I ran across an article, titled When to use CHAR, VARCHAR, or VARCHAR(MAX), which struck me as something I never do. I mean, I do use varchar (and nvarchar), but I can’t remember the last time I actually created a char column. The article is worth a read, and it sets the stage for you to think about your database design process and the choices you make.
It’s Friday, and if you take a few minutes and think about the last few times you’ve added columns to a table have you used CHAR as a data type? Or do you default to varchar of any size as a general rule?
I tend to do a lot of demo work, and I help customers with different situations. In many cases, we are storing text data, often not strongly typed data. As a result, I find most customers using varchar (or nvarchar), and I’ve built the habit of using the variable structures in proofs-of-concept and demos. I find it especially handy when someone asks me to enter some data they use and then show how it would be handled.
Early in my career, I’d often tag a zipcode as a 5 character field, or a state as a 2 character abbreviation. However, these days a postal code can be a 5+4, which is 9 characters or even 10 with the plus. Many companies work overseas and may want to account for longer postal codes. States (or regions), can often be 3 characters, but sometimes more. Often we just leave 10 characters for region abbreviations (or longer) as the data might be spelled out or need to accommodate something unknown.
For many business applications, it seems that there might be a definition for what the data should be, but since exceptions can abound, often using a variable-length data type just prevents issues in the future. Add that to the fact that often we are dealing with cheap storage, and it doesn’t seem worth the time to try and get the exact size correct. Even when knowing an invoice uses 10 characters, are you sure that you won’t exceed the ten-character width? What if you acquire a company that uses 12 character invoice numbers? Easier to set this to a variable 20 and move on.
If you work in data warehousing, then you might know what your data sources contain and be more likely to choose fixed types, but is the space savings worth the work in the event that source systems change? I don’t know. I tend to plan for sources to change and allow a little padding in my schema. You might feel differently, but are the space savings worth the potential hassles in the future? I’d be curious what you think today.
Steve Jones
Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.
Good question. The only times I ever really remember using CHAR are when (1) I had data with a fixed number of characters, or (2) I had data that required space padding. Of course, how often do those scenarios come up?
LikeLike
That’s what I have been seeing more lately. We don’t have a lot of “fixed” sizes.
LikeLike
doesn’t it depend on the size of the database? I would think lots of character fields in a large table could add it in terms of storage space and database backups and indexes. Also, isn’t there a performance gain by using number fields like integers if the field is supposed to contain only numbers?
LikeLike
Certainly data size could matter, however many data warehouses, with billions of rows, use compression, so a lot of space isn’t being used with variable columns, but it is with fixed width ones.
You definitely need to use the proper data types. This is only for string columns, not numerical or date fields.
LikeLike
Pingback: Using CHAR – Curated SQL
I like fixed-width types for things like country codes, currency codes, license plates (at least in my country). Not so much because it saves a tiny little bit of space, but because it appeases my fundamental desire for strong typing.
A banking client of mine used nvarchar(35) for account numbers, even though account numbers were entirely numeric, 10-12 characters. That just hurt my eyes whenever I looked at it – but sure *slaps roof* you could probably fit a lot of poop emojis into that account number, I guess.
LikeLike