Data Modeling Information

Data modeling is something that we should all be doing when altering the schema in our databases. I’d like to think that most people spend time here, but I don’t think that’s the case. I think plenty of people think “I need to store a piece of data” and they pick a string or numeric datatype and start stuffing in values. If in doubt, just pick a string. It’s why I think we have lots of dates stored in string columns because that was someone’s first thought.

There was a post recently that talked about storing data in its highest form. It was interesting to me because these are the type of decisions I try to make when designing a table. What is the best form in which to store data? The authors talk about picking not only a type that easily converts, but the fields that make it easiest to work with the data in different ways.

I do think that the aggregations or calculations that we need to perform should influence your data type. If you are measuring something, use a numeric. In fact, in their example of movie times, integer is probably the best type. While many databases and languages have time datatypes, some represent a measure of time (timespan), while others represent a clock (T-SQL time). Either might work for movies, but in aggregations, the T-SQL time will have issues beyond 24 hours. An integer is a better choice, assuming we don’t care about seconds.

The second part of the post looks at multiple values, in this case customer loyalty points earned and redeemed. A simple running sum is what we might store in a database, though the application class might need two fields. Of course, modern software often totals these things for a customer as part of gamification and inducement to engage more, so maybe a data store would also want to store the title earned and redeemed, with a calculation to show the balance.

The one thing that I might add for developers to a post about modeling is the need to consider operations at scale. While using a bit more or less storage often doesn’t matter for any row or any operation on a singleton set of data, when we scale across millions of rows, little things matter. Consider how your data might be aggregated and what happens if you have millions of rows to work on. There a better design decision can out perform a poor one by many orders of magnitude.

That and generate lots of data to test. You ought to know how to quickly mock up a million rows to check your queries. You might have a million rows in production.

Steve Jones

Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged . Bookmark the permalink.

1 Response to Data Modeling Information

  1. And how you’re going to handle NULL values. The current accounting DB we use has so many NULL enabled columns that shouldn’t be Null enabled it can drive one crazy. The DB has 29,493 Null enabled columns across 1,654 tables all of which are non-empty tables. In our DB because of optional modules for the software we use we have a lot of tables that are empty because they are for use by modules we don’t license/use. Granted some of these columns should be null enabled but there are a lot that shouldn’t be that can cause issues because they were Null enabled.

    Consistency is another issue on our end. In some tables 1 means TRUE/YES/ENABLED and in others it’s -1 and in a few cases it’s 0 and I wish I was joking about the last one.

    When I first started out with MS Access 97 I wasn’t as adamant about table design as i should have been but I also wasn’t building a commercially viable product but something for in-house use so we could transition from using an Excel Spreadsheet which was cumbersome for what we were doing.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.