I’ve been trying to work on some new presentations so that I have a variety, including some spares, when I go to events. One of the topics that I think has some value, especially for .NET and sysadmin groups, is a list of common mistakes, how to fix them, and why they’re bad.
I was going to call this Common Developer Mistakes, but I’m not sure that would go over well at Developer events, and I see DBAs making these mistakes along with Windows admins.
I decided to build a series of blog posts as I work through the presentation to document some of the issues, and help me work through speaking points. Please feel free to comment.
SELECT * Is For Short Term Use Only
The first mistake that I often see in application code is that too often people write things like
SELECT *
FROM Sales.Customer
.csharpcode, .csharpcode pre{font-size: small;color: black;font-family: consolas, “Courier New”, courier, monospace;background-color: #ffffff;/*white-space: pre;*/}.csharpcode pre { margin: 0em; }.csharpcode .rem { color: #008000; }.csharpcode .kwrd { color: #0000ff; }.csharpcode .str { color: #006080; }.csharpcode .op { color: #0000c0; }.csharpcode .preproc { color: #cc6633; }.csharpcode .asp { background-color: #ffff00; }.csharpcode .html { color: #800000; }.csharpcode .attr { color: #ff0000; }.csharpcode .alt {background-color: #f4f4f4;width: 100%;margin: 0em;}.csharpcode .lnum { color: #606060; }
What does this do? If I run this in my 2008 AdventureWorks database, I get something like this:
You can see that I end up with multiple columns (CustomerID, TerritoryID, CustomerType, rowguid, ModifiedDate). That’s handy, and cool, and allows me to get all the data in the table.
But do I really need it?
In most applications, my guess is that we don’t. Why do we need TerritoryID? That’s a foreign key to the SalesTerritory table, and typically what I want instead is the SalesTerritory.Name column instead of the ID value.
I could do this:
SELECT *
FROM Sales.Customer c
INNER JOIN Sales.SalesTerritory t
ON c.TerritoryID = t.TerritoryID
but that’s any better. Now I’ve returned even more columns, 10 more to be exact, including TerritoryID twice, once from each table. In AdventureWorks, this is 19k rows, and at a minimum this query has returned 19k rows x 8 bytes (int data type) too much data. That doesn’t sound like a lot, but what if this runs in your application 500 times a day? That’s a lot of wasted:
- bandwidth
- disk access
- memory from caching
- CPU work on the server AND client
I would also guess that most of the time when you access a customer, you don’t even want all the rows. Likely you want to filter this somehow, and you will with a WHERE clause, but it’s still wasted time and resources.
We know that the database often is a bottleneck. It’s a shared resource, it’s one machine, and it doesn’t scale as easy as multiple clients or web servers, or even developers, so we should avoid wasting resources when we don’t have to.
What Do You Do?
Here’s what I recommend:
You can write this, and it’s what I often do:
SELECT TOP 2
*
FROM Sales.Customer
And I get a limited result set:
Why is this better? I do this so I can easily see the column names. I can then include those in my SELECT statement, with a quick rewrite.
I could also quickly use the Object Explorer to find the columns like this:
And you can right click, and choose “script” and “as SELECT”
and paste the code into your query window. The results would look something like this:
Alternatively, my employer, Red Gate Software, makes a fantastic product called SQL Prompt that will help you quickly grab columns. For me, I can do an SSF, and get a SELECT * FROM and then choose the table:
Not that I see the columns to the right. I could also just select the table with a tab and then if I remove my askterisk, I get a list of columns I can easily pick:
SQL Prompt makes this easier, but it isn’t that hard to just do this by hand. You could easily grab the columns you need from SSMS and add them to queries.
The database is a limited resource, even if you have a 256 core server with 1024GB of RAM. You still want to query the data you need and only return what’s necessary. A little more effort when building code will pay off later with much better performing applications.
References
A few links from other people that see this as an issue as well.