Common SQL Server Mistakes – GUID as a Clustered PK

I haven’t been thrilled with GUIDs as primary keys, mainly because I think that it’s hard for humans to work with GUIDs. A GUID, or uniqueidentifier, looks like this:
ECB6ECB4-ACCB-4382-84D1-19990D59CA2F
Not exactly something I want to try and type or include in a query. Cut and paste works, but it’s cumbersome. Much easier for me to work with integers.
I understand that GUIDs have some good advantages. They can reduce round trips, allowing the client to build a primary key and send it to the server. That’s a nice performance trick, and one I’d encourage.
The real issue, however, is when you make a GUID a primary key on your table, using the defaults. Most people use the defaults, and that’s typically OK. However in this case the defaults cause a problem.
The default setting for a primary key is a clustered index. For an integer, especially with the identity property, this is OK. All new rows are added to the end of the index, in new space allocations. This creates a hot spot for heavy insertions, but SQL Server handles those OK.
For a GUID, if I create new rows, I get values like this. These are three new GUIDs I created on my local instance.
ECB6ECB4-ACCB-4382-84D1-19990D59CA2F
3406A5AE-A963-48A6-B2FC-03197DC72478
C5D75C4F-D9EA-4355-A025-2FCC541D6E1E
If you examine these values, you’ll see that they appear to be random. That’s OK, and it can be a good thing. But for inserting new values, that means that item 3 would be inserted before item 1, and that can cause page splits.
Page splits are bad for performance. Data has to be moved to a new page, so not only are you inserting xx amount of data onto a page, you might be moving yyy data to a new page. It’s entirely possible that yyy > xx, which could be really bad.
There are a number of more technical explanations in the references below, but there really is a penalty there. This is in addition to the extra space (16 bytes v 4 bytes for an int). That’s less of an issue, but it’s still an issue.
The other thing is that all this page splitting creates fragmentation. So not only are your inserts slower, but potentially your read queries are also slower.

What can you do?

I think that the first thing you ought to do is read some of the articles below, and consider if you really want to use a GUID as a PK. If you do this…
GUID_a
then do this:
GUID_b
That will at least minimize some of the performance issues that you might have.
The other thing you can do on the server, if you are generating the keys with SQL Server, you can use NewSequentialID, which should generate sequential GUIDs, in the same manner that the identity property builds sequential numbers. There are some potential issues, so don’t assume these will always be sequential, especially if you generate some on .NET, but this is better than a clustered index on a GUID.
Be careful when using defaults, and if you use GUIDs, make sure that it is a good choice for you.

References:

A few posts from around the web on the issues of GUIDs as clustered primary keys.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Blog and tagged , , . Bookmark the permalink.