There is plenty of debate over how to design your database. At SQLServerCentral we have a Stairway Series as well as a few articles that cover design topics. I think it’s important for anyone that builds tables to spend some time learning what others have done and understand the pros and cons of making different choices. It does often become hard to change designs once they are in use, so trying to choose a good entity design early is important.
One of the things I think is important in modeling your particular entity is including a primary key (PK). In my DevOps talk I stress this, as I’d rather most attendees come away thinking a PK is important as their first takeaway from the session. There are exceptions, but they are rare, and I would prefer that most tables just have some PK included from the beginning.
A PK ought to be stable as well, and there are plenty of written words about how to pick the PK for your particular problem domain. Often I have received the advice that natural keys are preferred over surrogate keys, and it is worth the effort to try and identify a suitable column (or set of columns) that will guarantee uniqueness. I think that’s good advice, and it’s also advice I tend to ignore.
There’s an interesting article about keys and the GDPR. The first part is a rather basic description of what PKs are, but the second part talks about keys and some of the rights that data subjects have under the GDPR. I think these are worth considering, especially as it’s likely similar legislation will make its way into other jurisdictions, as already seen in California. The short part of the argument is that the right to be forgotten or to have your data deleted is incompatible with the use of natural keys.
It’s an argument, though I’m not completely sure if I think it would be solid. There are valid reasons to keep some information about a user, and I suspect keeping a list of emails to delete from a database restore as a separate list would be a valid use. Even if the user asked that their information was removed. It would be, but there would also be a need to ensure that the correct data was removed, hence a list of emails.
The bigger problem for me is that if I needed to redact or alter this key data, which I would likely do in order to keep some integrity in my database, I’d need to alter this data in lots of tables. That makes for a much more complex set of scripts, including ensuring that I am correctly building a map of the new values I would use for a key. It’s much easier to have a surrogate key that doesn’t change and just redact the other information.
I’m sure there are arguments both ways, but as we move towards the era of not only seeing data as valuable, but also as an asset we can’t completely control, I think surrogate keys make more sense now than ever. Let me know if you agree.