Statistical Protection

Statistics are being used more and more, and many of us don't understand the lack of security, even in anonymized data.

Statistics are being used more and more, and many of us don’t understand the lack of security, even in anonymized data.

The things people can do with data is amazing. I remember reading about the anonymous data set released by Netflix and how some of the people were identified based on other, related actions on the Internet. This de-anonymization, while scary, was amazing to me. There have been other, related reports of similar “attacks” taken against other data sets. These reports worry me that we will have more and more data security issues in the future, not less.

I ran across an article that talked about protecting data in statistical databases. These are the databases that contain data from multiple sources, and are used to analyze the information from these sources. The security of these databases becomes important when the data contains information about individuals that we consider sensitive. Interestingly enough, it seems that the security protections being used are query restrictions.

However these restrictions are the reverse of what we might expect. There might be minimum restrictions on the number of rows returned, to try and prevent information about a specific individual from being returned. There are also limitations on the types of queries that can be run, usually requiring aggregate functions in the query, and restricting which aggregates are allowed.

This is definitely an area of our industry that needs more work and research. Lots of organizations, especially government organizations are being called on to open their databases up to the public, and many of them are doing so right now, allowing queries of their statistical databases. This might improve the use of this information by the public, but there are plenty of ways in which this data could be potentially misused. If your companies wants to open some of your data to clients or customers, you might raise the concerns with possible abuses of the database and ask that time and effort be included to try and secure the data, possibly by implementing query restrictions.

Steve Jones

The Voice of the DBA Podcasts

We publish three versions of the podcast each day for you to enjoy.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged . Bookmark the permalink.