I saw this article a few months ago, which talks about engineers at Facebook not knowing where their customers’ personal data is stored. The engineers were being questioned in a legal matter, where they were asked to definitively state where all personal PII data for any human was stored by Facebook. Their answer was that they didn’t think anyone in the company would be able to answer that question.
Facebook has been controversial over the years, and plenty of people dislike the way the company conducts business. I noticed no shortage of data people (and many others) commenting on this situation, saying that Facebook should be shut down because they don’t know where data is being stored.
However, I don’t agree. In working with lots of customers, on all aspects of how they handle, process, and manage data, I expect this to be a problem in many organizations. Whether large or small, whether they have few or many software engineers, it is highly possible that there isn’t a good list of where personal data is being stored. As we work with customers to classify data with SQL Data Catalog, that process takes a long time, and very often the system administrators or developers who undertake take the task are unaware of all the places where data is stored.
That’s just in relational databases, ignoring all the Excel spreadsheets, text exports, mail merge operations, and uploads to services for mailing, analysis, or something else. Very often the control of personal data is fragmented among groups, with there being few efforts made to coherently manage a customer’s data.
The world has adopted computing at an incredibly fast pace, often by people with little knowledge or forethought of the implications of gathering and processing data. In many cases, probably most cases, there is no overriding strategy. Just like with applications slapped together quickly, we find data being gathered and stored based on the requirements and demands of business people, with no planning for management or archival, and often not even with any security requirements.
I liked the GDPR as a step forward, asking companies to not only handle data appropriately, but remove it when not needed, not use it without consent, and to be able to keep track and delete it if not necessary. I don’t know that this has been successful, but it has changed handling practices in some organizations. At least in responsible organizations, and many of them have had to track down personal data to delete it. I’m not sure they know where it all is, but I at least assume they know where all of the data about a person is in their various relational stores.
As a technical person, do you know where all data is stored about a customer? Are you sure you know where marketing has been keeping information and what other mailing, analysis, reporting, CRM, etc. systems they’ve put data? Any idea how many copies the operations group keeps? Test systems, QA, UAT, and others? What about test data sets, are they sanitized? Perhaps legal or finance has gotten extracts of data to reconcile their systems.
Tracking down all data can be hard, and I’m not surprised Facebook struggles. I would guess engineers in many organizations would have similar answers.