A long time ago, nearly two decades ago, I had a manager come ask me about metadata, specifically, what it was. This was a technical manager, who understood software well but wasn’t a data expert. I tried to explain data about data, but at that time, the world primarily dealt with client-server type technologies, with data being something that was primarily about a think. Metadata like types, scales, etc., weren’t seen as that useful. Descriptive items, like we might store in the MS_Description extended property, were rare.
Today, we use metadata quite often, and I think in many ways we’ve “consumer-ized” metadata with tagging and other structures that let the users of our software add descriptive attributes to the data they store in our systems. In some sense, we get metadata at a deeper level, almost the row or cell level from actual users, as if we were tracking column or entity level markers.
There is a lot of value in this metadata, both to the technical staff and to business users. This article talks about the value of having detailed metadata for data scientists and similar roles, where understanding the meaning of the data can be very valuable. The more focused data scientists and analysts can be, and the easier with which they can find data, the more productive they can be.
At Redgate, we work with customers all over the world, but it’s outside the US where we’ve seen the value of metadata grow. Primarily in response to the GDPR, we find many business people want to know about their data, it’s meaning, it’s sensitivity, and really, the risk of data loss. This has expended into our Data Catalog and Data Masker products, which exist to help organizations keep track of their data. It’s a big job, one that isn’t fun, and often has debatable value when starting from the beginning. It can be a large time sink when one person has to track down the meanings of all the data in databases. It’s also not a very interesting job.
Many find, however, that knowing this information can be quite useful when it’s available. Archive plans, decisions on masking data, even choosing technologies for protection become more focused when you know what data you have. We continue to research and work on finding ways to improve the help software gives users in applying this metadata, as do many others. Privacy and compliance have driven a lot of work in this area, but potentially data science, data lineage, security, and more could find ways to use this information in all our organizations.
The one thing I think is important is that we maintain this information as a part of our regular work. Customers might do some metadata work themselves (which can be extremely valuable information), but asking most people to classify and tag lots of information is a mind-numbing job. It’s better if we can tag sections of data as we add features to software, whether as developers, or someone else as a part of the software process. That way, when someone goes looking, most data is already partially tagged.
Listen to the podcast at Libsyn, Stitcher or iTunes.