Many of us that work with data will find requests and demands to import or export data at some point. Plenty of us have regular processes that perform these actions, and we may regularly troubleshoot or enhance these activities. In fact, I know some people have a full time, or nearly full time, position just dealing with ETL operations.
Working with data in disparate formats and the myriad of inconsistencies even when formats are known is a challenge. Integration Services is a useful tool, but many us find that we need to pre or post process data separate from a simple import or export. Some of us may prefer using T-SQL or other languages, such as R or Python, to process data rather than programming SSIS. It seems that I often find that every client wants a slightly different format or change to their data that a simple query export won’t handle.
These days, as we add in Machine Learning and other downstream processing activities, it seems that there is more and more of a need to process data beyond imports and exports. After all, it seems that the majority of the time in any ML project is spent preparing and transforming data. In addition, in Article 15 of the GDPR, there is language that notes a data subject has the right to request a copy of the data relating to them when it is being processed by an organization. I don’t know how often someone will want to get data about themselves or their organization, but I’m sure it will happen more than it happens today.
I think this means I’ll need to brush up on ETL skills, perhaps to ensure I can easily extract out a copy of an individual’s data. In fact, I probably should compile some scripts now to ensure I can let someone know what we information keep at SQLServerCentral that would fall under GDPR. I think it’s just email addresses, but I could be wrong.