I’ve been working through some of the GDPR legislation, trying to find ways to explain this more clearly to customers and ensure that our products make sense in light of this law taking effect. Redgate is focused in this area and not only do we need to ensure we are compliant, but we also want to ensure we are building tools that help ensure you are compliant.
In article 14, there’s this text: “the controller shall provide the data subject with the following information… the existence of automated decision-making, including … meaningful information about the logic involved.” That sounds a little concerning for those of us that work with data. It’s not always easy, but we can explain how a SUM or AVG function works, even with a complex OVER() clause and lots of joins and criteria.
What do we do with a model running under SQL Server Machine Learning Services? The output from those scripts and models is often created by the model, without any obvious way to determine how the results are determined. The requirement to explain is enshrined in a law, one that many people are concerned about. With all the ways that ML and AI systems can get gamed and perhaps contain biases based on the data used to train the model, I can certainly see no shortage of people asking for explanations of decisions or conclusions.
Fortunately Article 14 also has this part: “the provision of such information proves impossible or would involve a disproportionate effort …” That seems to give companies an out if they are using current systems about which little is known about the black box of machine learning. Certainly organizations are still charged with protecting the data subjects rights and freedoms, but this seems allow for the use of technologies that we can’t quite understand.
I doubt this was the intention of the authors, though I do hope that this doesn’t prevent the use of newer tools and technologies. What I’d like to see take place is more research and understanding into how the various algorithms we want to use for ML and AI technologies work, perhaps with some more detailed analysis of the inner workings of the models.
GDPR is going to be an interesting regulation that may have dramatic impacts on the world of data. I’m both excited and concerned to see how things move forward from here. Hopefully this results in better and more responsible data handling and doesn’t degenerate into a series of long term legal battles.