Fragmented or Centralized Data

I read a piece recently that talks about the hassles of copying data multiple times for different applications. In my experience, I haven’t seen this to be the main problem with data. It’s not often that we might replicate, in a general sense, data across different data stores to support different applications. Certainly lots of ETL jobs exist to copy data to new stores for different purposes, which perhaps is what the author is implying.

The idea of protecting data is one that is becoming a greater concern for many organizations. In fact, I’d argue that a number of the recent high profile data breaches in the last couple years involve copying data from some RDBMS store to an ElasticSearch server that isn’t secure. Any movement of sensitive data, whether to warehouse or Power BI report, should be in a secure way.

For years we’ve had minor issues with data security in Excel worksheets; a similar problem continues to exist with both data stores and reporting tools that might contain copies of data. In some sense, this is actually no different than the problems of losing paper reports in the distant past.

The solution given in the article is to share data from a single store among more applications. That’s been the practice in many places I’ve worked, with the challenges of additional load and performance concerns on the data store. Modern distributed SQL Servers can use AGs or (after SQL Server 2017), Kubernetes, to scale out and potentially handle the loads, but those choices aren’t without their own resource costs and challenges.

Ultimately, we aren’t going to get away from moving data around. Certainly we have needs to deal with dev/test environments even if we don’t have any other data movement. While I do think the future of large data workloads will involve less movement, we aren’t going to eliminate movement.  We may build more applications that connect to a single data store, which is likely as our platforms become more powerful and enable scale-out capabilities to meet workload growth.

We also need to ensure that copies of data made for different purposes as well protected. Most businesses need to develop better skills and habits to limit sensitive data in dev and test environments, as well as proper access controls for data copies that are used in production environments.

Steve Jones

Listen to the podcast at Libsyn, Stitcher or iTunes.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged , . Bookmark the permalink.

2 Responses to Fragmented or Centralized Data

  1. Tom says:

    I almost fell into a bad situation related to data security just a few days ago. We have a database with secured financial information which applies row-level access to all queries. Recently, I was asked by a high-level executive to quickly add summarized financial data to a report. The problem was that all people with access to the report should be able to view the financial information (essentially bypassing the security controls in the financial system). I have the access to add the report enhancement and I was pressured to do it quickly. Also, I probably could have added the financial information without compromising sensitive data because I would have added the information at an aggregated level. However, future requests would likely have arisen to expand the data to a more detailed level. I was able to persuade the executive that bypassing the security controls was a bad idea.

    Bottom line: small company + high pressure to move quickly = data mishaps!

    This isn’t the exact scenario you covered your post, but I think it fits. Consumers get more tools and technology and come to expect whatever they want in short order. It’s easy to compromise security.


  2. way0utwest says:

    It is easy, and in the US, we do it too often.


Comments are closed.