There was a report that Spotify is writing lots of junk data to users’ local drives. Some people have noted GB a day or writes, even when they aren’t using Spotify for listening to music. Some debugging by users show that a local database (SQLite) is getting a number of maintenance calls to the database that are being repeated over and over. This appears to be a bug, and one that is hopefully being fixed. I know it hasn’t hit me, so perhaps there are more factors in play than just using Spotify.
This does bring to mind a few things to me that we should be careful of as software and data professionals. The first is that while bugs will creep into our software, being able to deliver fixes is important. I’ve seen updates made to Spotify in a smooth fashion, one that doesn’t seem to require much effort from me. While I don’t usually get to decide when these updates will apply, and they can be annoying if I reboot and attempt to use the application, this isn’t a critical piece of software for me. For some of us, though, our users can be impacted if we don’t allow them the chance to choose when updates occur, or at least, when the final time is when the old version might no longer be supported (or stop working).
The bigger issue, to me, is that the local database is being used in a way that users don’t expect. There’s likely a bug here, causing lots of additional activity, but what’s a normal level of activity? What would our users expect? Do they even know that we’re keeping data locally on their system? In many cases they might not care, or even assume we are, but we should disclose this to our users. Security concerns over data leakage or loss may be an issue in some environments, and it’s important that we ensure our users are aware, or at least they can find, information about the data our software may collect, store, and use.
Data becomes more important all the time, and we are constantly finding new issues surrounding its capture, storage, security, and use. The way we even think about data is somewhat immature. Data ownership, protection, lifetime, and more are concerns. The problem is that many of us would struggle to articulate how exactly we want to handle the data that is related to our lives. In fact, we may have vastly different ideas on how different types of data are handled. Perhaps we don’t care about our musical listening habits, but we care deeply about someone learning about our reading history.
If you develop software that uses a database, and many of you do, it would be nice to document (maybe self document with code) the ways in which data is stored and managed. However, if you’re systems are like mine, there might not even be any policies on how data is managed and handled over time. That’s something I hope changes in my lifetime and we start to treat data like a very valuable asset that can impact our lives.