Do you know the difference between XLS and XLSX? They’re both Excel formats, and many of us might just use one or the other. After all, the latest versions of Excel work with both, and if you’ve been using a spreadsheet for years, perhaps you stick with the older format when exchanging data with others.
As many of you might have seen, Public Health England recently learned there is a difference with large amounts of data. They found data was being chopped off in a spreadsheet because they were using the old XLS format, which only supports 65k rows. The newer XLSX format will support a million rows, but both numbers are far below what SQL Server, MySQL, PostgreSQL, and other platforms support. Those platforms support billions, and most are limited only by the storage available.
I know that pandemic has had many groups scrambling to assemble and analyze data. We have people building dashboards and gathering data together in numerous ways, from paper and pencil to Excel to (hopefully) enterprise databases. An import into a relational store would make more sense than Excel, but I also understand that setting a schema, dealing with ETL and different formats from different sources, and other issues are a pain. There is a reason data professionals get paid a lot of money for these tasks.
To me, this highlights one of the issues of working with SQL Server, MySQL, PostgreSQL, etc., in that they are cumbersome and difficult to get started with. Even if scientists chose Cassandra or MongoDB, there would be issues, because there aren’t easy, simple client tools that facilitate work with data sets coming in disparate text files and formats.
I don’t mean to excuse this, because IT professionals should know better. If you’re using XLS, stop. Data volumes increase and you don’t want to realize you’ve hit the limit after data is lost.
I get the ease and convenience of using Excel, but stop using it for major projects once we realize these are important. Once you realize this is data that needs to be intact, secured, and protected, put it in a real platform. Excel, PowerBI, and most tools can query SQL Server.
Use those tools where you need them and where you can, just don’t use them as your database.