As coronavirus goes on, we see more mistakes being made. I’m aware that mistakes happen. This is a unique time. It would be unreasonable for me to be outraged at everything that happened. There are complex decisions that have many factors influencing the outcome. For instance, when to lockdown. People don’t say in, especially if the lockdown is going on for a long time. There are negative effects on the economy. The rate of spread of disease is another complex field of study. I’m glad I’m not making these decisions. However, I’m human and whilst I try not get outraged, there will be times where the action is so bad, and the choices were so obvious, I cannot help myself and this is why I’m writing this post.
There was some data corruption based on the fact that the NHS was using Excel to store and process COVID test data [link] leading to the missing of tests. The reason why I am outraged about this, is because using Excel to process up to a million rows of data in this day and age is just flat out dumb. It reeks with incompetence, and there is frankly no excuse apart from the fact that there are people out there who clearly do not know what they are doing making these decisions. For the rest of this post I am going to go through the completely free alternative that is vastly superior, can run on your computer with a few installs, and has a graphical user interface.
Our approach is an SQL database. The advantage of an SQL database is that it can store a lot more data. It also has tables and relationships, making queries a lot faster and more comprehensive. It also has what we call a schema, which ensures that the data row you are inserting has the right fields and correct format for each field. This ensures that all data is clean. The only reason you should be reaching for Excel is if you have up to a couple of thousand rows and you want to calculate the average of something etc.
Let’s say we’ve been a bit lazy and we’ve let our excel file get to 50,000 rows (even though we’re processing COVID tests for the country, it does not take a PhD in economics to predict that the data is going to get big). We just want to process it on our computer, and we don’t want the hassle of setting up a server etc. This is where docker comes in [link]. It’s a free open source container system that enables you to spin up and run multiple operating systems on your computer.
What’s even cooler, is that companies that maintain the free open source database containers in docker, upload them to dockerhub. You merely just have to type in that you’re doing to use the free open source PostgreSQL database in your docker file, run it, and it will pull down the database container, build it, and start running it on your computer [link].
Ok, so we’ve got a database running, how to we interact with it? There are loads of free, open source graphical user interfaces you can download [link]. Hell, they even have Excel like layouts so you can point and click on cells.
What’s even better is that these user interfaces also support remote connections so you can put you SQL database on a server and have multiple people access it. These graphical user interfaces even have point and click functions where it can ingest Excel files, loop through them and try and insert them into the database. You can even see relationship tables:
Don’t even want to have the hassle of typing a few commands into a terminal for docker, you can buy encrypted SQL databases off the self on Amazon Web Services [link]. With a few point and clicks you can build a database, configure it so it will only accept connections from IP addresses you’ve defined, and then connect to it remotely with your free open source graphical user interface. With a few point and clicks you can get automated backups etc.
What was the excuse for not doing this? Nothing, it doesn’t take any coding ability to set up. You might have to google some SQL scripting but it’s very straightforward. The tools are all free. It’s more secure than a CSV file loaded in excel. And there are point and click graphical user interfaces that can manage the setup and data management. I usually roll my eyes at political outrage. However, this level of incompetence is so insane, I don’t think I can actually stress enough how much of a mess this is. Also, what else is excel being used for in the NHS?
As a side note, even if this is too much for you, there is a disk file type called SQLite [link], this is a file, just like any other file, you can read and write without having any other processes running. However, it still has tables, schemas that enforce correct data fields and formats are inserted, and advanced querying. it can also store a lot more than an Excel file. Again, I cannot stress this enough, there is no excuse for using excel, just sheer incompetence.