As a software engineer, I’ve been a little obsessed with data. However, I’ve held off telling everyone how important it is. I’ve been scarred with the days of when I worked in the emergency department. Every now and then a specialist would do a talk implying their field was the most important. If we had a patient with X, Y, Z, we’d have to think A, B, C in relation to their field and refer to them (though in practice their juniors didn’t like trigger happy referrals). My days now consist of manipulating raw data and calculating things so I’m well aware of by bias. After my recent experience, I can’t keep silent anymore. Even if you’re non-technical, if you have a roll in business strategy, considering data is a must.
This realisation came when I agreed to help a friend out with his side project. In his spare time he designed and sold pins. He’d buy pin components, make the pins in his bedroom, and sell them online. He didn’t plan on it to scale so I can’t be too hard on him, but his story illustrates what I’m trying to say. He needed to keep track of stock based on the components that made up each pin that was sold. As a side project, it wasn’t a big hassle. He’d manually put in entries into an excel and because he was making the pins his brain could quickly map components, taking a couple of seconds to log and update the stock. However, he’s really good at designing and making pins. The orders rose into the thousands. Museums and events started placing orders, and he ended up spending hours a day on data entry. When it got to this, he contacted me asking for help. Writing software that logged the stock of pin components and track pricing? Yeah sure. Should be a nice fun side project right?
…… not exactly. Hours burned coding till 2am, and 2 months later we got to a solution. So, what happened. Well, technically, nothing was that hard. Just him logging into the server and processing data doesn’t send chills down an engineers spine. The one user has direct communication with me so intuitive front-end, automated emails etc wasn’t needed. The pain point was the data structures. Whilst he could map things in his head, there was no coherent way of mapping objects. We had to introduce hashmaps for each inventory item so when a new sold item was entered, it mapped all of the components. Also, manually entered data isn’t great. Formats, spellings, and characters vary when there’s loads of manually entered data. A lot of extra code was needed to make sense of this and process it. Sometimes whole days were burnt writing scripts that went through the thousands of rows to categorise the variations of data the excel files housed. It was kinda like going round a friends’ house to put up a simple shelf only to find out that you have to repair the wall first. Tech is built on data. If your data process is terrible, you will have to do a lot of repairs.
So what can you do? It seems a little premature and expensive to have custom built software made for every project. However, you need to treat your data seriously otherwise it’s going to hurt you when you need it. First things first is get rid of unfiltered manual data entry. This can be done with no coding. Google Forms is a good free quick solution. Where there’s categories, define them and have a dropdown or checkbox menu. This just stops variations of the same category sneaking in. At this stage, you should consider how one object maps to another. You don’t need any technical skill, just an understanding of how they work. For instance, if you’re storing doctors you have their name, age, etc. However, when you have stuff like speciality, you have a different table for speciality that is linked to the doctor. Therefore you can add a speciality, search doctors by speciality and assign multiple specialities to one doctor. You also do the same with hospital, and another table for department. Therefore, if the department moves to another hospital, then you just change the hospital allocation in the department table for a particular row. You should map out your data in a diagram like the following:
As you can see, a campaign can have multiple goals, managers, sponsors and organisations. Therefore, if we change the campaign ID for a campaign manager, all the new attributes of goals, sponsors etc associated with the manager can be looked up. This increases your flexibility, and reduces the amount of data entry needed for each row. You can create data models in excel [link]. If you can, get into an SQL based storage system. These enforce data types for data inserts and can be imported into a range of different software platforms. If you don’t want to run an SQL server, SQLite is a good start as it just stores it on a file. It’s free, open source, and you can download a range of graphical user interfaces for it.
These small steps pay massive dividends in the future. In-fact, the results of careful planning in tech are so big, we have a name for it, tech debt. Seasoned engineers know that skipping on planning and structure now has compound negative consequences the longer it’s left. You definitely pay later. In my day job, I’ve worked at multiple tech companies. All of them have set aside sometimes months for me and other engineers to refactor and sort out tech debt. The flat out failures I’ve seen in the startup world were usually led by non-technical people. This doesn’t mean that non-technical people can’t succeed. However, they run the risk of not understanding the damage tech debt can do. They hired cheap developers and tried to push for features quickly. Initially it works. Their app is technically running. However, soon their app is crashing, server bills shoot up into the thousands per month and new features are buggy and take ages to implement. I’ve seen it so many times I’d happily put money on it when I see it.