The Importance of Data Engineering

I understand that there are plenty of reasons to start a digital project. I fully believe that most of you have seen an area where computers could solve a problem, or streamline a process. New advances like machine learning, VR, and blockchain offer people with domain knowledge opportunities to make a change. Sadly, a lot of well-intentioned people never manage to get their project/idea off the ground. Just over a year ago, I was constantly contacted by a range of doctors who had ideas that they wanted to start. I was not alone. My friend who is a senior researcher in stroke was swamped with job applicants who stated that they were involved in “developing an app”. Within a year, these queries stopped. Curious as to if it’s a reflection of my replies, I asked my friend when he came to my flat warming. It was the same for him. He had put out two new jobs in the new year. Virtually none of them expressed involvement in app development. It looked like the hype has died down for now.

Whilst there seems to be less hype on the grassroots, the political sphere seems to be waking up to the concept of tech in health. Multiple reports are being generated. Vague fellowships, postgrads, positions with no real goals are being filled, and academics are raving about VR, machine learning, and blockchain. Don’t get me wrong, it’s exciting to see the establishment being interested. However, as always the crucial, less exciting issue is not at the forefront. Data engineering is not getting enough focus.

So what is data engineering? Essentially, data engineering is converting data into a usable format. This encompasses a lot of skills and disciplines. This doesn’t just mean getting data into a readable format that you can see on the screen, it also involves engineering the memory state of the data. This includes improving the way data is collected, stored, and distributed. When I was on the front line last year, there was a lot of work that still needed to be done. GP systems do not talk to hospital systems. Hospital systems do not talk to imaging systems, and half of the data always seemed to be dodgy/missing. If you want to apply machine learning, the concept of garbage in garbage out will hold you back. If your data is not well labeled or accurate, then you will not have an accurate algorithm. If you do manage to create one, how well will it be implemented if the data engineering is bad and you will struggle feeding data into it on a production level?

Screen Shot 2018-08-19 at 22.11.24.png

Good quality data flow is so important, there are big companies in financial tech that make big profits just processing and passing data to their clients. They don’t analyze or comment, they just clean, process, and send. This offers opportunity. For the basic, you need to collect data, log processes, have alarms for errors, and streamline the data entry process. Then reliable data flows can be established. This enables the system to support structured and unstructured data. Data pipelines can also be developed. These transform data into other desired forms. Once this has been achieved, data cleaning and quality control can be achieved. After the transforming and cleaning, analytics and metrics are a possibility. After this, you can start experimenting with basic machine learning. Whilst we have this for things such as blood results, concepts like bed management, patient journey flow, staffing, patient notes, incidents etc all desperately need data engineering before we should entertain more exciting projects.

If you want to work on a project in this area, you’re not starting from scratch, the FHIR API is a standardized API that aims to improve the communication between health systems.

personal mutterings

maxwellflitton View All →

I help clinicians get to grips with coding and tech, I also code for a financial tech firm

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: