People who work with data have had a number of job titles in my time, including data admin, data specialist, ETL developer and data integrator. The rise of the term data scientist as the de facto job market term for an analyst, however, left us data people a bit high and dry. We, too, needed a recognised umbrella term, and the chosen option has become data engineer.
Defining data engineers
But what is a data engineer? Ask around, and you may hear that they are hardworking guru types who know all the ins and outs of the data platform, are adept at making complex queries in a plethora of languages, and can stitch together a blanket of usable data from seemingly separate ragged patches and sources. In practice, though, what they do is provide the basis for the work of data scientists.
Data scientists really want well-prepared analytical base tables, ready to support modelling, visualisation, forecasting, optimisation and many other analytical uses. This doesn’t happen automatically. You often hear figures being quoted of 60 to 80 percent of the analytical effort being spent on preparing the data. That is where the data engineer comes in.
A team of experts
The process of creating insight from data is a team effort involving multiple roles. The data scientist cannot manage without the data engineer, but without the data scientist, the data engineer is left without a customer. Neither can function without the solid support of IT teams, who own the tools and the platforms that are both sources of data and targets for the deployment of analytics into production. And both also need the business user who defines the problem.
This need for teamwork means that communication is key: between the business user who needs the results of the analysis, the data scientist who has the analytical skills and knowledge of what kind and format of data is required, the data engineer who sources the data and shapes it into a usable format, and the IT team with the knowledge of the underpinning infrastructure.
I put the business user first in that list because it is pivotal to understand the business objective for any analytical work. Analytics starts from a business problem that needs solving. The first question is “What are we trying to achieve?” and the answer must be communicated clearly to everyone involved. There is nothing new about this. It is a basic principle of project work, but it is so seldom observed.
For data engineers, this really matters. It is much easier to create the right kind of data to answer a business question if you understand the problem. Business users may think that all they need are tables A, B and C, but you may be able to add considerable value to the project because of your knowledge of the data, offering new options for analysis. Communication is the key.The process of creating insight from data is a team effort. The #DataScientist cannot manage without the #DataEngineer, but without the data scientist, the data engineer is left without a customer. #analytics Click To Tweet
Two sides of the coin
Being a well-prepared data engineer is considerably similar to being a well-prepared data scientist. Both work on the same problem and ultimately the same data, although they have a different primary skill set and focus. Both need to have curiosity and an obsession with solving problems. Both also benefit from having a bag of useful tricks, skills gained from having solved similar problems before.
Data problems can be solved (and caused) regardless of the tool chosen, but having a flexible, open and scalable analytics platform with suitable data preparation tools can help both data engineers and data scientists. These, however, are secondary – the goal is to improve or transform real-world processes. As Deming once said, wisely: “Without data, you're just another person with an opinion.” And without deployment to the real world, you only have an analytical experiment.