Data Engineering

The plumbing in the data value-production chain.

Transforming data into a useful format for analysis

A data scientist is only as good as the data they have access to. Most companies store their data in variety of formats across databases and text files. This is where data engineers come in — they build pipelines that transform that data into formats that data scientists can use.

Data engineers are just as important as data scientists, but tend to be less visible because they are further away  from the analysis end product.



Data infrastructure

Data scientists are engaged in advanced mathematics and statistical analysis on the data infrastructure that is built and maintained by the data engineers, but they are not responsible for building and maintaining that infrastructure.

Instead, they are internal clients, trying to identify trends and relations—things that require them to use a variety of sophisticated machines and methods to interact with and act upon data.

data scientist vs data engineer

The plumbers of data science

In contrast, data engineers focus on the applications and harvesting of (big) data. Their role doesn’t include a great deal of analysis or experimental design. Instead they are out where the rubber meets the road, creating interfaces and mechanisms for the flow and access of information, with a focus on collecting, managing, analysing, and visualising data; and develop batch & real-time analytical solutions.

Simply put, data engineers are the plumbers of data science. However, data scientist depend on them in order to be able to do their job well.

dat pipeline