Data Engineering — the Cousin of Data Science, is Troublesome

A contribution by Lissie Mei.


How to get your analysts realize the importance of expanding their toolkit? I guess I found the answer.

We always deem data science as the “sexiest job of the 21st century”. When it comes to the transformation from a traditional company to analytical company, either the company or the data scientists would expect to dive into the fancy world of analytics as soon as possible. But, is it always the case?


A troublesome start

Ever since we started the collaboration with Hilti, a leading manufacturing company in power tool and related services, we had provisioned several splendid blueprints: automation of pricing, propensity model…Working with such a great company is a precious opportunity for us, and we cannot wait to exploit our analytical skills to create business value. But when we started to tap into the data, we found that for a traditional company, it is hard for us to directly acquire clean and structured data as for a data-driven company such as e-commerce company.

As I was mainly responsible for the data cleaning and engineering of the project, I witnessed how we were hindered in the analysis progress due to data un-readiness.


I witnessed how we were hindered in the analysis progress due to data un-readiness

As we are directly working with the finance team, and the other team, pricing operations, is actually taking charge of database, at the beginning the process was heavily lagged because we can barely request and inquire the data people in time. Moreover, as the data of Hilti is sensitive and it hasn’t developed a secure way to transfer data, we have to wait for masked data after every request. Thirdly, data engineering is the basis of data analytics, so before we completely solve the inconsistency among several referring tables, we can barely proceed with a solid model or conclusion. Finally, we have to deal with the various data types: CSV, JSON, SQLite, etc… Indeed a good chance to learn. After around two months, we got all the data ready and every anomaly discussed and solved.


Diving time!

The models and visualizations cannot wait to taste the fresh data and test the accuracy. However, the most embarrassing thing happened when we were presenting the first proposal with actual numbers.

Guess what, the big numbers didn’t seem to match. And then we realized that we didn’t receive the complete data. We were too focusing on the detail of data, such as anomalies and relationship among data sources, and we forgot to do the basic check of sum and count. This is a lesson that I will remember for a lifetime. Truly!



Why data engineering is so important

The most important thing that I learn from the data engineering drama is that this kind of role that usually works behind the scene is actually holding the gateway of innovation. When a traditional company considers exploiting their data, the most efficient and first-step hiring should be data engineers. With them, the company can build a healthy and replicable data pipeline and making it much easier for data mining and finding business insights.


I also learned that why a lot of company require their data analysts to have some knowledge in programming related tools such as Python and Scala, apart from basic analytics tools such as SQL and Excel. Usually, we cannot expect a full-stack analyst, but it is necessary that we have someone who can communicate with both engineering people and management people. For the highest efficiency of work, the clear allocation is important, but a guru of every data tools is indeed attractive.

What I am expecting myself to learn is the basic knowledge of both the front side and the back side, such as Java, Kafka, Spark, and Hive, and I believe they would be the sparkling point in my experience.


mostlyfad View All →

Computer Engineer • Entrepreneur • Blogger

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: