Data Lineage: The Journey of Data

Born out of the necessity to understand how is data written, Data Lineage is a data lifecycle that includes the data’s origins, what happens to it, and where it moves over time. In a more simplified way, it is the story of how data gets into an organization, who uses it, and how it is transformed.

Data Lineage is essential when it comes to data quality, especially that fully understanding the data is necessary. It is the first step to be done before diving into data cleansing, analytics, and decision making. Every organization and business today is facing issues with demonstrating data’s origin and transformation: Where the data comes from? What does it mean? When Was it captured? How did it change? Who and when is using it? How and why it is stored?

No matter how much data an organization has, neither how insightful this data is, without Data Lineage, Big Data becomes synonymous with the last phrase in a game of telephone.


Data Lineage is crucial in analytics and decision making. Before being impressed by the colors and charts of a dashboard, and by the insights driven from the data visualization and statistics, the lineage of the data that fed the analysis behind is to be understood. Having too much data flowing from different sources into many directions across a business makes it harder to understand when the data was created and how it even got there.

With Data Lineage, it is way easier and simpler to track errors back to the root cause in an analytics process.



Data Lineage, Data Governance, and Data Provenance:

While Data Lineage is the journey of data (origin to destination and everything happening in between), Data Governance is about understanding everything around data (audit, log, view of datawarehouse, etc.). On the other hand, Data Provenance documents the inputs, systems, entities, and processes that influence data of interest, providing the history of the data and its origin. It is more on a high-level view of the system for business users to understand where their data is coming from.


mostlyfad View All →

Computer Engineer • Entrepreneur • Blogger

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: