Businesses around the world are fast pivoting their organizations to become data driven. For example, a North American financial services organization that manages its diverse clientele’s investment portfolios recently decided to embark on a journey to profile their master data for quality, provenance accuracy and business process alignment.
The business’ data science teams were coming together to roadmap an effort to promote citizen analytics developers within various operational groups while centralizing an initiative to build machine learning models to drive new revenue and build efficiencies in their operational workflows. The organization was tasked with identifying all data sources, asserting the true source of each data asset, knowing the timestamp for when the data assets were created or changed and tracing the data’s journey through the technology landscape. Here are a few discoveries they made as their project progressed:
- Business definitions for how each business group classified some of the primary master data were not consistent.
- Their central IT and Compliance business teams uncovered a few significant sources of data uncatalogued and lacking governance oversight.
- Data lifecycle and lineage management needed a reevaluation with updated information.
- Gaps in data audit trails were giving rise to a lack of trust in data as one of the organization’s valued assets.
- Additionally, data quality had degenerated over time due to:
- Data fields being renamed during system upgrades while missing context
- Data loss during migrations carried out during the many application/system changes
- Data not being kept current by the people and systems tasked with upkeep of data
- Data context lost when employees have exited the organization, thus taking the respective knowledge with them
- Technology and data not kept current with business expansion that had occurred over time
A big takeaway from the above experience was that companies should dedicate time and budget to data governance as an ongoing organizational need, since time and cost of recovery or lost opportunity from not utilizing technology and data to their fullest value is so costly in the information age.
For medium and large scale organizations, applications and processes that support and automate the processes of data governance and quality control have become foundational to their data science activities. Microsoft’s Azure Purview helps businesses, at enterprise scale, connect to various data sources for automated data classification, as well as extract metadata and data lineage for holistic data governance. For organizations that are in their early stages of data maturity, the Power BI monitoring tool offers a quick step into the areas of data lineage tracking and data classification.
Companies just establishing their data maturity levels typically view the task of moving up the stages of data maturity as a daunting effort, although it helps when such projects are broken down into smaller milestones and quick wins. Data becomes an asset to a company when we understand the business problem we are trying to solve. For example, one of the requirements for the financial services organization described earlier was the capability to predict the behavior of their existing customers who had been loyal to them over the past decade, understand lifetime values of the most strategic customers and develop predictions of what their needs will be in the future. The business SMEs were very intentional about the time boundaries around the data while also being specific that the aggregate of lifetime values be over a certain benchmark. These cost and time boundaries of the customer master data helped focus the data profiling activity to a targeted data set while proving to be a large win for the business in the form of decisions stemming from the analytics that the dataset garnered.
Here are a few steps that helped evaluate the data for its integrity and extract its lineage:
- The origin and source of all data assets within this focus group
- The timestamp of when the data assets were created, and if the data assets were created by people of the organization or specific systems
- Trace of the data’s movement and how it has changed/changes across the technology landscape during its lifecycle
- Classification of the type of data, including tagging data sensitivity, business definition and business relevance and additional metadata compilation
- Identification of stewards who are responsible for the safe upkeep of the data
- Influence of the data asset relative to the business decisions identified as success metrics of the project
- Asset normalizations as acceptable as universal to the organization
- Outlier references
During the course of the above process:
Extracting data lineage helped deliver the audit trail of the data elements at the lowest granular level. In the case of our financial services organization, business leaders can trace the various changes that contributed to a data metric projection change, like a client’s net worth/financial contributions or the organization’s own sales projections.
Additionally, discovering the new data sources and the variety of data assets that were originating from them helped establish data provenance a lot more accurately. Enforcing data integrity with the newly implemented data catalog system helped the financial services organization build trust in its data assets while ensuring the assets were classified and categorized to comply with government regulations, reducing risks of asset breach and building confidence within the business leaders to democratize data across the organization and empower their employees with information.