Building a Framework for Data Trust

Build a framework for data trust
Thank goodness data quality and data governance continue to be on the hot list of topics we’re discussing, teaching, tweeting, blogging and writing about. The more we discuss them, the more people are taking notice, and the better we all understand and communicate the business benefits of improved DG.
We’re at the point where business leaders are beginning to grok the value of data in achieving their organizational objectives, and with this comes the very real issue of trust in critical data. MDM has done a great job of shining a very bright light on data quality (and therefore trustworthiness), but we have to change our mental model of how to achieve higher levels of data quality.
Surprisingly, there are huge numbers of practitioners who try to improve data quality (and trust in data) by cleansing data on its way into their data warehouse. This is way too late!
As I mentioned in an earlier blog post, this is like allowing cyanide into the municipal water supply and trying to filter it out after the fact! That would be a fool’s errand and inestimably expensive – but it’s pretty much worked out that way in IT, and it’s time for a change.
The only way to keep a poison like that out is to prevent it from entering the water supply in the first place. The same thing applies to our IT ecosystem. We’ve allowed poisoned data into our systems, and this has lowered the trust business leaders have in us as IT professionals, as well as the data we manage on their behalf.
So, how do you solve this problem? Quite simply, you must prevent low-quality data from infecting your systems, and you must constantly check for degradations in data quality whenever data are touched, manipulated, combined and propagated.
This implies real-time enforcement of policies and business rules that build trust into your data. It also implies a widely-distributed set of managed assets (services, agents and sensors) that constantly check for quality and trustworthiness of data and make improvements as data flow thru your systems.
But how are these separate assets going to detect, analyze and repair the data issues when found? How are they going to manage exceptions? What processes will they invoke to make repairs or notify other assets of a data quality event? What rules are they going to enforce and how are they going to measure and prove success? Certainly not as part of an ETL process!
Enter the Critical Role of Data Governance
Data governance is the ongoing, focused process through which business leaders negotiate policies on how to ensure consistent levels of data quality across an enterprise.
It’s where exception management is addressed so that critical operations take place with the appropriate levels of data quality. It is where business rules are approved for detecting, analyzing and treating quality issues. It is also the place where measurements are taken of the efficacy of those policies, processes and rules.
These declarations (principles, policies, processes, business rules and metrics) instruct the behavior of the myriad data quality assets deployed through an enterprise, and lay the groundwork for a data quality ecosystem and active data governance.
A Framework for Data Quality
Data governance is the place where you establish a framework for data trust. Through data governance, business leaders get the insight into critical data and put the scaffolding in place to guarantee and measure agreed-to levels of quality. Without this system-wide direction, any attempts at correcting data problems are at risk.
In future blog posts, I’ll begin to lay out how this should work…
6 Responses »
Trackbacks
Leave a Response







Entries(RSS)
Marty,
good article. One thing I would bring up... trying to establish a Data Governance is like an act of congress. Try finding a body of people that have influence, and access to authority to come together and establish a governance model is pretty tough.
As a framework using data governance to build data quality, that would be great, but to many times we see it the other way, where data quality is the driver, and it is the pseudo data governance.
What I have found is that to establish data governance, a vendor needs to provide some tooling... yes that bad word "tooling"... another product to sell, another blah blah blah sales pitch, but think about tools that are out in the market today that provide ways to establish stewardship, and can be expanded to create policies in Data Governance. Compliance issues around the data ? how about a tool that monitors, tracks, and grants access based on policies ... all working in the background ... like IBM's Guardium...
The success behind good data, and a good MDM strategy evolves around the points you have made: policies and governance. I am just hoping we see an explosion of Data Governance and Data Stewardship projects in the near future
thanks for the post.
Garnie –
Thanks so much for the feedback – you raise many fine points (very similar to ones that I’ve been preaching about for quite a while)!
I do not know the Guardium product, but I think I’m trying to shoot at a target that’s perhaps a little further upstream – at the point where a piece of data is captured and perhaps composed with other data to complete a business transaction – even before it makes it into a message destined for an ESB and before it gets to a physical schema.
But you’re right on target that it should be the Policies that govern which data are managed from a DQ perspective, which processes should be enforced, what security (and other) constraints, how to handle exceptions, etc. My next blog will be about these different “declarations” of policy, process, rules, metrics, etc.
Also, your point about Data Quality “leading” Data Governance is right on. We have a “chicken or egg” dilemma, don’t we? But why not use the DQ driver as the perfect opportunity to implement some level of "minimalist" DG?
One little note: establishing DG can be very easy in some sectors like healthcare, compared to other commercial sectors (I’ve done it in a matter of a few weeks, believe it or not!). Even then, establishing DG can be done the hard way (your “act of Congress” analogy is perfect!) or an easier way (if you’ve read/heard any of my Agile Data Governance work). Let’s try to find a way to make it less like “an act of Congress” and more like a “Neighborhood Watch” kind of organization!!!
Let’s keep pushing for “Active Data Governance!”
Cheers!
Marty,
Great post - I completely agree with you.
Prevention is better than cure - and "Active Data Governance" is the key.
I'm reminded of a story about Edward De Bono, who coined the phrase "Lateral Thinking".
He was asked to consider how to clean up the rivers in Switzerland (I believe).
His solution was to require factories to have their intake pipe downstream of their outflow pipe.
This was implemented, and water quality rose dramatically - not surprising.
I feel a blog post coming on about the above - for another day.
Well done again Marty - I look forward to your future posts,
Ken
Via Twitter from Mark Horseman:
Data governance evolves at an institution. Data quality becomes paramount when data governance is understood but before it begins.
Hey Ken -
Thanks for your encouragement and agreement!
I also have heard references to De Bono's Lateral Thinking work and now will go look it up. What a superb "pollution solution!"
I can't wait to read your blog - please tweet on it when it's written - I look forward to your insights and wisdom!
Cheers!