I am a great proponent of data quality investments and this presents a great challenge to me when I am trying to create a new information management capability in an organisation. Overcoming this challenge is essential to developing a valued analytic capability. This is what I want to talk about in my next couple of posts.
It's a challenge because usually very few other Senior Managers / C-level Executives appreciate the value of good data. Intellectually, most usually agree with me, but when priorities are being set, data quality almost invariably gets pushed down the priority list by more 'burning' issues. Issues where Managers see a more immediate impact on the bottom line.
The only exception to this is when a serious compliance issue has occurred or there is a serious risk of one occurring if nothing is done to prevent it. Even in these circumstances, the decision is usually made to invest in a short term and usually one-time-only 'fix' rather than investing more to make a solution that continues to work over time or even goes back and fixes the underlying source of the poor data quality (often back in a source system or operational process).
So the challenge when constructing a program of work to deliver a new analytic capability is to build into the solution an appropriate and sustainable level of data quality.
Couldn't be easier, right?
Remember - the Board is not going to appreciate explicit data quality goals unless they think that they will not delay other (and in their minds more valued) goals. In other words they want it for free.
Tricky to do but here are some actions that I have successfully used in multiple large projects/programs:
Establish Business Ownership
If you can't find the formal business owner of the data then you can't fix the quality issue in a sustainable way. It may be the case that your work delivering a new data warehouse or other data store may solve today's quality issue - but if no one in the business owns the data then the quality will inevitably decline over time. This will usually happen a lot faster than anyone predicts.
The more usual situation is that the quality issue remains and everyone continues to work around it. Often the dashboard, analytic reports, or data extracts you deliver from your shiny new platform will also have one of two of the following consequences:
- the data quality issue becomes masked so that people look at the shiny new analytics and forget that it is still based on the same lousy data, and/or
- users continue to not trust the numbers, and/or
- areas of analysis that require the poor quality data are excluded from the analytic platform - making your investment much less valuable.
All of these are negative consequences that can seriously devalue you and your teams hard work.
The best business owners are those who are directly and negatively affected when quality declines. They are also normally (but not always) the ones who most benefit from improved data quality as well. Why are they the best business owners? Because they will care the most.Once the correct owner(s) are agreed with the business, I also recommend that data quality improvement of their data be made a measured part of their performance incentive program - but more on this in my next post.
Manage Data As An Asset
Manage your data assets just like you do capital and equipment. If it is worth something then at the very least you need to manage it to protect that value.
Would you leave a valuable printing press or lathe outside to rust? Data is the same. If you don't look after data, it will decline over time just as capital equipment depreciates over time.
The best data asset managers are your senior information management experts. Why? Because they care deeply about data quality and should already be closely engaged with the business and IT owners of data.
Link Cause To Effect
Explicitly link data quality issues to known problems that need to be solved. If 1% of your customer records have invalid details, who cares? Why do they care? And most importantly: how much do they care?
If you can identify the department suffering most, then go to that Manager and work with them to quantify the dollar cost. Agree with them also about the importance of the resulting problem(s). If it is important and causing the company to lose sufficient dollars then can a solution be devised that will save money and deliver this benefit in a reasonable timeframe? Reasonable in my experience is almost always within 6 months.
Sidebar: The Value Of Program Management
If you structure the delivery of a program of work to spread delivery evenly throughout the life of the program, you can often buy yourself up to 12 - 15 months of time to solve complex data quality issues. Again, this is a much wider topic than data quality that I will leave to another post.
Work Closely With IT
Business ownership is great - but most of the data the business cares about is electronic. If IT can't help deliver the solution then there is little hope that the beautiful analytic platform you build will operate to maximum efficiency over time.
IT has to be a full and active partner in any data quality initiative. Even if initially it is only to educate IT and to develop their own information management capabilities. In my experience, many IT people will enthusiastically take on IM/DQ roles when they know that people in the business care.
If IT doesn't care about data quality, then at least some of the blame for this sits with the business leaders. In data quality, just like most areas of modern business, IT and business managers are very dependent on each other. The safest way to ensure a successful analytic outcome is close cooperation.
Yes data quality is a very important factor that is why data quality software is extremely needed.
Posted by: bum's software gear | Wednesday, August 26, 2009 at 12:29 PM
Sanjay, thanks for the comment. I agree that information is almost always imperfect and you almost always have to accept a certain level of error.
As to fuzzy logic - it's an interesting idea that I have only ever applied to data scrubbing.
For example I frequently need to scrub customer address details in the data integration layer of a data warehouse. Usually it is cheaper and easier to have the addresses scrubbed by a third party piece of software or SaaS. They will use fuzzy logic to improve the quality of the scrubbed addresses they return. It works well and is cheaper than reinventing the wheel and having custom software written.
More broadly I use 'hard' measures - such as the concept of banding to trigger alerts.
For example:
Set a dashboard alert so that if the frequency of rejected customer records in data integration process X is:
- less than 1%: then show status as green (OK)
- equal to or more than 1% but less than 1.5%:
then show status as yellow (Warning)
- equal to or more than 1.5%: then show status as red (Alert)
and send email requesting action to process owner.
Nothing fuzzy about that.
I can see that fuzzy logic is very useful in situations where complex conditions arise or where the volume of realtime transactions is massive - like credit card transactions in Visa or Mastercard.
My warehouses to date are mostly running data integration as a batch process so the need doesn't arise.
Check out NAFIPS (North American Fuzzy Information Processing Society!) at http://nafips.ece.ualberta.ca/ if you want to dig deeper.
I also remember that Business Objects purchased FUZZY! Informatik a couple of years ago. FUZZY! Informatik sold EU friendly data scrubbing software.
Do you have an example in DQ or IM?
Posted by: OzAnalytics | Wednesday, August 19, 2009 at 08:16 AM
Very interesting and illuminating post, Steve!. Feel that organizations need lot of discipline and management to have their data quality right. Was wondering if there is a concept of fuzzy data/fuzzy logic that can be applied to data in an organization, as the reality is an organization's data quality is realistically imperfect.
Posted by: Sanjay M Kabe | Wednesday, August 19, 2009 at 07:46 AM