In my last blog (Leadership Lessons in Data Quality - Part 1) I talked about some useful techniques that I have learned by delivering improved data quality in a number of Australian organisations. In this post I want to talk about how to help business people to care more about DQ.
This is vital because it is the key to making good data quality sustainable in your organisation.
The short answer - like a lot of analytic challenges faced in the real world - is to measure the problem.
Once you can measure data quality, as I mentioned in my earlier post: I also recommend that data quality improvement of the business owners data be made a part of their performance incentive program. In simpler language: link it to their bonus.
Data quality has to be measured like any other KPI so that non-experts can understand two things:
- What we all mean by poor data quality - define it
- The cost of poor data quality - quantify it.
This is easiest to see in an example:
Don't Set as a KPI:
Currently 6,753 customer records, or 2.0834% of all main customer records in EDW2 contain errors. Our objective is to reduce this by 50% within 6 months.
Do Set as a KPI:
Currently almost 7,000 clients have incorrect data that results an average of 2,800 direct mailings being returned by Australia Post. This adds $26,000 directly (printing and posting) and indirectly (administration) to each marketing campaign. Our objective is to reduce this average cost by 50% within 6 months.
A useful technique I have developed to help me understand data and quality is to create a taxonomy that classifies data in the following three ways:
- 'Classic' data quality criteria. Things like accuracy, relevance, availability, etc.
- The impact data quality has on the business. Things like dollars lost if data is wrong, the value of increased sales if data is correct, etc.
- What action(s) they initiate in an organisation when the data is wrong. Things like intervening at the source of data entry, remedial actions after data entry, etc.
So what metrics can you associate with data quality in your organisation? Here are six I use:
- Accuracy: the degree of confidence that data is free of error/defect
- Completeness: the extent to which data is not missing and is of sufficient breadth and depth for the task at hand
- Consistency: the degree to which common data across different sources follows the same definitions, codes and formats
- Timeliness: the degree to which data is up to date
- Security: the degree to which data confidentiality, integrity and availability has been maintained
- Fit for Purpose: the degree to which data is relevant, appropriate and meets business specifications.