Your data is a mess. Your analytics is wilting on the vine, starving for lack of effective data. As quickly as you clean data, more issues pop up.
You find your reporting team continually having to rework reports to deal with odd cases that are jacking your data. It's difficult to be data driven with poor data quality and limited capacity, budget, and time.
Sound familiar? All data-driven organizations, regardless of industries or domains, struggle with creating an efficient data pipeline due to poor data quality. But how do you fix that? How do you deploy limited resources to get the most bang for your buck?
Or to put it another way, how do we make data more effective more effectively? Let's start with the definition
ef·fec·tive·ness /iˈfektivnəs/, noun the degree to which something is successful in producing the intended or desired result
Note that while a certain amount of quality is implied here, effective data is by no means perfect data. Effective data is strictly that which leads to desired results: organizational success.
Thus the mantra of data effectiveness is to focus on work that supports your analytics that supports your business outcomes that results in organizational success and not pursue quality for it's own sake.
Never let me slip, 'cause if I slip, then I'm slippin'
Driving towards perfection is comfortable, who can argue with wanting to be better? The point is changing your definition of "better" to be more sophisticated, pragmatic and tailored to the reality of your data. As your data gets better you iterate your way to absolute quality.
One approach comes from the realization that there are phases to data quality:
Consistency - If there isn’t agreement on what value is being reported then it is difficult to determine if it is accurate
Accuracy - If the value being reported doesn't match reality it is hard to determine the relevancy to business outcomes
Relevancy - Once you have consistency and accuracy then you can determine whether it is useful
A particular metric can start being effective even if you only have consistency (eg measuring trends or relative changes) but as you go up the chain, you begin to turn effective data into effective information into effective knowledge. So to be effectively data driven you need to have a C.A.R.
Back to the lecture at hand
In the illustration above, if metric A has different values in your 3 reports then it's difficult to work on accuracy because you don't know which report to start with.
So get that calculation consistent.
Then once you get a repeatable, consistent value, you can start tweaking the calculation accuracy so that it matches reality. And once you get consistent and accurate metrics, you can verify whether or not they are capturing aspects of your business that are relevant.
Phase 1 Consistency - Data
“What numerical value is being shown for this metric?” Driven by reporting Consistency means literally just that: a metric has the same value for the same parameters no matter who pulls it. Matching reality is not the focus at this stage, repeatability is.
Traceability – same metric in different reports must be traced back to same source
Same parameters – need to be careful because different metrics could be referred to by the same common name
Time factor – legitimate changes can be made after report is run, don't mistake updated data for inconsistent metrics
Phase 2 Accuracy - Information
“Is the numerical value shown for this metric correct?” Driven by Analytics Accuracy is what people normally focus on when talking about data quality. But it's a moving target unless you address consistency. Once you have that you can verify accuracy by comparing against manually calculated metrics or physical audits.
Data entry errors – identify source and reason for poor errors, make sure to verify derived metrics using only good data so that you don't fruitlessly tweak a calculation when the real issue is nonsensical data
Wrong or inconsistent business rules – nail down definitions, two different sets of business rules for same metric could be appropriate (eg accounting rules could have changed from one year to another)
Phase 3 Relevancy - Knowledge
“Is this metric helping to meet our goal?”
Driven by business
Once you have accuracy, then you can determine whether that metric is useful. It could be that the previous inconsistencies and inaccuracies were hiding the fact you are not measuring what you thought you were. Or what you are measuring doesn’t really impact outcome.
First instinct may be to change metric definition – Would need to restart cycle, could be another instance of perfection being the enemy of the good
Be open to changing business goal - Effective data in hand is worth two in the bush, a unmeasurable business goal is really just a wish not a goal so if you already have a valid metric or set of metrics it may be appropriate to question if a different business goal that uses those metrics may be a better pathway to the same organizational success
So sit back, relax, and strap on your seatbelt
As data becomes information becomes knowledge, the data sophistication of your analytics gets more refined and your metrics will likely proliferate but as long as you follow this C.A.R cycle you will continue to be data effective.
In future posts I'll be giving you some specific ideas for putting this into gear.