With a term like data science we may be guilty of unconsciously setting unrealistically high expectations.
In the common vernacular the tone of the word "science" is that of certainty. However, to a true scientist nothing could be further from the truth.
Science talks about theories and continual testing of those. Theories are living, breathing things and evolve with no static certainty, only consensus about what is broadly accepted currently.
Refining measurement systems and techniques gathers more data that helps adjust these theories but they are never really proved, just not yet disproved.
The notion is that of incrementalism. Scientific progress, even a so called revolution, happens on a much longer time scale than business cycles.
So in order to be true data scientists we need to reset our expectation of finding paradigm shifting insights to instead look for more realistic uses for our analytics.
What do we do not what did we do
For example, it is very rare that your brand new data analytics pipeline will come up with something that you didn't already know at some level. But confirming what you already suspected doesn't mean that it's wasted.
You don't need some unexpected result or startling correlation to justify your investment in analytics. Having a refined, data-supported view of something that you may have already known at an intuitive level will pay dividends just on its own.
It's less about insights in results than insights in actions: what does your analytics tell you about what to do next?
In the case of confirming something you already knew the next action to take is not necessarily a final decision but could be an experiment. Trying different input parameters to see how outcomes change.
Science makes progress via deliberate experimentation and data science should be no different.
Being data supported gives you the levers to make this systematic. Your analytics is your experimental data gathering tool not just your monitoring tool.
You could have come to the same conclusion based on your experience and judgement but it is much harder to run a "what if" scenario if you cannot isolate the test activities.
Your data doesn't have to be complete in order for you to be able to experiment, you don't even necessarily have to understand the process by which your inputs effect your outputs.
But by having specific input data you can vary in one group vs leaving it alone in a control group and being able to measure specific data in your outcomes, you can discover how to improve those outcomes even if you don't quite understand the processes in between.
The focus should be on what action should be taken.
A rigorous analysis indicating that something that you can't really control in your organization has a big effect on outcomes may not be as useful as a more loosely correlated parameter that you can change.
The former may be something to consider in the design of the next iteration of your process/system but the latter is how you can start seeing immediate benefits.
The return on your investment doesn't have to be dramatic to be meaningful over time. Even leaving aside your ultimate desired outcome, small wins lead to bigger wins, which leads to more credibility, the true currency of influence.
Data science is a science and as such we should temper our excitement in its potential with rigor in its execution. Conclusions without decisions are nearly has bad as decisions unsupported by data.