Putting the ‘Science’ in Data Science

These general wonders need to be formalized in conjectures or hypotheses which in turn can be tested.

This forms the true starting point of scientific enquiry.

The corporate Data Scientist, on the other hand, is too often ‘looking at what the data says.

’ Their starting point is, conceptually, somewhere mid-point on a methodological framework.

This is problematic for many reasons.

There is no a priori knowledge that the data is representative for the problem at hand, that the sample size is statistically relevant, that the data can actually answer the question, or the system’s behavior ergodic.

Most of all though, data is malleable; it is surprisingly easy for the data to follow a(ny) given narrative.

The Data Science ApproachData Science is quite a ways apart from its nearest scientific counterparts.

Because unlike research, Data Science is usually a reactive operation.

Reactive in the sense that it builds on data capture and specified at some point in the past.

Proper research is earmarked by its deductive nomological approach where, by the way of well-crafted hypothesis based on a conceptual worldview and strict rules, data is captured and the null-hypothesis falsified.

So data follows the question, rather than the other way around.

Data Science is built on the assumption that data when properly modelled can reveal new information about the world.

As a result of this positive worldview some pitfalls for the ‘data driven approach’ emerge.

Data availability bias: the data available is thought to be relevant, collectively exhaustive, conclusive, and final.

That the data holds the right variables to deduce or model whatever is required.

There is no data missing that hides the actual predictor.

Data quality and meaningfulness: data gathered by sources external to the Data Science team is typically not captured with the end in mind.

The data is vaguely specified, ill-timestamped or has quality issues.

Ask any Data Scientist and they will agree: they’d prefer quality to quantity.

Causal relations are not implied: especially when dealing with customer data it is easy to find high correlations between variables for which, with proper imagination, a plausible causal connection exist.

However, it is unlikely that the data relating to the customers ‘job to be done’ is perfectly capture.

Data does not give normative guidance: data will not tell you what the right thing to do is.

It will not magically reveal the way to move forward.

It is either a prediction based on past events or a diachronic slice of time.

The Scientific ApproachWhen structurally applied, an integrated scientific method approach could potentially open up completely new fields of integrated Data Science.

Think, for instance, of combining qualitatively oriented social science or behavioral economics with a team of more quantitatively oriented Data Scientists.

The former can specify the hypothesis based on the latest academic reserach, the latter can specify data requirements and supplement the qualitative research with numbers.

In tandem driving the business to new insight.

Taking, for exmaple, academic behavioral economic ideas of incentives and nudges and bringing them to the digital world.

Test the assumptions and iterate.

Not just extracting ‘truths’ fromyour existing data, but reimagining what is possible with online and offline information gathering.

It means moving from passive, reactive, analytics to proactive experimentation.

Moving away from what Richard Feynman called cargo cult research to compounded research by building the house brick by brick.

Although I have to see the first implementation, I am not the only one conceptualizing these ideas.

MIT recently published a paper highlighting the shortcoming of Data Science in relation to Customer Experience and proposed similar extension of the Data Science teams with, for instance, a customer experience experts.

This change in approach would also, quite dramatically, change the location of Data Science teams in organizations.

It puts Data Science in the driver seat, driving organizational change, developing customer knowledge, experimenting and forcing website change all based on research and solid foundation.

And, finally, putting the Science in Data Science.


. More details

Leave a Reply