Big Data does not equal Big Knowledge

Big Data does not equal Big KnowledgeGordon WebsterBlockedUnblockFollowFollowingMay 19At the risk of alienating half of my audience here, I am going to open with the statement that the life science Big Data scene is largely Big Hype.

This is not because the data itself is not valuable, but rather because its real value is almost invariably buried under mountains of well-meaning but fruitless data analytics and data visualization.

The fancy data dashboards that big pharmaceutical companies spend big bucks on for handling their big data, are for the most part, little more than eye candy whose colorful renderings convey an illusion of progress without the reality of it.

Sure, there’s always the chance that the visualization of a particular pattern in the data will lead to some biological insight, but the truth is that for the kind of complex biological systems that life scientists study — the kind of systems that produce big data sets whose complexity mirrors their own — hoping to draw meaningful biological insight from some purely mathematical fitting, filtering and visualization of that data, is to put it politely, a long shot.

This approach has served scientists well for much simpler data sets that represent only one or two underlying and relatively well-conditioned trends — the sigmoid curve of a simple dose-response experiment for example.

The problem is that this approach rarely scales effectively when dealing with more complex systems for which the patterns of underlying behavior manifest in the data, are anything but intuitive.

Per the old saying that when you have a hammer, every problem looks like a nail, the application of these ubiquitous data dashboards to large, multidimensional biological datasets, rarely leads to anything more than some superficial insight about trends in the data, that almost invariably falls far short of any real, actionable knowledge.

Applying the standard pantheon of data analytics and data visualization techniques to large biological datasets, and expecting to draw some meaningful biological insight from this approach, is like expecting to learn about the life of an Egyptian pharaoh by excavating his tomb with a bulldozer.

Data is not knowledge.

Data can reveal relationships between events — correlations that may or may not be causal in nature; but by itself, data explains nothing without some form of conceptual model with which it can be assimilated into an intellectual framework that allows one to reason about it.

The potential rewards of the pursuit of a modeling approach are considerable.

Modeling in the life sciences, has until now, been largely confined to the realm of theoretical biology — but used as an adjunct to an experimental approach in the laboratory, modeling can do a great deal more than just the kind of predictive “weather forecasting” for which it is generally best known.

Models can suggest new hypotheses to test in the laboratory, and answer some of the most critical questions facing anybody who is trying to run a productive life science R&D program.

What experiment should I do next?.What are the critical hypotheses to test to get the kind of answers that will move our R&D program forward?.What are the critical components and behaviors of the system that I am studying?One misconception that is common amongst scientists who are relatively new to modeling, is that models need to be complete to be useful.

Physicists understand this paradigm well.

For example, the discrepancies that astronomers observe trying to accurately map the positions and motions of certain binary star systems to the standard Newtonian gravitational model, can actually provide the means to discover new planets orbiting distant stars.

The divergence of the astronomical observations from the gravitational model can be used to predict not only the amount of mass that is unaccounted for (in this case, a missing planet), but also its position in the sky.

Models therefore, can clearly have predictive value, even when they diverge from experimental observations and appear to be “wrong”.

The enormous challenge posed by the complexity of biological systems represents a potential intellectual impasse to life science researchers and threatens to stall future progress in basic biology and healthcare.

The Big Data boom without the kind of modeling with which to reason about the data, is about as likely to produce any real biological insight as growing a forest of trees is likely to yield a habitable log cabin.

The burgeoning volumes of laboratory data gathered in support of the Big Data paradigm, generally pose more questions than they answer until such time as they can be assimilated as real knowledge.

Modeling can provide the kind of intellectual frameworks needed to transform data into knowledge, and there has arguably never been a better time for life scientists in the mainstream of biological research both in academia and industry, to embrace modeling as a core component of their experimental R&D programs.

The true value of the Big Data boom has yet to be realized, but it is providing life scientists who are ready to get their modeling hats on, with the raw materials that they will need to create something that is truly worthy of the hype.

© The Digital Biologist.

. More details

Leave a Reply