Why Your Company Needs Reproducible Research

Why Your Company Needs Reproducible ResearchStuart BuckBlockedUnblockFollowFollowingApr 9Photo by José Alejandro Cuffia on UnsplashToday’s sciences — especially the social sciences — are in a bit of turmoil.

Many of the most important experiments and findings are not reproducible.

This “reproducibility crisis” has significant implications not just for the future of academic research and development, but for any business expecting increased returns from investing in innovation, experimentation, and data analysis.

Business needs to learn from science’s mistakes.

As Vice President of Research at Arnold Ventures, I have close knowledge of this ongoing crisis, because I have funded a good deal of these ‘second look’ efforts.

Here’s an unhappy sample of what we funded and found:In 2015, Science published the results of the largest replication project ever performed: the Reproducibility Project in Psychology, in which hundreds of researchers around the world attempted to replicate 100 psychology experiments from top journals.

Only about 40% of the findings could be successfully replicated, while the rest were either inconclusive or definitively not replicated.

In August 2018, the Social Sciences Replication Project replicated all 21 social science experiments that had been published in Science or Nature from 2010 to 2015.

Only 13 of the 21 experiments could be replicated.

Even then, replication revealed that the ‘effect size’ — the magnitude of the declared discovery — was typically about half of what had originally been claimedBeyond these well-known replication projects, researchers have documented reproducibility problems in the research literature on economics, finance, marketing, management, organizational sciences, and international business.

Indeed, after analyzing over 2,000 business experiments, Ron Berman and his colleagues estimated that 42% of the effects found to be significant were actually false positives.

As John Ioannidis of Stanford told the Washington Post, “I would have expected results to be more reproducible in [top] journals.

” Scientific establishments worldwide have taken these findings seriously and are acting upon them.

For example, Congress officially asked the National Academies of Sciences to produce a major national report (it’s still in progress) on how to fix the reproducibility problem in science.

Digital innovators from Alibaba to Google to Facebook to Netflix to Microsoft to Amazon have already actively embraced large-scale, rapid experimentation as integral to their innovation efforts.

But every organization seeking authentic insight from experiment needs to be wary of the problems that have made scientific research unreliable.

Based on the literature, on ‘reproducible research’ efforts, and ongoing interaction with scientists who deeply understand these issues, I would suggest four key ‘best practices’ for business experimenters.

Incentives Matter: Don’t Over-Emphasize Positive ResultsDon’t put a thumb on the scale by expecting “positive” results from your divisions or teams who have put their work on the line by doing experiments.

Experiments are for testing and learning, not validating what someone has already decided to do.

The pressure to show results creates bias throughout the whole process of research, leading to the reproducibility problems we see throughout academia.

But as Jim Manzi and others have pointed out, most ideas just don’t work that well.

That’s the whole reason for doing experiments in the first place (if we already knew everything that worked, why bother experimenting?)Sure, it’s only human to be excited when an experiment shows that something “works” — whether it’s a drug that cures leukemia or a marketing innovation that increases lift or revenue.

But that means people can be tempted (even unconsciously) to find such a result by any means necessary.

If a business is going to do experiments or any other type of data analysis, it’s better to have the full truth rather than to bias the process.

Thus, I recommend going out of your way to require experimentation and data analysis to adhere to best practices, and make it clear that you care more about quality assurance than about finding “exciting” results.


Incentives Matter: Value ReplicationsEven when an exciting result turns up — say, an advertising experiment that improves lift or signup rates — this shouldn’t mean an immediate scale-up.

Even if experiments are done with the utmost rigor, false positives will still exist.

So rather than go from a study on a few customers to a company-wide implementation of the finding, it’s better to phase in the finding more slowly while analysts do one or more replication experiments.

Indeed, you might consider doing what Ronny Kohavi does at Microsoft: even when you roll out a new practice to customers or website visitors, hold back a random 5% or 10% as a control group on an ongoing basis.

Then, if the new practice stops working, you’ll know.

One of the main problems in academia is that careful replication is rarely rewarded, even though it is frequently essential.

But in any given company, no one has to worry about convincing the entire field to care about replication.

You just have to embrace good practices for yourself.


The World is ComplicatedOne of the hidden and often-undocumented causes of irreproducibility is the unavoidable fact that reality is immensely complicated.

In one famous case from cell biology, scientists at Harvard and Berkeley couldn’t get consistent results on a seemingly straightforward experiment on breast cancer cells.

In their words, “A set of data that was supposed to be completed in a few months took 2 years to understand and sort out.

” After two years, they finally realized that at one point in the experiment, the two labs had been stirring the tissue at different speeds.

That alone was enough to make the experiment turn out a different way.

The same is true for business experiments.

Innumerable factors about the business itself, the economy, political and cultural conditions, and personal characteristics of the users can radically impact experimental outcomes, even with excellent design.

To make matters worse, some business experiments have millions of subjects, which means that they can find even the most trivial of effects to be significant.

Consider the famous Facebook experiment on emotional contagion, in which people were shown differing numbers of “positive” or “negative” posts from their friends, and the researchers then measured whether the people’s own posts used more “positive” or “negative” words later.

The effects found were as small as four-hundredths of one percent.

By analogy, this is like finding that the Cleveland Cavaliers’ average height is a mere three-thousandths of an inch higher than the Golden State Warriors.

Not only is such a tiny effect of little real-world significance, it could disappear on the next measurement if one person stands a little taller, or slumps ever so slightly.

By the same token, “big data” experiments with millions of Google or Amazon users might be able to find a minute fraction of a percent change in lift.

But such a small effect could be changed by nearly anything, including any of thousands of seemingly minor changes that the company may make in the ordinary course of business.

That’s why it’s even more important to expect regular replications (Lesson 2).

Even if an experiment works on the first, second, or even third tries, those original experiments may “expire,” so to speak, if only by the passage of time and the shifting composition or tastes of users.

It is statistically — and economically — unwise to assume that an experiment showing a 1% increase in revenue can extrapolate to an entire business unit and remain a 1% increase for all time.


Keep Staff up to Speed on Research PracticesBenchmark methodological best practices in research.

Part of the reproducibility problem in science comes down to education, training, and familiarity with good statistical practice.

Make sure that your team has at least one person on staff, or has a regular consultant, who is up to speed on the latest in best practices for research.

The reproducibility crisis in science has led to many important lessons about how to structure experiments, how to do data analyses, and so forth.

Heed those lessons, and business experiments can provide a competitive advantage by showing what really works and for whom.


. More details

Leave a Reply