Why are p-values like needles? It’s dangerous to share them!

It baffles me.

Argument 2 (against)Argument 2 (potential for misuse) is fair, but it’s not the p-value’s fault.

It turns out that making decisions carefully using statistics takes effort, but people keep looking for the miraculous no-effort magic that gets them a crystal ball.

The mysterious p-value is tempting — most of its users don’t understand how to use it and the resulting broken telephone has reached ridiculous levels.

I’m with you.

That’s why I’m a huge advocate of chilling out.

In other words, I’m a fan of making data-inspired decisions where you commit to not taking yourself seriously if you’re not willing to put in the effort.

The best solution for those who are feeling lazy: do descriptive analytics and stay humble.

If you’re not willing to put in the effort, opt for descriptive analytics and stay humble.

Statistical inference only makes sense if you go about it rigorously in a way that fully honors the intentional way that you set up your decision frame and assumptions.

This isn’t a p-value problem.

It’s a snake oil problem: Statistics is often sold as a magical cure-all that purports to deliver guarantees that are crazy if you stop to think about it.

There’s no magic that makes certainty out of uncertainty… but somehow there are many charlatans implying the opposite.

The case for p-valuesYou should also be suspicious of anyone who professes rabid love for p-values.

They’re only useful in very specific circumstances.

But when p-values are useful, they’re very useful.

They are a useful tool for making decisions a particular way.

It’s pretty hard to challenge that one.

For decision-makers wishing to do their best in an uncertain world and make decisions in a specific way, the p-value is perfect.

Don’t rain on their parade just because you’d prefer to make the decision a different way — when it’s your turn to be the decision-maker, you can do it however you please.

The other case for p-valuesIf you’re interested in analytics (and not statistics), p-values can be a useful way to summarize your data and iterate on your search.

Please don’t interpret them as a statistician would.

They don’t mean anything except there’s a pattern in these data.

Statisticians and analysts may come to blows if they don’t realize that analytics is about what’s in the data (only!) while statistics is about what’s beyond the data.

Don’t use the word hypothesis when you’re doing analytics, or you’ll sound like an idiot.

To an analyst, a p-value is just another statistic, with no interpretation except “this is the number I get when I shake my dataset in a particular way, when it’s small it means my dataset has a certain kind of pattern” — think of it as a way to visualize complicated and large datasets efficiently.

Don’t use the word hypothesis when you’re doing analytics, or you’ll sound like an idiot.

You work with facts: these data have this pattern.


To learn more about the difference between the subfields of data science, see bit.


Enough with analytics — there’s no battle there (just as there are no rules beyond “Don’t make conclusions beyond the data!”).

Back to statistics, where the argument is heated!The case for confidence intervals instead of p-valuesYou’re in the wrong room, buddy.

Go back to analytics where confidence intervals are a more efficient way of visualizing and summarizing data.

In statistical decision-making, no one cares.

Why?.The decision you get using confidence intervals and p-values is identical.

If you’re doing real statistical inference, you should be indifferent for any reason that isn’t aesthetic.

(It’s true that it’s a kindness to future data explorers — analysts — if you report your results with confidence intervals, but that’s got nothing to do with the quality of your decision-making.

)Back to basicsLet’s revisit the situation where p-values make statistical sense.

First, you’re setting up your decision around the notion of a default action and you’re giving the data a chance to talk you out of it.

You’re not trying to form mathematically-describable opinions (go Bayesian for that).

You’re willing to make a decision in a way that follows the logic in this blog post.

If not, p-values are not for you.

There’s nothing to argue about.

They’re a good tool for some jobs, but if that’s not the job you need done, then go get a better tool.

Since when do we expect that one tool should fit every job?!Now that you’ve decided to test hypotheses the classical way, let’s see how you calculate a p-value.

Create the null worldOnce you have your null hypothesis stated formally (after you’ve done this), the bulk of the work will be visualizing the null hypothesis world and figuring out how things work there so we can make a toy model of it.

That’s the point of those arcane scribbles you might remember from stats class — they boil down to making a mathematical model a universe whose rules are governed by the null hypothesis.

You build that universe out of equations (or by simulation!) so you can examine it in the next step.

The math is all about building a toy model of the null hypothesis universe.

That’s how you get the p-value.

The math is all about making and examining toy universes (how cool is that, fellow megalomaniacs!? So cool!) to see how likely they are to spawn datasets like yours.

If your toy model of the null hypothesis universe is unlikely to give you data like the data you got from the real world, your p-value will be low and you’ll end up rejecting the null hypothesis… change your mind!Assumptions, assumptions, assumptionsNaturally, you’ll have to make some simplifying assumptions, otherwise you’ll get overwhelmed quickly.

No one has the time to make a universe as rich and complex as the one we actually live in, which is why statistics doesn’t give you Truth-with-a-capital-T, but rather a method for making reasonable decisions under uncertainty… subject to some corners you’re willing to cut.

(It’s also why statistical pedantry looks so silly.

)In STAT101, those assumptions tend to be spoonfed to you as “The data are normally distributed…blah blah blah.

” In real life, you have to come up with the assumptions yourself, which can feel scary since suddenly there are no right answers.

In real life, there are no right answers.

The best we can do is make decisions in a way that feels reasonable.

If a p-value was calculated for someone else, it’s probably useless to you.

It should only be shared among people who choose to make the same simplifying assumptions and frame their decision-making in the same way.

It’s dangerous to use other people’s p-values… they’re like needles: if you’re going to use them, get your own!Statistical decision-making is always subjective, whether it’s Bayesian or Frequentist, because you always have to make simplifying assumptions.

The conclusions are only valid insofar as you buy into those assumptions, which is why it’s weird to except someone to agree with your punchline if they haven’t seen the assumptions it’s based on.

Why do we do that?.No idea.

I don’t.

If I’m not willing to think about how I’d like to make the decision and whether the stated assumptions are palatable to me (before I see the data or p-value), then all I’ll ever see in a p-value is what an analyst sees: after some settings were twiddled, you saw a pattern.

That’s cute.

Sometimes I see animals when I look at clouds too.

If I’m tempted to take it seriously, I’ll follow the “insight” up in other data.

Otherwise, I’ll treat it as vague inspiration… and at that quality, who the hell cares how good it is anyways?Does this evidence surprise you?Now that you have imagined the world that describes your null hypothesis, you’re going to ask whether the evidence you got — your data—are surprising in that world.

The p-value is simply the probability that your null world coughs up data at least as damning as yours.

When it’s low, that means your data look weird in such a world, which makes you feel ridiculous about acting as if you live in that world.

When it’s low enough for your tastes — below a threshold you pick called a significance level- that means you’re surprised enough to change your mind and switch your action away from your default.

Otherwise, you keep doing what you were going to do anyway.

Interpret a low p-value as: “Some person was surprised by something.

”Who defines what “ridiculous” means?.The decision-maker (whoever chose the assumptions and significance level).

If you didn’t set the analysis up, the only valid interpretation of a low p-value is: “Some person was surprised by something.

” Let’s all meditate on how little that tells you if you don’t know much about the someone or the something in question.

And that’s why p-values are a bit like medical needles: They’re intended for personal use and it’s dangerous to share them.

If you enjoyed this article, here is a top 10 list of my musings for your amusement: bit.


. More details

Leave a Reply