# One Question to Rule Them All

One Question to Rule Them AllA surprising finding from a test assessing quick probabilistic thinkingSpiros DoxiadisBlockedUnblockFollowFollowingMar 8This is a collaborative article, written by Christoforos A.

and Spiros DoxiadisFirst, a little backgroundNice to meet you: Christoforos is an honorary member of staff at Imperial College and Spiros is an entrepreneur and amateur probabilist.

A few months back, we co-created the board game Borel, a game that uses probabilistic quizzes and instruments of chance -such as multi-sided dice and pouches with colored balls- to test players’ probabilistic judgement.

In the interest of exploring the way people use their intuition to deal with probabilistic dilemmas, we recently created and circulated online a short multiple-choice questions test, calling it, somewhat cheekily, Probabilistic IQ test.

It consists of 16 experiment descriptions, taken right out of our board game, and a question about the likelihood of their outcome.

So, for example, one of the questions is: Roll a 6-sided die five times.

Which is more probable?(a) You will roll a 2 at least one time(b) You will NOT roll a 2(c) It’s about 50%-50%Since we were merely interested to measure the accuracy of people’s probabilistic intuition and not any kind of mathematical competence, we asked respondents to use no more than 1 minute for each question.

Due to our desire to explore the appeal of Borel to people with varying academic backgrounds, the main objective of the test was to see if formal mathematical training has an effect on the accuracy of one’s probabilistic fast thinking.

So at the end we added a question about the respondent’s post-school formal mathematical training (the available answers were: ‘no formal training’, ‘up to 4 years of training’ and ‘more than 4 years of training’).

So, first of all, does training have an effect on performance?Let’s start by owning up to the fact that our test, just like all online polls, can be subject to selection bias: response is voluntary and the audience reached is not randomly selected, so there is no guarantee that the subset of people that actually participate are a representative sample of the population at large.

In particular, one fairly obvious selection mechanism is in place here: People that do not tend to enjoy mathematical riddles are unlikely to respond.

In fact it is somewhat likely that such people would perform worse, on average, than avid amateur probabilists, due to lack of practice at the least.

Moreover, we can reasonably assume that any selection bias is larger in the non-trained population than the trained one: a person trained in mathematics and probability is unlikely to be negatively predisposed to a quiz like this, so the factors that determine whether they would respond to the quiz are less likely to be correlated with their performance in the test.

With this in mind, let’s look at the analysis.

Training does seem to have a small positive effect, but this only becomes significant after 5 years of training.

Given our discussion of selection bias above, we might expect the effect to have been larger if we had a representative sample.

However, what we can still argue is that love for the subject (enough to make you want to take a probability quiz anyway) seems to be nearly as effective as training: the distributions are heavily overlapping (Figure 1): a solid 30% of respondents without mathematical training performed better than the average performance of those with 5+ years of training, and almost 50% performed better than the average score of those with 1–4 years of training.

It seems that the playing field is pretty level when it comes to snap calls on probability questions.

You might wonder whether there is anything other than training that better differentiates people that are good at this stuff than those that are not.

We have the answer for you at the very end of this post.

Read on!Figure 1 — Performance per years of trainingIn general, are people any good at this?So, how good are we at this sort of problems?.A glance at the plot above reveals that the average performance is about 50%.

Does this mean we are no better than random guessing?.Well, to answer that we first need to define what “random guessing” means in this context.

Recall that each question has three options.

So one way to define random guessing would be to pick one of the answers at random.

This would in general yield an average score of 1/3, so significantly lower than 50%.

Except, the answers in our quiz are not always mutually exclusive.

Take, for example, an outcome A with 52% chance of occurring.

We treated both “A is more likely” and “It’s about 50/50” as correct answers in such cases (we thought long and hard about this, and felt it was the best compromise between an intuitive quiz and a correct evaluation).

Taking this into account, picking an answer at random would have given you an expected score of 0.

39, which is significantly worse than our observed performance.

Being right for the wrong reasonsSome of the questions were different instances of the same family of questions.

For example, here are questions 3 and 10, which are really just one question with different sided dice:Question 3: Roll a 6 sided die.

Then roll a 10 sided die.

Then roll a 30 sided die.

Will the results be in strictly increasing order?Question 10: Roll a 10 sided die.

Then roll a 20 sided die.

Then roll a 30 sided die.

Will the results be in strictly increasing order?Presumably we would expect any given respondent to perform equally well in these two questions, right?.Exactly wrong!.This particular pair, in fact, was the one for which it was least likely, among all possible pairs of questions, for a respondent to get both right.

Quite peculiar.

Well, let’s dig in.

First, note that the two answers were picked so that a different outcome is likelier: a strict increasing order is more likely in the first question, and least likely in the second (to be fair, it is about 50/50).

Coupling that with the observation that about 60% of the respondents gave the same answer to these two questions, one is led to the conclusion that respondents tended to formulate an opinion about this question that was actually not that sensitive to the sides of the dice.

Such respondents were right half of the time, but for the wrong reasons.

It gets even more interesting.

A simple explanation for this phenomenon would be that, because of some cognitive bias, respondents favour a certain answer: for example, it might be that one just tends to assume the most likely outcome will be to have a strictly increasing sequence, because the dice are strictly increasing in their number of sides.

This is categorically not the case in the data: of those that gave the same answer to both these questions, 16% thought it was 50/50 in both cases, 42% thought an increasing order was most likely, and 42% thought it was least likely.

Exactly the same number of people thought strictly increasing was the right answer to both questions, than thought it to be the wrong answer to both questions.

There is only one reasonable conclusion: most people, when faced with this question, pick an answer almost at random, and stick with it!This analysis is also interesting in that it demonstrates how well-concealed cognitive biases can sometimes be: if we had only asked one of these two questions, we would have no way of telling this was taking place.

Which question was the hardest?This result was unsurprising to us: the question that respondents found it hardest to answer correctly was one that we knew from experience to be the most pernicious of all.

It was this one:In a continuous series of coin flips, which sequence is more probable to appear first at some point, [Heads — Tails] or [Tails — Tails]?Still, the results were dramatic: only 12% of respondents got this answer right.

What’s more, the answer to this question was highly predictive of performance in all other questions, too: the average score on the remaining questions of the respondents that got this one right was 58%.

Indeed, the answer to this one question was more predictive of total performance than whether one had mathematical training: respondents with 5+ years of training had an average performance of just 52%.

A statistical slam dunkWe found this finding so striking, that we thought to give it the third degree, so to speak (and was also the reason we thought of posting this article).

The data did not disappoint.

It’s a three-step process:1.

If one knows whether a respondent got this question right, whether or not they had mathematical training offers no additional predictive power about their performance in this test.

2.

Conversely, if one knows whether a respondent was formally trained in mathematics, additionally knowing whether or not they got this one question right does offer extra predictive power (additional effect in a binomial regression significant at 99% confidence level).

3.

To cap it all off, whether or not one got this question right does not correlate with having been trained in statistics.

We can sign below that statement: the correlation is nearly 0.

Giannis knows his probabilitiesThe combination of these three results is what can be safely called a “statistical slam dunk”.

It’s about as strong a statement as a statistician can make about one metric (in this case, the answer to the coin flip question) being superior to another (in this case, the presence of formal mathematical training) in predicting an outcome of interest.

So, still being somewhat cheeky, we are prepared to give the following advice to recruiters and people in HR:If you are interviewing a candidate for a position that requires fast probabilistic thinking (what job doesn’t, if you really think about it?), don’t bother asking for their academic credentials.

Nonchalantly put their long CV in the bin, and ask them one question:In a continuous series of coin flips, which sequence is more probable to appear first at some point, [Heads — Tails] or [Tails — Tails]?Get in there yourselfHave a go at the test to see how accurate is your probabilistic intuition.

You may also want to check out Borel, the board game that kick-started this whole shebang.