The Sleeping Beauty problem: a data scientist’s perspective

I will come back to this variation in the concluding section.

Sleeping Beauty with an unfair coinModifying the number of awakenings is one way to generalize Sleeping Beauty.

Another is to assume an unfair coin, and take the original halfer and thirder arguments along the ride in order to test their robustness against such generalization.

Before I present a Markov chain perspective in the next section, I will describe an alternative route for generalizing Sleeping Beauty to biased coins that is closer to the usual derivation of the problem’s solutions.

We will once more assume the original number of awakenings: One awakening if the coin landed heads, two awakenings if it landed tails.

Let’s assume that the prior probability of the coin landing heads is P(heads) = p, which is some number between 0 and 1 but not necessarily 1/2.

SB knows the value of p; alternatively she might be given the opportunity to estimate p herself by tossing the coin a lot of times before the experiment is conducted.

The goal is to determine P⁺(heads) := P(heads|awake).

Now, both halfers and thirders agree that P⁺(heads|Tuesday) = 0.

However, both positions disagree on what conditioning on Monday implies:Halfer position.

There is no change in SB’s credence upon awakening: P⁺(heads) = P(heads) = p.

Thirder position.

In order for us to generalize the thirder position to biased coins, we need to adjust the original arguments.

The basic argument is by principle of indifference: with a fair coin, there is no discernible difference between any outcome for Sleeping Beauty, so in the original setting, P⁺(heads, Monday) and P⁺(tails, Monday) and P⁺(tails, Tuesday) are equal.

However, if the coin is biased that assumption breaks down since SB living in “heads world” or “tails world” might have wildly different likelihood.

If you strip the thirder position from this now illegal argument you are left with the following position that presents the crucial difference to the halfer view: If SB is awakened and the experimenter were to tell her that it is in fact Monday, her credence that the coin landed heads must equal her prior credence (because she is awakened on Monday irrespective of the coin toss’ result and has therefore learned nothing new): P⁺(heads|Monday) = P(heads) = pThe above assumptions determine P⁺( ⋅ ) as a parameterized joint distribution for halfers and thirders, respectively, as follows:For halfers, the parameter t has no impact on the final result P⁺(heads) = p.

Thirders, however, still need to choose the parameter q which can be interpreted as SB’s credence that it is Monday when she wakes up: P⁺(Monday) = q.

One method to choose this parameter could be the principle of maximum entropy: without further information, we should choose the distribution closest to being uniform.

As a function of the parameter q, given p, the entropy of the above distribution can be computed asS(q) = -(1 – p) q log((1 – p)q) – pq log(pq) – (1 – q) log(1 – q),the maximum of which can be found via basic calculus.

However, we need to decline the maximum entropy solution since it implies P⁺(heads) = 1/2 in the limit of p → 1 — which of course makes no sense as SB should be certain that the coin landed heads whenever the prior probability for that event is already close to one.

Another way of finding a sensible value for the parameter q would be invoking the principle of indifference, after all — but only on the distribution conditioned on tails: The assumption is that whenever SB is awakened in “tails world”, there is no way for her to distinguish between Monday and Tuesday.

Consequently, P⁺(Monday|tails) = P⁺(Tuesday|tails), thus P⁺(Monday, tails) = P⁺(Tuesday, tails), and therefore (1 – p) q = 1 – q which finally leads to q = 1/(2 – p).

This analysis leads to the following conclusion:Halfer Beauties do not update their credence of the coin having been landed on heads: P⁺(heads) = P(heads) = p.

Thirder Beauties update their credence by multiplying with their subjective probability of the day being Monday upon awakening: P⁺(heads) = P(heads) ⋅ P⁺(Monday) = pq = p/(2 – p).

Halfer posterior (blue line) vs.

thirder posterior (red line) for priors P(heads) = pSleeping Beauty on a Markov chainLet’s take on a different perspective and model the problem as a Markov chain of repeated Sleeping Beauty experiments:This state transition diagram should be read as follows: When SB is awake on Monday, with probability p her next awakening will, again, be on a Monday: there will be no interview on a Tuesday when the coin lands heads, and the experiment will be repeated.

With probability 1 – p, her next awakening will be on a Tuesday before the experiment will be repeated.

SB’s ignorance to the fact what day it is can be directly translated into her ignorance of the current state that she is currently in: all she ever experiences is a series of indistinguishable awakenings.

Let’s call a transition Monday → Monday a “heads transition”, while Monday → Tuesday is a “tails transition”.

Suppose that the experiment has already been repeated a lot of times.

In this model, the halfer and thirder solutions take on a remarkable characterization:Markovian halfer.

The probability that the next transition that is either heads or tails will be a heads transition, is given by p:Either it’s Monday and the next transition is a heads transition with probability p — or it’s Tuesday, the next transition is the repeating the experiment, and the transition after that is a heads transition with probability p, again.

Markovian thirder.

The probability that the previous transition that is either heads or tails was a heads transition, is given by p/(2 – p):If it’s Tuesday, the previous transition had to be a tails transition.

If it’s Monday, the previous transition was a heads transition with probability p.

If the experiment had already been through many cycles, SB can be confident that the probability of being Monday is given by the equilibrium distribution of the Markov chain Π( ⋅ ), which in this case is determined by Π(Monday) = 1/(2 – p) — a probability that we already encountered with the more “Bayesian” treatment above.

Consequently, the final probability is not given by the transition probability but rather by the probability flux Π(Monday) ⋅ P(Monday → Monday) = p/(2 – p).

Note that the thirder position can also be applied to the distant future: The probability that the one millionth heads-or-tails transition after being interviewed is a heads transition is going to be very close to p/(2 – p).

While we are still confronted with the same distinct solutions when approaching the problem by modeling it as a Markov chain, I feel that it leads to a clearer picture as to the interpretation of these solutions.

One might suggest that the apparent distinction can be attributed to a difference in meaning that we assign to the same word “probability”:“Predicitive probability.

” Sleeping Beauty wishes to express her credence that the coin lands heads whenever the experimenter tosses it.

This prediction does not depend on what day it is or any other state of affairs.

That credence is 1/2 (or more generally, p).

“Descriptive probability.

” Sleeping Beauty wishes to express her view how often the event “heads” typically happens in her frame of reference.

This frequency is 1/3 (or more generally, p/(2 – p)).

The paradox seems to stem from the fact that usually the values for each notion of probability coincide: having tossed a dice a million times, we are confident that because the frequency (“descriptive probability”) is very close to 1/6 for each possible outcome, we can reason about the next time we toss the same dice based on the same measure of uncertainty (“predictive probability”).

Sleeping Beauty and the reference class problemThe divide into halfer and thirder solutions to the Sleeping Beauty problem could be interpreted as an example of the reference class problem: As the reference class, should SB choose the class of experiments (one out of two in a sequence of experiments will yield “heads”), or the class of awakenings (one out of three).

You could be very quick to argue that SB should, of course, choose her “own” class of reference, i.

e.

, the class of awakenings.

However, in this context, I will give an argument for the halfer position, by means of a variation of the repeated Sleeping Beauty experiment as follows.

In the original formulation, SB is fully aware of the mechanism that generates the data but she has no access to actual data.

Now suppose that SB is told with each awakening whether the coin landed heads, and she is allowed to keep track of the number of awakenings.

However, this time she is unaware of the elaborate procedure with which these data have been generated.

She might therefore obtain a sequence like this (p = 1/2, H = heads, T = tails):TTTTHTTTTTTHTTHTTHHHHTTTTHHHTTHHTTTTTTTTHTTHTTHTTTTHTTHTTHTTTTTTHHTTTTTTHTTTTTTTTHHTTTTTTTTHTTHTTTTTTTTHTTTTHTTTTHHHHHTTTTHTTTTTTTTHHHHHTTTTTTTTTTTTTTTTTTTTTTHHTTTTTTTTTTHHHTTTTTTTTTTHTTTTTTTTHHHHHHTTHTTHHHTTTTTTTTHTTHHHHTTTTTTHTTTTHHTTHHHHTTHHHTTHTTTTHHHHTTTTTTTTTTTTHHHHHHHTTTTTTTTHHHHHTTHTTTTTTHTTHTTTTHHTTTTHTTHHTTTTHTTHTTTTHTTHHTTHTTHHTTHHTTHTTHHHTTHTTTTHHTTHHTTHTTHHTTHHTTHHHTTHTTTTHHTTHTTHTTTTTTTTHTTTTTTTTHHTTHTTHHTTHHHTTTTTTHTTHTTTTTTTTHTTTTTTHTTHHHTTHTTHTTTTTTTTTTHHTTTTHTTHHHTTHTTTTHHHHHHTTHTTHTTHHHTTHHTTTTTTHTTHTTHHHHTTHHTTTTTTHHHHHTTHTTTTHTTHHHTTTTTTHTTTTHHTTTTHTTTTHTTTTTTTTHHTTTTTTHHTTTTHHHHTTTTHTTTTTTHHTTHHTTTTHTTTTHTTTTTTTTHHTTHTTHTTTTTTTTTTHTTHHHHTTHTTHTTTTHTTHTTHTTHHHHTTTTTTHTTTTHHHHTTTTHHHTTHHTTTTTTTTTTA simple analysis of this sequence of coin tosses will enable her to compute the frequency of instances of “heads”, which is about one out of three.

However, she can also learn that certain features never seem to appear, such as HTH.

Therefore, SB is given all the data, and she is not even aware of any particular reference class with respect to which she should compute frequencies.

Still, she will conclude that while the hypothesis that H is drawn at random with probability 1/3 and T with probability 2/3 is consistent with the data, the superior explanation for generating the observed results is given by the sequential draw of H and TT with equal probability 1/2.

When she is given the details of the experiments after the fact, she will find the superior theory confirmed.

Consequently, if the experiments were to continue but she is not given any new data, there is no reason for her to revise her credence of what is actually happening which includes the two seemingly contradictory statements “H drawn with probability 1/2” and “H is drawn with probability 1/3” but which are simply part of two different hypotheses and only seem contradictory if viewed out of context.

Summary & conclusionWhen you try and extend the arguments of both halfers and thirders to the experimental setup involving an unfair coin that shows heads with arbitrary frequency p, there is some ambiguity in how to extend the thirder’s arguments.

However, a consistent generalized thirder result can be obtained: p/(2 – p).

Furthermore, there are two ways to marry both halfer and thirder views by providing a model that generates the data (unobserved by SB) when executing the experiment repeatedly:A Markov chain model, where p represents the transition probability and p/(2 – p) the probability flux assigned to “skipping Tuesday”,a random sequence where “heads” appears with frequency p/(2 – p) but this frequency serves only as a partial description of the sequence: “heads” appears with probability p and “tails, tails” with probability 1 – p, respectively.

Personally, I have changed my view from a thirder to a halfer while pondering the problem.

Currently, I hold the view that the apparent paradox of Sleeping Beauty’s possible answers can be mitigated by embracing the fact that neither fully describe the situation she is in, as the mechanism that generates the unseen data cannot be fully described by one parameter alone.

In the end, Sleeping Beauty’s correct answer might be as simple as this:“The probability that the coin landed heads in the course of this experiment is given by the value p.

But if I were to guess that the coin landed heads right this instance, I would be correct with the smaller probability p/(2 – p).

”This can also be exemplified by different utilities:If the evil scientist kills SB on giving the wrong answer when she guesses “heads” or “tails”, she is well-advised to take p as a basis for her answer.

(This is obvious when she is killed after the whole experiment on Wednesday, say.

But this also holds if she were to be killed on Monday because there would be no Tuesday awakening in that case, effectively shrinking the reference class of awakenings.

)If SB is awarded a price (e.

g.

, money) every time she guesses the coin toss correctly, she will have larger expected gains when assessing the probability of the coin showing “heads” as p/(2 – p).

Consequently, the paradox might be rooted in the misconception that Sleeping Beauty’s reality is exhaustively described by one probabilistic variable alone.

However, I am convinced that the discussion does not end here.

ReferencesThe Sleeping Beauty problem, video by Julia GalefM.

Piccione and A.

Rubinstein.

“On the Interpretation of Decision Problems with Imperfect Recall”.

In: Games and Economic Behavior 20 (1997), pp.

3-24A.

Elga.

“Self-locating Belief and the Sleeping Beauty Problem”.

In: Analysis 60.

2 (2000), pp.

143-147S.

Guiasu and A.

Shenitzer.

“The principle of maximum entropy”.

In: The Mathematical Intelligencer 7.

1 (1985), pp.

42-48F.

P.

Kelly.

Reversibility and Stochastic Networks.

Wiley, Chichester, 1979D.

Lewis.

“Sleeping Beauty: reply to Elga”.

In: Analysis 61.

3 (2001), pp.

171-176N.

Bostrom.

“Sleeping Beauty and self-location: A hybrid model”.

In: Synthese 157.

1 (2007), pp.

59-78P.

Winkler.

“The Sleeping Beauty Controversy”.

In: The American Mathematical Monthly 124.

7 (2017), p.

579.. More details

Leave a Reply