# How to be less wrong

This is what went through J.

Richard Gott’s head when he visited Germany in 1969.

If you think about it, this is a tough question, as there is not much data available about lifetimes of walls in Germany (in fact, the Berlin Wall is a single datapoint).

How then to approach a question like that, where we are almost completely in the dark?Remains of the Berlin Wall.

(Source: Pixabay)Gott, today a professor in Astrophysics at Princeton University, thought about it like this: there was nothing special about him seeing the Wall on that particular day in that particular year.

If you divide the Wall’s entire lifetime into four equal-length segments, then there is a 50% probability that he would have arrived within the middle two segments.

This in turn translates into an estimate for how much longer the Wall should remain: between a third of its lifetime so far (if he happened to visit at end of the two middle segments), and three times its lifetime so far (if he happened to visit at the beginning of the two middle segments).

Given that the Wall at the time was 8 years old in 1969, he concluded that with 50% confidence it would last another 3 to 24 years.

The Wall lasted another 20 years.

There is a 50% chance that Gott’s visit falls within the middle two segments of the Wall’s existence.

(Source)The Copernican Principle: we are not specialGott’s reasoning that he did not visit the Wall at any special moment in time is really an application of the much broader Copernican principle, which states that we occupy neither a special place, nor exist in a special time in the Universe’s history.

Earth is not a special place, and neither is our Solar System or Milky Way Galaxy.

2019 is not a special time to be alive.

Gott’s calculation based on the Copernican principle can help us arrive at least at an estimate of the timing of an event when we are otherwise completely in the dark.

It can be a fun exercise to estimate various world events with it:There is a 50% probability that North Korea will exist as an isolated country for another 24 to 213 years.

There is a 50% probability that Brexit negotiations will last for another 8 months to 6 years.

There is a 50% probability that the Euro will exist for another 7 to 60 years.

There is a 50% probability that the human species will be around for another 70,000 to 600,000 years.

To be fair, Gott was not the first to apply this line of reasoning to make predictions based on limited data.

Consider the German Tank Problem: during World War II, the Allies tried to estimate the total number of German tanks based on the serial numbers of tanks captured so far.

In the extreme case, a single captured tank could at least give a clue about how many tanks there are in total.

If its serial number is 60, say, then there is a 50% chance that the total number of tanks is between 80 and 240.

In practice, the estimate gets better with more tanks captured.

Earth, as seen from Jupiter.

(Nasa)Gott’s calculation based on the Copernican principle is really a consequence of applying Bayes’ law, which more broadly tells us how to update our probability estimates in the face of new information.

Gott showed how to update our estimates based on a single datapoint.

In most cases however, we have additional information about expected outcomes of phenomena simply from our life experiences, and that extra information makes our predictions more accurate.

Priors shape our predictionsThere are broadly two different types of observations in our world: things that cluster around a natural value, and things that do not.

We call the former Gaussian distributions, and the latter power-law distributions.

Human life spans are an examples for Gaussian distributions, and so are human weight, calorie intake, hours of sleep per night, the lengths of movies, the tail lengths of a mice, and so on.

Take humans life spans, for example.

We have good expectations of life spans: in the United States, for instance, the average life span is around 78 years, and the distribution is roughly Gaussian.

In Statistics terms, this extra information is our prior.

Because of our prior, we can update the probability of someone’s remaining lifespan based on how old they are more accurately than with the Copernican principle — we have extra information.

The Social Security Administration, for instance, is updating this calculation each year, and so do insurance companies: according to their tables, the expected remaining life span of a 7-year old is around 70 years, while the expected remaining life span of a 70-year old is around 14 years.

This is a fundamental consequence of having a Gaussian prior: the longer you live, the shorter you expect your additional life span to be.

Source: UnsplashPower-lawsNot everything in our world follows a Gaussian distribution.

When observations vary over many orders of magnitude, we are likely dealing with a power-law distribution — examples are populations of cities, book sales, movie grosses, people’s wealths and incomes.

Take movie grosses for example.

The most successful movie of 2018, Black Panther, made around 700 Million dollars, while Billionaire Boys Club, one of the worst performers, made a meager \$600 — this is a difference of 6 orders of magnitude!.Another way to say this is that power-law distributions have no natural scale: they are scale-free distributions.

Movies can make hundreds or hundreds of Millions of dollars.

Using Bayes’ law, we can again estimate where we expect an observation to end up, given where we see it today, and given the power-law prior.

As it turns out, a power-law prior implies a multiplicative prediction rule: multiply the quantity observed so far by a constant factor, and this is the expected end result.

For movie grosses, for instance, that multiple is around 1.

4: given that a movie already made \$10 Million, it is likely that it will top out at \$14 Million (For Billionaire Boys Club, the outlook is pretty grim).

The multiplicative rule is a direct consequence of the scale-free nature of power-laws, where the only thing that gives us a sense of the scale is the single observation we have.

Gott’s Berlin Wall prediction — it will stand between a third and 3 times as long as it already stood — is thus similar to a Bayesian prediction with a power-law prior, with the difference that the multiplicative factor is not known.

Gott’s Berlin Wall calculation is more ignorant, given the lack of data.

Thus, the most important difference between Gaussian and power-law priors is this: With a Gaussian prior, the longer something has been going on, the shorter we expect it to continue.

With a power-law prior, the longer something has been going on, the longer we expect it to keep going.

Source: UnsplashLife is about learning priorsWhether we acknowledge them or not, we implicitly learn priors for various phenomena in our world over the course of our lifetimes, simply based on the observations we make every day in the world.

We learn that peoples’ heights, weights, and life spans cluster around a typical value, while city populations, movie grosses, and wealth does not.

We are ultimately Bayesian thinkers.

“Life is a school of probability.

” — Walter BagehotTake call center hold times as an example: Researchers Tom Griffith and Josh Tenenbaum surveyed what people think their total hold time would be, based on the time already waited.

By fitting their answers to different distributions, they learned that people, on average, have power-law expectations of hold times, with a multiplication parameter of around 1.

3: after holding for 5 minutes, they expect to be waiting a total of 7 minutes.

After holding for 50 minutes, they expect to be waiting a total of 67 minutes.

Priors matterIn Algorithms to Live By, authors Brian Christian and Tom Griffith tell the tragic story of Harvard biologist Stephen Jay Gould, who was diagnosed with a deadly form of cancer in 1982.

By reading the medical literature, Gould learned that his median life expectancy at this stage was a mere 8 months.

However, he reasoned, that statistic was not telling anything about the distribution of the life expectancies.

If it were Gaussian, then his life expectancy would be more or less 8 month, and the closer he gets to that point, the shorter it would be.

If it were a power-law on the other hand, the longer he would survive, the longer he could expect to survive!.Gould discovered that the distribution was in fact strongly skewed (more like a power-law), and went on to live for another 20 years after his diagnosis.

Priors matter: the better our priors, the better our predictions.

.