When Birthdays Collide

* (n — k)!) When we plug in 2 for k and 23 for n, our result is 253.

Thus, there are 253 possible pairs to be made from our humble sample size of 23.

When we take 364/365 and raise it to the 253rd power, the result is ~0.

4995, indicating a 49.

95% chance that none of the 23 strangers were born on the same day.

By subtracting this result from 100.

00%, we are able to ultimately obtain the probability that at least one pair does share the same date of birth: ~50.

05%.

The birthday paradox is fascinating for many reasons.

For programmers, understanding the birthday paradox is useful because it illustrates the concept of hash collisions.

Before we delve into collisions, let’s first discuss hash tables (also known as hash maps).

A hash table is a type of data structure that stores items in a random order.

These items are stored in buckets, the quantity of which is determined by the programmer.

A hash function (an in-built function in Python) is responsible for assigning each item to a specific bucket index number.

Look, it’s a bucket.

Photo by Gregory Culmer on UnsplashTo accomplish this important task, the hash function assigns a randomly generated number to the content you want to store.

It then divides this random number by however many buckets there are in the hash table.

The remainder of this division is the bucket index, and the content is then placed in the corresponding bucket.

You’d think that hashing would result in an even distribution of items across the buckets, just like you probably reasoned that the 23 strangers likely had birthdays scattered across the calendar year.

But we observed mathematically that there is a roughly 50% chance that two of the strangers shared their birthday.

We also observed that the likeliness jumps to nearly 100% when the group consists of a measly 80 individuals.

Let’s perceptualize each calendar day as a bucket in a hash table.

When one of the strangers discovers their date of birth matches that of a groupmate, it’s the equivalent of a new item being assigned to a bucket that already contains an item.

This event is called a hash collision.

And you don’t need to have some huge number of items before a hash collision becomes likely — after all, we only needed a small group of people to illustrate the birthday paradox.

Chaining!.Photo by JJ Ying on UnsplashThere are two major ways to deal with collisions in hash tables.

The first is linear probing, which deals with collisions by assigning the new item to the next bucket over.

The second is chaining, which entails sticking the new item into the bucket even though there is already an item there.

Hash tables are often implemented with linked lists, a type of linear data structure.

This means each bucket, unless empty, contains a linked list of items.

So when an item is added to a bucket that already contains something, the new item is simply added to the end of the linked list.

The birthday paradox is like a hash collision resolution.

Obviously, it is possible for more than one person to be born on any given day.

The real-world birthday hash table can therefore be said to resolve its collisions with chaining!References & Further Reading“The Birthday Paradox”, Philip J.

Erdelsky“Probability and the Birthday Paradox”, Scientific American“Understanding the Birthday Paradox”, Better Explained“Understanding the Birthday Paradox”, Shashank Tiwari.

. More details

Leave a Reply