Birthday Problem - Advanced SQL Puzzles

Python Puzzles

If you are unfamiliar with the birthday problem, see the Wikipedia article here.

Part I

Create a simulation to prove that a 50% probability of two people having the same birthday is reached with only 23 people.

Part II

A professor lines her students up in a single-file row and tells them that the first person to have a matching birthday of someone in front of them receives a passing grade. Which individual has the best chance of winning?

When I run the code for the first part, I routinely get 50% or greater, proving the birthday problem true.

Looking at a cumulative distribution, after 50 people’s birthdays are compared, the probability reaches almost 100%.

Probability of 23 individuals having the same birthday is 52%

For the second part, I first run a simulation for 1 million trials. I then calculate the answer via a mathematical proof and overlay the two plots.

The plot shows that both the simulation and the mathematical proof provide the same results. The 20th position in line has the best percentage of winning at 3.23%.

Here is the code for Part I.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#Start with comparing 2 individuals, then add 3, then 4.....
number_of_indiv = 2 
number_of_days = 365
loop = 1_000 
lst_result = []

#run a simulation for each number of possible individuals
while number_of_indiv <= number_of_days:
    
    #reset variable on each loop
    lst = []
    
    #number of simulations
    for _ in range(loop):
        
        #if the arr array contains a duplicate, append 1 to lst, else append 0
        arr = np.random.randint(1, 366, number_of_indiv)   
        lst.append(1) if len(np.unique(arr)) != len(arr) else lst.append(0)        
    
    #determine the percentage and append to lst_result
    lst_result.append(0) if sum(lst) == 0 else lst_result.append(sum(lst)/len(lst))
    number_of_indiv += 1

#create a pandas series and set the index to 2 through 365        
index = range(2,number_of_days+1)
series = pd.Series(lst_result,index=index)
print('Probability of 23 individuals having the same birthday is ' + '{0:.0%}'.format(series[23]))

#create the plot
series.plot()
plt.title('Birthday Paradox')
plt.xlabel('# of Individuals')
plt.ylabel('Cumulative Distribution')
plt.show()

Here is the code for Part II. Here, I further divide the code into two parts. One for running the simulation and the second for creating the mathematical proof. I then overlay the plots to show the results on one graph.

import numpy as np
import random
import pandas as pd
import matplotlib.pyplot as plt

###########################################################################
###########################################################################
#Below calculates the probability via a simulation

result = []
loop = 100_000

for _ in range(loop):
    #i represents the individual
    i = 1
    lst = []
    
    #seed the list with the first birthday
    birthday = random.randint(1,365)
    lst.append(birthday)
    
    #i represents the individual
    i += 1
  
    #loop through and determine which individual (i) has a matching birthday
    for _ in range(365):
    
        birthday = random.randint(1,365)
    
        if birthday in lst:
            lst.append(birthday)
            result.append(i)
            break  
        else:
            lst.append(birthday)  
        i += 1

#determine value counts        
values, counts = np.unique(result, return_counts=True)
percentage = np.array(counts/loop)
series = pd.Series(percentage,index=values)
    
#create the plot
series.plot(color='red',label='Simulation')
plt.title('Birthday Paradox')
plt.legend()
###########################################################################
###########################################################################
#Below calculates the probability via a math proof
result = []
individuals = list(range(1,50,1))

for x in individuals:
    if x == 1:        
        result.append(0)
    else:
        result_sum = 1 - sum(result)
        prob = result_sum * (individuals.index(x)/365) 
        result.append(prob)

#create the plot
plt.scatter(y=result, x=individuals,label='Math Proof')
plt.title('Birthday Paradox')
plt.xlabel('Individual')
plt.ylabel('Percentage')
plt.legend()
plt.show()