**Python Puzzles**

If you are unfamiliar with the birthday problem, see the Wikipedia article here.

**Part I**

Create a simulation to prove that a 50% probability of two people having the same birthday is reached with only 23 people.

**Part II**

A professor lines her students up in a single-file row and tells them that the first person to have a matching birthday of someone in front of them receives a passing grade. Which individual has the best chance of winning?

When I run the code for the first part, I routinely get 50% or greater proving the birthday problem to be true.

Looking at a cumulative distribution, after 50 people’s birthdays are compared, the probability reaches almost 100%.

For the second part, I first run a simulation for 1 million trials. I then calculate the answer via a mathematical proof and overlay the two plots.

From the plot, we can see both the simulation and the mathematical proof provide the same results. The 20th position in line has the best percentage of winning at 3.23%.

Here is the code for Part I.

import numpy as np import pandas as pd import matplotlib.pyplot as plt #Start with comparing 2 individuals, then add 3, then 4..... number_of_indiv = 2 number_of_days = 365 loop = 1_000 lst_result = [] #run a simulation for each number of possible individuals while number_of_indiv <= number_of_days: #reset variable on each loop lst = [] #number of simulations for _ in range(loop): #if the arr array contains a duplicate, append 1 to lst, else append 0 arr = np.random.randint(1, 366, number_of_indiv) lst.append(1) if len(np.unique(arr)) != len(arr) else lst.append(0) #determine the percentage and append to lst_result lst_result.append(0) if sum(lst) == 0 else lst_result.append(sum(lst)/len(lst)) number_of_indiv += 1 #create a pandas series and set the index to 2 through 365 index = range(2,number_of_days+1) series = pd.Series(lst_result,index=index) print('Probability of 23 individuals having the same birthday is ' + '{0:.0%}'.format(series[23])) #create the plot series.plot() plt.title('Birthday Paradox') plt.xlabel('# of Individuals') plt.ylabel('Cumulative Distribution') plt.show()

Here is the code for Part II. Here I further divide the code into two parts. One for running the simulation, and the second for creating the mathematical proof. I then overlay the plots to show the results on one graph.

import numpy as np import random import pandas as pd import matplotlib.pyplot as plt ########################################################################### ########################################################################### #Below calculates the probability via a simulation result = [] loop = 100_000 for _ in range(loop): #i represents the individual i = 1 lst = [] #seed the list with the first birthday birthday = random.randint(1,365) lst.append(birthday) #i represents the individual i += 1 #loop through and determine which individual (i) has a matching birthday for _ in range(365): birthday = random.randint(1,365) if birthday in lst: lst.append(birthday) result.append(i) break else: lst.append(birthday) i += 1 #determine value counts values, counts = np.unique(result, return_counts=True) percentage = np.array(counts/loop) series = pd.Series(percentage,index=values) #create the plot series.plot(color='red',label='Simulation') plt.title('Birthday Paradox') plt.legend() ########################################################################### ########################################################################### #Below calculates the probability via a math proof result = [] individuals = list(range(1,50,1)) for x in individuals: if x == 1: result.append(0) else: result_sum = 1 - sum(result) prob = result_sum * (individuals.index(x)/365) result.append(prob) #create the plot plt.scatter(y=result, x=individuals,label='Math Proof') plt.title('Birthday Paradox') plt.xlabel('Individual') plt.ylabel('Percentage') plt.legend() plt.show()