Python Puzzles
At the Aces High Card Club, the top player wins her Skip-Bo games 59% of her time based upon 500 games.
- Based on an infinite number of games, what is the 99% confidence interval of the mean?
- Assume a 59% percent win rate, run a simulation where you play the game 500 times, and record the win percentage. What are your highest and lowest win percentages? What is the win percentage at the 99% and 1% percentiles?
First, let’s calculate the 99% confidence interval of the mean.
The equation to calculate a 99% confidence interval for the mean for a proportion is:
p ± 2.58 * sqrt(p(1-p)/n)
where:
- p is the sample proportion
- n is the sample size
The value of 2.58 is the z-score corresponding to a 99% confidence level in a standard normal distribution.
Note that this equation assumes that the sample is both random and independent and that the sample size is sufficiently large (typically n ≥ 30) to satisfy the Central Limit Theorem.
The following Python code calculates the confidence interval of a proportion manually. The norm.pdf function returns the z score based on the confidence interval value.
z-value for Confidence Interval 0.99: 2.576
The standard error of the mean is: 0.02199545407578575
Lower Bound: 53.33%
Upper Bound: 64.67%
We can see that even if the top player has a lucky string of 500 games, it is still a good bet that this player will win more than 50% of the games.
# Import required libraries import random from scipy.stats import norm # Set the parameters for the simulation n = 500 prob_win = 0.59 prob_loss = 0.41 num_simulations = 1000 CI = .99 # Determine the z value based upon the CI z = norm.ppf(1 - (1 - CI)/2) print(f"z-value for Confidence Interval {CI}: {z:.3f}") # Calculate the standard error using the formula sqrt(p*(1-p)/n) se = math.sqrt(prob_win * (1 - prob_win) / n) print("The standard error of the mean is:",se) # Calculate the confidence interval lower_bound = (prob_win - se * z) * 100 upper_bound = (prob_win + se * z) * 100 # Print the results print(f"Lower Bound: {lower_bound:.2f}%") print(f"Upper Bound: {upper_bound:.2f}%")
Now, let’s assume that 59% is the actual mean. Here, we run a simulation 1000 times, playing a game of Skip-Bo 500 times and recording the win percentage.
Index of the 1 percentile: 99
Index of the 99 percentile: 9899
Value at the 1 percentile: 53.80%
Value at the 99 percentile: 64.00
Minimum win percentage: 51.40%
Maximum win percentage: 67.40%
Mean win percentage: 58.98%
We can see the 1st and 99th percentiles values fall within the 99% confidence interval of the mean, and even on the simulation with the most losses, the player is still winning over 50% of the games.
# Import required libraries import random import numpy as np from collections import defaultdict from scipy.stats import norm import math # Define a function that simulates a game and returns the percentage of wins def simulate(n, prob_win, prob_loss): # Create an empty list to store the results data = [] for _ in range(n): # Generate a random number between 0 and 1 random_number = random.random() # If the random number is less than or equal to the probability of winning, # append 1 to the list, which represents a win. if random_number <= prob_win: data.append(1) # If the random number is between the probability of winning and the sum of # probability of winning and probability of losing, append 0 to the list, # which represents a loss. elif random_number <= prob_win + prob_loss: data.append(0) # Calculate the number of wins and the percentage of wins num_wins = sum(data) percentage_wins = num_wins / n * 100 # Return the percentage of wins return percentage_wins # Set the parameters for the simulation n = 500 prob_win = 0.59 prob_loss = 0.41 num_simulations = 10000 CI = .99 # Determine the z value based upon the CI z = norm.ppf(1 - (1 - CI)/2) print(f"z-value for CI={CI}: {z:.3f}") #################################################################################################### print("############################################################################################") print(f"Running {num_simulations} simulations for n = {n} and win probability of {prob_win}") print("############################################################################################") # Create an empty list to store the win percentages from each simulation win_percentages = [] # Run the simulation num_simulations times and append the result to the list for _ in range(num_simulations): result = simulate(n, prob_win, prob_loss) win_percentages.append(result) # Sort the win_percentages list in ascending order sorted_win_percentages = sorted(win_percentages) # Print the indices of the lower and upper nth percentiles lower_index = int(num_simulations * (1 - CI)) - 1 upper_index = int(num_simulations * CI) - 1 print(f"Index of the {(1 - CI)*100:.0f} percentile: {lower_index}") print(f"Index of the {CI*100:.0f} percentile: {upper_index}") # Access the values at the 1st and 99th percentiles in the sorted list value_lower = sorted_win_percentages[lower_index] value_upper = sorted_win_percentages[upper_index] print(f"Value at the {(1 - CI)*100:.0f} percentile: {value_lower:.2f}%") print(f"Value at the {CI*100:.0f} percentile: {value_upper:.2f}%") # Calculate the minimum, maximum, and mean of the win percentages min_value = min(win_percentages) max_value = max(win_percentages) mean = np.mean(win_percentages) print(f"Minimum win percentage: {min_value:.2f}%") print(f"Maximum win percentage: {max_value:.2f}%") print(f"Mean win percentage: {mean:.2f}%") # Create a defaultdict with integer values (default value is 0) counts = defaultdict(int) # Count the occurrences of each win percentage in the list for win_percentage in win_percentages: # Round the win percentage to 2 decimal places and increment the count rounded_win_percentage = round(win_percentage, 2) counts[rounded_win_percentage] += 1