Python Puzzles
At the Aces High Card Club, the top player wins her Skip-Bo games 59% of her time based upon 500 games.
- Based on an infinite number of games, what is the 99% confidence interval of the mean?
- Assume a 59% percent win rate, run a simulation where you play the game 500 times, and record the win percentage. What are your highest and lowest win percentages? What is the win percentage at the 99% and 1% percentiles?
First, let’s calculate the 99% confidence interval of the mean.
The equation to calculate a 99% confidence interval for the mean for a proportion is:
p ± 2.58 * sqrt(p(1-p)/n)
where:
- p is the sample proportion
- n is the sample size
The value of 2.58 is the z-score corresponding to a 99% confidence level in a standard normal distribution.
Note that this equation assumes that the sample is both random and independent and that the sample size is sufficiently large (typically n ≥ 30) to satisfy the Central Limit Theorem.
The following Python code calculates the confidence interval of a proportion manually. The norm.pdf function returns the z score based on the confidence interval value.
z-value for Confidence Interval 0.99: 2.576
The standard error of the mean is: 0.02199545407578575
Lower Bound: 53.33%
Upper Bound: 64.67%We can see that even if the top player has a lucky string of 500 games, it is still a good bet that this player will win more than 50% of the games.
# Import required libraries
import random
from scipy.stats import norm
# Set the parameters for the simulation
n = 500
prob_win = 0.59
prob_loss = 0.41
num_simulations = 1000
CI = .99
# Determine the z value based upon the CI
z = norm.ppf(1 - (1 - CI)/2)
print(f"z-value for Confidence Interval {CI}: {z:.3f}")
# Calculate the standard error using the formula sqrt(p*(1-p)/n)
se = math.sqrt(prob_win * (1 - prob_win) / n)
print("The standard error of the mean is:",se)
# Calculate the confidence interval
lower_bound = (prob_win - se * z) * 100
upper_bound = (prob_win + se * z) * 100
# Print the results
print(f"Lower Bound: {lower_bound:.2f}%")
print(f"Upper Bound: {upper_bound:.2f}%")
Now, let’s assume that 59% is the actual mean. Here, we run a simulation 1000 times, playing a game of Skip-Bo 500 times and recording the win percentage.
Index of the 1 percentile: 99
Index of the 99 percentile: 9899
Value at the 1 percentile: 53.80%
Value at the 99 percentile: 64.00
Minimum win percentage: 51.40%
Maximum win percentage: 67.40%
Mean win percentage: 58.98%We can see the 1st and 99th percentiles values fall within the 99% confidence interval of the mean, and even on the simulation with the most losses, the player is still winning over 50% of the games.
# Import required libraries
import random
import numpy as np
from collections import defaultdict
from scipy.stats import norm
import math
# Define a function that simulates a game and returns the percentage of wins
def simulate(n, prob_win, prob_loss):
# Create an empty list to store the results
data = []
for _ in range(n):
# Generate a random number between 0 and 1
random_number = random.random()
# If the random number is less than or equal to the probability of winning,
# append 1 to the list, which represents a win.
if random_number <= prob_win:
data.append(1)
# If the random number is between the probability of winning and the sum of
# probability of winning and probability of losing, append 0 to the list,
# which represents a loss.
elif random_number <= prob_win + prob_loss:
data.append(0)
# Calculate the number of wins and the percentage of wins
num_wins = sum(data)
percentage_wins = num_wins / n * 100
# Return the percentage of wins
return percentage_wins
# Set the parameters for the simulation
n = 500
prob_win = 0.59
prob_loss = 0.41
num_simulations = 10000
CI = .99
# Determine the z value based upon the CI
z = norm.ppf(1 - (1 - CI)/2)
print(f"z-value for CI={CI}: {z:.3f}")
####################################################################################################
print("############################################################################################")
print(f"Running {num_simulations} simulations for n = {n} and win probability of {prob_win}")
print("############################################################################################")
# Create an empty list to store the win percentages from each simulation
win_percentages = []
# Run the simulation num_simulations times and append the result to the list
for _ in range(num_simulations):
result = simulate(n, prob_win, prob_loss)
win_percentages.append(result)
# Sort the win_percentages list in ascending order
sorted_win_percentages = sorted(win_percentages)
# Print the indices of the lower and upper nth percentiles
lower_index = int(num_simulations * (1 - CI)) - 1
upper_index = int(num_simulations * CI) - 1
print(f"Index of the {(1 - CI)*100:.0f} percentile: {lower_index}")
print(f"Index of the {CI*100:.0f} percentile: {upper_index}")
# Access the values at the 1st and 99th percentiles in the sorted list
value_lower = sorted_win_percentages[lower_index]
value_upper = sorted_win_percentages[upper_index]
print(f"Value at the {(1 - CI)*100:.0f} percentile: {value_lower:.2f}%")
print(f"Value at the {CI*100:.0f} percentile: {value_upper:.2f}%")
# Calculate the minimum, maximum, and mean of the win percentages
min_value = min(win_percentages)
max_value = max(win_percentages)
mean = np.mean(win_percentages)
print(f"Minimum win percentage: {min_value:.2f}%")
print(f"Maximum win percentage: {max_value:.2f}%")
print(f"Mean win percentage: {mean:.2f}%")
# Create a defaultdict with integer values (default value is 0)
counts = defaultdict(int)
# Count the occurrences of each win percentage in the list
for win_percentage in win_percentages:
# Round the win percentage to 2 decimal places and increment the count
rounded_win_percentage = round(win_percentage, 2)
counts[rounded_win_percentage] += 1
