Estimation of the Mean - Advanced SQL Puzzles

Python Puzzles

Back to the Python! homepage

At the Aces High Card Club, the top player wins her Skip-Bo games 59% of her time based upon 500 games.

Based on an infinite number of games, what is the 99% confidence interval of the mean?
Assume a 59% percent win rate, run a simulation where you play the game 500 times, and record the win percentage. What are your highest and lowest win percentages? What is the win percentage at the 99% and 1% percentiles?

First, let’s calculate the 99% confidence interval of the mean.

The equation to calculate a 99% confidence interval for the mean for a proportion is:

p ± 2.58 * sqrt(p(1-p)/n)

where:

p is the sample proportion
n is the sample size

The value of 2.58 is the z-score corresponding to a 99% confidence level in a standard normal distribution.

Note that this equation assumes that the sample is both random and independent and that the sample size is sufficiently large (typically n ≥ 30) to satisfy the Central Limit Theorem.

The following Python code calculates the confidence interval of a proportion manually. The norm.pdf function returns the z score based on the confidence interval value.

z-value for Confidence Interval 0.99: 2.576
The standard error of the mean is: 0.02199545407578575
Lower Bound: 53.33%
Upper Bound: 64.67%

We can see that even if the top player has a lucky string of 500 games, it is still a good bet that this player will win more than 50% of the games.

# Import required libraries
import random
from scipy.stats import norm

# Set the parameters for the simulation
n = 500
prob_win = 0.59
prob_loss = 0.41
num_simulations = 1000
CI = .99

# Determine the z value based upon the CI
z = norm.ppf(1 - (1 - CI)/2)
print(f"z-value for Confidence Interval {CI}: {z:.3f}")


# Calculate the standard error using the formula sqrt(p*(1-p)/n)
se = math.sqrt(prob_win * (1 - prob_win) / n)
print("The standard error of the mean is:",se)
# Calculate the confidence interval
lower_bound = (prob_win - se * z) * 100
upper_bound = (prob_win + se * z) * 100


# Print the results
print(f"Lower Bound: {lower_bound:.2f}%")
print(f"Upper Bound: {upper_bound:.2f}%")

Now, let’s assume that 59% is the actual mean. Here, we run a simulation 1000 times, playing a game of Skip-Bo 500 times and recording the win percentage.

Index of the 1 percentile: 99
Index of the 99 percentile: 9899
Value at the 1 percentile: 53.80%
Value at the 99 percentile: 64.00
Minimum win percentage: 51.40%
Maximum win percentage: 67.40%
Mean win percentage: 58.98%

We can see the 1st and 99th percentiles values fall within the 99% confidence interval of the mean, and even on the simulation with the most losses, the player is still winning over 50% of the games.

# Import required libraries
import random
import numpy as np
from collections import defaultdict
from scipy.stats import norm
import math

# Define a function that simulates a game and returns the percentage of wins
def simulate(n, prob_win, prob_loss):
    # Create an empty list to store the results
    data = []
    for _ in range(n):
        # Generate a random number between 0 and 1
        random_number = random.random()
        # If the random number is less than or equal to the probability of winning,
        # append 1 to the list, which represents a win.
        if random_number <= prob_win:
            data.append(1)
        # If the random number is between the probability of winning and the sum of
        # probability of winning and probability of losing, append 0 to the list,
        # which represents a loss.
        elif random_number <= prob_win + prob_loss:
            data.append(0)

    # Calculate the number of wins and the percentage of wins
    num_wins = sum(data)
    percentage_wins = num_wins / n * 100
    # Return the percentage of wins
    return percentage_wins

# Set the parameters for the simulation
n = 500
prob_win = 0.59
prob_loss = 0.41
num_simulations = 10000
CI = .99

# Determine the z value based upon the CI
z = norm.ppf(1 - (1 - CI)/2)
print(f"z-value for CI={CI}: {z:.3f}")

####################################################################################################
print("############################################################################################")
print(f"Running {num_simulations} simulations for n = {n} and win probability of {prob_win}")
print("############################################################################################")

# Create an empty list to store the win percentages from each simulation
win_percentages = []
# Run the simulation num_simulations times and append the result to the list
for _ in range(num_simulations):
    result = simulate(n, prob_win, prob_loss)
    win_percentages.append(result)

# Sort the win_percentages list in ascending order
sorted_win_percentages = sorted(win_percentages)

# Print the indices of the lower and upper nth percentiles
lower_index = int(num_simulations * (1 - CI)) - 1
upper_index = int(num_simulations * CI) - 1
print(f"Index of the {(1 - CI)*100:.0f} percentile: {lower_index}")
print(f"Index of the {CI*100:.0f} percentile: {upper_index}")

# Access the values at the 1st and 99th percentiles in the sorted list
value_lower = sorted_win_percentages[lower_index]
value_upper = sorted_win_percentages[upper_index]
print(f"Value at the {(1 - CI)*100:.0f} percentile: {value_lower:.2f}%")
print(f"Value at the {CI*100:.0f} percentile: {value_upper:.2f}%")

# Calculate the minimum, maximum, and mean of the win percentages
min_value = min(win_percentages)
max_value = max(win_percentages)
mean = np.mean(win_percentages)
print(f"Minimum win percentage: {min_value:.2f}%")
print(f"Maximum win percentage: {max_value:.2f}%")
print(f"Mean win percentage: {mean:.2f}%")

# Create a defaultdict with integer values (default value is 0)
counts = defaultdict(int)

# Count the occurrences of each win percentage in the list
for win_percentage in win_percentages:
    # Round the win percentage to 2 decimal places and increment the count
    rounded_win_percentage = round(win_percentage, 2)
    counts[rounded_win_percentage] += 1