Confidence Intervals

When working with statistical data, it is common to want to estimate the true value of a population parameter based on a sample of data. One common parameter of interest is the population mean. However, since it is often not possible to measure the entire population, we must rely on a sample mean as an estimate of the population mean.  Confidence intervals are a way to express the precision of our estimate of the population mean.

To put it a little more simply, you have a sample that has a mean, and you know that this sample mean might be close to the population mean, and you want to find a range of values where the true mean of a population may exist with a 95% or 99% confidence level.

There are three scenarios that you will face when calculating confidence intervals of the mean.

  1. Calculating the CI of the mean with the population standard deviation is known.
  2. Calculating the CI of the mean when the populating standard deviation is unknown.
  3. Calculating the CI for proportions.

There are several points that need to be made when thinking about confidence intervals.

  1. The purpose of constructing a confidence interval for the mean is to estimate the true value of the population mean, based upon the mean of a sample.
  2. To calculate a confidence interval for the mean, we start by calculating the sample mean and the sample standard deviation (when the population standard deviation is unknown, which is often the case). We then use a statistical formula to determine a range of values, called the confidence interval, that is likely to contain the true population mean with a certain level of confidence. This range is calculated by adding and subtracting a value, known as the margin of error, to and from the sample mean.
  3. A confidence interval for the mean is a range of values within which we believe the true population mean is likely to fall.  Statisticians often use 95% or the 99% confidence interval.  As the level of confidence increases, the precision of your estimate decreases. 
  4. There is an inverse relationship between level of confidence and precision of the estimate.  When a researcher wants to be more confident in their findings and increase the level of confidence, they have to widen the range of values that their estimate can fall within.  To increase the precision of an estimate, we can either increase the sample size or lower the level of confidence.
  5. When constructing a confidence interval for a proportion, the margin of error is actually a reflection or statement of the width of the interval.

Sample Error

When constructing a confidence interval, it’s important to understand the central limit theorem and the standard error of the mean. The central limit theorem tells us that the means of repeated samples tend to approximate a normal distribution, while the standard error of the mean is a measure of the variability of the sample means around the population mean. These concepts help us calculate the margin of error for our estimate and determine the range of values within which the population mean is likely to lie with a certain level of confidence.

I won’t go into too much detail here about the central limit theorem or the standard error, as I have this WordPress page dedicated to these concepts.

Few More Thoughts

It is uncommon to know the mean of a population without knowing its standard deviation. Standardized tests like IQ tests, personality tests, and college entrance exams are some examples of situations where this might occur. In such cases, the z-distribution can be used instead of the t-distribution.

However, there is no consensus among statisticians on whether to use the z or t distribution to calculate confidence intervals. Some statisticians prefer to use the z-distribution even when the population standard deviation is known, provided that the sample size is sufficiently large. In reviewing different textbooks or courses, different statisticians may have slightly different methods for determining confidence intervals.

Here is a quick review of the symbols used.

  • X̄ (pronounced as “X-bar”) represents the sample mean, which is the average value of a sample.
  • σ (pronounced as “sigma”) represents the standard deviation of the population, which is a measure of the amount of variation or dispersion in a population of data.
  • μ (pronounced as “mu”) represents the population mean, which is the average value of a population.
  • s represents the standard deviation of the sample, which is a measure of the amount of variation or dispersion in a sample of data.
  • n represents the sample size, which is the number of observations in a sample.
  • df represents the degrees of freedom.
  • ME represents the margin of error.
  • SEM (or SE) represents the standard error.  When it is estimated it is sometimes labeled as ESEM.

The Equations

In the equations, Z represents the z statistic, t the t statistic, and P the proportion.


Confidence Interval for the Mean With σ Known

A sample of 400 students in a school district yielded a mean score of 498 on the verbal component of the SAT. The population mean for this component is known to be 500, with a standard deviation of 100. To estimate the mean score of all students in the district, a 95% confidence interval is calculated using the sample data.

import math
from scipy.stats import norm

# Define the sample data
n = 400
sample_mean = 498
population_sd = 100

# Calculate the standard error of the mean
sem = population_sd / math.sqrt(n)

# Calculate the z critical value for 95% confidence level
z_critical = norm.ppf(0.975)

# Calculate the margin of error
me = z_critical * sem

# Calculate the confidence interval
lower_limit = sample_mean - me
upper_limit = sample_mean + me

print(f"The z critical value is {z_critical:.2f}")
print(f"The 95% confidence interval for the population mean is: ({lower_limit:.2f}, {upper_limit:.2f})")

Reviewing the print statements, we get the following.

The z critical value is 1.96
The 95% confidence interval for the population mean is: (488.20, 507.80)

Confidence Interval for the Mean With σ Unknown

Data are collected concerning the birth weights for a nation-wide sample of 30 Wimberley Terriers. Results indicate that the mean birth weight for the sample of pups equals 6.36 ounces, with a standard deviation of 1.45 ounces. Based on that information, develop a 95% confidence interval to provide an estimate of the mean birth weight for the national population of Wimberley Terriers.

import math
from scipy.stats import t

# Define the sample data
n = 30
sample_mean = 6.36
sample_sd = 1.45

# Calculate the standard error of the mean
sem = sample_sd / math.sqrt(n)

# Calculate the t critical value for 95% confidence level with (n-1) degrees of freedom
df = n - 1
t_critical = t.ppf(0.975, df)

# Calculate the margin of error
me = t_critical * sem

# Calculate the confidence interval
lower_limit = sample_mean - me
upper_limit = sample_mean + me

print(f"The t critical value is {t_critical:.2f}")
print(f"The 95% confidence interval for the population mean birth weight is: ({lower_limit:.2f}, {upper_limit:.2f})")

Reviewing the print statements, we get the following.

The t critical value is 2.05
The 95% confidence interval for the population mean birth weight is: (5.82, 6.90)

Confidence Intervals for Proportions

Next, we will calculate confidence interval for a proportion, which parallels that of a confidence interval for the mean.

Of a sample of 200 registered voters, 32% report that they intend to vote in a school board election. Using a 95% confidence interval, estimate the percentage of all registered voters planning to vote.

import math

# Define the sample data
n = 200
p = 0.32

# Calculate the standard error of the proportion
se = math.sqrt((p * (1 - p)) / n)

# Calculate the margin of error for a 95% confidence level
z_critical = 1.96  # two-tailed test
me = z_critical * se

# Calculate the confidence interval
lower_limit = p - me
upper_limit = p + me

print(f"The z_critical value is {z_critical:.2f}")
print(f"The 95% confidence interval for the percentage of all registered voters planning to vote is: ({lower_limit:.2%}, {upper_limit:.2%})")

Reviewing the print statements, we get the following.

The z_critical value is 1.96
The 95% confidence interval for the percentage of all registered voters planning to vote is: (25.53%, 38.47%)

Conclusion

It’s worth noting that there are different methods for calculating confidence intervals in statistics, and these methods may vary in their notations and terminology. Learning about these differences can help improve your overall understanding of statistics. Keep in mind that statistics can sometimes involve personal preferences or judgment calls rather than strict rules and facts, so don’t worry if you encounter some variability in how different statisticians approach the topic.