Hypothesis Testing

For this WordPress page we are going to work through some textbook problems from a Statistics 101 textbook.

Typically, I prefer to delve into the rationale and mechanisms behind the problems at hand. However, in this instance, we’re going to bypass the theoretical exploration and head straight into addressing the textbook problems through Python solutions.

Please note that in addressing these problems, I will employ manual computations rather than resorting to the scipy.stats library. As we’ll be solving these problems by hand, it is imperative that you know how to determine the critical value from the relevant tables, given the degrees of freedom. A good statistics book will have an appendix with the appropriate charts for reference.


We will be performing the following hypothesis tests:

1. Hypothesis Test Based on a Single Sample With 𝛔 Known
2. Hypothesis Test Based on a Single Sample With 𝛔 Unknown
3. Hypothesis Testing With Two Independent Samples
4. Hypothesis Testing With Two Matched Samples
5. Hypothesis Testing Using ANOVA
6. Chi-Square Test of Independence


Here is a quick review of some common symbols used in statistics.

  • X̄ (pronounced as “X-bar”) represents the sample mean, which is the average value of a sample.
  • σ (pronounced as “sigma”) represents the standard deviation of the population, which is a measure of the amount of variation or dispersion in a population of data.
  • μ (pronounced as “mu”) represents the population mean, which is the average value of a population.
  • s represents the standard deviation of the sample, which is a measure of the amount of variation or dispersion in a sample of data.
  • n represents the sample size, which is the number of observations in a sample.
  • df represents the degrees of freedom.
  • α (prounced as “alpha”) represents the significance level in hypothesis testing. It represents the probability of rejecting the null hypothesis when it is in fact true, a mistake known as a Type I error.
  • ME represents the margin of error.
  • SEM (or SE) represents the standard error.  When it is estimated it is sometimes labeled as ESEM.

On to the statistics problems…..


Hypothesis Test Based on a Single Sample With 𝛔 Known

The police department of a major city reports that the mean number of auto thefts per neighborhood per year is 6.88 with a standard deviation of 1.19. As the mayor of a suburban community just outside the major city, you’re curious as to how the auto theft rate in your community compares. You determine that the mean number of auto thefts per neighborhood per year for a random sample of 15 neighborhoods in your community is 8.13. Assume that you’re working at the .05 level of significance.

a. State an appropriate null hypothesis.
b. What is the value of the calculated test statistic?
c. State your conclusion.

The null hypothesis is H0: μ = 6.88
The calculated z statistic is: 4.068259817444769
Reject the null hypothesis at the 0.05 significance level.
import math

# population
population_mean = 6.88
population_stdev = 1.19

# sample
sample_mean = 8.13
n = 15

# Significance level and critical value
significance_level = .05
critical_value = 1.96

# Calculate the Z statistic
diff_means = sample_mean - population_mean
# print("The difference of means is:", diff_means)

# Calculate the standard Error of the mean
sem = population_stdev / math.sqrt(n)
# print("The standard error of the mean is:", sem)

# Calculate the z statistic
z_statistic  = diff_means / sem
# print("The z statistic is:", z_statistic)

# a. State an appropriate null hypothesis.  
print("The null hypothesis is H0: μ =",population_mean)

# b. What is the value of the calculated test statistic (Z)?  
print("The calculated z statistic is:",z_statistic)

# c. State your conclusion.
if abs(z_statistic) < critical_value:
    print("Accept the null hypothesis at the",significance_level, "significance level.")
else:
    print("Reject the null hypothesis at the",significance_level, "significance level.")

Hypothesis Test Based on a Single Sample With 𝛔 Unknown

The mean level of absenteeism rate for the local school district is reported as 8.45 days per year, per student. The mean rate for a sample of 30 students enrolled in a vocational training program is reported as 6.79 days per year with a standard deviation of 2.56 days. Assume that you’re working at the .05 level of significance.

a. State an appropriate null hypothesis.
b. What is the value of the calculated test statistic?
c. Identify the critical value.
d. State your conclusion.

The null hypothesis is H0: μ = 8.45
The calculated t statistic is: -3.55163845882256
The critical value is: 2.045
Reject the null hypothesis at the 0.05 significance level.
import math

# Population
population_mean = 8.45

# Sample
sample_mean = 6.79
n = 30
sample_sd = 2.56

# Level of significance and critical value
critical_value = 2.045 
significance_level = .05
 
# Calculate the difference of means
diff_means = sample_mean - population_mean

# Calculate the estimated standard error of the mean
esem = sample_sd / math.sqrt(n)

# Calculate the t statistic 
t_statistic  = diff_means / esem

# a. State an appropriate null hypothesis.  
print("The null hypothesis is H0: μ =",population_mean)

# b. What is the value of the calculated test statistic (t)?  
print("The calcualted t statistic is:", t_statistic)

# c. Identify the critical value.
print("The critical value is:", critical_value)

# d. State your conclusion.
if abs(t_statistic) >= critical_value:
    print("Reject the null hypothesis at the",significance_level, "significance level.")
else:
    print("Accept the null hypothesis at the",significance_level, "significance level.")

Hypothesis Testing With Two Independent Samples

Consider a research situation investigating the potential statistical difference in drinking habits between fraternity members and non-fraternity members. Assume you are working with a .05 significance level.

Fraternity members’ weekly drinks per week: 6, 3, 2, 4, 5, 6, 7, 5, 4, 5, 4, 8, 6, 7
Non-fraternity members’ weekly drinks per week: 0, 5, 3, 4, 3, 6, 3, 6, 5, 4, 4, 2

a. Formulate an appropriate null hypothesis.
b. Calculate t statistic.
c. Identify the critical value.
d. State your conclusion.

The null hypothesis is: H0: μ1 = μ2
The calculated t statistic is: 2.1039711719961014
The critical value is: 2.06
Reject the null hypothesis at the 0.05 significance level.
import numpy as np
import math

# Drinks per week for fraternity and non fraternity
fraternity = np.array([6,3,2,4,5,6,7,5,4,5,4,8,6,7])
non_fraternity = np.array([0,5,3,4,3,6,3,6,5,4,4,2])

# Sample size and degrees of freedom
fraternity_n = len(fraternity) # 14
non_fraternity_n = len(non_fraternity)
fraternity_df = fraternity_n - 1
non_fraternity_df = non_fraternity_n - 1
total_df = fraternity_df + non_fraternity_df

# Significance level and critical level
significance_level = .05
critical_value = 2.06

# Calculate the mean
fraternity_mean = np.mean(fraternity)
non_fraternity_mean = np.mean(non_fraternity)

# Calculate the standard deviation and the variance
fraternity_std = np.std(fraternity,ddof=1)
non_fraternity_std = np.std(non_fraternity,ddof=1)
fraternity_variance = np.var(fraternity,ddof=1)
non_fraternity_variance = np.var(non_fraternity,ddof=1)

# Calculate the standard error of difference of means
standard_error_diff_means_p1 = (fraternity_df * fraternity_variance) + (non_fraternity_df * non_fraternity_variance)
standard_error_diff_means_p2 = (fraternity_df + non_fraternity_df)
standard_error_diff_means_p3 = ((1/fraternity_n) + (1/non_fraternity_n))

# Calculate the standard error difference between means
standard_error_diff_means = math.sqrt((standard_error_diff_means_p1/standard_error_diff_means_p2) * standard_error_diff_means_p3)

# Calculate the t statistic 
t_statistic  = (fraternity_mean - non_fraternity_mean)/standard_error_diff_means

# a. Formulate an appropriate null hypothesis
print("The null hypothesis is H0: μ1 = μ2")

# b. Calculate t statistic
print("The calculated t statistic is", t_statistic)

# c. Identify the critical value
print("The critical value is:", significance_level, "is", critical_value)

# d. State your conclusion
if abs(t_statistic) > critical_value:
    print("Reject the null hypothesis at the",significance_level, "significance level.")
else:
    print("Accept the null hypothesis at the",significance_level, "significance level.")

Hypothesis Testing With Two Matched Samples

Consider the set of scores which reflect the performance of a drug awareness test administered to a sample of 15 participants in a before/after test situation.

First, each of 15 participants was administered a drug awareness test, and their scores were recorded. The participants were then shown a film concerning the dangers of recreational drug use. Following exposure to the film, the 15 participants were given the drug awareness test again, and these scores were recorded.

Before: 50, 77, 67, 94, 64, 77, 85, 52, 81, 91, 52, 61, 83, 66, 71
After: 55, 79, 82, 90, 64, 83, 80, 55, 79, 91, 61, 77, 83, 70, 75

Remember, these are matches samples, so their place in their array corresponds to the individual.

The null hypothesis is H0: μ1 = μ2
The calculated t statistic is: 2.2331171200871616
The critical value is: 2.15
Reject the null hypothesis at the 0.05 significance level.
import numpy as np
import math

# Define the two datasets
data_before = [50, 77, 67, 94, 64, 77, 85, 52, 81, 91, 52, 61, 83, 66, 71]
data_after = [55, 79, 82, 90, 64, 83, 80, 55, 79, 91, 61, 77, 83, 70, 75]
n = 15

# Significance level and critical value
significance_level = 0.05
critical_value = 2.15

# Calculate the differences between before and after scores
data_diff = np.array(data_before) - np.array(data_after)

# Calculate the mean of the differences
mean_of_differences = abs(np.mean(data_diff))

# Calculate the standard deviation of the differences
standard_deviation_of_differences = np.sqrt(np.sum((data_diff - np.mean(data_diff))**2) / (len(data_diff) - 1))

# Calculate the estimated standard error of mean differences
estimated_standard_error_of_mean_differences = standard_deviation_of_differences / math.sqrt(n)

# Calculate the t statistic 
t_statistic  = mean_of_differences / estimated_standard_error_of_mean_differences

# a. State an appropriate null hypothesis.
print("The null hypothesis is H0: μ1 = μ2")

# b. What is the value of the calculated test statistic (t)?
print("The calculated t-statistic is:", t_statistic)

# c. Identify the critical value.
print("The critical value is:", critical_value)

# d. State your conclusion.
if abs(t_statistic) >= critical_value:
    print("Reject the null hypothesis at the", significance_level, "significance level.")
else:
    print("Accept the null hypothesis at the", significance_level, "significance level.")

Hypothesis Testing Using ANOVA

An evaluation survey, designed to measure perceived program effectiveness, was administered to a sample of 37 citizens who attended a community crime-prevention meeting. The respondents were asked to rate (on a scale of 1 to 12) the meeting in terms of effectiveness in presenting useful information. The responses were analyzed, based upon the place of residence of the respondent—northern sector, southern sector, eastern, or western sector—and the following results were found.

Northern: 3.8, 7.1, 9.6, 8.4, 5.1, 11.6, 6.2, 7.9, 9.0, 10.3
Southern: 4.2, 6.5, 4.4, 8.1, 7.6, 5.8, 4.0, 7.3, 5.2, 4.8
Eastern: 8.8, 5.1, 12.7, 6.4, 9.8, 6.3, 10.2, 8.5, 11.9, 8.6
Western: 4.8, 1.2, 8.0, 9.4, 3.6, 8.7, 6.5

a. State an appropriate null hypothesis.
b. What are the values of each category mean?
c. What is the value of the grand mean?
d. What is the value of the between-groups sum of squares?
e. What is the value of the within-groups sum of squares?
f. What is the value of the between-groups degrees of freedom?
g. What is the value of the within-groups degrees of freedom?
h. What is the value of the within-groups mean of squares?
i. What is the value of the between-groups mean of squares?
j. What is the value of F?
k. Assuming you were working at the .05 level of significance, what would you conclude?


The null hypothesis is H0: μ1 = μ2 = μ3 = μ4 
Mean for Northern sector: 7.9
Mean for Southern sector: 5.79
Mean for Eastern sector: 8.83
Mean for Western sector: 6.028571428571429
Grand Mean: 7.2270270270270265
Between-Groups Sum of Squares: 60.92868725868725
Within-Groups Sum of Squares: 179.3042857142857
Between-Groups Degrees of Freedom: 3
Within-Groups Degrees of Freedom: 33
Within-Groups Mean of Squares: 5.433463203463203
Between-Groups Mean of Squares: 20.309562419562415
F value: 3.7378669292574616
Critical value: 2.92
Reject the null hypothesis at the 0.05 significance level.
import numpy as np

# Define the data for each sector
northern = np.array([3.8,7.1,9.6,8.4,5.1,11.6,6.2,7.9,9.0,10.3])
southern = np.array([4.2,6.5,4.4,8.1,7.6,5.8,4.0,7.3,5.2,4.8])
eastern  = np.array([8.8,5.1,12.7,6.4,9.8,6.3,10.2,8.5,11.9,8.6])
western  = np.array([4.8,1.2,8.0,9.4,3.6,8.7,6.5])

# Level of significance
significance_level = .05
critical_value = 2.92

# Calculate the mean for each sector
mean_northern = np.mean(northern)
mean_southern = np.mean(southern)
mean_eastern = np.mean(eastern)
mean_western = np.mean(western)

# Calculate the grand mean
grand_mean = np.mean(np.concatenate((northern, southern, eastern, western)))

# Calculate the between-groups sum of squares
ss_between = (len(northern) * (mean_northern - grand_mean)**2 +
              len(southern) * (mean_southern - grand_mean)**2 +
              len(eastern) * (mean_eastern - grand_mean)**2 +
              len(western) * (mean_western - grand_mean)**2)

# Calculate the within-groups sum of squares
ss_within = np.sum((northern - mean_northern)**2) + \
            np.sum((southern - mean_southern)**2) + \
            np.sum((eastern - mean_eastern)**2) + \
            np.sum((western - mean_western)**2)

# Calculate the between-groups degrees of freedom
df_between = 4 - 1

# Calculate the within-groups degrees of freedom
df_within = len(northern) + len(southern) + len(eastern) + len(western) - 4

# Calculate the within-groups mean of squares
ms_within = ss_within / df_within

# Calculate the between-groups mean of squares
ms_between = ss_between / df_between

# Calculate the F statistic
f_statistic = ms_between / ms_within

# a. State the null hypothesis
print("The null hypothesis is H0: μ1 = μ2 = μ3 = μ4")

# b. Print the mean for each sector
print("Mean for Northern sector:", mean_northern)
print("Mean for Southern sector:", mean_southern)
print("Mean for Eastern sector:", mean_eastern)
print("Mean for Western sector:", mean_western)

# c. Print the grand mean
print("Grand Mean:", grand_mean)

# d. Print the between-groups sum of squares
print("Between-Groups Sum of Squares:", ss_between)

# e. Print the within-groups sum of squares
print("Within-Groups Sum of Squares:", ss_within)

# f. Print the between-groups degrees of freedom
print("Between-Groups Degrees of Freedom:", df_between)

# g. Print the within-groups degrees of freedom
print("Within-Groups Degrees of Freedom:", df_within)

# h. Print the within-groups mean of squares
print("Within-Groups Mean of Squares:", ms_within)

# i. Print the between-groups mean of squares
print("Between-Groups Mean of Squares:", ms_between)

# j. Print the F significance level.
print("F statistic:", f_statistic)

# k. Compare the F-value with the critical F-statistic
if f_statistic > critical_value:
    print("Reject the null hypothesis at the", significance_level, "significance level.")
else:
    print("Fail to reject the null hypothesis at the", significance_level, "significance level.")

Chi-Square Test of Independence

You are interested in whether there is any association between gender and academic major. Questioning 75 students, you obtain the following results:

BusinessScienceLiberal ArtsOtherTotal
Female1099735
Male121110740
Total2220191475

a. How many degrees of freedom are involved?
b. What is the calculated value of Chi-Square?
c. Assuming the .05 level of significance, what would you conclude?

Degrees of Freedom: 3
Chi-Square: 0.10156784005468232
The null hypothesis is not rejected. There is no significant association between gender and major.
import numpy as np

# Observed data
observed = np.array([[10, 9, 9, 7],
                     [12, 11, 10, 7]])

# Calculate row, column and total sums
row_totals = observed.sum(axis=1)
col_totals = observed.sum(axis=0)
grand_total = observed.sum()

# Expected data
expected = np.outer(row_totals, col_totals) / grand_total

# a. Degrees of freedom
df = (len(row_totals) - 1) * (len(col_totals) - 1)
print(f'Degrees of Freedom: {df}')

# b. Chi-square statistic
chi_square = ((observed - expected)**2 / expected).sum()
print(f'Chi-Square: {chi_square}')

# c. Conclusion
critical_value = 7.815
if chi_square > critical_value:
    print('The null hypothesis is rejected. There is a significant association between gender and major.')
else:
    print('The null hypothesis is not rejected. There is no significant association between gender and major.')