Welcome to the data analytics portion of my WordPress site.
MLB: Correlating Runs Scored
Correlating runs scored to AVG, SLG, OBP, and OPS. Which metric best correlates to runs scored?
Here is my foray into the famous iris dataset. Besides the usual EDA, I perform a number of supervised and unsupervised machine learning models on the data. Because it is used extensively in ML training models, it seems appropriate to spend some time to learn this data set in full.
MLB No-Hitters and the Exponential Distribution
A review the exponential distribution of major league no-hitters.
Rainfall in Austin, Texas
Analysis of the average rainfall in my home city of Austin, Texas.
Literacy vs Fertility
Analysis of literacy vs fertility throughout the world using linear regression and bootstrap sampling.
Canelo Álvarez vs. Gennady Golovkin II
Who really won the Álvarez vs Golovkin II match? Here I perform an analysis of the CompuBox stats.
Dice Roll Game
A fun dice roll game where I run 10,000 simulations and perform a statistical analysis of my results.
High Low Card Game
A fun card game where I run 1 million simulations and perform a statistical analysis of my results.
Greatest NY Yankee
For this analysis I look at the career totals of 4 of the greatest Yankees to play the game.
Simulating a Basketball Game
In this study, we simulate two distinct scenarios involving basketball teams: one team exclusively shoots three-pointers, while the other focuses solely on two-pointers. We then visualize the data in a graph and evaluate which team exhibits a higher win percentage. Finally, we conduct a two-sample t-test to ascertain whether the average win percentages of the two teams are statistically different from one another.
This section comprises my writings on inferential statistics, focusing on the fundamentals and essential ideas that I found are critical for comprehending the field.
Standard Normal Curve
In this section, I provide a brief Python analysis of the Standard Normal Curve. While the analysis may not be overly complex, it sheds light on a few aspects that might not be immediately evident to all readers.
Central Limit Theorem
In this section, I delve into the topic of the Central Limit Theorem and emphasize its significance. By exploring the CLT alongside a comprehensive understanding of the standard error of the mean, readers can make substantial progress towards attaining a deep comprehension of inferential statistics. These concepts play a pivotal role in unlocking the true essence of inferential statistical analysis.
In this section, I perform calculations for confidence intervals across various scenarios. By considering different situations, I demonstrate the practical application of confidence intervals in statistical analysis.
In this section, I delve into the realm of hypothesis testing using Python. I explore various scenarios, including single samples, unmatched samples, matched samples, ANOVA (Analysis of Variance), and chi-square tests.