For this analysis we are going to simulate a dice roll game.
Given 10,000 trials, perform a statical analysis on the number of dice rolls needed to reach 100 points given the following rules.
- Starting at 0, for each dice roll resulting in 1 through 5, add the result to your score.
- For each dice roll resulting in 6, reroll the dice and subtract the result from your score. Any re-roll that results in a 6, re-roll the dice again. Your score cannot go lower than 0.
For this simulation I am going to use SQL. A language such as Python may be better suited to create this simulation, but using SQL is simpler than you might perceive. The trick is to use a numbers table and create a set of random dice rolls and then use summation windows to determine the cumulative score after each dice roll. The code is a bit lengthy, but SQL is a verbose language not known for its conciseness. I’ve included the script at the end of this analysis.
So now that we have covered the specifics of the rules, let’s move on to the analysis.
After 10,000 trials, we have the following summary statistics.
|Minimum Dice Rolls||28|
|Maximum Dice Rolls||133|
The mean is the average, the median is the exact middle, and the mode is the number of dice rolls that appear the most often. Because the mean and median are close, you might expect the data to fit a normal distribution. However, given the mode is slightly lower at 53 we can expect the data to be skewed.
We can view this skewness with the following histogram.
Reviewing a box plot of the dice rolls, we can see that any result over 97 becomes an outlier. Approximately 2% of the dice rolls (197 of the 10,000 simulations) result in an outlier.
Another good graph is a cumulative distribution. From this graph we can see that 50% of the time the number of rolls to reach 100 points will be under 60 dice rolls or over 60 dice rolls.
We can also spin our cumulative distribution graph to show a histogram by percentage. This is the same graph as above, except that after 50% we pivot the slope and create a downslope. We can ascertain the same information from both of these graphs, however, a good example where the below graph may be easier to read is that you can determine there is a 10% chance of reaching 100 points after 45 dice rolls or after 80 or more dice rolls.
Rolling the dice anywhere near 100 times to reach 100 points seems like an extra ordinary number of times. This would need to be a simulation where we roll a large number of 6s combined with the average dice roll where we add to our score is less than 2.5.
Looking at two of the largest outliers in our data, we can use a scatter diagram to review each dice roll and how it contributed to the overall score.
Looking at the diagram, we can see some huge swings of 10 to 20 points that result in an unusual number of rolls to reach 100 points.
Performing these types of simulations is a fun way to learn both coding techniques and statistical analysis. I encourage everyone to try this same approach and create a fun game where you can guess the results and then test via a simulation. When performing data analysis, histograms, box plots, cumulative distributions, and scatter plots are a great resource for creating a visual understanding of your data.
Here is my code written in SQL Server’s T-SQL.
CREATE SCHEMA dicerolls GO DROP TABLE IF EXISTS dicerolls.Numbers; DROP TABLE IF EXISTS dicerolls.DiceRolls; DROP TABLE IF EXISTS dicerolls.DiceRollResults; DROP TABLE IF EXISTS dicerolls.DiceRollsHistory; GO CREATE TABLE dicerolls.Numbers ( MyValue INTEGER NOT NULL, StepNumber INTEGER IDENTITY(1,1) NOT NULL ); GO INSERT INTO dicerolls.Numbers (MyValue) SELECT 1 GO 300 CREATE TABLE dicerolls.DiceRolls ( StepNumber INTEGER NOT NULL, DiceRoll INTEGER NOT NULL, DiceResult INTEGER ); GO CREATE TABLE dicerolls.DiceRollResults ( Iteration INTEGER IDENTITY(1,1) NOT NULL, StepNumber INTEGER ); GO CREATE TABLE dicerolls.DiceRollsHistory ( Iteration INTEGER NOT NULL, StepNumber INTEGER NOT NULL, DiceRoll INTEGER NOT NULL, DiceResult INTEGER ); GO DECLARE @Counter INTEGER = 1; WHILE @Counter <= 10 BEGIN DROP TABLE IF EXISTS dicerolls.DiceRolls_Sixes; TRUNCATE TABLE dicerolls.DiceRolls; INSERT INTO dicerolls.DiceRolls SELECT StepNumber, ABS(CHECKSUM(NEWID()) % 6) + 1 AS DiceRoll, NULL FROM dicerolls.Numbers; --Determines dice rolls that result in 6 and the next dice roll (LEAD) SELECT StepNumber AS StepNumber, DiceRoll AS DiceRoll, '<----->' AS Div, LEAD(StepNumber,1) OVER (ORDER BY StepNumber) AS StepNumber_Lead, LEAD(DiceRoll,1) OVER (ORDER BY StepNumber) AS DiceRoll_Lead INTO dicerolls.DiceRolls_Sixes FROM dicerolls.DiceRolls WHERE DiceRoll = 6; --Determines the dice roll to subtract from the score --This uses a semi-join with an anti-join subquery. UPDATE dicerolls.DiceRolls SET DiceRoll = DiceRoll * -1 WHERE StepNumber IN (SELECT StepNumber_Lead FROM dicerolls.DiceRolls_Sixes WHERE StepNumber_Lead NOT IN (SELECT StepNumber FROM dicerolls.DiceRolls_Sixes)); --Sum the DiceResults up WITH cte_Sum AS ( SELECT StepNumber, DiceRoll, SUM(DiceRoll) OVER (ORDER BY StepNumber) AS DiceResult FROM dicerolls.DiceRolls WHERE DiceRoll <> 6 ) UPDATE dicerolls.DiceRolls SET DiceResult = a.DiceResult FROM cte_Sum a INNER JOIN dicerolls.DiceRolls b ON a.StepNumber = b.StepNumber; --The cumulative sum of the dice cannot go below 0 --I setup a loop to ensure that I properly handle this situation. WHILE 0 > (SELECT MIN(DiceResult) FROM dicerolls.DiceRolls WHERE DiceResult < 0) BEGIN WITH cte_Sum AS ( SELECT StepNumber, DiceRoll, SUM(DiceRoll) OVER (ORDER BY StepNumber) AS DiceResult FROM dicerolls.DiceRolls WHERE DiceRoll <> 6 AND StepNumber > (SELECT MIN(StepNumber) FROM dicerolls.DiceRolls WHERE DiceResult < 0) ) UPDATE dicerolls.DiceRolls SET DiceResult = a.DiceResult FROM cte_Sum a INNER JOIN dicerolls.DiceRolls b ON a.StepNumber = b.StepNumber; UPDATE dicerolls.DiceRolls SET DiceResult = 0 WHERE StepNumber = (SELECT MIN(StepNumber) FROM dicerolls.DiceRolls WHERE DiceResult < 0); END--End Loop INSERT INTO dicerolls.DiceRollResults (StepNumber) SELECT MIN(StepNumber) FROM dicerolls.DiceRolls WHERE DiceResult > 100; INSERT INTO dicerolls.DiceRollsHistory (Iteration, StepNumber,DiceRoll,DiceResult) SELECT @Counter, StepNumber, DiceRoll, DiceResult FROM dicerolls.DiceRolls ORDER BY 1; SET @Counter = @Counter + 1; END--End Loop SELECT * FROM dicerolls.DiceRollResults ORDER BY 2 DESC;