For this analysis, we are going to simulate a dice roll game.

Given 10,000 trials, perform a statistical analysis on the number of dice rolls needed to reach 100 points given the following rules.

- Starting at 0, for each dice roll resulting in 1 through 5, add the result to your score.
- For each dice roll resulting in 6, reroll and subtract the result from your score. Any re-roll that results in a 6 re-roll the dice again. Your score cannot go lower than 0.

For this simulation, I am going to use SQL. A language such as Python may be better suited to create this simulation, but using SQL is simpler than you might perceive. The trick is to use a numbers table, create a set of random dice rolls, and then use summation windows to determine the cumulative score after each dice roll. The code is a bit lengthy, but SQL is a verbose language not known for its conciseness. I’ve included the script at the end of this analysis.

So, now that we have covered the specifics of the rules, let’s move on to the analysis.

**Analysis**

After 10,000 trials, we have the following summary statistics.

Mean | 61 |

Median | 59 |

Mode | 53 |

Standard Deviation | 14 |

Minimum Dice Rolls | 28 |

Maximum Dice Rolls | 133 |

The mean is the average, the median is the exact middle, and the mode is the number of dice rolls that appear the most often. Because the mean and median are close, you might expect the data to fit a normal distribution. However, given the mode is slightly lower at 53, we can expect the data to be skewed.

We can view this skewness with the following histogram.

Reviewing a box plot of the dice rolls, we see that any result over 97 becomes an outlier. Approximately 2% of the dice rolls (197 of the 10,000 simulations) result in an outlier.

Another good graph is a cumulative distribution. From this graph, we can see that 50% of the time, the number of rolls to reach 100 points will be under 60 dice rolls or over 60 dice rolls.

We can also spin our cumulative distribution graph to show a histogram by percentage. This is the same graph as above, except after 50% we pivot the slope and create a downslope. We can ascertain the same information from both graphs; however, a good example where the below graph may be easier to read is that you can determine a 10% chance of reaching 100 points after 45 dice rolls or after 80 or more dice rolls.

Rolling the dice nearly 100 times to reach 100 points seems like an extraordinary number of times. This would need to be a simulation where we roll a large number of 6s combined.

Looking at two of the largest outliers in our data, we can use a scatter diagram to review each dice roll and how it contributed to the overall score.

Looking at the diagram, we can see some huge swings of 10 to 20 points that result in an unusual number of rolls to reach 100 points.

**Conclusion**

Performing these types of simulations is a fun way to learn both coding techniques and statistical analysis. I encourage everyone to try this same approach and create a fun game where you can guess the results and then test via a simulation. When performing data analysis, histograms, box plots, cumulative distributions, and scatter plots are an excellent resource for creating a visual understanding of your data.

Happy coding!

Here is my code written in SQL Server’s T-SQL. For simplicity, this code does one iteration of the game.

------------------------------- ------------------------------- --Tables used DROP TABLE IF EXISTS #Numbers; DROP TABLE IF EXISTS #DiceRolls; DROP TABLE IF EXISTS #DiceRolls_NotSixes; DROP TABLE IF EXISTS #DiceRolls_Modified; DROP TABLE IF EXISTS #DiceRollsResults; GO ------------------------------- ------------------------------- --Declare and set variables --Make an assumption on the total number of rolls needed DECLARE @vDesiredScore INTEGER = 100; DECLARE @vIterations INTEGER = 100; ------------------------------- ------------------------------- --Create and populate a #Numbers table WITH cte_Number (Number) AS ( SELECT 1 AS Number UNION ALL SELECT Number + 1 FROM cte_Number WHERE Number < 200 --@vDesiredScore * 2 --Assuming you will not need x times the desired score for the total number of rolls needed to achieve the desired score ) SELECT Number INTO #Numbers FROM cte_Number OPTION (MAXRECURSION 0);--A value of 0 means no limit to the recursion level ------------------------------- ------------------------------- --Create and populate a #DiceRolls table WITH cte_Random AS ( SELECT Number, ABS(CHECKSUM(NEWID()) % 6) + 1 AS DiceRoll, CAST(NULL AS INTEGER) AS DiceResult FROM #Numbers ) SELECT Number AS StepNumber ,DiceRoll INTO #DiceRolls FROM cte_Random; GO /*************************************** --If you need to create test data with multiple rolls of 6 UPDATE #DiceRolls SET DiceRoll = 6 WHERE StepNumber in (1,3,4,6,7); GO UPDATE #DiceRolls SET DiceRoll = 5 WHERE StepNumber in (2,5); GO ***************************************/ ------------------------------- ------------------------------- --Create and populate a #DiceRolls_NotSixes table SELECT StepNumber ,LEAD(StepNumber) OVER (ORDER BY StepNumber) - 1 AS Lead_StepNumber ,DiceRoll INTO #DiceRolls_NotSixes FROM #DiceRolls WHERE DiceRoll <> 6; GO ------------------------------- ------------------------------- --Create and populate a #DiceRolls_Modified table --Determines which StepNumbers need to be subtracted WITH cte_DiceRolls AS ( SELECT a.Number as StepNumber, ISNULL(b.Lead_StepNumber,a.Number) AS Lead_StepNumber, c.DiceRoll AS DiceRoll_Actual, CASE WHEN a.Number = b.StepNumber THEN b.DiceRoll ELSE ISNULL(b.DiceRoll,0) * - 1 END AS DiceRoll_Modified FROM #Numbers a LEFT OUTER JOIN #DiceRolls_NotSixes b ON a.Number BETWEEN b.StepNumber AND b.Lead_StepNumber LEFT OUTER JOIN #DiceRolls c ON a.Number = c.StepNumber ) SELECT 1 AS ID, StepNumber, Lead_StepNumber, DiceRoll_Actual, DiceRoll_Modified, SUM(DiceRoll_Modified) OVER (ORDER BY StepNumber) AS DiceRoll_Sum INTO #DiceRolls_Modified FROM cte_DiceRolls; GO ------------------------------- ------------------------------- --Create and populate a #DiceRollsResults table --Uses recursion to account for the sum not allowed to go below 0 ;WITH cte_DiceRolls_Modified AS ( --Add a ranking function here if needed --Test data has StepNumber to rank/sort the records. SELECT * FROM #DiceRolls_Modified ), cte_Recursion AS ( SELECT ID, StepNumber, DiceRoll_Actual, DiceRoll_Modified, DiceRoll_Sum, CASE WHEN DiceRoll_Modified < 0 THEN 0 WHEN DiceRoll_Modified > 1000 THEN 1000 ELSE DiceRoll_Modified END AS DiceRoll_Sum_Modified FROM #DiceRolls_Modified WHERE StepNumber = 1 UNION ALL SELECT cte.ID, t.StepNumber, t.DiceRoll_Actual, t.DiceRoll_Modified, t.DiceRoll_Sum, (CASE WHEN t.DiceRoll_Modified + cte.DiceRoll_Sum_Modified < 0 THEN 0 WHEN t.DiceRoll_Modified + cte.DiceRoll_Sum_Modified > 1000 THEN 1000 ELSE t.DiceRoll_Modified + cte.DiceRoll_Sum_Modified END) AS RunningSum FROM cte_Recursion cte INNER JOIN cte_DiceRolls_Modified t ON t.StepNumber = (cte.StepNumber + 1) AND t.ID = cte.ID ) SELECT * INTO #DiceRollsResults FROM cte_Recursion ORDER BY ID, StepNumber OPTION (MAXRECURSION 0);--A value of 0 means no limit to the recursion level; GO ------------------------------- ------------------------------- --Display the results SELECT * FROM #DiceRollsResults WHERE StepNumber <= (SELECT MIN(StepNumber) FROM #DiceRollsResults WHERE DiceRoll_Sum_Modified >= 100) ORDER BY StepNumber DESC;