Dice Roll Game

For this analysis, we are going to simulate a dice roll game.

Given 10,000 trials, perform a statistical analysis on the number of dice rolls needed to reach 100 points given the following rules.

  • Starting at 0, for each dice roll resulting in 1 through 5, add the result to your score.
  • For each dice roll resulting in 6, reroll and subtract the result from your score. Any re-roll that results in a 6 re-roll the dice again. Your score cannot go lower than 0.

For this simulation, I am going to use SQL. A language such as Python may be better suited to create this simulation, but using SQL is simpler than you might perceive. The trick is to use a numbers table, create a set of random dice rolls, and then use summation windows to determine the cumulative score after each dice roll. The code is a bit lengthy, but SQL is a verbose language not known for its conciseness. I’ve included the script at the end of this analysis.

So, now that we have covered the specifics of the rules, let’s move on to the analysis.

Analysis

After 10,000 trials, we have the following summary statistics.

Mean61
Median59
Mode53
Standard Deviation14
Minimum Dice Rolls28
Maximum Dice Rolls133

The mean is the average, the median is the exact middle, and the mode is the number of dice rolls that appear the most often. Because the mean and median are close, you might expect the data to fit a normal distribution. However, given the mode is slightly lower at 53, we can expect the data to be skewed.

We can view this skewness with the following histogram.

Reviewing a box plot of the dice rolls, we see that any result over 97 becomes an outlier. Approximately 2% of the dice rolls (197 of the 10,000 simulations) result in an outlier.

Another good graph is a cumulative distribution. From this graph, we can see that 50% of the time, the number of rolls to reach 100 points will be under 60 dice rolls or over 60 dice rolls.

We can also spin our cumulative distribution graph to show a histogram by percentage. This is the same graph as above, except after 50% we pivot the slope and create a downslope. We can ascertain the same information from both graphs; however, a good example where the below graph may be easier to read is that you can determine a 10% chance of reaching 100 points after 45 dice rolls or after 80 or more dice rolls.

Rolling the dice nearly 100 times to reach 100 points seems like an extraordinary number of times. This would need to be a simulation where we roll a large number of 6s combined.

Looking at two of the largest outliers in our data, we can use a scatter diagram to review each dice roll and how it contributed to the overall score.

Looking at the diagram, we can see some huge swings of 10 to 20 points that result in an unusual number of rolls to reach 100 points.

Conclusion

Performing these types of simulations is a fun way to learn both coding techniques and statistical analysis. I encourage everyone to try this same approach and create a fun game where you can guess the results and then test via a simulation. When performing data analysis, histograms, box plots, cumulative distributions, and scatter plots are an excellent resource for creating a visual understanding of your data.

Happy coding!


Here is my code written in SQL Server’s T-SQL. For simplicity, this code does one iteration of the game.

-------------------------------
-------------------------------
--Tables used
DROP TABLE IF EXISTS #Numbers;
DROP TABLE IF EXISTS #DiceRolls;
DROP TABLE IF EXISTS #DiceRolls_NotSixes;
DROP TABLE IF EXISTS #DiceRolls_Modified;
DROP TABLE IF EXISTS #DiceRollsResults;
GO

-------------------------------
-------------------------------
--Declare and set variables
--Make an assumption on the total number of rolls needed
DECLARE @vDesiredScore INTEGER = 100;
DECLARE @vIterations INTEGER = 100;

-------------------------------
-------------------------------
--Create and populate a #Numbers table
WITH cte_Number (Number)
AS (
    SELECT 1 AS Number
    UNION ALL
    SELECT  Number + 1
    FROM   cte_Number
    WHERE  Number < 200 --@vDesiredScore * 2  --Assuming you will not need x times the desired score for the total number of rolls needed to achieve the desired score
    )
SELECT Number
INTO   #Numbers
FROM   cte_Number
OPTION (MAXRECURSION 0);--A value of 0 means no limit to the recursion level

-------------------------------
-------------------------------
--Create and populate a #DiceRolls table
WITH cte_Random AS
(
SELECT  Number,
        ABS(CHECKSUM(NEWID()) % 6) + 1 AS DiceRoll,
        CAST(NULL AS INTEGER) AS DiceResult
FROM    #Numbers
)
SELECT  Number AS StepNumber
        ,DiceRoll
INTO    #DiceRolls
FROM    cte_Random;
GO

/***************************************
--If you need to create test data with multiple rolls of 6

UPDATE #DiceRolls
SET		DiceRoll = 6
WHERE	StepNumber in (1,3,4,6,7);
GO
	

UPDATE #DiceRolls
SET		DiceRoll = 5
WHERE	StepNumber in (2,5);
GO
***************************************/

-------------------------------
-------------------------------
--Create and populate a #DiceRolls_NotSixes table
SELECT  StepNumber
        ,LEAD(StepNumber) OVER (ORDER BY StepNumber) - 1 AS Lead_StepNumber
        ,DiceRoll
INTO    #DiceRolls_NotSixes
FROM    #DiceRolls
WHERE   DiceRoll <> 6;
GO

-------------------------------
-------------------------------
--Create and populate a #DiceRolls_Modified table
--Determines which StepNumbers need to be subtracted
WITH cte_DiceRolls AS
(
SELECT  a.Number as StepNumber,
        ISNULL(b.Lead_StepNumber,a.Number) AS Lead_StepNumber,
        c.DiceRoll AS DiceRoll_Actual,
        CASE WHEN a.Number = b.StepNumber THEN b.DiceRoll ELSE ISNULL(b.DiceRoll,0) * - 1 END AS DiceRoll_Modified
FROM    #Numbers a LEFT OUTER JOIN
        #DiceRolls_NotSixes b ON a.Number BETWEEN b.StepNumber AND b.Lead_StepNumber
        LEFT OUTER JOIN
        #DiceRolls c ON a.Number = c.StepNumber
)
SELECT  1 AS ID,
        StepNumber,
        Lead_StepNumber,
        DiceRoll_Actual,
        DiceRoll_Modified,
        SUM(DiceRoll_Modified) OVER (ORDER BY StepNumber) AS DiceRoll_Sum
INTO    #DiceRolls_Modified
FROM    cte_DiceRolls;
GO

-------------------------------
-------------------------------
--Create and populate a #DiceRollsResults table
--Uses recursion to account for the sum not allowed to go below 0
;WITH cte_DiceRolls_Modified AS
(
--Add a ranking function here if needed
--Test data has StepNumber to rank/sort the records.
SELECT  *
FROM    #DiceRolls_Modified
),
cte_Recursion AS
(
SELECT  ID,
        StepNumber,
        DiceRoll_Actual,
        DiceRoll_Modified,
        DiceRoll_Sum,
        CASE WHEN DiceRoll_Modified < 0 THEN 0
             WHEN DiceRoll_Modified > 1000 THEN 1000
             ELSE DiceRoll_Modified
        END AS DiceRoll_Sum_Modified 
FROM    #DiceRolls_Modified
WHERE StepNumber = 1
UNION ALL
SELECT  cte.ID,
        t.StepNumber,
        t.DiceRoll_Actual,
        t.DiceRoll_Modified,
        t.DiceRoll_Sum,
        (CASE WHEN t.DiceRoll_Modified + cte.DiceRoll_Sum_Modified < 0 THEN 0
              WHEN t.DiceRoll_Modified + cte.DiceRoll_Sum_Modified > 1000 THEN 1000
              ELSE t.DiceRoll_Modified + cte.DiceRoll_Sum_Modified
        END) AS RunningSum
FROM    cte_Recursion cte
        INNER JOIN
        cte_DiceRolls_Modified t ON t.StepNumber = (cte.StepNumber + 1) AND t.ID = cte.ID
)
SELECT   *
INTO     #DiceRollsResults
FROM     cte_Recursion
ORDER BY ID,
         StepNumber
OPTION (MAXRECURSION 0);--A value of 0 means no limit to the recursion level;
GO

-------------------------------
-------------------------------
--Display the results
SELECT  * 
FROM    #DiceRollsResults
WHERE   StepNumber <= (SELECT MIN(StepNumber) FROM #DiceRollsResults WHERE DiceRoll_Sum_Modified >= 100)
ORDER BY StepNumber DESC;