Dice Roll Game

For this analysis we are going to simulate a dice roll game.

Given 10,000 trials, perform a statical analysis on the number of dice rolls needed to reach 100 points given the following rules.

  • Starting at 0, for each dice roll resulting in 1 through 5, add the result to your score.
  • For each dice roll resulting in 6, reroll the dice and subtract the result from your score. Any re-roll that results in a 6, re-roll the dice again. Your score cannot go lower than 0.

For this simulation I am going to use SQL. A language such as Python may be better suited to create this simulation, but using SQL is simpler than you might perceive. The trick is to use a numbers table and create a set of random dice rolls and then use summation windows to determine the cumulative score after each dice roll. The code is a bit lengthy, but SQL is a verbose language not known for its conciseness. I’ve included the script at the end of this analysis.

So now that we have covered the specifics of the rules, let’s move on to the analysis.

Analysis

After 10,000 trials, we have the following summary statistics.

Mean61
Median59
Mode53
Standard Deviation14
Minimum Dice Rolls28
Maximum Dice Rolls133

The mean is the average, the median is the exact middle, and the mode is the number of dice rolls that appear the most often. Because the mean and median are close, you might expect the data to fit a normal distribution. However, given the mode is slightly lower at 53 we can expect the data to be skewed.

We can view this skewness with the following histogram.

Reviewing a box plot of the dice rolls, we can see that any result over 97 becomes an outlier. Approximately 2% of the dice rolls (197 of the 10,000 simulations) result in an outlier.

Another good graph is a cumulative distribution. From this graph we can see that 50% of the time the number of rolls to reach 100 points will be under 60 dice rolls or over 60 dice rolls.

We can also spin our cumulative distribution graph to show a histogram by percentage. This is the same graph as above, except that after 50% we pivot the slope and create a downslope. We can ascertain the same information from both of these graphs, however, a good example where the below graph may be easier to read is that you can determine there is a 10% chance of reaching 100 points after 45 dice rolls or after 80 or more dice rolls.

Rolling the dice anywhere near 100 times to reach 100 points seems like an extra ordinary number of times. This would need to be a simulation where we roll a large number of 6s combined with the average dice roll where we add to our score is less than 2.5.

Looking at two of the largest outliers in our data, we can use a scatter diagram to review each dice roll and how it contributed to the overall score.

Looking at the diagram, we can see some huge swings of 10 to 20 points that result in an unusual number of rolls to reach 100 points.

Conclusion

Performing these types of simulations is a fun way to learn both coding techniques and statistical analysis. I encourage everyone to try this same approach and create a fun game where you can guess the results and then test via a simulation. When performing data analysis, histograms, box plots, cumulative distributions, and scatter plots are a great resource for creating a visual understanding of your data.

Happy coding!


Here is my code written in SQL Server’s T-SQL.

CREATE SCHEMA dicerolls
GO
DROP TABLE IF EXISTS dicerolls.Numbers;
DROP TABLE IF EXISTS dicerolls.DiceRolls;
DROP TABLE IF EXISTS dicerolls.DiceRollResults;
DROP TABLE IF EXISTS dicerolls.DiceRollsHistory;
GO
CREATE TABLE dicerolls.Numbers
(
MyValue		INTEGER NOT NULL,
StepNumber	INTEGER IDENTITY(1,1) NOT NULL
);
GO
INSERT INTO dicerolls.Numbers (MyValue)
SELECT	1
GO 300
CREATE TABLE dicerolls.DiceRolls
(
StepNumber	INTEGER NOT NULL,
DiceRoll	INTEGER NOT NULL,
DiceResult	INTEGER
);
GO
CREATE TABLE dicerolls.DiceRollResults
(
Iteration		INTEGER IDENTITY(1,1) NOT NULL,
StepNumber		INTEGER
);
GO
CREATE TABLE dicerolls.DiceRollsHistory
(
Iteration	INTEGER NOT NULL,
StepNumber	INTEGER NOT NULL,
DiceRoll	INTEGER NOT NULL,
DiceResult	INTEGER
);
GO
DECLARE @Counter INTEGER = 1;
WHILE @Counter <= 10
	BEGIN
	DROP TABLE IF EXISTS dicerolls.DiceRolls_Sixes;
	TRUNCATE TABLE dicerolls.DiceRolls;
	INSERT INTO dicerolls.DiceRolls
	SELECT	StepNumber,
			ABS(CHECKSUM(NEWID()) % 6) + 1 AS DiceRoll,
			NULL
	FROM	dicerolls.Numbers;
	
	--Determines dice rolls that result in 6 and the next dice roll (LEAD)
	SELECT	StepNumber AS StepNumber,
			DiceRoll AS DiceRoll,
			'<----->' AS Div,
			LEAD(StepNumber,1) OVER (ORDER BY StepNumber) AS StepNumber_Lead,
			LEAD(DiceRoll,1) OVER (ORDER BY StepNumber) AS DiceRoll_Lead
	INTO	dicerolls.DiceRolls_Sixes
	FROM	dicerolls.DiceRolls
	WHERE	DiceRoll = 6;
	--Determines the dice roll to subtract from the score
	--This uses a semi-join with an anti-join subquery.
	UPDATE	dicerolls.DiceRolls
	SET		DiceRoll = DiceRoll * -1
	WHERE	StepNumber IN (SELECT StepNumber_Lead FROM dicerolls.DiceRolls_Sixes WHERE StepNumber_Lead NOT IN (SELECT StepNumber FROM dicerolls.DiceRolls_Sixes));
	--Sum the DiceResults up
	WITH cte_Sum AS
	(
	SELECT	StepNumber,
			DiceRoll,
			SUM(DiceRoll) OVER (ORDER BY StepNumber) AS DiceResult
	FROM	dicerolls.DiceRolls
	WHERE	DiceRoll <> 6
	)
	UPDATE	dicerolls.DiceRolls
	SET		DiceResult = a.DiceResult
	FROM	cte_Sum a INNER JOIN
			dicerolls.DiceRolls b ON a.StepNumber = b.StepNumber;
	--The cumulative sum of the dice cannot go below 0
	--I setup a loop to ensure that I properly handle this situation.
	WHILE 0 > (SELECT MIN(DiceResult) FROM dicerolls.DiceRolls WHERE DiceResult < 0)
		BEGIN
		WITH cte_Sum AS
		(
		SELECT	StepNumber,
				DiceRoll,
				SUM(DiceRoll) OVER (ORDER BY StepNumber) AS DiceResult
		FROM	dicerolls.DiceRolls
		WHERE	DiceRoll <> 6 AND StepNumber > (SELECT MIN(StepNumber) FROM dicerolls.DiceRolls WHERE DiceResult < 0) 
		)
		UPDATE	dicerolls.DiceRolls
		SET		DiceResult = a.DiceResult
		FROM	cte_Sum a INNER JOIN
				dicerolls.DiceRolls b ON a.StepNumber = b.StepNumber;
	
		UPDATE	dicerolls.DiceRolls
		SET		DiceResult = 0
		WHERE	StepNumber = (SELECT MIN(StepNumber) FROM dicerolls.DiceRolls WHERE DiceResult < 0);
		
		END--End Loop
	INSERT INTO dicerolls.DiceRollResults (StepNumber)
	SELECT
			MIN(StepNumber)
	FROM	dicerolls.DiceRolls
	WHERE	DiceResult > 100;
	INSERT INTO dicerolls.DiceRollsHistory (Iteration, StepNumber,DiceRoll,DiceResult)
	SELECT	@Counter,
			StepNumber,
			DiceRoll,
			DiceResult
	FROM	dicerolls.DiceRolls
	ORDER BY 1;
	SET @Counter = @Counter + 1;
	
	END--End Loop
SELECT	*
FROM	dicerolls.DiceRollResults
ORDER BY 2 DESC;