For this analysis we will simulate a high-low card game.

Given a standard randomly shuffled 52 card deck, the card game is played by displaying an initial card and a player must guess if the next card is of higher or lower value. The next card is then displayed, and the player again makes a guess if the next card is of higher or lower value. The game continues until all cards are displayed. If the probability of the next card being higher or lower is equal, the player must make a random guess. At the end of a trial, tally the number of correct guesses. If two cards of the same value are adjacent to each other, the player automatically cannot choose the correct outcome, as the next card is neither higher nor lower in value.

It should be noted here what we mean by the term “guess”. We are creating a computerized simulation where we will count the number of cards available against the current displayed card and make our guess based upon known probabilities. In a situation where the player was human, the guesses would truly be less educated (unless of course you can count cards perfectly).

Now that we understand the rules, let’s get on to the analysis. Given 1 million trials, create a statistical analysis of the outcome. How likely is it to correctly guess all 51 predictions? Note, to correctly guess all 51 predictions, we will need some amount of luck that the shuffled deck does not contain cards of the same value adjacent to each other.

**Analysis**

For this analysis, I used SQL to create the simulation. A language such as Python may be better suited for a simulation like this, but SQL works well once you understand how to make a numbers table and combine it with a sequence and a uniqueidentifier data type.

SQL is a verbose language, so don’t expect it to be nicely compact. But reviewing the script I have included below; you can see how I didn’t need to use conditional statements or variable declarations. I simply could solve the puzzle using set-based solutions, where the only loop needed is to create each of the 1 million trials.

Now that we have that out of the way, let’s first look at the summary statistics given 1 million trails.

Mean | 37.71 |

Median | 38 |

Standard Deviation | 2.79 |

Range | 27 |

Minimum | 24 |

Maximum | 51 |

Looking at the summary statistics we have a good indication that the data follows a normal distribution as the values for the mean, median and mode are very close to each other, and we are lucky enough to have at least one trial where we had a perfect run of 51 correct guesses.

Reviewing a histogram, we can visually determine the data fits a normal distribution. Because it fits a normal distribution, we know that 66% of the results will fit within 1 standard deviation from the mean, 95% will fit within 2 standard deviations of the mean, and 99.7% of the results will fit within 3 standard deviations of the mean.

When looking at a box plot, we can see there are several outliers, both large and small, where the z-score will be above 3 standard deviations.

Looking at the detail, we can see by the z-score how rare the occurrences of these outliers are. Surprisingly, it is more rare to get only 24 guesses correct (4.92 standard deviations from the mean) than it is to get all 51 guesses correct (4.76 standard deviations from the mean).

Remember, by guesses we mean guesses based upon known probabilities of the cards in the deck. Out of 1 million simulations, we were able to guess all 51 guesses correct once, so we can literally say this was a “one in a million” occurrence. And surprisingly, we were able to achieve only 24 correct results twice.

When reviewing a table of areas under the normal curve, any z-score above or below 3.7 has only a .01% chance of occurrence, so we know these occurrences are quite rare. However, given we did run this simulation 1 million times, I would expect to see these occurrences given the law of large numbers.

Result | Count | Z-Score |
---|---|---|

24 | 2 | 4.92 |

25 | 7 | 4.56 |

26 | 28 | 4.20 |

27 | 112 | 3.84 |

28 | 434 | 3.48 |

29 | 1,314 | 3.12 |

47 | 285 | 3.33 |

48 | 50 | 3.69 |

49 | 4 | 4.05 |

50 | 1 | 4.40 |

51 | 1 | 4.76 |

**Conclusion**

Performing these types of simulations is a fun way to learn both coding techniques and statistical analysis. I encourage everyone to try this same approach and create a fun game where you can guess the results and then test via a simulation. When performing data analysis, histograms, box plots, cumulative distributions, and scatter plots are a great resource for creating a visual understanding of your data.

Happy coding!

Here is the SQL code written in SQL Server’s T-SQL.

DROP SEQUENCE IF EXISTS dbo.CardDeckSequence; DROP TABLE IF EXISTS dbo.Numbers; DROP TABLE IF EXISTS dbo.CardShuffle; DROP TABLE IF EXISTS dbo.CardShuffleResults GO CREATE TABLE dbo.Numbers ( MyInteger INTEGER IDENTITY(1,1) NOT NULL PRIMARY KEY, MyValue INTEGER, ); GO INSERT INTO dbo.Numbers (MyValue) VALUES(1); GO 52 CREATE SEQUENCE dbo.CardDeckSequence AS TINYINT START WITH 1 INCREMENT BY 1 MINVALUE 1 MAXVALUE 13 CYCLE; GO CREATE TABLE dbo.CardShuffle ( RandomNumber UNIQUEIDENTIFIER DEFAULT NEWID(), CardNumber TINYINT ); GO CREATE TABLE dbo.CardShuffleResults ( Iteration INTEGER IDENTITY(1,1) PRIMARY KEY, Result INTEGER NOT NULL ); GO DECLARE @Counter1 INTEGER = 1; WHILE @Counter1 <= 1000000 BEGIN TRUNCATE TABLE dbo.CardShuffle; DROP TABLE IF EXISTS dbo.CardShuffle2 DROP TABLE IF EXISTS dbo.CardShuffle3 DROP TABLE IF EXISTS dbo.CardShuffle4 DROP TABLE IF EXISTS dbo.CardShuffle5 INSERT INTO dbo.CardShuffle (CardNumber) SELECT (NEXT VALUE FOR dbo.CardDeckSequence) FROM dbo.Numbers; SELECT Row_Number() OVER (ORDER BY RandomNumber) AS StepNumber, * INTO dbo.CardShuffle2 FROM dbo.CardShuffle ORDER BY 1; SELECT DISTINCT a.StepNumber, a.CardNumber, SUM(CASE WHEN a.CardNumber > b.CardNumber THEN 1 ELSE 0 END) AS LowerCardCount, SUM(CASE WHEN a.CardNumber < b.CardNumber THEN 1 ELSE 0 END) AS HigherCardCount, SUM(CASE WHEN a.CardNumber = b.CardNumber THEN 1 ELSE 0 END) AS SameCardCount INTO dbo.CardShuffle3 FROM dbo.CardShuffle2 a LEFT OUTER JOIN dbo.CardShuffle2 b on a.StepNumber < b.StepNumber GROUP BY a.StepNumber, a.CardNumber ORDER BY 1; SELECT StepNumber, CardNumber, HigherCardCount, LowerCardCount, SameCardCount, LEAD(CardNumber,1) OVER (ORDER BY StepNumber) AS NextCardNumber, (CASE WHEN HigherCardCount > LowerCardCount THEN 'Higher' WHEN LowerCardCount > HigherCardCount THEN 'Lower' WHEN LowerCardCount = HigherCardCount THEN IIF(ABS(CHECKSUM(NEWID()) % 2) + 1 = 1,'Higher','Lower') END) AS Prediction, (CASE WHEN CardNumber < LEAD(CardNumber,1) OVER (ORDER BY StepNumber) THEN 'Higher' WHEN CardNumber > LEAD(CardNumber,1) OVER (ORDER BY StepNumber) THEN 'Lower' WHEN CardNumber = LEAD(CardNumber,1) OVER (ORDER BY StepNumber) THEN 'Same' END) AS Outcome INTO dbo.CardShuffle4 FROM dbo.CardShuffle3; SELECT *, (CASE WHEN Prediction = Outcome THEN 1 ELSE 0 END) AS Result INTO dbo.CardShuffle5 FROM dbo.CardShuffle4; INSERT INTO dbo.CardShuffleResults (Result) SELECT SUM(Result) FROM dbo.CardShuffle5; SET @Counter1 = @Counter1 + 1; END