# High Low Card Game

For this analysis we will simulate a high-low card game.

Given a standard randomly shuffled 52 card deck, the card game is played by displaying an initial card and a player must guess if the next card is of higher or lower value. The next card is then displayed, and the player again makes a guess if the next card is of higher or lower value. The game continues until all cards are displayed. If the probability of the next card being higher or lower is equal, the player must make a random guess. At the end of a trial, tally the number of correct guesses. If two cards of the same value are adjacent to each other, the player automatically cannot choose the correct outcome, as the next card is neither higher nor lower in value.

It should be noted here what we mean by the term “guess”. We are creating a computerized simulation where we will count the number of cards available against the current displayed card and make our guess based upon known probabilities. In a situation where the player was human, the guesses would truly be less educated (unless of course you can count cards perfectly).

Now that we understand the rules, let’s get on to the analysis. Given 1 million trials, create a statistical analysis of the outcome. How likely is it to correctly guess all 51 predictions? Note, to correctly guess all 51 predictions, we will need some amount of luck that the shuffled deck does not contain cards of the same value adjacent to each other.

Analysis

For this analysis, I used SQL to create the simulation. A language such as Python may be better suited for a simulation like this, but SQL works well once you understand how to make a numbers table and combine it with a sequence and a uniqueidentifier data type.

SQL is a verbose language, so don’t expect it to be nicely compact. But reviewing the script I have included below; you can see how I didn’t need to use conditional statements or variable declarations. I simply could solve the puzzle using set-based solutions, where the only loop needed is to create each of the 1 million trials.

Now that we have that out of the way, let’s first look at the summary statistics given 1 million trails.

Looking at the summary statistics we have a good indication that the data follows a normal distribution as the values for the mean, median and mode are very close to each other, and we are lucky enough to have at least one trial where we had a perfect run of 51 correct guesses.

Reviewing a histogram, we can visually determine the data fits a normal distribution. Because it fits a normal distribution, we know that 66% of the results will fit within 1 standard deviation from the mean, 95% will fit within 2 standard deviations of the mean, and 99.7% of the results will fit within 3 standard deviations of the mean.

When looking at a box plot, we can see there are several outliers, both large and small, where the z-score will be above 3 standard deviations.

Looking at the detail, we can see by the z-score how rare the occurrences of these outliers are. Surprisingly, it is more rare to get only 24 guesses correct (4.92 standard deviations from the mean) than it is to get all 51 guesses correct (4.76 standard deviations from the mean).

Remember, by guesses we mean guesses based upon known probabilities of the cards in the deck. Out of 1 million simulations, we were able to guess all 51 guesses correct once, so we can literally say this was a “one in a million” occurrence. And surprisingly, we were able to achieve only 24 correct results twice.

When reviewing a table of areas under the normal curve, any z-score above or below 3.7 has only a .01% chance of occurrence, so we know these occurrences are quite rare. However, given we did run this simulation 1 million times, I would expect to see these occurrences given the law of large numbers.

Conclusion

Performing these types of simulations is a fun way to learn both coding techniques and statistical analysis. I encourage everyone to try this same approach and create a fun game where you can guess the results and then test via a simulation. When performing data analysis, histograms, box plots, cumulative distributions, and scatter plots are a great resource for creating a visual understanding of your data.

Happy coding!

Here is the SQL code written in SQL Server’s T-SQL.

```DROP SEQUENCE IF EXISTS dbo.CardDeckSequence;
DROP TABLE IF EXISTS dbo.Numbers;
DROP TABLE IF EXISTS dbo.CardShuffle;
DROP TABLE IF EXISTS dbo.CardShuffleResults
GO

CREATE TABLE dbo.Numbers
(
MyInteger   INTEGER IDENTITY(1,1) NOT NULL PRIMARY KEY,
MyValue     INTEGER,
);
GO

INSERT INTO dbo.Numbers (MyValue) VALUES(1);
GO 52

CREATE SEQUENCE dbo.CardDeckSequence AS TINYINT
INCREMENT BY 1
MINVALUE 1
MAXVALUE 13
CYCLE;
GO

CREATE TABLE dbo.CardShuffle
(
RandomNumber    UNIQUEIDENTIFIER DEFAULT NEWID(),
CardNumber      TINYINT
);
GO

CREATE TABLE dbo.CardShuffleResults
(
Iteration   INTEGER IDENTITY(1,1) PRIMARY KEY,
Result      INTEGER NOT NULL
);
GO

DECLARE @Counter1 INTEGER = 1;

WHILE @Counter1 <= 1000000
BEGIN

TRUNCATE TABLE dbo.CardShuffle;
DROP TABLE IF EXISTS dbo.CardShuffle2
DROP TABLE IF EXISTS dbo.CardShuffle3
DROP TABLE IF EXISTS dbo.CardShuffle4
DROP TABLE IF EXISTS dbo.CardShuffle5

INSERT INTO dbo.CardShuffle (CardNumber)
SELECT  (NEXT VALUE FOR dbo.CardDeckSequence)
FROM    dbo.Numbers;

SELECT  Row_Number() OVER (ORDER BY RandomNumber) AS StepNumber,
*
INTO    dbo.CardShuffle2
FROM    dbo.CardShuffle
ORDER BY 1;

SELECT  DISTINCT
a.StepNumber,
a.CardNumber,
SUM(CASE WHEN a.CardNumber > b.CardNumber THEN 1 ELSE 0 END) AS LowerCardCount,
SUM(CASE WHEN a.CardNumber < b.CardNumber THEN 1 ELSE 0 END) AS HigherCardCount,
SUM(CASE WHEN a.CardNumber = b.CardNumber THEN 1 ELSE 0 END) AS SameCardCount
INTO    dbo.CardShuffle3
FROM    dbo.CardShuffle2 a LEFT OUTER JOIN
dbo.CardShuffle2 b on a.StepNumber < b.StepNumber
GROUP BY a.StepNumber,
a.CardNumber
ORDER BY 1;

SELECT  StepNumber,
CardNumber,
HigherCardCount,
LowerCardCount,
SameCardCount,
LEAD(CardNumber,1) OVER (ORDER BY StepNumber) AS NextCardNumber,
(CASE   WHEN HigherCardCount > LowerCardCount THEN 'Higher'
WHEN LowerCardCount > HigherCardCount THEN 'Lower'
WHEN LowerCardCount = HigherCardCount THEN IIF(ABS(CHECKSUM(NEWID()) % 2) + 1 = 1,'Higher','Lower') END) AS Prediction,
(CASE   WHEN CardNumber < LEAD(CardNumber,1) OVER (ORDER BY StepNumber) THEN 'Higher'
WHEN CardNumber > LEAD(CardNumber,1) OVER (ORDER BY StepNumber) THEN 'Lower'
WHEN CardNumber = LEAD(CardNumber,1) OVER (ORDER BY StepNumber) THEN 'Same'
END) AS Outcome
INTO    dbo.CardShuffle4
FROM    dbo.CardShuffle3;

SELECT  *,
(CASE WHEN Prediction = Outcome THEN 1 ELSE 0 END) AS Result
INTO    dbo.CardShuffle5
FROM    dbo.CardShuffle4;

INSERT INTO dbo.CardShuffleResults (Result)
SELECT  SUM(Result)
FROM    dbo.CardShuffle5;
SET @Counter1 = @Counter1 + 1;

END```