High Low Card Game

For this analysis we will simulate a high-low card game.

Given a standard randomly shuffled 52 card deck, the card game is played by displaying an initial card and a player makes a prediction if the next card is of higher or lower value. The next card is then displayed, and the player again makes a prediction if the next card is of higher or lower value. The game continues until all cards are displayed. If the probability of the next card being higher or lower is equal, the player must make a random prediction. At the end of a trial, tally the number of correct predictions. If two cards of the same value are adjacent to each other, the player automatically cannot choose the correct outcome, as the next card is neither higher nor lower in value.

It should be noted here what we mean by the term “prediction”. We are creating a computerized simulation where we will count the number of cards available against the current displayed card and make our prediction based upon known probabilities. In a situation where the player was human, the guesses would truly be less educated (unless of course you can count cards perfectly).

Now that we understand the rules, let’s get on to the analysis. Given 1 million trials, create a statistical analysis of the outcome. How likely is it to correctly guess all 51 predictions? Note, to correctly guess all 51 predictions, we will need some amount of luck that the shuffled deck does not contain cards of the same value adjacent to each other.


Analysis

For this analysis, I used SQL to create the simulation. A language such as Python may be better suited for a simulation like this, but SQL works well once you understand how to make a numbers table and combine it with a sequence and a uniqueidentifier data type.

SQL is a verbose language, so don’t expect it to be nicely compact. But reviewing the script I have included below; you can see how I don’t use conditional statements or variable declarations. I simply solve the puzzle using set-based solutions, where the only loop needed is to create each of the 1 million trials.

Now that we have that out of the way, let’s first look at the summary statistics given 1 million trails.

Mean37.71
Median38
Standard Deviation2.79
Range27
Minimum24
Maximum51

Looking at the summary statistics we have a good indication that the data follows a normal distribution as the values for the mean, median and mode are very close to each other, and we are lucky enough to have at least one trial where we had a perfect run of 51 correct guesses.

Reviewing a histogram, we can visually determine the data fits a normal distribution. Because it fits a normal distribution, we know that 66% of the results will fit within 1 standard deviation from the mean, 95% will fit within 2 standard deviations of the mean, and 99.7% of the results will fit within 3 standard deviations of the mean.

When looking at a box plot, we can see there are several outliers, both large and small, where the z-score will be above 3 standard deviations.

Looking at the detail, we can see by the z-score how rare the occurrences of these outliers are. It is more rare to get a mere 24 predictions correct (4.92 standard deviations from the mean) than it is to get all 51 predictions correct (4.76 standard deviations from the mean).

Out of 1 million simulations, we were able to determine all 51 predictions correctly only once, so we can literally say this was a “one in a million” occurrence. We were able to achieve only 24 correct results twice.

When reviewing a table of areas under the normal curve, any z-score above or below 3.7 has only a .01% chance of occurrence, so we know these occurrences are quite rare. However, given I ran the simulation 1 million times, I would expect to see these occurrences given the law of large numbers.

ResultCountZ-Score
2424.92
2574.56
26284.20
271123.84
284343.48
291,3143.12
472853.33
48503.69
4944.05
5014.40
5114.76

Conclusion

Performing these types of simulations is a fun way to learn both coding techniques and statistical analysis. I encourage everyone to try this same approach and create a fun game where you can guess the results and then test via a simulation. When performing data analysis, histograms, box plots, cumulative distributions, and scatter plots are a great resource for creating a visual understanding of your data.

Happy coding!


Here is the SQL code written in SQL Server’s T-SQL.

DROP SEQUENCE IF EXISTS dbo.CardDeckSequence;
DROP TABLE IF EXISTS #Numbers;
DROP TABLE IF EXISTS #CardShuffle;
DROP TABLE IF EXISTS #CardShuffleResults
GO

CREATE TABLE #Numbers
(
RowNumber   INTEGER IDENTITY(1,1) NOT NULL PRIMARY KEY,
MyValue     INTEGER,
);
GO

INSERT INTO #Numbers (MyValue) VALUES(NULL);
GO 52

CREATE SEQUENCE dbo.CardDeckSequence AS TINYINT
    START WITH 1
    INCREMENT BY 1
    MINVALUE 1
    MAXVALUE 13
    CYCLE;
GO

CREATE TABLE #CardShuffle
(
RandomNumber    UNIQUEIDENTIFIER DEFAULT NEWID(),
CardNumber      TINYINT
);
GO

CREATE TABLE #CardShuffleResults
(
Iteration   INTEGER IDENTITY(1,1) PRIMARY KEY,
Result      INTEGER NOT NULL
);
GO

DECLARE @Counter1 INTEGER = 1;

WHILE @Counter1 <= 1000 --Number of simulations to run
BEGIN

    TRUNCATE TABLE #CardShuffle;
    DROP TABLE IF EXISTS #CardShuffle2
    DROP TABLE IF EXISTS #CardShuffle3
    DROP TABLE IF EXISTS #CardShuffle4
    DROP TABLE IF EXISTS #CardShuffle5

    INSERT INTO #CardShuffle (CardNumber)
    SELECT  (NEXT VALUE FOR dbo.CardDeckSequence)
    FROM    #Numbers;

    SELECT  Row_Number() OVER (ORDER BY RandomNumber) AS StepNumber,
            *
    INTO    #CardShuffle2
    FROM    #CardShuffle
    ORDER BY 1;
    
    SELECT  DISTINCT
            a.StepNumber,
            a.CardNumber,
            SUM(CASE WHEN a.CardNumber > b.CardNumber THEN 1 ELSE 0 END) AS LowerCardCount,
            SUM(CASE WHEN a.CardNumber < b.CardNumber THEN 1 ELSE 0 END) AS HigherCardCount,
            SUM(CASE WHEN a.CardNumber = b.CardNumber THEN 1 ELSE 0 END) AS SameCardCount
    INTO    #CardShuffle3
    FROM    #CardShuffle2 a LEFT OUTER JOIN
            #CardShuffle2 b on a.StepNumber < b.StepNumber
    GROUP BY a.StepNumber,
            a.CardNumber
    ORDER BY 1;
    
    SELECT  StepNumber,
            CardNumber,
            HigherCardCount,
            LowerCardCount,
            SameCardCount,
            LEAD(CardNumber,1) OVER (ORDER BY StepNumber) AS NextCardNumber,
            (CASE   WHEN HigherCardCount > LowerCardCount THEN 'Higher'
                    WHEN LowerCardCount > HigherCardCount THEN 'Lower'
                    WHEN LowerCardCount = HigherCardCount THEN IIF(ABS(CHECKSUM(NEWID()) % 2) + 1 = 1,'Higher','Lower') END) AS Prediction,
            (CASE   WHEN CardNumber < LEAD(CardNumber,1) OVER (ORDER BY StepNumber) THEN 'Higher'
                    WHEN CardNumber > LEAD(CardNumber,1) OVER (ORDER BY StepNumber) THEN 'Lower'
                    WHEN CardNumber = LEAD(CardNumber,1) OVER (ORDER BY StepNumber) THEN 'Same'
                    END) AS Outcome
    INTO    #CardShuffle4
    FROM    #CardShuffle3;
    
    SELECT  *,
            (CASE WHEN Prediction = Outcome THEN 1 ELSE 0 END) AS Result
    INTO    #CardShuffle5
    FROM    #CardShuffle4;
    
    INSERT INTO #CardShuffleResults (Result)
    SELECT  SUM(Result)
    FROM    #CardShuffle5;
    SET @Counter1 = @Counter1 + 1;
    
    END

SELECT * FROM #CardShuffleResults ORDER BY 2 DESC;