Analyzing the first 10 million digits of pi: Randomness within structure

10
digitsofpi Saturday, March 14, 2015, is Pi Day, and this year is a super-special Pi Day! This is your once-in-a-lifetime chance to celebrate the first 10 digits of pi (π) by doing something special on 3/14/15 at 9:26:53. Apologies to my European friends, but Pi Day requires that you represent dates with the month placed first in order to match the sequence 3.141592653.... Last year I celebrated Pi Day by using SAS to explore properties of the continued fraction expansion of pi. This year, I will examine statistical properties of the first 10 million digits of pi. In particular, I will show that the digits of pi exhibit statistical properties that are inherent in a random sequence of integers. Editor's note (17Mar2015): Ken Kleinman remarked on the similarity of the analysis in this article to his own analysis from 2010. I was reading his blog regularly in 2010, so I should have cited him for the chi-square and Durbin-Watson analyses in this post. My apologies to Ken and Nick Horton for this oversight!

Reading 10 million digits of pi

I have no desire to type in 10 million digits, so I will use SAS to read a text file at a Princeton University URL. The following statements use the FILENAME statement to point to the URL:
/* read data over the internet from a URL */
filename rawurl url "http://www.cs.princeton.edu/introcs/data/pi-10million.txt"
                /* proxy='http://yourproxy.company.com:80' */ ;
 
data PiDigits;
   infile rawurl lrecl=10000000;
   input Digit 1. @@;
   Position = _n_;
   Diff = dif(digit);      /* compute difference between adjacent digits */
run;
 
proc print data=PiDigits(obs=9);
   var Digit;
run;
t_digitsofpi1 The PiDigits data set contains 10 million rows. The call to PROC PRINT displays the first few decimal digits, which are (skipping the 3) 141592653.... For other ways to use SAS to download data from the internet, search Chris Hemedinger's blog, The SAS Dummy for "PROC HTTP" and you will find several examples of how to download data from a URL.

The distribution of digits of pi

You can run many statistical tests on these numbers. It is conjectured that the digits of pi are randomly uniformly distributed in the sense that the digits 0 through 9 appear equally often, as do pairs of digits, trios of digits, and so forth. You can call PROC FREQ to compute the frequency distribution of the first 10 million digits of pi and to test whether the digits appear to be uniformly distributed:
/* Are the digits 0-9 equally distributed? */
proc freq data = PiDigits;
tables Digit/ chisq out=DigitFreq;
run;
t_digitsofpi The frequency analysis of the first 10 million digits shows that each digit appears about one million times. A chi-square test indicates that the digits appear to be uniformly distributed. If you turn on ODS graphics, PROC FREQ also produces a deviation plot that shows that the deviations from uniformity are tiny.

A "pi chart" of the distribution of the digits of pi

As an advocate of the #OneLessPie Chart Initiative, I am intellectually opposed to creating pie charts. However, I am going to grit my teeth and make an exception for this super-special Pi Day. You can use the Graph Template Language (GTL) to create a pie chart. Even simpler, Sanjay Matange has written a SAS macro that creates a pie chart with minimal effort. The following DATA step create a percentage variable and then calls Sanjay's macro:
data DigitFreq;
   set DigitFreq;
   Pct = Percent/100; 
   format Pct PERCENT8.2;
run;
 
/* macro from http://blogs.sas.com/content/graphicallyspeaking/2012/08/26/how-about-some-pie/ */
%GTLPieChartMacro(data=DigitFreq, category=Digit, response=Pct,
         title=Distribution of First 10 Million Digits of Pi,
         DataSkin=NONE);
The pie chart appears at the top of this article. It shows that the digits 0 through 9 are equally distributed.

Any autocorrelation in the sequence?

In the DATA step that read the digits of pi, I calculated the difference between adjacent digits. You can use the SGPLOT procedure to create a histogram that shows the distribution of this quantity:
proc sgplot data=PiDigits(obs=1000000);  
   vbar Diff;
run;
digitsofpi3 That's a pretty cool triangular distribution! I won't bore you with mathematical details, but this shape arises when you examine the difference between two independent discrete uniform random variables, which suggests that the even digits of pi are independent of the odd digits of pi. In fact, more is true. You can run a formal test to check for autocorrelation in the sequence of numbers. The Durbin-Watson statistic, which is available in PROC REG and PROC AUTOREG, has a value near 2 if a series of values has no autocorrelation. The following call to PROC AUTOREG requests the Durbin-Watson statistic for first-order through fifth-order autocorrelation for the first one million digits of pi. The results show that there is no detectable autocorrelation through fifth order. To the Durban-Watson test, the digits of pi are indistinguishable from a random sequence:
proc autoreg data=PiDigits(obs=1000000);  
   model Digit = / dw=5 dwprob;
run;
t_digitsofpi2

Are the digits of pi random?

Researchers have run dozens of statistical tests for randomness on the digits of pi. They all reach the same conclusion. Statistically speaking, the digits of pi seems to be the realization of a process that spits out digits uniformly at random. Nevertheless, mathematicians have not yet been able to prove that the digits of pi are random. One of the leading researchers in the quest commented that if they are random then you can find in the sequence (appropriately converted into letters) the "entire works of Shakespeare" or any other message that you can imagine (Bailey and Borwein, 2013). For example, if I assign numeric values to the letters of "Pi Day" (P=16, I=9, D=4, A=1, Y=25), then the sequence "1694125" should appear somewhere in the decimal expansion of pi. I wrote a SAS program to search the decimal expansion of pi for the seven-digit "Pi Day" sequence. Here's what I found:
proc print noobs data=PiDigits(firstobs=4686485 obs=4686491);
   var Position Digit;
run;
t_digitsofpi3 There it is! The numeric representation of "Pi Day" appears near the 4.7 millionth decimal place of pi. Other "messages" might not appear in the first 10 million digits, but this one did. Finding Shakespearian sonnets and plays will probably require computing more digits of pi than the current world record. The digits of pi pass every test for randomness, yet pi is a precise mathematical value that describes the relationship between the circumference of a circle and its diameter. This dichotomy between "very random" and "very structured" is fascinating! Happy Pi Day to everyone!
Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

10 Comments

  1. Rick

    Some of your readers will have another chance to celebrate, and this time it will be an international holiday.

    data _null_;
    call symput('pitime',"21jul2059 00:37:34"dt);
    run;
    %put &pitime.;

    Of course, if you don't want to round, celebrate one second earlier, namely at:
    call symput('pitime',"21jul2059 00:37:33"dt);

    Art

    • Rick Wicklin

      Very cool! My new goal is to make that my time of death. I don't know how to manage it, but John Adams, Thomas Jefferson, and James Monroe all died on the 4th of July so I will try.

  2. Richard Seibert on

    I am curious to see a similar analysis of e, and phi?

    Any possibility? Email me direct if you could show me how to do it.

    Thanks,

    rwsiii

  3. Pingback: Find your birthday in the digits of pi - The DO Loop

  4. Interesting! There something I do not understand though...If the digits of PI have a uniform distribution based on chi-square 2.78 at 10mill, how come if you calculate chi-square further, at around position 86mill you will get a chi-sqare of 8.9 suggesting it is not uniform.
    There is an even smaller chi-square at around 12million (~0.9) so I find it strange that the chi-square would then increase.

    • Rick Wicklin

      Remember that a statistical test at 0.05 significance will reject a true hypothesis for 5% of random samples. It would not surprise me to find regions in the sequence of pi that test as "nonuniform" for a while. Some of these issues are addressed in "How to lie with a simulation," which coincidentally is also about pi.

Leave A Reply

Back to Top