Saturday, March 14, 2015, is Pi Day, and this year is a super-special Pi Day! This is your once-in-a-lifetime chance to celebrate the first 10 digits of pi (π) by doing something special on 3/14/15 at 9:26:53. Apologies to my European friends, but Pi Day requires that you represent dates with the month placed first in order to match the sequence 3.141592653....
Last year I celebrated Pi Day by using SAS to explore properties of the continued fraction expansion of pi. This year, I will examine statistical properties of the first 10 million digits of pi. In particular, I will show that the digits of pi exhibit statistical properties that are inherent in a random sequence of integers.
Editor's note (17Mar2015): Ken Kleinman remarked on the similarity of the analysis in this article to his own analysis from 2010. I was reading his blog regularly in 2010, so I suspect that I unconsciously remembered his chi-square and Durbin-Watson analyses and used them without properly citing him. My apologies to Ken and Nick Horton for this oversight!
Reading 10 million digits of pi
I have no desire to type in 10 million digits, so I will use SAS to read a text file at a Princeton University URL. The following statements use the FILENAME statement to point to the URL:
/* read data over the internet from a URL */ filename rawurl url "http://www.cs.princeton.edu/introcs/data/pi-10million.txt" /* proxy='http://yourproxy.company.com:80' */ ; data PiDigits; infile rawurl lrecl=10000000; input Digit 1. @@; Position = _n_; Diff = dif(digit); /* compute difference between adjacent digits */ run; proc print data=PiDigits(obs=9); var Digit; run;
The PiDigits data set contains 10 million rows. The call to PROC PRINT displays the first few decimal digits, which are (skipping the 3) 141592653....
For other ways to use SAS to download data from the internet, search Chris Hemedinger's blog, The SAS Dummy for "PROC HTTP" and you will find several examples of how to download data from a URL.
The distribution of digits of pi
You can run many statistical tests on these numbers. It is conjectured that the digits of pi are randomly uniformly distributed in the sense that the digits 0 through 9 appear equally often, as do pairs of digits, trios of digits, and so forth.
You can call PROC FREQ to compute the frequency distribution of the first 10 million digits of pi and to test whether the digits appear to be uniformly distributed:
/* Are the digits 0-9 equally distributed? */ proc freq data = PiDigits; tables Digit/ chisq out=DigitFreq; run;
The frequency analysis of the first 10 million digits shows that each digit appears about one million times. A chi-square test indicates that the digits appear to be uniformly distributed. If you turn on ODS graphics, PROC FREQ also produces a deviation plot that shows that the deviations from uniformity are tiny.
A "pi chart" of the distribution of the digits of pi
As an advocate of the #OneLessPie Chart Initiative, I am intellectually opposed to creating pie charts. However, I am going to grit my teeth and make an exception for this super-special Pi Day. You can use the Graph Template Language (GTL) to create a pie chart. Even simpler, Sanjay Matange has written a SAS macro that creates a pie chart with minimal effort. The following DATA step create a percentage variable and then calls Sanjay's macro:
data DigitFreq; set DigitFreq; Pct = Percent/100; format Pct PERCENT8.2; run; /* macro from http://blogs.sas.com/content/graphicallyspeaking/2012/08/26/how-about-some-pie/ */ %GTLPieChartMacro(data=DigitFreq, category=Digit, response=Pct, title=Distribution of First 10 Million Digits of Pi, DataSkin=NONE);
The pie chart appears at the top of this article. It shows that the digits 0 through 9 are equally distributed.
Any autocorrelation in the sequence?
In the DATA step that read the digits of pi, I calculated the difference between adjacent digits. You can use the SGPLOT procedure to create a histogram that shows the distribution of this quantity:
proc sgplot data=PiDigits(obs=1000000); vbar Diff; run;
That's a pretty cool triangular distribution! I won't bore you with mathematical details, but this shape arises when you examine the difference between two independent discrete uniform random variables, which suggests that the even digits of pi are independent of the odd digits of pi.
In fact, more is true. You can run a formal test to check for autocorrelation in the sequence of numbers. The Durbin-Watson statistic, which is available in PROC REG and PROC AUTOREG, has a value near 2 if a series of values has no autocorrelation. The following call to PROC AUTOREG requests the Durbin-Watson statistic for first-order through fifth-order autocorrelation for the first one million digits of pi. The results show that there is no detectable autocorrelation through fifth order. To the Durban-Watson test, the digits of pi are indistinguishable from a random sequence:
proc autoreg data=PiDigits(obs=1000000); model Digit = / dw=5 dwprob; run;
Are the digits of pi random?
Researchers have run dozens of statistical tests for randomness on the digits of pi. They all reach the same conclusion. Statistically speaking, the digits of pi seems to be the realization of a process that spits out digits uniformly at random.
Nevertheless, mathematicians have not yet been able to prove that the digits of pi are random. One of the leading researchers in the quest commented that if they are random then you can find in the sequence (appropriately converted into letters) the "entire works of Shakespeare" or any other message that you can imagine (Bailey and Borwein, 2013). For example, if I assign numeric values to the letters of "Pi Day" (P=16, I=9, D=4, A=1, Y=25), then the sequence "1694125" should appear somewhere in the decimal expansion of pi. I wrote a SAS program to search the decimal expansion of pi for the seven-digit "Pi Day" sequence. Here's what I found:
proc print noobs data=PiDigits(firstobs=4686485 obs=4686491); var Position Digit; run;
There it is! The numeric representation of "Pi Day" appears near the 4.7 millionth decimal place of pi. Other "messages" might not appear in the first 10 million digits, but this one did. Finding Shakespearian sonnets and plays will probably require computing more digits of pi than the current world record.
The digits of pi pass every test for randomness, yet pi is a precise mathematical value that describes the relationship between the circumference of a circle and its diameter. This dichotomy between "very random" and "very structured" is fascinating! Happy Pi Day to everyone!