Many introductory courses in probability and statistics encourage students to collect and analyze real data. A popular experiment in categorical data analysis is to give students a bag of M&M® candies and ask them to estimate the proportion of colors in the population from the sample data. In some classes, the students are also asked to perform a chi-square analysis to test whether the colors are uniformly distributed or whether the colors match a hypothetical set of proportions.

M&M's® have a long history at SAS. SAS is the worldâ€™s largest corporate consumer of M&M's. Every Wednesday a SAS employee visits every breakroom on campus and fill two large containers full of M&M's. This article uses SAS software to analyze the classic "distribution of colors" experiment.

### The proportion of colors for M&M's

The "plain" M&M candies (now called "milk chocolate M&M's") are produced by the Mars, Inc. company. The distribution of colors in M&M's has a long and colorful history. The colors and proportions occasionally change, and the distribution is different for peanut and other varieties. A few incidents from my lifetime that made the national news are:

• 1976: Red M&M's are replaced by orange. This was a big story. "Red dye #2" had been discovered to be a carcinogen. Although Mars did not use this dye in their candies, the company changed colors to alleviate customer concerns.
• 1986: Red M&M's are brought back. Orange stays.
• 1995: The tan color is replaced by a more vivid color. In a promotional campaign, the public is asked to vote for the replacement color. Ten million vote; blue wins in a landslide.
• Late 1990s: The M&M web site lists the distribution of colors. Circa 1997, the color distribution was
30% brown, 20% yellow, 20% red, 10% orange, 10% green, and 10% blue.
Statistician and educators rejoice and publish many papers on the topic.
• 2008: Mars changes the color distribution to
24% blue, 20% orange, 16% green, 14% yellow, 13% red, 13% brown.
Some time later, the proportions were removed from the web site and have not been restored.
• 2017: What is the current distribution of colors? Read on for an interesting story!

### Proportions and chi-square test

The breakroom containers at SAS are filled from two-pound bags. So as to not steal all the M&M's in the breakroom, I conducted this experiment over many weeks in late 2016 and early 2017, taking one scoop of M&M's each week. The following data set contains the cumulative counts for each of the six colors in a sample of size N = 712:

```data MandMs; input Color \$7. Count; datalines; Red 108 Orange 133 Yellow 103 Green 139 Blue 133 Brown 96 ;```

A bar chart that shows the observed distribution of colors in M&M's is shown at the top of this article.

To estimate the proportion of colors in the population, simply divide each count by the total sample size, or use the FREQ procedure in SAS. PROC FREQ also enables you to run a chi-square test that compares the sample counts to the expected counts under a specified distribution. The most recent published distribution is from 2008, so let's test those proportions:

```proc freq data = MandMs order=data; weight Count; tables Color / nocum chisq /* 2008 proportions: red orange yellow green blue brown */ testp=(0.13 0.20 0.14 0.16 0.24 0.13); run;```

The observed and expected proportions are shown in the table to the right. The chi-square test rejects the test hypothesis at the α = 0.05 significance level (95% confidence). In other words, the distribution of colors for M&M's in this 2017 sample does NOT appear to be the same as the color distribution from 2008! You can see this visually from the bar chart: the red and green bars are too tall and the blue bar is too short compared with the expected values.

You need a large sample to be confident that this empirical deviation is real. After collecting data for a few weeks, I did a preliminary analysis that analyzed about 300 candies. With that smaller sample, the difference between the observed and expected proportions could be attributed to sampling variability and so the chi-square test did not reject the null hypothesis. However, while running that test I noticed that the green and blue colors accounted for the majority of the difference between the observed and theoretical proportions, so I decided to collect more data.

### Simultaneous confidence intervals for the M&M proportions

As I explained in a previous article, you can use the sample proportions to construct simultaneous confidence intervals for the population proportions. The following SAS/IML statements load and call the functions from the previous post:

```%include "conint.sas"; /* define the MultCI and MultCIPrint modules */ proc iml; load module=(MultCI MultCIPrint); use MandMs; read all var {"Color" "Count"}; close;   alpha = 0.05; call MultCIPrint(Color, Count, alpha, 2); /* construct CIs using Goodman's method */```

The table indicates that the published 2008 proportion for blue (0.24) is far outside the 95% confidence interval, and the proportion for green (0.16) is just barely inside its interval. That by itself does not prove that the 2008 proportion are no longer valid (we might have gotten unlucky during sampling), but combined with the earlier chi-square test, it seems unlikely that the 2008 proportions are applicable to these data.

### Calling the experts

The published proportions for green and blue do not seem to match the sample proportions from 2008. For this large sample, the published proportion of blue is too large whereas the published proportion of green is too small.

From reading previous articles, I know that the Customer Care team at M&M/Mars is very friendly and responsive. Apparently they get asked about the distribution of colors quite often, so I sent them a note. The next day they sent a breakdown of the colors for all M&M candies.

Interestingly, plain (and peanut) M&M's are now produced at two different factories in the US, and the factories do not use the same mixture of colors! You need to look on the packaging for the manufacturing code, which is usually stamped inside a rectangle. In the middle of the code will be the letters HKP or CLV. For example, the code might read 632GCLV20.

• CLV: The Cleveland plant uses the following proportion of colors for plain M&M's:
Red=0.131, Orange=0.205, Yellow=0.135, Green=0.198, Blue=0.207, and Brown=0.124.
• HKP: The Hackettstown, NJ, plant uses the following proportion of colors for plain M&M's:
Red=0.125, Orange=0.25, Yellow=0.125, Green=0.125, Blue=0.25, and Brown=0.125.

Although I did not know about the manufacturing codes when I collected the data, I think it is clear that the bulk of my data came from the CLV plant. You can create a graph that shows the sample proportions, the 95% simultaneous confidence intervals, and vertical hash marks to indicate the CLV population parameters, as follows:

The graph shows that the observed proportions are close to the proportions from the CLV plant. All proportions are well within the 95% simultaneous confidence intervals from the data. If you rerun the PROC FREQ chi-square analysis with the CLV proportions, the test does not reject the null hypothesis.

### Summary

The experimental evidence indicates that the colors of plain M&M's in 2017 do not match the proportions that were published in 2008.

After contacting the M&M/Mars Customer Care team, I was sent a new set of proportions for 2017. The color proportions now depend on where the candies were manufactured. My data matches the proportion of colors from the Cleveland plant (manufacturing code CLV).

If you are running this analysis yourself, be sure to record whether your candies came from the HKP or CLV plant. If you want to see my analysis, you can download the complete SAS program that analyzes these data.

Educators who use M&M's to teach probability and statistics need to record the manufacturing plant, but this is still a fun (and yummy!) experiment. What do you think? Do you prefer the almost-equal orange-blue-green distribution from the CLV plant? Or do you like the orange-blue dominance from the HKP plant? Or do you just enjoy the crunchy shell and melt-in-your-mouth goodness, regardless of what colors the candies are?

Share

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

1. Robert Allison on

I plotted some M&M color data a while back, and I found it useful to sort them by most-to-least frequent, and to show both the percent and frequency for each color. Here's a link to my graph, and the code, in case that would be of interest to anyone:

2. Wonderful article - thank you Rick! I now wonder if other kinds of M&Ms like milk chocolate and peanut butter have the same distributions.

• Milk chocolate (="plain") and dark chocolate have the same distributions. Peanut, almond, crispy, and minis have different distributions (although minis are VERY close to the HKP distribution for milk chocolate). The distributions of colors for peanut M&M's differ between the HKP and CLV facility.

• Do you really care though??????

• I assume you mean "since they taste so good, does it matter what colors they are?" I'm not a psychologist, but I've read that food that looks appealing is perceived to taste better than food that looks unappealing. But I doubt whether 25% blue versus 20% blue would make a significant difference in our enjoyment of these tasty treats.

3. Kyle Wellmaker on

Just for fun I contacted the Mars corporation and asked for their distributions. Here is the response they gave:

"Our color blends were selected by conducting consumer preference tests, which indicate the assortment of colors that pleased the greatest number of people and created the most attractive overall effect.

On average, our mix of colors for M&M'S CHOCOLATE CANDIES is:

M&M'S MILK AND DARK CHOCOLATE HKP: 25% Cyan Blue, 25% Orange, 12.5% Green, 12.5% Bright Yellow, 12.5% Red, 12.5% Brown.
M&M'S MILK AND DARK CHOCOLATE CLV: 20.7% Cyan Blue, 20.5% Orange, 19.8% Green, 13.5% Bright Yellow, 13.1% Red, 12.4% Brown.

Each large production batch is blended to those ratios and mixed thoroughly. However, since the individual packages are filled by weight on high-speed equipment, and not by count, it is possible to have an unusual color distribution."

I like that last sentence because it helps get the students thinking about whether or not the samples of bags that we got from our local stores is actually a random sample and leads to a talk about how the candies are distributed from the factory to the stores and then into our bellies.

4. Thank you, Dr. Wicklin, for this informative blog. I used it to develop an in-class activity for my Introductory Statistics course. The students enjoyed themselves and were exposed to the Goodness of Fit test and simultaneous confidence intervals, a perfect way to wrap up the term.

• Wonderful! Thanks for sharing.