Munich votes for new mayor – First run-off election in 36 years

Elections are a beloved but controversial topic worldwide. Feelings are often strong, debates intensify on a daily basis, and positions become polarized. Elections for presidents or parliaments get a lot of attention from both news media and citizens. But I believe local elections can be fascinating as well. That’s why I invite you to take a deeper look into the current elections for a new mayor in Munich, Germany.

Munich was voted the world’s most livable city in 2007 and 2010. Munich is the green city of the Oktoberfest, of beer, of Bavarian veal sausage and of coziness! So what? you might say.

It’s the end of an era: Munich’s mayor for the past 20 years has been the social democrat Christian Ude, who was not allowed to run for office again because of his age. Twelve (!) politicians were hoping they would win the opportunity to replace Ude on March 16. But nobody won a majority of the votes. On March 30, we will have have a historic vote: the first run-off election in 36 years.

But which parties have already had the opportunity to lead Bavaria’s main city for a six-year term as mayor? The graph below shows you the winners since 1952.

Voting results since 1952

Figure 1 – JMP Graph Builder stacked diagram: Voting results (in percentage) of the larger (above) and smaller (middle) parties, as well turnout (in percentage) for all mayor elections since 1952.

Recalling the past

Let’s take a trip down memory lane with Figure 1: Just twice in post-war history has Munich gone “black,” the color of the conservative party CSU (Christian Social Union). Right after the war, Karl Scharnagl of the CSU was installed by the American occupying power as mayor in 1947. Then in 1948, the city council voted for the “red” Thomas Wimmer of SPD (Social Democratic Party) as the new mayor. Since then, the “red” party has led Munich almost continuously. Only once, in 1978, did Erich Kiesel of the CSU benefit from a candidate’s change of SPD party and win in a run-off election. But this lasted for only one legislative period.

The era of Christian Ude started in 1993, and he managed to gain more and more votes over the years despite decreasing turnout. In 2008, he won 66 percent of the vote. Since then, some things have changed. The “black” party has regained some strength in both Germany’s and Bavaria’s elections. Although Munich's citizens still voted for Ude, they became increasingly dissatisfied. Munich has been growing, causing more housing, traffic, child care and education problems. Now 36 years after the last CSU mayor in Munich, that party has a chance to win again.

There were two TV debates before the election on March 16, in which the four main candidates participated: Dieter Reiter (SPD), Josef Schmid (CSU), Sabine Nallinger (Alliance '90/The Greens (GRUENE)) and Michael Mattar (FDP, Free Democratic Party). (Eight other parties also sent their candidates, although they have very little chance of winning the election.) All candidates said they would make it a high priority to work on the issues of growth, housing, traffic, child care and education. So their messages were very similar. Everyone felt it was a time for a change and that Rieter was unlikely to repeat Ude’s results.

And if we look at historic data, which party other than the CSU would have any chance of winning at all? The graph below illustrates the answer to this.

Multivariate Analysis for votes of the main parties in the City Council in the election 2008

Figure 2 – JMP Multivariate Analysis: Showing correlation between the three largest parties with mayor candidates at the election for the city council 2008, colored by eight clusters specifying different voting behavior in the voting districts. All but districts in Cluster 2 show the same correlation trend between SPD and CSU. Districts in Cluster 1 and 4 or Clusters 6, 7 and 8 show almost no correlation between GRUENE and SPD and between GRUENE and CSU, respectively. In case of a moderate or strong relationship between parties, data tend into an increasing or decreasing direction; horizontal, vertical or almost circled distributions show low or no correlation.

It is difficult to predict the outcome when clustering the districts based on their voting behavior and visualizing the relationships between the parties in a multivariate analysis (see Figure 2). The graph shows that most districts voted for either SPD or CSU, although in the districts in Cluster 2, GRUENE were neck-and-neck with CSU. Supposing that many people from GRUENE voted for UDE to prevent a CSU victory, the “Losing UDE” effect might be a game-changer for Nallinger and GRUENE in other districts also.

The here and now

On March 16, 2014, Munich voted, and for the parties, the results were shocking as well as exciting -- especially since the turnout was even lower than in 2008: about 42 percent. Reiter saw a 26 percent drop-off compared to Ude’s result in 2008, ending with 40.4 percent. It was expected that he would lose votes, but it was surprising how much he lost. Schmid received 36.7 percent, a 12.3 percent increase compared to his 2008 result. Because the mayor has to have an absolute majority, Munich must go to the polls again at the end of March for a run-off election between Reiter and Schmid.

Cluster Analysis of voting behaviour in all polling stations for Munich's mayor election 2014

Figure 3 – JMP Cluster Analysis with Dendogramm (left) und Parallel Plot (right): Voting behavior of people at polling station only for Munich’s mayor election 2014 split into nine clusters, Y-axis: absolute number of people voted per district, X-axis: Parties.

Although Nallinger is not a real competitor, it would be grossly negligent not to note Nallinger's valiant fight that achieved 14.7 percent of the vote. This is a much bigger surprise than the neck-and-neck race between Reiter and Schmid. If you take a deeper look into the city districts (Figure 3), you can see a much more differentiated scenario than in 2008, where SPD was almost always on top. In 2014, there are many more districts where CSU is ahead of or in the same range as SPD (clusters 3, 4, 5, 7, 8 and 9), although some districts are without doubt still led by SPD (clusters 1, 2 and 6). However, many more districts are also a close race between CSU and GRUENE for second place (clusters 1 and 2).

Looking at the overall win-loss results, you can see that GRUENE won over many citizens, especially non-voters, as did CSU. In contrast, SPD couldn’t replicate their 2008 results in 2014 and lost support, both to non-voters and their competitors CSU and GRUENE (Figure 4). Out of 150,000 lost SPD voters, almost 96,000 didn’t vote at all.

Bar Charts: Win-Loss analysis Munich's mayor elections 2014 for parties

Figure 4 – JMP Graph Builder: Bar charts of win-loss of votes by party for Munich’s mayor election 2014 (left), voting proportion of base, swing and non-voters (right): From the people who voted for CSU in 2014, 69 percent also did that in 2008, 20.9 percent voted in 2008 for a mayor of a different party, and 10.1 percent were non-voters in 2008.

Nallinger’s mayoral race enriched her result as well as her party’s result for the city council. The Greens achieved more than 15 percent (+2.3 percent). However, the big surprise is that the biggest party has changed from SPD (31.4 percent, minus 8.3 percent) to CSU (35.2 percent, plus 7.5 percent). Now it’s clear that the Ude advantage is gone.


On March 30, the new mayor will finally be selected in a run-off election. Dieter Reiter is less than 6 percent ahead of Josef Schmid. Of course, the Conservatives are hoping their results will give them the win. I’m skeptical. Based on historic political closeness and alliance, the SPD probably can count on the supporters of GRUENE. Nallinger’s result of almost 15 percent of the vote, if added to Reiter's, could create a majority.

At the same time, taking into account that there is no true majority anymore in the city council for SPD and Alliance '90/The Greens, it may be challenging for a “red” mayor (in case he makes it). He would have to take care of all Munich citizens, including many of those who didn't vote for him. In any case, there will not be another era like Ude’s for SPD.

Post a Comment

Extending Capability Animation with an add-in: Interactively exploring the impact of proposed process changes

In quality improvement, it’s common to talk about the "voice of the process" (intrinsic variation in the outputs of an in-control process) and the "voice of the customer" (specification limits that express the range of output values customers will not be unhappy with). A capability analysis compares these two "voices," and summarizes the result via the well-known capability indices Cp and Cpk. These indices are both defined so that larger values correspond to better overall performance. Cp compares the process variation to the width of the spec window, and Cpk is a measure of how well this variation is centered on the target value (mid-way between the spec limits).

When calculating process capability from a normal fit to your data, the Capability Animation feature (available in JMP via Analysis > Quality > Capability or Analysis > Distribution > Capability Analysis) allows you to assess the relationship between Cpk and a proposed shift in the mean or a change in the specification limits.

For example, in Figure 1 we see a normal fit to output from a process that has a standard deviation of 3.51, and for which there is a lower spec limit (LSL) of 70 and an upper spec limit (USL) of 100. Figure 1 also shows the effect of shifting the original mean (72.8, in gray) to a new, higher value (78.12, in blue) that is closer to the target. So if we were (somehow . . . ) able improve the centering of the process in this way, we could expect the Cpk value to increase from 0.27 to 0.77. Note that in this case, the process variation is assumed to be unchanged, so the value of Cp is unchanged also.

Figure 1: Illustration of the built-in Capability Animation platform

My colleague William Zhou and I developed an add-in that extends this built-in Capability Animation to include the option of exploring how changes in the process variation may affect both Cpk and Cp. This add-in is now available in the JMP File Exchange.

You can install the add-in by double clicking on it, and then use it by selecting Add-Ins > Capability Animation. Figure 2 illustrates that reducing the standard deviation of the process from 3.51 to 1.96 (without shifting the mean) increases the Cpk from 0.27 to 0.48 and Cp from 1.42 to 2.54.

This new add-in allows you to perform what-if analysis to determine how any changes to the process mean, process standard deviation or specification limits are likely to impact the values of Cpk and Cp. This can help you to focus your process improvement efforts in the best way, either by re-engineering the process itself, or by renegotiating requirements with customers to get new spec limits.

Figure 2: Illustration of the Capability Animation add-in

Post a Comment

Is your state really the best in college basketball?

As we roll up to another college basketball tournament, there is no shortage of rankings of teams and conferences to help avid fans fill out their brackets. A different kind of ranking that recently caught my eye was “Power Ranking the 50 US States by College Basketball Strength” by Kerry Miller at Bleacher Report.

Miller looked at the past 75 years of college basketball tourney results and ranked the success of all 50 US states.  Some of the rankings made sense, like having a top state ranking for traditional basketball power Kansas.  However, other rankings didn’t make as much sense, such as having low state ranking for Florida, despite recent team tournament successes. So what is really driving these rankings?

To help understand this, we built our own study of historical tournament data available (and used with permission) from Sports Reference. We followed a similar method to the one Miller used, where the majority of the score comes from the tournament performance of basketball schools within each state. Making the tournament was worth one point, getting to the semifinals was worth four points, and then winning the title game was worth 10 points. Finally, we divided this total score by the number of eligible (Division I) teams in each state to provide a weighted score. The logic behind the weight is that some states have more eligible teams that could make the tournament than do other states. While Miller added additional multipliers and even a current component to his final score, our simpler formula returned a similar result that we can now visually explore.

Rather than showing these results in a just a list, we can use JMP 11 graphing, labeling and mapping to analyze these results quickly and visually. We first color-coded the map by the state weighted score and labeled each state by its respective rank. Also, we included a second view to zoom in for a clearer picture of the Northeast states.

View 1: State Weighted Score Rank

View 2: State Weighted Score Rank - Northeast Zoom

For fun, we focused on four state comparisons that would interest many of us on the JMP team.

1) North Carolina vs. Kentucky – How can Kentucky at No. 1 (which is dominated by multiple title winner UK and a recent title win at UL) be ranked ahead of North Carolina at No. 6 (with multiple title winners at UNC, NCSU and Duke)?

2) California vs. Nevada – How can Nevada at No. 5 (with only a brief title stretch at UNLV) be ranked ahead of California at No. 11 (with record-setting titles at UCLA)?

3) New York vs. Michigan – How can Michigan at No. 10 (with strong programs at University of Michigan and MSU) be ranked so far ahead of New York at No. 27 (with strong programs in Syracuse & St. Johns)?

4) Texas vs. Oklahoma – How can Oklahoma at No. 3 (with occasional basketball tourney appearances by OU and OSU) be ranked ahead of Texas at No. 26 (with frequent tournament participation by UT, Texas A&M, Texas Tech, UTEP, Baylor and Houston)?

View 3: State Comparisons

One possibility is that weighting the score by dividing the total score by the number of eligible teams in each state has a huge impact on the final ranking. Let’s see if a new feature in JMP 11 – geospatial mapping  – can help us see this potential effect.

View 4: State Weighted Scores & Eligible Teams

Immediately, we can see the huge impact that having more eligible teams (as seen by the bigger circles) exerts on the weighted score of the state. In our four state rivalries, the higher-ranked state got at times a sizable edge if it had fewer eligible programs in the state.

A JMP 11 scatterplot helps show the negative slope of the lines between our state comparisons. If we draw axis lines at 12.5, eligible teams and a weighted score of 30.00 to create quadrants, we can see that all our states with a large number of eligible schools fall into the bottom right quadrant of the chart.

View 5: Weighted Scores vs. Eligible Teams

So while the weighting wanted to take into account that some states had more opportunities (eligible programs) to place and win the tourney than other states, putting the number of eligible programs in the denominator of the formula had too much of an impact on the overall score and corresponding ranking of the state.

You could argue that the average college basketball fan would rate his or her state’s biggest basketball teams that play in the power conferences (like the North Carolina teams of UNC, NCSU, Duke, and Wake Forest in the ACC Conference) as more influential on state basketball supremacy than the weaker teams who play in mid major or smaller conferences (like Western Carolina, Elon, Appalachian State in the Southern Conference). So perhaps tweaking the formula to show only the raw total score (without any weighting) would be a fairer way to score basketball power. Even if your state has a lot of eligible basketball teams, there are only a few big basketball teams who play in the stronger conferences that really stand a good chance of getting higher points for semifinal and title tourney wins.

The map based on the total score rankings (unweighted) gives a very different view of where the top state basketball powers are. Now our previously down-weighted states of California, Texas, New York and North Carolina all finished much higher in the ranking and actually above their comparison states. Looking again at a scatterplot of our comparison states, we can see the magnitude of the differences as these states have moved up to or near the top-right quadrant and reversed the slope against their comparison state. While this may be a very basic way to calculate the rankings (without any weighting or adjusting), it provides a useful view that seems more in line with conventional knowledge and better represents top team performances within the states.

View 6: State Total Score Rankings

View 7: Total Score vs. Eligible Teams

So the debate about the best way to measure the basketball power of a state will continue. However, we can see that it is important to really understand how rankings are constructed and to explore – visually, if possible – whether they are calculated fairly. So enjoy the basketball games, and may your state's teams go far in the tourney this year!

Post a Comment

7 things to love about JMP Clinical 5.0

New versions of JMP Clinical and Genomics are available starting today, so I wanted to take the opportunity to give a brief overview of some of the new features you’ll come to enjoy with the new release of JMP Clinical 5.0. Below are seven things to love!

1. Risk-Based Monitoring (RBM). If you’ve been following my posts, features for RBM should come as no surprise. If you need to catch up, you can do so here. This new functionality for RBM was developed using the recommendations of TransCelerate BioPharma, so you can be assured that the current implementation should meet the needs of your company.

2. Fraud Detection. Who doesn’t relish the opportunity to play Sherlock Holmes? New and updated features include:

  • Clustering Subjects Across Study Sites. This analysis can help identify patients who have enrolled at two or more study sites within the same trial, or in multiple studies within the same development program. If basic subject information like initials or birth date is unavailable, this analysis allows you to identify subjects who are overly similar based on pre-dosing data. Zero-in on interesting pairs of subjects with matching gender and race, while allowing for minor differences in age, height and weight.
  • Weekdays and Holidays. Previous functionality identified holidays common to the U.S. and Canada. Though some holidays are celebrated globally, new features allow the user to define custom holidays or events (such as severe weather events) that may interrupt normal business activity at clinical sites, taking into account the country of the sites.
  • Perfect Scheduled Attendance. A new screening feature helps identify the particular site-visit combinations that appear unusual. You can identify sites where the patient attendance appears too good to be true or sites with severe scheduling delays.

Figure 1. Digit Preference Volcano Plot

  • Digit Preference. This analysis helps identify any differences in the distribution of the trailing digit between sites for all Findings domains (Figure 1). A screening feature helps identify the particular site-visit-test combinations that appear unusual. This can help identify sites that may tend to round analysis values, improperly conducted procedures (e.g., taking blood pressure manually in lieu of using an automated blood pressure cuff), improperly calibrated equipment, or important differences in subjective measurements (such as reporting clinical signs using a Likert scale), which could suggest that additional training is needed.

Figure 2. Hierarchy of Billiary Disorders SMQs

3. Standardised MedDRA Queries (SMQs). Using your current version of the MedDRA dictionary, JMP Clinical identifies occurrences of SMQs using broad, narrow or algorithm criteria. It will summarize findings in histograms and diagrams (Figure 2), and conduct incidence analyses. Further functionality allows the user to identify which preferred or lower-level terms contributed to the SMQs.

4. Predictive Modeling. A Predictive Modeling Review (Figure 3) feature enables the analyst to quickly drag and drop models (with the ability to tune various options) to define a set of predictive models for testing. With cross-validation and learning curve techniques, users can easily identify the most useful form of the predictive model, identify important covariates and limit problems due to overfitting.

Figure 3. Predictive Modeling Review Builder

5. Subgroup Identification. A new subgroup analysis menu enables users to identify subgroups with enhanced treatment response (or excess safety risk) using either the prune-as-you-go interaction tree or Virtual Twins algorithms.

6. Review Builder. Similar to the Predictive Modeling Review, the Review Builder (Figure 4) enables the analyst to quickly drag and drop reports to define a set of analyses (with the ability to tune various options) that can be run in rapid succession each time the study database is updated. You can easily apply these reviews to other studies or modify them to address any changes required due to design, endpoints or options.

Figure 4. Clinical Review Builder

7. Patient Profiles. Due to popular demand, we have added a tabular display to our patient profiles. Users can customize which columns are summarized, the sort order of the rows, and save these tables to PDF or RTF reports.

As you can see, we’ve been busy here at JMP Life Sciences! We've been working on ways to help you understand and reduce safety and quality risks in your clinical trials, more easily predict important safety and efficacy outcomes and generate clinical reviews, and identify subgroups that may potentially be of greater interest. You can expect a deeper dive for many of these features in the weeks to come.

Post a Comment

Fitting distributions in JMP

Most statistical procedures benefit from understanding the underlying population distribution, or at the very least offering reassurance that our assumptions about those distributions are valid. JMP provides a number of ways to easily explore and investigate distributional assumptions.

In this post, I provide information on fitting continuous or discrete distributions in the JMP Distribution platform. You can also fit, evaluate and model a wide variety of distributions in the Reliability and Survival > Life Distribution platform.

Fitting One Continuous Distribution

  1. From an open JMP data table, select Analyze > Distribution. For this example, I use Hollywood from the JMP Sample Data Directory.
  2. Select one or more continuous variables from Select Columns, click Y, Columns, and then click OK.
  3. Select Continuous Fit from the red triangle for the variable and select a distribution (LogNormal was selected in the example below).
  4. In the resulting fitted distribution output, click on the red triangle and select Goodness of Fit (shown) or Diagnostic Plot to assess the fit of the distribution.

Here, the small p-value and the note provided indicate that the underlying distribution is not LogNormal.

Fitting All Continuous Distributions

To automate the process of fitting and evaluating different continuous distributions, select Continuous Fit, and then All from the red triangle for the variable. JMP will compare available continuous distributions, and will select and fit the best distribution (the distribution with the lowest AICc value). The check boxes under Compare Distributions allow you to explore the fits of the different distributions.

Fitting Discrete Distributions

If the continuous variable contains discrete values, four discrete distributions are available under Discrete Fit.




Note: For more details on fitting continuous or discrete distributions, search for Fit Distributions in the JMP Help or in the book Basic Analysis (under Help > Books).

See More in the Learning Library

Fitting Distributions is one of the many topics covered in the Learning Library. To download or view one-page guides, tutorials, short videos and other resources, visit

Post a Comment

Making #onelesspie with JMP

It's wonderful to see the #onelesspie effort gathering interest, especially on twitter. For my introductory post yesterday, I wanted to focus on encouraging people to improve pie charts everywhere. In this post, I want to show you how I remade the pie chart in JMP.

Here's the original:

The first step is to get the data, and fortunately the data is contained in the wiki page as a table. Plus, there is a link to the source so I could verify that it was the latest and accurate. JMP’s Internet Open menu command is great at reading Wikipedia tables, making the whole data acquisition step quick and painless.

Making a bar chart in Graph Builder was quite straightforward, but there were three tweaks that are not so obvious. Can you spot them?


1) Showing millions of users instead of raw user counts on the axis. For that, I created a formula column that was just

2) Ordering the bars. For ordering, I used the existing rank column, though it would have been easy enough to add such a column. I dragged it into the Merge/Order hot-spot, which is just inside the axis. It’s highlighted in blue when dragging.

After that, I right-clicked to change ascending to descending to get Others at the bottom.

3) Graying the bar for "Others." In Graph Builder, the bar color can be determined by a data column, so I created a new column called "other" with values of "y" for the "Others" row and "n" for the others. Then I used that column in the Color role to get it a separate color for it.

I right-clicked on the legend to change red to gray, and I excluded the legend from the final picture (so the actual values didn't matter).

After some final adjustments for the axis text and sizing, I saved the image as an SVG file and verified it in a Web browser.

Post a Comment

Using JMP to respond to workshop submissions en masse

Greetings, everyone. Sorry for the extra-long blogging hiatus. I have recently been wandering the desert in a self-imposed social media exile (well, mostly) due to some other writing responsibilities. If you’ve been upset over the lack of posts on JMP Clinical or statistics, let me just say that my absence isn’t because of you… it’s all me.

As you all know, JMP has many wonderful features for visualizing your data. However, today I am going to talk about a recent application of JMP Scripting Language (JSL) that helped me write and send emails en masse using the JSL Mail function.

To give a bit of background, I am part of a conference steering committee. Part of our responsibilities involve responding to participants as to whether or not their proposal was accepted for the upcoming workshop. Naturally, this could involve some tedious effort in writing and submitting email, especially if the goal is to customize the email message in such a way as to be most informative for each organizer. Faced with this challenge, I wrote a JSL script to respond to individuals whose proposals were accepted. An artificial example based on dystopian literature is shown in Figure 1. You may need to click on the image to read it.

Figure 1. Artificial Example for the Science is Awesome Workshop

Faced with a rather well-defined structure, the JSL script cycles through each line in the data table to build and customize email text, attach a PDF file of important timelines and send the message (Figure 2). The Mail function is placed within a Try so that if the message fails to send for any reason (such as an incorrectly defined email address), a note is written to the JMP log. You may notice the j-indexed loop towards the bottom. Currently, Mail supports sending to a single recipient, so it is necessary to loop through the number of organizers (up to two in this case). This is the reason for including all organizers in the body of the email. However, if a proposal had only a single organizer, the below script would work (as it does for entry #2).

Figure 2. Script to Respond Positively for the Science is Awesome Workshop

Here are examples of the final messages that are generated and sent by the JSL script:

Hello, Organizers. Thank you for taking the time to submit a session proposal for the Science is Awesome Workshop. We are pleased to inform you that your proposal, 321723: CHICKIENOB NUBBINS: PRACTICAL PROBLEMS WITH GENETIC ENGINEERING, was selected for this year’s workshop.  If needed, please work with the other organizers to identify speakers for the session. Organizers in this session include Crake Smith, HelthWyzer ( and Jimmy Thickney, OrganInc ( Please select a chair from among the organizers, keeping to the attached rules for Workshop participation. Rules for participation and important timelines are attached. We thank you for your interest in participating in the Science is Awesome Workshop! Sincerely, The Steering Committee

Hello, Organizers. Thank you for taking the time to submit a session proposal for the Science is Awesome Workshop. We are pleased to inform you that your proposal, 231234: THE IMPORTANCE OF VACCINATION PROGRAMS TO HALT THE SPREAD OF DISEASE, was selected for this year’s workshop.  If needed, please work with the other organizers to identify speakers for the session. Organizers in this session include Perdita Verney, University of Windsor ( Session chair is currently Adrian Shelley, Earl of Windsor (  Please notify the chair of proposal approval. Rules for participation and important timelines are attached. We thank you for your interest in participating in the Science is Awesome Workshop! Sincerely, The Steering Committee

Hello, Organizers. Thank you for taking the time to submit a session proposal for the Science is Awesome Workshop. We are pleased to inform you that your proposal, 655321: USING THE LUDOVICO TECHNIQUE TO LOWER THE INCIDENCE OF VIOLENT CRIME, was selected for this year’s workshop.  If needed, please work with the other organizers to identify speakers for the session. Organizers in this session include Miles Branom, Ludovico Inc. ( and William Brodsky, Ludovico Inc. ( Session chair is currently Alex DeLarge, Droog Partners (  Please notify the chair of proposal approval. Rules for participation and important timelines are attached. We thank you for your interest in participating in the Science is Awesome Workshop! Sincerely, The Steering Committee

Unfortunately, due to computer viruses of days past, email programs have gotten a bit protective when other programs try to gain access to send an email (Figure 3). In these instances, you can give JMP access for up to 10 minutes, though this still requires approving each email as it is sent. While some may find this button-pushing step tedious, I found it much preferable to writing and tailoring 30+ emails.

Figure 3. Warning from Microsoft Outlook

As a final note, please be aware that the bitness (32- or 64-bit) of JMP currently has to match that of the email software. Please see the note here as to when this issue may be addressed.

Post a Comment

#OneLessPie chart on Pi Day

It's become too easy and common for data visualization practitioners to point to flaws in pie charts and other artless visualizations. Far better is to pair criticism with demonstrated improvements. Kaiser Fung's junkcharts blog is the pioneer in backing words with actions, but there’s nothing stopping the rest of us from making visualization improvements. Let's all use Pi Day as motivation to clean up the data visualization world, one pie chart at a time.

Many critics have written at length against pie charts, so I'll only recap that most criticisms follow from perceptual studies showing we perceive angles and areas less accurately than positions and lengths. I won't go as far as John Tukey, the father of exploratory data analysis, who is often quoted as saying, "There is no data that can be displayed in a pie chart that cannot be displayed better in some other type of chart."

Pie charts have their supporters and can be useful for simple, low-accuracy views or when visually summing adjacent values is important, for instance. However, I think everyone can agree that pie charts fall down in many cases:

  • When there are many levels.
  • When the data doesn't support proportions.
  • When the wedge ordering is random.
  • When distorting effects are added.

These examples below from Wikipedia illustrate these pitfalls.

Collection of Bad Pie Charts

Questionable Pie Charts from Wikipedia

So what’s your Pi Day action? Look for wayward pie charts in your own work or in a public space like Wikipedia and replace them with better visualizations. Then leave a comment here or tweet with the tag #onelesspie to share your accomplishment. Anyone can edit Wikipedia, but if you're not up for editing, you can still move things forward by posting a comment on the "Talk" page for an entry or even contacting the chart author. In either case, be sure to read and follow the Five Pillars of Wikipedia.

You can use this Google image search as a starting point, but it's best if you can narrow it to your own field of expertise, such as semiconductors or genetics. That’s because you need to take a few minutes to understand the intended message of the chart in support of the text before you can improve it. Often the improvement will be a bar chart, but sometimes a table or removal may be better (Wikipedia even has a barnstar award for those "who remove unnecessary information from images or descriptions").

To get you going, here are the steps from my early start at improving the visualization of content languages for Internet websites. The page Languages used on the Internet contains two pie charts. The first one was this pie chart:

While the chart succeeds in showing that more than half of websites have English content (pie charts are good at comparisons to 50% and 25%), the rest of the chart underperforms. Plus, the percentages for this pie chart don’t add up to 100% because some websites use more than one language.

Fortunately, the data is provided in the Wikipedia page and via a link to the original source. Sometimes, you may have to hunt a little for the data. In this case, I still had do some data work because the data in the Wikipedia page was out of date.

I made a bar chart of the data and ran into one of the great problems of data visualization: handling multiple magnitudes of scale. In this case, the English usage dwarfs the others. I thought about showing the top non-English languages, but decided the English dominance was an important part of the message and left it in. To keep the emphasis on English being in more than 50% of websites, I added a label for that bar. For presentation, I added a reference line at 5% where the second-tier languages were and gave "Others" a different styling.

Once the chart was completed, I saved it as resolution-independent SVG (though PNG is fine, too) and uploaded it to Wikimedia Commons using the Upload Wizard. In the description, I added a link to the data to make it easier for the anyone who improves on my chart. Finally, I updated the referencing Languages used on the Internet page with the new image and new data.

I also left a note on the user page of the original chart author explaining my changes in case the author has objections. And I still need to update a few other pages that use the old images.

The process was not simple. I rarely edit Wikipedia and had never uploaded image files before, so it was a bit slow going. But it's nice to try to move things forward. #onelesspie

Post a Comment

JMP & Women's Initiatives Network to award conference registrations

“Know Your Power” is the slogan of the first-ever Women in Statistics conference to be held in Cary, North Carolina, just across the street from SAS headquarters.

The May 15-17 conference is a celebration of women in statistics, and the program promises to highlight achievements and career interests of women who work in – or who plan to work in – statistical fields. Check out the conference website.

The JMP team and WIN, the SAS Women’s Initiatives Network, want to empower three statistics students by helping them attend the Women in Statistics conference. Here’s how you can apply for one of the three free conference registrations.

Write a short essay telling us why you want to attend this conference, what you hope to get out of it and how you plan to use statistics in your career. Submit your entry at

The essays can be no more than 300 words each and need to be submitted before 5:00 p.m. ET April 11, 2014. Essays must be in English and will be judged on the following criteria: how well the author connects personal goals to the conference goals (25 percent); how well the author articulates career plans (25 percent); how inspiring the author's career plans are (25 percent); and how well the essay is written (25 percent). One essay per person, please. No purchase necessary. Void where prohibited. See official rules for complete details.

The contest is open only to students currently enrolled in a degree-granting institution, and both graduate and undergraduate students are welcome to enter. The prize covers conference registration only; travel, meals and any other expenses incurred are the responsibility of the students. Each of the three registrations is valued at $199.

The conference is a great opportunity for students to learn from women who have successfully navigated courses and careers in technical fields. Submit your essay!

Post a Comment

A bit on bootstrapping in JMP Pro

Bootstrapping is a popular resampling method for estimating the sampling distribution of a statistic. While the theory behind resampling methods dates back to Sir R.A. Fisher, bootstrap resampling was first proposed by Bradley Efron in the 1970s. Bootstrapping involves repeatedly sampling from a data set, with replacement, in order to form new data sets of the same size. This approximates the process of taking repeated samples from the target population. The end result is a large number (hundreds or thousands) of bootstrapped samples, which can then be used to estimate the standard error and confidence intervals for nearly any chosen statistic.

Bootstrapping is a nonparametric approach, which doesn’t require distributional assumptions or theoretical calculations and is available for even the most complicated estimator. As a result, you can use it to estimate confidence intervals for wide variety of population parameters. For these same reasons, bootstrapping is also gaining popularity in statistics education, as an intuitive and more direct approach for teaching concepts of inference.

In JMP Pro, bootstrapping is available from nearly all reports. The two exceptions are Time Series and analyses using Restricted Maximum Likelihood (REML).

Bootstrapping in JMP Pro

1. From an analysis platform report window, right-click on the report of interest and select Bootstrap. In this example, I use the Distribution platform and bootstrap the statistics in the Summary Statistics report for a continuous variable.

2. In the Bootstrapping window, click OK. JMP creates a data table (below) with statistics for the original sample in the first row (excluded) and each of the bootstrap samples. The BootID• column identifies the bootstrap sample number.

3. Use the Distribution platform to explore the statistics of interest for the bootstrap samples. Bootstrap percentile confidence intervals for different confidence levels are provided.

Note: Advanced bootstrap methods are available in the Partition and Neural Platforms in JMP Pro. For more details on bootstrapping in JMP Pro, including information on options in the Bootstrapping window (above), search for Bootstrap in the JMP Help or in the book Basic Analysis (under Help > Books).

See More in the Learning Library

Bootstrapping is one of the many topics covered in the JMP Learning Library. To download or view one-page guides, tutorials, short videos and other resources, visit

Post a Comment