Is that the best (distribution) you've got?

As data analysts, we all try to do the right thing. When there is a choice of statistical distributions to be used for a given application, it’s a natural inclination to try to find the “best” one.

But beware...

Fishing for the best distribution can lead you into a trap. Just because one option appears to be best – that doesn’t mean that it’s correct! For example, consider this data set:


What is the best distribution we can use to describe this data? JMP can help us answer this question. From the Distribution platform, we can choose to fit a number of common distributions to the data: Normal, Weibull, Gamma, Exponential, and others. To fit all possible continuous distributions to this data in JMP, go to the red triangle hotspot for this variable in the Distribution report, and choose “Continuous Fit > All”. Here is the result:


JMP has compared 11 potential distributions for this data, and ranked them from best (Gamma) to worst (Exponential). The metric used to perform the ranking is the corrected Akaike Information Criterion (AICc). Lower values of AICc indicate better fit, and so the Gamma distribution is the winner here.

Here’s the catch

This data set was generated by drawing a random sample of size 50 from a population that is normally distributed with a mean of 50 and a standard deviation of 10. The Normal distribution is the correct answer by definition, but our fishing expedition gave us a misleading result.

How often is there a mismatch like this? One way we can approach this question is through simulation. I wrote a small JMP script to draw samples of various sizes from a normally distributed population. I investigated sample sizes of 5, 10, 20, 30, 50, 75, 100, 250, and 500 observations; for each of these, I drew 1,000 independent samples and had JMP compute the fit for all possible continuous distributions. Last, for each sample I recorded the name of the best-fitting distribution, as measured by AICc. (JSL script available in the JMP File Exchange).

The results were quite surprising!


  • Remember, the correct answer in each case is “Normal”. If our fishing expedition was yielding good results across the board, the line for the Normal distribution should be high and flat, hovering near 100%.
  • Instead, the wrong distribution was chosen with disturbing frequency. For sample sizes under 50, the Normal distribution was not even the most commonly chosen. That honor belongs to the Weibull distribution.
  • For a sample size of 5 observations from a Normal distribution, the correct identification was not made a single time out of 1,000 samples.
  • If you want to have at least a 50% chance of correctly identifying normally distributed data by this method, you’ll need more than 100 observations!
  • Even at a sample size of 500 observations, the likelihood of the normal distribution being correctly called the best is only about 80%.

The moral of the story

When comparing the fit of different distributions to a data set, don’t assume that the distribution with the smallest AICc is the correct one. Relative magnitudes of the AICc statistics are what counts. A rule of thumb (used elsewhere in JMP) is that models whose values of AICc are within 10 units of the “best” one are roughly equivalent.* In our first example above, the Gamma distribution is nominally the best, but its AICc is only .2 units lower than that of the Normal distribution. There is not good statistical evidence to choose the Gamma over the Normal.

More generally, as a best practice it is wise to consider only distributions that make sense in the context of the problem. Your own knowledge and expertise are usually the best guides. Don’t choose an exotic distribution that has a slightly better fit over one that makes sense and has a proven track record in your field of work.

*This rule is used to compare models built in the Generalized Regression personality of the Fit Model platform in JMP Pro. See Burnham, K.P. and Anderson, D.R. (2002), Model Selection And Multimodel Inference: A Practical Information Theoretic Approach. Springer, New York.

Post a Comment

Entertaining thing-explaining with Randall Munroe

iconPopular xkcd comic and author, Randall Munroe, delivered a fantastic closing plenary, Complicated Stuff in Simple Words, at JMP Discovery Summit last month. Based on his very popular second book, Thing Explainer: Complicated Stuff in Simple Words, it was hugely entertaining, and we are sharing it as this month’s episode of our web series Analytically Speaking.

Because Randall’s talk was followed by the book signing for his newest book, many didn’t submit feedback on his talk in the JMP Discovery Summit app (probably so they could quickly get in line for the book). I took the comments submitted and used the new Text Explorer platform in JMP 13 to show you the very positive terms from the comments and how the powerful regex handles all those enthusiastic exclamation points!!!!

Below left, you see the most popular terms and phrases listed. And at right, you can see the regular expression editor with default tokenizing options highlighted in different colors under the Word Separator List. These settings can be further customized, but for this simple example, we see that Randall Munroe (using simple words) evoked very positive and enthusiastic comments. For more on text exploration, check out these previous posts.


The "Bang!!! cleaner" nicely handles a sequence of exclamation points for this text exploration.

During the book signing, Randall asked what JMP users do at their organizations. Upon hearing a few answers about some of the “complicated stuff“ JMP users do, he actually flipped to a place in his book where he “explained” what they did!


Many attendees were looking forward to Randall's talk and book signing, and had planned ahead by bringing copies of his other books to be autographed.

At the close of his talk, I tweeted, “Awesome keynote by Randall Munroe @ #jmpDiscoverySummit. His curiosity is contagious.” We hope you will tune in Oct. 12 for some entertaining thing-explaining! Or, you can watch the archive along with other episodes of Analytically Speaking.

Post a Comment

13 things to know about JMP documentation

Writing JMP documentation is a team effort. Susan Conaghan, Michael Crotty, Colleen McKendry, Karen Copeland and I work with JMP developers, technical support, and other subject matter experts to provide you with help so that you can quickly create and interpret your JMP reports and graphs.

There’s so much to know about JMP help. I’ve whittled down the list to 13 things to celebrate JMP 13 and Triskaidekaphilia (love of the number 13).

1. Visit Google to find help with JMP.
Precede a Google search term with “JMP”, and you’ll likely find the help that you need. A Google search leads you to the same help that’s installed with JMP.

2. Search all books in one PDF file.
JMP Documentation Library PDFThe Help > Books menu provides links to JMP documentation in PDF files. Open the JMP Documentation Library PDF file to search all books at once.

3. JMP 13 has a new help system!
The help system in JMP 13 has been completely redesigned with a more modern interface and browser-based features. The help appears in your default browser, where you can easily bookmark pages. And the help provides detailed search results on a page that’s easier to scan.

4. Add JMP books to your e-book library.
Designed for viewing in iBooks on the iPad and on the Kindle, the e-books provide interactive options and let you personalize your reading experience. Why choose e-books over PDF? Here are a few advantages:

  • Search for terms in the e-book or on Wikipedia.
  • Add bookmarks with a single tap.
  • Double-tap a word or drag your finger over a phrase to highlight, underline and enter notes.
  • Look up the definition of a selected word.
  • Carry JMP e-books everywhere, along with your other e-books.

Purchase the e-books at, Google Books, the Apple App Store or the SAS Bookstore for $3.99 to $9.99.

5. JMP provides help in many formats.
To give you the best experience, we provide JMP documentation in several formats:

Each of these resources contains the same information. You choose which format you prefer.

6. Help is only a click away.
Question-mark tool Click the question-mark tool on a JMP report or graph. Help for that item appears in your browser.

Question-mark tool on graph

7. Read about JMP 13 new features.
Open the New Features documentation by selecting Help > New Features.

8. Find JMP Pro topics.
JMP Pro iconIn the documentation, you’ll find a JMP Pro graphic next to each topic that refers only to JMP Pro. Want a single list of JMP Pro features? In JMP, select View > JMP Starter > JMP Pro.

9. Learn about JMP every time you open it.
The Tip of the Day appears each time you open JMP. Tips provide help on topics such as automatically saving files, collapsing blocks of JSL code, and making data tables smaller. If you turned off the Tip of the Day in the General preferences, you can read the tips by selecting Help > Tip of the Day. Email the documentation team if you have any helpful hints for working with JMP. On a related note, the @JMPtips Twitter feed provides new tips on a regular basis.

10. Ask a question about a help topic.
Email icon In JMP 13, you can ask a question about any help topic. Click the envelope button in the upper-right corner of an in-product help page to email the documentation team. In the help on, click the “Email” link in the lower-left corner of the page.

11. Documentation is consistently structured to make finding the information that you need quicker.
The statistical documentation is consistently structured so that each platform is covered in a separate chapter. A chapter begins with a quick description of the platform and a step-by-step example. The launch window and report sections are followed by statistical details at the end of the chapter. References are located at the end of the book. We began restructuring the documentation in this way a few versions ago and have almost finished the project.

12. Examples are based on sample data that are installed with JMP.
Hundreds of sample data tables are installed with JMP -- covering areas such as manufacturing, consumer research, medical studies, and nutrition. These sample data enable us to illustrate features in JMP. If you’re reading an example in the documentation, open the specified sample data table and then follow each step. View the folder of sample data tables by selecting Help > Sample Data Library. You can also view sample data in categories by selecting Help > Sample Data. Read about the new sample data in Michael Crotty's blog.

13. The Scripting Index is a script writer's best friend.
The Scripting Index in the Help menu describes the JSL functions and messages, provides the syntax, and includes one or more example scripts for each entry. Go to the Scripting Index to learn the syntax, click the Topic Help button to read extra details in the JSL Syntax Reference, and then consult the JMP Scripting Guide for more examples.

Enjoy using JMP 13!

Post a Comment

Interactive HTML: Lines, mosaic plots and more for Graph Builder

By now, you may have heard that in JMP 13, you can save Graph Builder reports as interactive HTML, and the most frequently used features remain interactive. These interactive HTML reports can then be viewed using just a web browser.

Getting Graph Builder output to work for the web in JMP 13 involved bringing new features to several graphical elements that had been available in interactive HTML output since JMP 11. Areas and lines can be used to display some of the same information as points but in a different way. Exploring these stacked areas in interactive HTML, you can now see the values along the edge of the area.


The tooltips for lines display the rows that are included in each point along the line as well as information about the values. Graph Builder gives you the ability to customize various attributes of the lines. The example below combines lines using different drawing styles with annotations and the gray reference ranges to create a rich graph.


While the most heavily used graph types and options are exported as interactive HTML, the remaining ones are exported as static images. Contour plots are exported as static images; however, if your data is categorical, Graph Builder produces violin plots, which are exported as interactive HTML. Below you can see the close relationship between the violin plot and another Graph Builder element, the box plot.


What if you want to bin data into categories to explore their distribution? There are a number of ways to do this in Graph Builder. The histogram is available in Interactive HTML in the Distribution platform (as well as options in several other JMP platforms), but now can also be exported to the web after exploring your data in a drag-and-drop manner in Graph Builder in JMP. Below is an example created using Titanic passenger data to examine the distribution of ages.


A mosaic plot is used to examine the relationship between two categorical variables. Cells give informative tooltips regarding the share and number of rows associated with each cell, and cells can be selected with rows being linked to other related charts in the report.


In JMP, you can use Dashboard Builder to create reports with several types of Graph Builder output in the same page -- so people who do not have JMP yet can interactively explore your data. Here, a mosaic plot, bars and histograms are combined to analyze the importance of different goals to schoolchildren.


These are just a few examples of the powerful graphs you can create to explore your data in Graph Builder and share with others using interactive HTML. The graphs shown here as well as a few other examples are available as live interactive HTML files to explore on the web at, but be sure to try your own Graph Builder creations!

Post a Comment

New sample data tables in JMP 13

By now, you’ve probably seen and heard about a lot of the new features that are available in JMP 13 and JMP Pro 13. To help demonstrate those features, the documentation team has added nearly 60 new sample data tables to the Sample Data Library.

As a reminder, you can access the Sample Data Library through the Help menu. There are more than 500 data tables in the Sample Data Library. I’d like to highlight a few of the new ones in this blog post.

The Sample Data Index in JMP 13

The Sample Data Library in JMP 13 includes nearly 60 new sample data tables that help demonstrate new capabilities in the software.

General Purpose

There are two new tables based on the old favorite, Big Class. Big Class Families illustrates new modeling types and Expression columns in a data table. It provides more data (including pictures) of the fictional Big Class students. World Class contains the original Big Class data columns with multiple language translations of the names for the fictional students.

Big Class Families in JMP 13

Big Class Families is a new sample data table in JMP 13 that illustrates new modeling types and includes pictures.

To support the new virtual join feature, we have added some fictional movie rental data that are split across three data tables: Movie Customers, Movie Inventory, and Movie Rentals. These tables are already linked together so you can try out the virtual join feature right away.

We have included Nicardipine, a data table that has shipped with JMP Clinical for a number of years. It is an example of a typical study that can be used to illustrate many exploratory techniques, including Recode, Grouping columns, Graph Builder, Distribution, and Predictor Screening.

For text analysis, we have added Aircraft Incidents, which is a collection of US aircraft incident reports for the year 2001. These reports come from the National Transportation Safety Board (NTSB) and constitute a commonly used data set for text analysis. We also have a simpler data table for trying out the new Text Explorer platform -- Pet Survey is a fictional data table of text responses of pet owners to an open-ended question about their pets.

Fit Model

To support the Generalized Regression personality of Fit Model, we have added Catheters. This data table contains attempts to start intravenous catheters along with many other predictive variables. This table can be used to demonstrate variable selection with binomial, Poisson, and negative binomial response distributions in Generalized Regression.

The VA Lung Cancer IV data table is based on the VA Lung Cancer data table that has been included in JMP Sample Data for many releases. It has been modified to illustrate interval-censored responses. The Generalized Regression personality of Fit Model can now accommodate interval-censored responses.

New covariance structures were added to the Mixed Models personality of Fit Model. Growth Measurements (based on a data set in the SAS PROC MIXED documentation) illustrates the Toeplitz Unequal Variances covariance structure. Although the Unstructured covariance structure isn’t new in JMP13, we added Wafer Quadrants to support an example in the Mixed Models chapter of the documentation.


Consumer Prices contains Bureau of Labor Statistics (BLS) data on monthly prices of consumer goods and illustrates the new Process Screening platform in the Analyze > Screening menu. It also contains a script to split the data by the type of product. The new split table illustrates the Explore Missing Values platform in the Analyze > Screening menu.

Grocery Purchases is used in an example of the new Association Analysis platform in the Analyze > Screening menu.


To illustrate the new Sparse method in the Principal Components platform, a split version of the Adverse Reactions data table, AdverseR Split, has been added.


Drosophila Aging Distances is an example that uses a distance matrix in the Hierarchical Clustering platform in the Analyze > Clustering menu.

Also, for the Hierarchical Clustering platform, Wafer Stacked has been added to illustrate the use of the new spatial clustering measures for wafer maps.

Health Risk Survey contains data from a Centers for Disease Control (CDC) survey on risky behaviors. This table illustrates the new Latent Class Analysis platform in the Analyze > Clustering menu.

Quality and Process

In the Process Capability platform, you can now fit non-normal distributions. We have added two new measurements data tables that can be used to illustrate these distributions: Process Measurements and Tablet Measurements.

Consumer Research

Four new tables (prefixed with “Potato Chip”) illustrate the new MaxDiff platform in the Analyze > Consumer Research menu.

Pizza Combined No Choice illustrates the new No Choice option in the Choice analysis platform. This table is identical to the existing Pizza Combined sample data table, except that there are some missing responses in the Indicator column.

The new Multidimensional Scaling (MDS) platform in the Analyze > Consumer Research menu can be illustrated with the new Flight Distances and San Francisco Crime Distances data tables. The first table contains flight distances in miles between major US cities. The second table is a dissimilarity matrix based on the existing San Francisco Crime sample data table.

Design of Experiments

In the design of experiments (DOE) area, we have added a number of new data tables to illustrate various design methods. Candy Profiles and Candy Survey are used in an example of a MaxDiff experiment.

Coffee Choice Factors, Laptop Factors, and Weld Factors were all added to facilitate automatic loading of factors into JMP for examples in the documentation. This is especially helpful for experiments with many factors.

Binomial Optimal Start is used in an example of a Nonlinear Design.

Extraction3 Data is used in an example of the Fit Definitive Screening platform.

Torque Prior is used in an example of an ALT Design.

Peanut Data and Peanut Factors represent an example of a definitive screening design.

Algorithm Data and Algorithm Factors illustrate a Gaussian Process model that contains categorical factors, which is a new functionality of the Gaussian Process platform for JMP Pro 13.

Finally, Binomial Experiment and Catalyst Design were added to illustrate the new Simulate feature in JMP Pro 13.


In the reliability area, we added eight data tables (prefixed with “CD”) that can be used in the new Cumulative Damage platform. This platform can be used to analyze accelerated life tests where the stress levels are allowed to change over the time of the experiment. There are four pairs of sample data tables; these pairs correspond to the four tabs in the Cumulative Damage launch window. Each tab requires a time-to-event data table and a stress pattern data table.

The Reliability Growth platform launch window now has two new tabs as well. The new Concurrent Systems data table has data that are in the correct format for the Concurrent Systems tab. There are four data tables (prefixed with “Parallel Systems”) that illustrate various scenarios that are appropriate for the Parallel Systems tab.

Finally, we added Device X Lifetimes and Small Production Time to Event. The first of these tables was added to highlight the new ability of the Generalized Regression personality of Fit Model to handle censored data as well as Cox Proportional Hazards models. The second table was added to illustrate the Time to Event data format in the Reliability Forecast platform.

Post a Comment

Best in show at Discovery Summit 2016

Discovery Summit 2016 wrapped up with an announcement of the top-rated papers and posters. The selection is based on the ratings attendees submitted in the conference app. After the last breakout presentations of the conference were finished, we exported the ratings data from the app and brought it into JMP (of course), where we ran an add-in that did the calculation.

We wanted to recognize the top conference content here in the JMP Blog and share it with the larger JMP community. So, below you will find the list with links to the top-rated content, which most of the presenters uploaded to the Discovery Summit 2016 site in the JMP User Community. Contributed papers are those by customers, while invited papers are those by SAS employees. For the first time ever, we gave awards for best student posters.

While you're at the Discovery Summit site, look around. In addition to paper and posters uploaded by presenters, you'll see full-length videos of some plenary and breakout sessions. Many presenters have shared their JMP files, in addition to their slides.

Congratulations to all!

Top 3 in Best Contributed Paper

Top 3: Best Invited Paper

Top 3: Best Poster

Best Student Posters

P.S. If you missed the conference and want a sense of what went on, visit our highlights page for photos, tweets and links to live blogs.

Post a Comment

Interactive HTML: Graph Builder and more in JMP 13

In JMP 11, we built interactive HTML technology into JMP to enable customers to share results. You can publish JMP results to the Web, post them to a corporate intranet or shared drive, or share them with colleagues via e-mail.

In JMP 12, we added support for Bubble Plots, Profilers and Mobile devices. Unlike Flash applications, the interactive HTML reports in JMP can run on iPads and similar devices.

In JMP 13, we've added support for reports created with Graph Builder. The most frequently used features are enabled, for Points, Smoothers, Ellipses, Lines, Bars, Areas, Box Plots, Histograms, Heat maps, Mosaic Plots, Caption Boxes, and Map Shapes. These Graph Builder elements are highlighted in the figure below.


In addition, we've received a number of feature requests from customers over the years. You know who you are ;-)  So beyond Graph Builder, we've added the following in JMP 13:

  1. Dashboard Support
  2. More Profilers
  3. Reference Ranges
  4. Value Labels
  5. Value Ordering
  6. Pinned Tooltips
  7. Hover Pictures

In this post, I'll give a high-level overview of some of these features.

Graph Builder Bar Charts

Bar charts are among the most frequently used graphs. Graph Builder provides a dozen different styles of bar charts, of which six are supported in interactive HTML.

The example below shows the same data drawn with stacked and side-by-side bars. For this market share example, the bar sections all sum to 100%, so arguably the stacked style communicates more clearly.

Interactive HTML

Bullet charts provide a highly space-efficient presentation. These charts were developed specifically for use in dashboards. The example below shows a dashboard for a hospital interested in patients' and doctors' wait times and emergency room occupancy.


Range bars are useful for showing two values. In the stock market example below, hover tips display the high and low prices for each date.

Interactive HTML

Graph Builder supports many combinations of layouts for grouping. The example below shows diamond prices vs. carat weights, grouped by cut, in a wrapped layout. The larger diamonds are more expensive, and the cut also matters. The ideal cut is considered to give the most brilliant sparkle to the diamond, so it generally fetches higher prices.


Bar Charts Outside Graph Builder

Implementing bar charts for Graph Builder gave us a bonus: Bar charts are now also supported in all other JMP reports. In the Partial Least Squares analysis, for example, bar charts interactively display X and Y coordinates.

Interactive HTML

Improved Dashboard Support

Dashboards combine related information in custom layouts for efficient communication. Dashboards are popular on the web, so we've improved support for custom layouts in Interactive HTML. The dashboard below uses a particularly efficient layout with tab controls to show profits per employee for different types of companies.


Besides improving layout, we can support many more kinds of dashboards in JMP 13, because we support many more kinds of graphs. This dashboard of regional air quality combines four separate Graph Builder reports. Along with Bar Charts, Heat maps, Mosaic Plots and Map Shapes are all new graph types supported in JMP 13. My colleagues John Powell and Josh Markwordt will describe these new graphs in future blog posts.


Interactive Examples

In this blog post, I've shown static images and simple animations, but that is no substitute for interacting with the web pages themselves. All of our examples are available as Interactive HTML pages at

 JMP 13 HTML5 Examples

We built these examples with our colleague Michael Goff's excellent new web report generator, available in JMP 13 under the View menu "Create Web Report."

We hope our work helps you, and we look forward to your comments and suggestions!

Post a Comment

Live blog of Randall Munroe keynote

Simple comic drawing of Randall Munroe

Randall Munroe explains "Complicated Stuff in Simple Words" at Discovery Summit 2016.

Randall Munroe of the popular webcomic xkcd takes the stage at Discovery Summit today. He will discuss "Complicated Stuff in Simple Words" in his keynote.

The author of the science question-and-answer blog What If, Munroe was born in Easton, Pennsylvania, and grew up outside Richmond, Virginia. After studying physics at Christopher Newport University, he got a job building robots at NASA Langley Research Center. In 2006, he left NASA to draw comics on the Internet full time, supporting himself through the sale of xkcd t-shirts, prints, posters and books

View the live blog of this speech.

See photos and tweets from the conference at

Post a Comment

Live blog of Chris Nachtsheim keynote

Chris Nachtsheim

Chris Nachtsheim gives a keynote speech "DOE: Is the Future Optimal" at Discovery Summit 2016 on Sept. 22.

Christopher Nachtsheim is the Frank A. Donaldson Chair of Operations Management in the Carlson School of Management at the University of Minnesota. Nachtsheim's teaching and research interests center on the optimal design of industrial experiments, regression and predictive analytics and quality management.

He has co-written several related books, most notably Applied Linear Statistical Models and Applied Linear Regression Models. Nachtsheim has also published over 70 articles in the statistics literature and currently serves as Associate Editor of the Journal of Quality Technology.

His keynote speech at Discovery Summit 2016 is titled "DOE: Is the Future Optimal?"

View the live blog of this speech.

See photos and tweets from the conference at

Post a Comment

Live blog of Tom Lange keynote

Tom Lange

Tom Lange delivers a keynote speech at JMP Discovery Summit on Sept. 21.

Tom Lange is a 37-year veteran of Procter & Gamble, where he founded and directed the modeling and simulation (M&S) group. His M&S team led efforts in consumer modeling, computational chemistry and biology, computer-aided engineering, and production system throughput and reliability.

Over the course of his career, Lange contributed to such projects as the improvement of peanut butter production systems, the development and expansion of a chocolate chip cookie brand, and the quality and reliability of baby diapers.

He now spends his professional time consulting with small and medium enterprises on ways to improve their competitive edge with the latest computer-based modeling and simulation tools.

View the live blog of this speech.

See photos and tweets from the conference at

Post a Comment