5 things you don't know about JMP

The title of this post is provocative, I know. But if you read this list, I'll bet you'll find something that you didn't know about.

Now, if it turns out that you already knew most of these, I do want to know. Take a moment, let me know and add a comment with a favorite thing that you think most people don't know about JMP.

1. Use shortcut and modifier keys to be more efficient

JMP has lots of shortcut keys and keys to modify mouse clicks. Use them to be make your time in JMP more productive. Here are my favorites:

  • Switch tools using single letter keypresses. For example, you can get to the lasso tools with a simple press of the L key. Note that no modifer key is required. Here are the keys associated with each tool: A=Arrow, S=Selection, Z=Zoom, C=Crosshair, L=Lasso, B=Brush, H=Hand, R=Scroller, ?=Help, T=Text Annotate
  • Broadcast commands with Ctrl/Cmd key. When you hold the Ctrl/Cmd key down and manipulate a report window, for example by choosing an option from the red triangle hotspot menu, JMP will send that same command to all similar objects in the report window. For example, in the Bivariate platform, if you hold the Ctrl/Cmd key down as you choose Fit Line from the hotspot menu, JMP will fit a line in all the bivariate plots in the report window. Broadcasting works with lots of different actions, including resizing graphs, changing marker sizes and adjusting axes.
  • Get a dialog from any hotspot menu. If you hold the Alt key down when you click on a red triangle hotspot menu, you'll get a dialog of all the choices in that menu – so you can pick more than one at a time.
  • Get to the Home window quickly. Use Ctrl-1 on Windows or Cmd-2 on the Mac to get to Home window quickly.
  • Find a buried JMP window. On Windows, use Ctrl-Tab to cycle through JMP windows. Use Alt-Tab to cycle through all open windows. The Alt-Tab combination is part of Windows and makes it easy to move quickly between JMP and another Windows application, for copy and pasting, for example.
  • Reveal all JMP Windows. On Windows, you can reveal all JMP windows by pressing F9. This will make it easy to find exactly the window you want.

You can find these keys and more in the Quick Reference Card from Help -> Books -> Quick Reference.

2. Combine report windows using Application Builder

Sometimes you'd like to combine two reports into a single window for ease of analysis or to create something like a dashboard for someone else when you pass a data table to them. Application Builder makes this pretty easy. Start by arranging the report windows on the screen as you'd like them, side-by-side or top-bottom.

From here the method is different for Windows and Macintosh.

For Windows:

Click the checkbox in the lower right of each window. Then click the drop-down menu next to that checkbox on one of the windows and choose Combine Selected Windows.

Alternatively, you can select the windows in the list in the Home Window (click on the first one and shift/ctrl-click on others) and then right click and choose Combine.

For Macintosh:

Choose Window -> Combine Windows... from the menu bar, and you'll get a list of windows to choose from. Select the ones you want and click OK.

Once you've got your new report window, use the hotspot menu at the top to save a script to the data table – or elsewhere if you like. You can also edit the application using  Application Builder to get finer control of the placement and grouping of the report elements.

3. Get data from the web with Internet Open

Many webpages have tables full of data that can be used in JMP. Choose File->Internet Open... and put the address (URL) of the webpage, and JMP will scan the page and find any HTML tables to import.

Try it out with the table in this post from the Discussions forum. Just put this address in the Internet Open... dialog: https://community.jmp.com/message/216850

4. The best formula editor function you're not using is Word()

I've said it before, do I have to say it again? The Word() function is the most useful function for dealing with text strings. If you want some more, Craige@JMP at JMP has a nice list of JSL Character String Functions.

5. Five quick Graph Builder tips

  • Drag an element icons from the palette into the graph to add another element.
  • Order a categorical axis by dragging a continuous variable into the “merge” zone of the axis.
  • With multiple Xs or Ys resize them separately by clicking on the resize zone between them. Bonus tip: Turn off auto-stretching to make a graph bigger than the current window.
  • Some legend items are hidden by default (points and confidence bands); unhide and customize them in Legend Settings.
  • Change the text orientation of the Y Group labels.

So what do you think? Did you know these? Do you have a favorite "hidden" JMP tip to share? Tell me in comments.

Editor's Note: A version of this post first appeared in the JMP User Community.

Post a Comment

Potato chips and ANOVA, Part 2: Using analysis of variance to improve sample preparation in analytical chemistry

In an earlier blog post, I began a two-part series to examine a proposed sample preparation scheme for measuring the weight percentage of sodium in potato chips.  My first post:

  • Introduced the problem.
  • Entered the raw data into JMP.
  • Standardized the format of the data using Standardize Attributes.
  • Used the Stack function to shape our data into a format that is ready for analysis.

In this second post, I will use the Fit Y by X platform in JMP to visualize the data and analyze them using analysis of variance (ANOVA). I will conclude this series by interpreting the ANOVA results to answer our original question about comparing the variation in the two stages of our sample preparation scheme.

Visualizing the Variation

Our goal is to compare the variation between the two stages of sample preparation. Before performing any statistical analysis, let’s visualize the data using a scatterplot with the grand mean and the group-specific means drawn. Then, we will use ANOVA to answer our question. Fortunately, JMP has one platform that allows you to do both things. Under the Analyze menu, choose Fit Y by X.

fit y by x in JMP

In the Fit Y by X platform, choose Weight Percentage as the response (Y), and choose Chip as the factor (X) – view my example below if needed. Notice that JMP automatically recognizes this model as a one-way ANOVA – see the Oneway label appearing in the lower half of the left side of this window. Click OK to run this model.

fit y by x platform

The initial output that you get is just a scatterplot of the data. However, there is a lot more information that you can get from the red triangle menu on the upper-left corner. First, let’s enlarge the markers for the points for easier visualization. Right-click anywhere within the plot to adjust the marker size.

enlarge marker size

For our later statistical analysis, the grand mean (the mean of all weight percentages) and the four group-specific (or chip-specific) means will become important. The grand mean is shown by default. Let’s also visualize the four chip-specific means on this scatterplot. Under the red-triangle menu, go to Display Options, and choose Mean Lines.

adding mean lines in JMP

The lines for the grand mean and the chip-specific means are too narrow to be seen clearly, so let’s widen them. Right-click anywhere in the plot, and go to the Line Width menu. Let’s choose Other to widen the line to 6.

widening lines in JMP

Here is the resulting plot with the wider lines.

drawing the 2 sources of variation on the scatterplot in JMP

Those points and lines are much easier to see. I have manually added some arrows and text in blue and orange using Microsoft Word to partition the two different types of variation that are introduced in the sampling process:

  • Drawing four different chips with slightly different weight percentages of sodium from the bag
  • Drawing three different aliquots from each flask of the homogenized solution for each chip

In short, there is variation between the four chip samples, and there is variation within each chip sample. The total variation in the final measured values of the weight percentages originate from these two sources of variation. (For the sake of brevity and simplicity, I am ignoring the propagation of measurement uncertainty. If sufficiently good equipment is used, this uncertainty should be minimal compared to the variation introduced by these two steps of sampling.) The key question is this: Which step contributes more to the total variation (and, therefore, the cumulative uncertainty) in the measured weight percentages of sodium?

Partitioning Variation Using ANOVA

A good statistical technique for partitioning and comparing sources of variation is analysis of variance, or ANOVA. It uses the concept of sum of squares to rigorously measure variation – whether it is between groups or within groups. This formulation is applied to both the between-group variation and the within-group variation.

To implement ANOVA, click on Means/ANOVA under the red-triangle menu.

means and anova in red triangle menu in JMP

The ANOVA report will appear, and the mean diamonds will be added to the scatterplot by default.  I prefer to look at the plot without the mean diamonds, so I clicked on the red-triangle menu, went to Display Options, and unchecked Mean Diamonds. Here is the resulting output:

anova output in JMP

Interpreting the Results

We are now ready to use the ANOVA results to answer our original question. Look under Analysis of Variance in the above screenshot. The mean of the sum of squares – abbreviated as Mean Square in the output above, quantifies the variation in each stage of the sampling process.

  • The row titled Chip refers to the between-group variation – this is the variation in the weight percentages of sodium between the four chips that were originally drawn from the bag.
  • The row titled Error refers to the within-group variation – this is the variation in the weight percentages of sodium between the aliquots within each chip. It combines this variation from all four chips.

Notice that the between-group variation is much higher than the within-group variation – 0.009373 compared to 0.000596. Under the null hypothesis that the two sources of variation are equal, the ratio of these two mean squares has an F-distribution. You can test whether or not this ratio is significantly bigger than 1. (An F ratio of 1 implies that the two sources of variation are equal.) The P-value of that test (Prob > F) is 0.0010, suggesting that creating four separate homogenized solutions produced much more variation than drawing the 12 aliquots.

Revising the Sample Preparation Process

We have shown that our proposed sample preparation process contributes far more variation in the first stage of sampling than in the second stage. This prompts thinking of a new way to sample the chips before drawing aliquots from their homogenized solutions for measurement.

Here is an alternative sample preparation scheme that reduces the initial sampling variation.

  • Draw the four chips from the bag.
  • Blend the four chips together and homogenize them into one solution.
  • Draw aliquots from this one solution.

blending method

A rigorous mathematical derivation can show that the total variance in this alternative scheme is smaller than the total variance in the first scheme proposed at the beginning of this blog post. For the sake of brevity, I will not show this derivation in this blog post, but you can find a sketch of this derivation on pages 75-78 in Chapter 4 in Miller and Miller (2010).

However, there is a disadvantage to this blending strategy – more aliquots need to be drawn and analyzed. These additional measurements cost more time, equipment or money, so this blending strategy is not always the best method. Past experiences can inform you on the between-group and within-group variances and the costs of additional sampling and measurement. A careful balance between cost and precision can help you to achieve the most cost-effective way to answer your analytical question.


Harris, D. C. (2002). Quantitative chemical analysis (6th edition). Macmillan.

Miller, J. N., & Miller, J. C. (2010). Statistics and chemometrics for analytical chemistry (6th edition). Pearson Education.

Post a Comment

What's new in the second edition of JMP Essentials

JMP Essentials, 2nd edition, book coverAsk any user how they first learned JMP, and there’s a good chance that they’ll cite JMP Essentials – An Illustrated Step-by-Step Guide for New Users as a resource they relied on.

Authors Curt Hinrichs and Chuck Boiler have written a second edition of this very popular book that promises to be even more helpful to new JMP users.

Recently, I sat down with Curt and Chuck to ask them a few questions:

1. The first edition of JMP Essentials taught thousands of users how to use JMP. How is this new book different?

Curt Hinrichs: Well, it’s hard to believe that the first edition, which was based on JMP 8, was published five years ago. And we felt a second edition was a long overdue given we have seen four major releases of JMP since then. There were certainly a number of new features that we needed to introduce to readers of JMP Essentials, and we added 10 new sections to the book to cover these. But what really compelled my thinking in the revision were the workflow improvements that JMP introduced. These may be a little more subtle on the surface, but I find these to be significant timesavers, which help our users stay in the flow with their analyses and discover insights in their data faster.   

Chuck Boiler: JMP Essentials, 2nd Edition, has been updated to take advantage of the latest features of JMP. So one of the most exciting features introduced in JMP 12 is the way JMP handles missing data. In a nutshell, you get more information faster when using JMP Essentials, 2nd Edition, in conjunction with JMP 12, even when some of the data is missing. This is a major improvement because almost no one has complete data despite what most statistics text books say.

2. What is the most common question you get asked by new users, and what do you tell them?

Curt Hinrichs: I work with academic customers, professors mostly, and I think the most common question I hear from them is really about the fundamental shift from a technique-driven to data-driven mind set. That is, "How do I begin to adopt JMP’s data-driven navigation and pedagogy in my course?” This mind set is not exclusive to academic users, and we hope JMP Essentials will continue to help a wide range of curious people explore their data and make interesting discoveries and share them with others. We have organized the contents of JMP Essentials to try to help the reader see and understand this process.  

Chuck Boiler: The question that makes me laugh is “Isn’t JMP just a fancy spreadsheet?” That’s a great starting point for a new user because new analysts can use everything they learned using spreadsheets to become a better data analyst using appropriate software. The major difference in using JMP and the second edition is the faster learning rate it enables. The speed with which you gain knowledge and insight from data should be the metric the data analysis process is measured by. The aim of JMP Essentials is to increase the learning rate to the maximum.

Now for fun…

3. When you are not working, what do you like to do?

Curt Hinrichs: I like to spend time with my family, my kids’ soccer games, golf, tournament poker, woodworking and working out. I also enjoy reading history and historical fiction.

Chuck Boiler: I’m a huge dog lover, and I love walking. I also enjoy cooking without recipes, which is sort of like driving without a license, but less dangerous.

This blog post is part of a series of posts that celebrate the 25 years of SAS Press. We're also marking this milestone by offering customers 25 percent off their entire order from the SAS Bookstore. Use promo code SMPBBP. Discount ends Dec. 31, 2015.

Post a Comment

Dyeing diecast vehicles redux: The results

colored diecast vehicles lined up

What gave us better results with dyeing diecast vehicles? And what made things worse? Photo courtesy of Caroll Co.

Last time, I discussed setting up a new stage of experimentation for dyeing diecast vehicles. Not everything went as planned, but there were some positive results.

I took the alias optimal design from the previous blog post and used a column shuffle to randomize the rows (it was sorted by random blocks originally). Three raters gave a forced ranking as to how well the cars were dyed.

As a quick reminder, these were the factors in the experiment:

  • Dye color (blue/yellow)
  • Dye amount (1 tsp/2 tsp)
  • Additional heat (no/yes)
  • Time (15 minutes/30 minutes )
  • Vinegar (0%/50%)
  • Acetone (0%/50%)

Changes during data collection

When I created the alias optimal design, I had a few acetone and vinegar mixes that were close to 25/25 mixes (20/30 and 22.5/27.5, for example). I manually changed these to 25/25 since there was a line on the measuring cup that made me more comfortable with those measurements.

Once we started the experiment, when I began to examine some of the completed cars, I noticed that plastic bases/tires were starting to melt on them – not a good sign! I knew this was a potential problem with the acetone. Upon looking at the design that corresponded to the melting, it was the half water, half acetone mixes that were most noticeably having issues. The remaining 50% acetones were changed to 25% acetone and 25% vinegar – even if the dye was doing a good job, the melting plastic would eliminate 50% acetone from consideration.

After dyeing the 16 cars in the experiment, there were noticeably better results compared to the first experiment (and noticeably worse when you consider the melting of the car…). Before I let the raters look at the cars, I picked out two of the cars that stood out to me and highlighted their rows in the data table to see if anything popped out at me. Sure enough, those two (different cars and colors) had two factor level combinations in common: additional heat and 50% vinegar. Alas, because of the constraints, there were only four runs that had additional heat (and two that had added heat and vinegar, which also happened to be the two I was interested in), which still wasn’t much extra information.

I had a couple of extra cars available, so I decided to do my own augmentation with two cars having additional heat. I dyed one car in blue with 50% vinegar, and one in yellow with no vinegar (both for 30 minutes to see if the color would stick).

The analysis

I tried a number of different techniques to analyze the average rank, but I’ll go through the basic idea here. My analyses excluded the three runs that had 50% acetone. Not only did I already know that I was not going to be using 50% acetone due to the melting plastic issue, but the raters had a bigger disagreement on those runs because one rater penalized more when he saw issues with the vehicle. However, running the analyses with the excluded runs presented similar results.

Because of the exclusions with the acetone, the acetone*vinegar interaction is not estimable with the main effects. Fitting the main effects model:


Not surprisingly from what I had observed, additional heat was the most important factor. Dye amount and acetone look potentially interesting, but I was also curious in the additional heat * vinegar interaction. Fitting that model:


Note that the additional heat*vinegar interaction is not significant, although the effect itself is large. Most of the main effects are relatively close, but you may notice that the effect of dye amount has dropped. Recall that additional heat*vinegar was not included in the original model. Even in picking the alias optimal design, the -.43 in the alias matrix for the dye amount with the additional heat * vinegar interaction is enough to make a difference with the size of the interaction.

All that said, the one thing that all the modeling attempts I tried agreed with is that additional heat is the biggest driver for the better cars. I usually show Fit Model results at this point in the blog, but I think a picture showing the average rank vs. vinegar, broken up by additional heat is the most compelling:

In particular, the three highest rated cars were 50% vinegar with additional heat. You may notice there is one point that performed reasonably well with 25% vinegar – this car also had 25% acetone, and on closer inspection, the plastic was a bit affected. Recalling that additional heat meant no acetone, I don’t think I’ll be using acetone in the next experiment (my wife didn’t appreciate the acetone smell in the kitchen either, so it would’ve been a hard sell). So vinegar and heat will definitely be in the next experiment. While time and dye amount are not as significant, there may still be some value in carrying them through.

Final thoughts

This experiment and the analysis ended up being a lot more involved than I anticipated. As disappointed as I was to see the acetone not work, I was pretty excited to see results better than expected.

I’m certainly intrigued to continue experimenting with keeping heat on during the dyeing process, and will likely limit it to one color, since I think it’s an easier comparison for the forced ranking.

I’ve still been picking up diecast vehicles for the purpose of more experimentation… so stay tuned for Part III. Thanks for reading!

Post a Comment

Hatching and nesting trends for sea turtles

Baby hatchling sea turtle on the beach

A baby sea turtle that has just hatched

I spend a fair amount of time at the North Carolina coast, specifically at Oak Island. Over the years, I’ve noticed the attention given to saving the sea turtles, but I didn’t get involved with this endeavor until the summer of 2014.

I happened to witness a nest hatching with my mom one evening, and it was amazing to watch all those tiny turtles emerge and march to the water. Did you realize as many as 50-350 hatchlings emerge from one nest? Did you know that only 1 in 1,000 survive?

And so it began for me. I learned more and more  about sea turtles: nesting, hatching, survival rate, types of turtles, etc.. After taking a new job as a JMP software tester, I found out I could use sea turtle data to do my testing. That led to some discoveries in the data and a poster presentation at Discovery Summit 2015 in San Diego in September.

JMP did a wonderful job of reading the data in for me, using the File > Internet Open option in JMP. After that, I was able to pull statistics together with Formula columns, and concatenate tables together for several years. That let me look for  trends over time and make some comparisons. Graph Builder and other platforms, such as Distribution and Fit Y by X, allowed me to find hatching and nesting trends for Oak Island, specifically in areas where there was greater nest density.

For example, here are the summary statistics on all of the nests from the 26 beach programs, showing the hatching success from 2010 to 2015. Hatching success is the number of eggs that actually hatch from a nest. I had 2015 data only through August 15.

A JMP Distribution Graph and Summary Statistics of In Situ Hatch Success

Summary statistics of hatching success

As you can see, the mean hatching success is about 78 percent.

Sometimes, a sea turtle emerges from the water to lay her nest, but then for some reason changes her mind and returns to the water. That’s called a false crawl (see below).

A sea turtle's false crawl track on a beach

Tracks left behind from a false crawl by a sea turtle.

Using Graph Builder, I saw that nests laid and false crawls are closely correlated in any given year.  This bar graph shows the totals for the years 2010-2015:

Bar graph shows nests and false crawls from 2010 to 2015

Nests and false crawls, 2010-2015

Having the latitude and longitude points of all the nests from 2010 to 2015 enabled me to plot the points with Graph Builder, and then use the Street Map Service in the background to produce this density map:

A density map of nesting at Oak Island

A density map of sea turtle nests at Oak Island

Recently, I shared these findings and visualizations with the NC Wildlife Sea Turtle project leader, who suggested that I share my analysis with volunteers next spring to open the new season. And I will do that, after I complete my analysis for 2015!

Post a Comment

Dyeing diecast vehicles with DOE redux

We try some different things in our second experiment with dyeing toy cars.

We try some different things in our second experiment with dyeing toy cars. Photos courtesy of Caroll Co.

In a previous experiment, my father and I changed the color of diecast cars by placing them in fabric dye. A recent visit from my father allowed us to undertake the next experiment in our dyeing journey with some new ideas from my colleague Lou Valente. With the information gained from the first experiment and some new ideas, we were hoping to produce better results than we had in the first experiment.

For this reason, we planned to use only the powdered version of the dye, and concentrate on yellow and blue colors only. We also want to allow for the possibility of adding more vinegar, using acetone, and permit keeping the liquid heated during the dyeing.

How to handle the liquids?

We want the combination of water, vinegar and acetone to result in 1 cup of liquid. This would suggest that I add these factors to the design as mixture factors. However, water can be treated as a slack variable: I want at least half of the liquid composed of water, and I don’t mind if some runs are entirely water. I’m really curious in this experiment if there is any benefit in using acetone or vinegar. I ultimately decided to include vinegar and acetone as (constrained) continuous factors.

The factors

I still wanted to vary the time and dye amount, so here is our final list of factors:

  • Dye color (blue/yellow)
  • Dye amount (1 tsp/2 tsp)
  • Additional heat (no/yes)
  • Time (15 minutes/30 minutes )
  • Vinegar (0%/50%)
  • Acetone (0%/50%)

The constraints

As mentioned above, we want at least half of the liquid to be water. This necessitated adding a constraint so that vinegar + acetone is not more than 50% of the liquid. Because the boiling point of acetone is so low, I also wanted to not add additional heat when there was acetone.

This is straightforward to do using Disallowed Combinations, particularly with the Disallowed Combinations Filter in JMP 12. To disallow the vinegar and acetone taking up more than half of the liquid, I select acetone and vinegar from the filter, and right-clicked one of the two factor names, choosing Combine -> Sum.


And, I used the combined variable to disallow values greater than 50.


I wanted additional heat to disallow acetone > 0, but the filter will give acetone >=0. A handy tip is that if you use the Disallowed Combinations Filter and use Save Script to Script Window, the disallowed combination from the filter is converted into a Disallowed Combinations Script. To handle the acetone and heat issue, I selected additional heat = yes, and put in a small number (0.1) for acetone simply to get the structure of the disallowed combination.


I manually changed the value for acetone to disallow values greater than 0 in the script for when there is additional heat, and reran the script. I’ve also found this tip useful for creating lengthy disallowed combinations where I have similar patterns and just want to change variable names.

The design

In the previous experiment, I included the vehicles in the design using covariates. This time, I could treat them as random blocks in the design phase, which can be done under the Design Generation tab before making the design.

I budgeted 16 runs in random blocks of size 4 (I had four of four different vehicles). In addition to the main effects, I added the interaction for vinegar and acetone, since it allowed me to consider cases where I had some of both mixed with the water. I could now make the design, but…

Choice of optimality criterion

The default optimality criterion for this case would be D-optimality. However, the two constraints for this design are going to cause issues with effects being correlated – not only those in the model, but those that I have not included. I can look at the impact of effects listed in the Alias Terms outline and how they can bias the effects in my model by looking at the Alias Matrix after the design is created.

After some discussion with Bradley Jones and some design evaluations, I ultimately decided to go with an alias optimal design. The estimation efficiency was similar for both designs, with the confidence interval for additional heat about 40% larger (due to the restriction with acetone) than an orthogonal design, while the confidence interval for vinegar and acetone terms are about 160%-180% larger (not surprising because of the constraint).

The D-efficiency of the alias optimal and D-optimal for the fixed effects was about 64%, while the sum of squared terms in the alias matrix was higher in the D-optimal design (around 13 vs. 10). This means the alias optimal design provides more protection for estimating the main effects in the case that important two-factor interactions are missing from the model. I can choose this option from the red triangle menu by selecting Optimality Criterion -> Make Alias Optimal Design.


Now I can set the “Group runs into random blocks of size:” to 4, and the number of runs to 16:


Next time

I’ll share the results next time, but I can say some did not turn out well, such as melted plastic on the base of the left car:


On the other hand, others were more successful than we had hoped:


Thanks for reading!

Post a Comment

Potato chips and ANOVA in analytical chemistry, Part 1: Formatting data in JMP

Sample preparation is a very important part of measuring quantities of substances in analytical chemistry. One benefit of a good sample preparation scheme is the minimization of the cumulative uncertainty for the estimated quantity of interest. This two-part blog series will show how a basic statistical technique called analysis of variance (ANOVA) can assess the uncertainty that is introduced in a sample preparation scheme and offer insights on how it can be improved to minimize the cumulative uncertainty.

The first part of this series will introduce the problem and shape the data into a format that is ready for analysis. The second part of this series will use ANOVA to partition and compare the two sources of variation in a proposed sample preparation scheme.

Measuring Sodium in Potato Chips

A common ingredient in potato chips is table salt, or sodium chloride. Suppose that you want to measure the weight percentage of sodium in a bag of potato chips. Here is one possible scheme for drawing samples of chips out of this bag and preparing them for measurement. In this example, the quantity of interest – usually called the analyte in chemistry – is sodium.

  • Randomly draw and weigh four chips from a bag.
  • Grind each chip into a homogeneous paste.
  • Dissolve each sample of paste in an Erlenmeyer flask of water.
  • Draw three sub-samples (called aliquots) of equal volume from the homogenized sample each flask. Put each aliquot into a volumetric flask.
  • Use an analytical instrument or technique to measure the weight percentage of sodium from each aliquot.
  • Calculate the average of the 12 weight percentages from the 12 aliquots.

The following is a diagram that summarizes this scheme.

sample preparation scheme 1

Image sources: “Erlenmeyer flask” by Danilo Prudêncio Silva and "Volumetric flask" by Lucasbosch - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

Here is a data set of measured weight percentages of sodium from the 12 aliquots; I obtained it from page 736 in chapter 29 of the 6th edition of “Quantitative Chemical Analysis” by Daniel Harris.

raw data

Estimating the true weight percentage of sodium in this bag of potato chips can be done in a relatively straightforward manner – simply pick a good analytical technique, build a calibration curve, and use inverse prediction to obtain a point estimate and a confidence interval. However, this blog post will focus on the variation that is introduced throughout this sample preparation process and how it can be minimized. This process is critical to minimizing the cumulative uncertainty for the final measurements of the weight percentages.

Entering and Transforming the Data

Let’s enter the above data set into JMP.

raw data in JMP

If you prefer to show the first aliquot under Chip 3 as 0.420, you can change this in the Column Properties. Highlight all columns, and then choose Standardize Attributes under the Cols menu.

standardize attributes

In the Attributes drop-down list, choose Format. This activates the Format for modification.

format attributes

Change the format from Best to Fixed Dec. In the newly available Dec field, change the value from 0 to 3.

fixed decimal places

Notice now that the first aliquot under Chip 3 now shows 0.420.

all columns have 3 decimal places

As you will see later in this blog post, the layout of this data set is not ready for analysis in JMP. Instead, let’s stack this data set so that all data values are in one column, and another column indicates which chip each value came from. We will later use the Fit Y by X platform in JMP, and it requires the data to be structured in this stacked format.

Under the Table menu, choose Stack.


Under Select Columns, choose all four chip columns, and then click Stack Columns. This will ensure that all four columns will be stacked. I have also entered the new names of the output table, the stacked data column, and the source label column.

stack platform - choose columns and set output table

Here is what the stacked data set looks like.

stacked data set

The data are now ready for analysis! In the next blog post of this two-part series, I will use the Fit Y by X platform to visualize the data and analyze them using ANOVA.

I will conclude this series by interpreting the ANOVA results to answer our original question about comparing the variation in the two stages of our sample preparation scheme.

Stay tuned!


Harris, D. C. (2002). Quantitative chemical analysis (6th edition). Macmillan.

Post a Comment

Making the world a better place: David Trindade on improving quality, reliability and more

DaveTindadeDavid Trindade, Chief Officer of Best Practices at Bloom Energy, knows a lot about creating value. He is internationally renowned for his expertise in reliability analysis and has been honored with the IEEE Reliability Society 2008 Lifetime Achievement Award.

He co-authored a valued resource for reliability engineers, Applied Reliability, now in its third edition. He has extensive experience enhancing quality and reliability, and crafting innovative analytical solutions to a wide range of industrial applications.

AppliedReliabilityHis most recent work is leading quality and reliability initiatives at Bloom Energy, provider of greener, cleaner, more sustainable power. Really pretty incredible technology when you think about chemically made power with “Bloom boxes” fueling customers like these to lower their energy costs, reduce their carbon footprints and improve their energy security. We may even have Bloom boxes powering our homes one day, thanks to the rocket science contributing to this innovative way of creating energy! You can learn about how Bloom Energy uses JMP in our customer success story.

David has great stories to tell about creating value at Bloom Energy and elsewhere, like these:

  • How quality initiatives took hold, spreading a culture of analytics more broadly.
  • How experimental design gave great insight to process improvement, delivering huge returns.
  • How motivating it has been for engineers to apply design of experiments (DOE), analysis and modeling on an ongoing basis to improve quality and reliability.

David has shared a great deal of his expertise as a consultant and instructor, and he continues to educate colleagues and students. He has been an adjunct instructor at Santa Clara University for more than 30 years. An advocate of DOE, he helps his students appreciate the practical usefulness of DOE, relaying a fun story about a student who designed an experiment to effectively remove chocolate stains from her kids' clothing.

We hope you will join us to hear the many interesting stories David has to tell on  Analytically Speaking on Nov. 18. Or watch the on-demand version along with other episodes of Analytically Speaking, at your convenience.

Post a Comment

Reducing the time it takes to investigate Salmonella cases

Not only is rapid response by public health authorities to Salmonella cases comforting to the community, but it is also critical to containing the spread of the disease to reduce the likelihood of an outbreak.

“Time to investigation,” or how long it takes to investigate a reported Salmonella case, is a key performance metric for the Communicable Disease Prevention & Control program at Santa Clara County Public Health Department in California. To identify the root causes of “slow down” in the response time for Salmonella cases, the department put together a Six Sigma project team to tackle the problem.

I was fortunate to be able to work with the team on getting up to speed in using JMP to handle the analysis needs for the project. I saw firsthand how team members were able to use data to substantiate a change. During the measure and analyze phase of the project, the team established a baseline for time to investigation using control charts in JMP. For their baseline from January to March 2015, the average time to investigation for Salmonella cases was 23.8 hours.

To analyze the barriers significantly associated with time to investigation, the team used the Fit Model platform in JMP. From the baseline period, barriers that were significantly associated with time to investigation for Salmonella cases included problems locating the physician, the volume of reports and staff assigned to advice calls.

Changes were implemented in April 2015. By May 2015, the average time to investigation for Salmonella cases was 11.7 hours, a 50% decrease in time. Team members were able to see and visually communicate the difference using control charts in JMP.  The control charts are posted online.

The program is continuing the Six Sigma project by monitoring data for time to investigation and barriers to investigation. A report of this work was recently published on the Public Health Quality Improvement Exchange (PHQIX). PHQIX is an online community for public health professionals to share information about quality improvement in public health.

The full story is available online at PHQIX.

Post a Comment

The QbD Column: Response surface methods and sequential exploration

George Box and K.B. Wilson introduced the idea of response surface methodology in a famous article[1] in 1951. There were several novel and extremely useful ideas in the article:

  1. Designed experiments can be a great tool in experimentally optimizing conditions.
  2. When feedback is rapid, there are great benefits to breaking up the experimental effort into a sequence of experiments, rather than trying to “learn everything at once”.
  3. The results of one experiment will often stimulate changes in strategy: new factors may be added, old ones may be dropped, factor ranges may move.
  4. The results may indicate that a more complex regression model is needed to adequately reflect the relationship between the factors and the outcomes.

A typical response surface study begins with a screening experiment to identify the most important factors. Small, orthogonal experimental plans and simple regression models are usually used for screening (see our second and third blog posts in this series). Subsequent experiments will depend on the results of the screening experiment. For example, factors that had small effects might be dropped from further consideration. Other factors might be added. The team might decide to shift the levels of some of the factors to get better results for the critical quality attributes (CQA’s). If the results suggest that a first-order model is no longer a good fit to the data, the team expands the design to permit fitting a second-degree regression model. Box and Wilson proposed the central composite design for that purpose, and to this day, it remains a popular choice.

How is response surface methodology used in QbD?  

Many QbD studies are aimed at determining improved production conditions and are able to exploit the benefits of sequential experimentation. These studies can benefit from response surface methodology.

We illustrate the approach and some of the methods via a study reported by Xu, Khan and Burgess[2]. Their goal was to use designed experiments to improve the drug delivery system for a class of molecules by using liposome formulations. Such formulations were expected to bring benefits by improving the ability to target the activity of the molecule in the body. However, previous efforts had yielded methods that were not commercially viable, primarily because an important critical quality attribute (CQA), encapsulation efficiency, was too low.

The experimental team focused on three CQAs in this sequence of experiments: encapsulation efficiency (with a goal of at least 20%); particle size (with a target range of 100-200 nm); and storage stability at 4⁰ C. They also carried out a risk analysis to decide which process factors to study, converging on a list of eight: lipid concentration; drug concentration; extrusion pressure; cholesterol concentration; buffer concentration; hydration time; sonication time; and number of freeze-thaw cycles.

What design was used for factor screening? 

The first experiment was aimed at finding the most important factors from the list of eight. The team wanted a small, economical experiment, and so chose to use a 12-run Plackett and Burman (PB) experiment. This is an orthogonal experiment that can screen up to 11 factors, each at two levels. Like most PB designs, the 12-run design does not completely alias two-factor interactions with main effects. Sometimes this has beneficial effects for screening, enabling detection of an especially large interaction when there are only two or three active factors. The team also added three center points, which enables a pure-error estimate of variability and a check on the need for pure quadratic terms.

It is easy to generate the PB design in JMP using the Screening Design platform within DOE. After entering the factors, check the box for choosing from a list of orthogonal designs and the 12-run PB design is at the top of the list, due to its economical run size.   After selecting this design, there is an option to add center points.

The team looked at three immediate outcomes from the experiment: encapsulation efficiency (EE); particle size; and zeta potential. In addition, each formulation was diluted and divided into samples for storage, half at 4⁰ C and half at 37⁰ C. These samples were tested at predetermined times (up to 24 months) for loss of active drug during storage.

What were the results of the data analysis?

The 15 experimental runs had EEs that ranged from 8.2% to 36.5%. Figure 1 shows a summary of the data analysis for EE. Lipid concentration was clearly the most important factor – increasing the lipid concentration led to higher EE. Drug concentration was also highly significant, with a negative effect on EE. None of the other factors reached statistical significance, and all had effects that were quite small compared to those of the two strong factors. The fit to the data was very good. The residual standard deviation was 0.87% (quite small compared to the range of EE results), and there was no evidence of lack of fit.

Figure 1. Factor effects on EE in the initial PB experiment.

Figure 1. Factor effects on EE in the initial PB experiment.

The analysis of particle size focused on the mean particle size for each experimental run. These means were all between 160 and 176 nm, well within the acceptable range. None of the factors had strong effects on mean particle size, suggesting that this outcome is highly robust to the settings of the factors in the experiment. Particle size also varies within experimental runs, and the team measured the standard deviation. The SDs were all 3.1 nm or less, so that particle size was actually quite uniform within runs.

The zeta potential is a measure of the magnitude of the electrostatic or charge repulsion/attraction between particles, and known to affect stability. All the process runs in the PB design had zeta potential between 61 and 76. None of the factors had a statistically significant effect on the outcome.

Figure 2 shows the JMP Profiler for the three immediate responses. The strong effects of lipid concentration and drug concentration on EE are evident in Figure 2, as are the weak effects of all the factors on both mean particle size and zeta potential.

The study team was also concerned about storage stability. They assessed stability by measuring drug leakage over a two-year storage period. For the samples stored at 4⁰ C, there was almost no leakage. So cold storage led to excellent stability regardless of the process factor settings. The samples stored at 37⁰ C did suffer from drug leakage, with an average loss of 6.6% after just two weeks of storage. Xu et al. did not present the data on leakage. The summary analysis in their article shows that the factor with the strongest effect on leakage was the lipid concentration; the high setting of 120 for this factor led to a decrease from 6.6% to 3.4%. The use of higher lipid concentrations was thus helpful both for increasing EE and for improving storage stability. The primary conclusions of the storage analysis were that it is important to store at cold temperatures and, when that is done, the drug remains stable for up to two years regardless of the factor settings.

Figure 2. Profiler showing the factor effects on EE, particle size and zeta potential, from the main effects model.

Figure 2. Profiler showing the factor effects on EE, particle size and zeta potential, from the main effects model.

Can we identify any interactions?

We pointed out earlier that one of the advantages of the PB design is that, when there are few active factors, it may be possible to identify an important two-factor interaction. One useful strategy is to look at all the interactions between two factors with active main effects, using stepwise regression to decide which interactions have the most predictive power. The drug delivery experiment found just two active factors, so the interaction between them is the only candidate. Figure 3 shows the results from fitting the corresponding model. The interaction is strongly significant, much more so than any of the other six main effects. This is a solid indication that there are second-order effects associated with the two dominant factors.

Note that the main effect estimates do not change from those in Figure 1 – this is because of the orthogonality of the Plackett-Burman design. When the design is projected onto two factors, as in this analysis, it has three replicates of each of the four combinations of those factors, so adding their interaction does not affect the orthogonality. However, when more factors and/or interactions are included, the model is no longer orthogonal, and the effect estimates will change.

 Figure 3. Model for EE with main effects of lipid concentration, drug concentration and their interaction.

Figure 3. Model for EE with main effects of lipid concentration, drug concentration and their interaction.

How did the study proceed?

The initial experiment clearly pointed to lipid concentration and drug concentration as the two process factors that affect EE. The team decided to focus on these two parameters in the second phase of the study, holding all the other factors at default levels. They wanted to explore the possibility of curvature in the relationship, and so chose to use a central composite design (CCD), which allows fitting a full second-order model. The CCD has three types of experimental runs. One set is a two-level factorial. Another set consists of center points. The third set has “axial points,” in which one factor is set at two extreme levels and all the others are at their center settings. The extreme levels can be the same ones used for the factorial points (in which case the CCD has factors at three levels) or can be different (in which case the factors have five levels).

In the drug delivery study, the team used five levels for each factor. When five levels are used, the axial points are usually set to levels outside those for the factorial points. So it is also common to extend the ranges of the factors. The experimental range for lipid concentration was stretched upwards from that used in the PB experiment (30-160 instead of 30-120), and the range for drug concentration was extended in both directions (0.5-8 instead of 1-5). The experiment included 12 runs: four factorial points, four axial points (two for each factor) and four center points.

Other strategies can also be used to augment a screening design and fit a higher-order model. When there are more than two or three active factors, it is often possible to find expanded designs that are smaller, hence more economical, than the CCD. One useful option is to apply the custom design platform in JMP for a full quadratic model. The I-optimal design is a good choice here, with small run size and good ability to estimate the response function throughout the factor space.

What were the results of the CCD?

The results for EE in the second phase of the study were from 9% to 41%. The low EE occurred when the lipid concentration had its low value of 30, matching the findings from the initial experiment. The CCD is specifically designed to let us fit a full second-order model using a relatively small number of experimental runs. That model includes linear and quadratic main effects for each of the factors and all two-factor interactions. We used JMP to fit the second-order model to the data from the drug delivery system experiment via the Fit Model menu. To construct the model effects, choose the two experimental factors and then click on the Macros option and choose Response Surface. Figure 4 shows the results.

Figure 4. Second-order model fit to the CCD data.

Figure 4. Second-order model fit to the CCD data.

The two factors have strong linear effects, much as in the original screening experiment. The figures show scaled estimates and, because the factor ranges were wider in the CCD, the coefficients relate to the wider scale. The change in ranges results in larger coefficients in the CCD. Both factors also have significant pure quadratic effects. The interaction effect is similar in magnitude to that in the original experiment, but opposite in sign. Again, it is quite possible that this relates to the fact that the model is being fitted to a wider range in the factor space. The interaction effect does not achieve statistical significance here, and that is probably due to the small size of the factorial portion of the CCD. In this experiment, only four observations contribute to estimating the interaction; in the original PB experiment, all 12 observations were relevant.

The Profiler (in Figure 5) is very helpful for “combining” the linear and quadratic effects of the two factors. We can see that for both factors, the dominant effect is the linear effect. The estimated effects are monotone throughout the region of experimentation. The Profiler has both factors set at their optimal settings, within the range of the experiment, with an estimated level of EE equal to 46.9. The quadratic effects suggest that the effects are stronger at low levels of each factor than at high levels. Hence, higher levels of lipid concentration might produce even higher EE, but the marginal benefit is decreasing. Lower drug concentrations also generated higher EE, and there the gain from further reduction could be substantial. Of course, we always need to be careful about extrapolating outside the experimental range, and any proposal to use more extreme settings should be tested in further experimentation.

Figure 6 shows a contour plot of predicted EE levels based on the CCD. The highest predicted values are obtained with high lipid concentrations and low drug concentrations. Using high lipid concentrations gives good predicted values for the full range of drug concentrations that were included in the experiments.

Figure 5. Profiler plot for the CCD.

Figure 5. Profiler plot for the CCD.


Figure 6. Contour plot for predicted EE using the model from the CCD.

Figure 6. Contour plot for predicted EE using the model from the CCD.

How did the team verify the models?

To further test the validity of the model for predicting EE, the team ran a number of additional experiments, varying the settings of lipid concentration and drug concentration. For example, drug concentrations of 1% and 5% were matched with several lipid concentrations ranging from 30% to 150%. Some of these combinations are at the border of the region tested in the CCD. A number of other combinations within the domain of the CCD were also tested. The results showed excellent agreement between the new data and the predictions based on the CCD.

What did the team conclude?

The study team used the experiments to derive a design space for the specific active ingredient used in this project. They were convinced that the conclusions would be quite general and would prove relevant to many other hydrophilic agents. Thus, they saw these experiments as the springboard to developing effective procedures for many additional settings. The good results regarding storage at low temperature were also highly encouraging. They showed that the product could remain stable at low temperatures for up to two years, without the need to make adjustments that would complicate and add expense to the production process.

Next in this series

The next blog posts in this series will cover nonlinear designs, split plot design and robust designs, in the context of QbD.


[1] Box, G. and Wilson, K. (1951), On the Experimental Attainment of Optimum Conditions, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 13, No. 1, pp. 1-45.

[2] Xu, X., Khan, M. and Burgess, D. (2012), A quality by design (QbD) case study on liposomes containing hydrophilic API: II. Screening of critical variables, and establishment of design space at laboratory scale, International Journal of Pharmaceutics, 423, pp. 543– 553.

About the Authors

This blog post is brought to you by members of the KPA Group: Ron Kenett, David Steinberg and Benny Yoskovich.

Ron Kenett

Ron Kenett

David Steinberg

David Steinberg

Benny Yoskovich

Benny Yoskovich


Post a Comment