Proper and improper use of Definitive Screening Designs (DSDs)

In 2011, my colleague Prof. Chris Nachtsheim and I introduced Definitive Screening Designs (DSDs) with a paper in the Journal of Quality Technology. A year later, I wrote a JMP Blog post describing these designs using correlation cell plots. Since their introduction, DSDs have found applications in areas as diverse as paint manufacturing, biotechnology, green energy and laser etching.

When a new and exciting methodology comes along, there is a natural inclination for leading-edge investigators to try it out. When these investigators report positive results, it encourages others to give the new method a try as well.

I am a big fan of DSDs, of course, but as a co-inventor I feel a responsibility to the community of practitioners of design of experiments (DOE) to be clear about their intended use and possible misuse.

So when should I use a DSD?

As the name suggests, DSDs are screening designs. Their most appropriate use is in the earliest stages of experimentation when there are a large number of potentially important factors that may affect a response of interest and when the goal is to identify what is generally a much smaller number of highly influential factors.

Since they are screening experiments, I would use a DSD only when I have four or more factors. Moreover, if I had only four factors and wanted to use a DSD, I would create a DSD for six factors and drop the last two columns. The resulting design can fit the full quadratic model in any three of the four factors.

DSDs work best when most of the factors are continuous. That is because each continuous factor has three levels, allowing an investigator to fit a curve rather than a straight line for each continuous factor.

When is using a DSD inappropriate?

Graph shows the comparative power of an optimal split-plot design vs. a Definitive Screening Design created by sorting the hard-to-change factor.

Here, the optimal split-plot design dramatically outperforms the Definitive Screening Design created by sorting the hard-to-change factor, wp. See point 4) below.

1) When there are constraints on the design region

An implicit assumption behind the use of DSDs is that it is possible to set the levels of any factor independently of the level of any other factor. This assumption is violated if a constraint on the design region makes certain factor combinations infeasible. For example, if I am cooking popcorn, I do not want to set the power at its highest setting while using a long cooking time. I know that if I do that, I will end up with a charred mess.

It might be tempting to draw the ranges of the factors inward to avoid such problems, but this practice reduces the DSD’s power to detect active effects. It is better to use the entire feasible region even if the shape of that region is not cubic or spherical.

2) When some of the factors are ingredients in a mixture

Similarly, using a DSD is inappropriate if two or more factors are ingredients in a mixture. If I raise the percentage of one ingredient, I must lower the percentage of some other ingredient, so these factors cannot vary independently by their very nature.

3) When there are categorical factors with more than two levels

DSDs can handle a few categorical factors at two levels, but if most of the factors are categorical, using a DSD is inefficient. Also, DSDs are generally an undesirable choice if categorical factors have more than two levels. A recent discussion in The Design of Experiment (DOE) LinkedIn group involved trying to modify a DSD to accommodate a three-level categorical factor. Though this is possible, it required using the Custom Design tool in JMP treating the factors of the DSD as covariate factors and adding the three-level categorical factor as the only factor having its levels chosen by the Custom Design algorithm.

4) When the DSD is run as a split-plot design

It is also improper to alter a DSD by sorting the settings of one factor so that the resulting design is a split-plot design. For the six factor DSD, the sorted factor would have only three settings. There would be five runs at the low setting, three runs at the middle setting and give runs at the high setting. Using such a design would mean that inference about the effect of the sorted factor would be statistically invalid.

5) When the a priori model of interest has higher order effects

For DSDs, cubic terms are confounded with main effects, so identifying a cubic effect is impossible.

Regular two-level fractional factorial designs and Plackett-Burman designs are also inappropriate for most of the above cases. So, they are not a viable alternative.

What is the alternative to using a DSD in the above cases?

For users of JMP, the answer is simple: Use the Custom Design tool.

The Custom Design tool in JMP can generate a design that is built to accommodate any combination of the scenarios listed above. The guiding principle behind the Custom Design tool is

“Designs should fit the problem rather than changing the problem to suit the design.”

Final Thoughts

DSDs are extremely useful designs in the scenarios for which they were created. As screening designs they have many desirable characteristics:
1) Main effects are orthogonal.
2) Main effects are orthogonal to two-factor interactions (2FIs) and quadratic effects.
3) All the quadratic effects of continuous factors are estimable.
4) No 2FI is confounded with any other 2FI or quadratic effect although they may be correlated.
5) For DSDs with 13 or more runs, it is possible to fit the full quadratic model in any three-factor subset.
6) DSDs can accommodate a few categorical factors having two levels.
7) Blocking DSDs is very flexible. If there are m factors, you can have any number of blocks between 2 and m.
8) DSDs are inexpensive to field requiring only a minimum of 2m+1 runs.
9) You can add runs to a DSD by creating a DSD with more factors than necessary and dropping the extra factors. The resulting design has all of the first seven properties above and has more power as well as the ability to identify more second-order effects.

In my opinion, the above characteristics make DSDs the best choice for any screening experiment where most of the factors are continuous.

However, I want to make it clear that using a DSD is not a panacea. In other words, a DSD is not the solution to every experimental design problem.

Post a Comment

Better hard-boiled eggs with our eggsperiment

peeled, hard-boiled eggs

We learned how to make even better hard-boiled eggs. (Photos by Caroll Co)

In my post last week, I discussed our latest eggsperiment with hard-boiled eggs – and now it’s time for the results!

As a reminder of the factors, we had hard-to-change factors:

  • Cooking Start (Hot/Cold)
  • Cooking Time (10 minutes, 12 minutes, 14 minutes)
  • Salt (0 tsp, ½ tsp, 1 tsp)
  • Vinegar (0 tbsp, ½ tbsp, 1tbsp)

And easy-to-change factors:

  • Egg (brown, white)
  • Cooling Method (cold water/ice water)

The response is the number of seconds it took my wife to peel the egg. The peeling order was randomized and had no effect when added to the analysis.

Before I present the model, if a picture is worth a thousand words, which peeled eggs would you prefer?

eggs_side_by_side

Just as in the first eggsperiment, if there’s one thing to take away from the results, putting the eggs into already boiling water (the hot start) instead of starting with cold water produced the nicest (and easiest to peel) eggs.

The Model

Looking at the main effects model, only the cooking start was significant. There does seem to be more to the data when we look a bit deeper. For example, examining peel time vs. cooking start, and overlaying the amount of salt suggests that there’s an interaction between cooking start and salt (notice that higher salt has lower peel times with a hot start, but higher peel times for the cold start).

eggsresults_p1

Interactions involving cooking start are also the most natural place to look for significant interactions, since the main effect is so large. Sure enough, the interaction with salt (and cooking start*vinegar) is significant, and my parameter estimates are as follows:

eggsresults_p2

Not so easy to see what’s going on? Try this interactive HTML to see the results in a Profiler. I've also put the data set on the JMP User Community. Other than the obvious effect of cooking start, the biggest thing I noticed is that salt and vinegar have different results depending on the cooking start (due to the interactions). If I’m wanting to concentrate on where I observed the best results, I want to focus on the hot cooking start. It appears as if the addition of salt helps reduce the peel time for the hot cooking start, although the effect is only marginally significant analyzing the hot start subset of the data. Although it wasn't significant, I also found it interesting that cooling the eggs by running cold water over them reduces the peel time. This was also observed in the first eggsperiment – if there is a difference, I would have thought that the ice bath would be better. Maybe next time, there will be a few more eggs to get a better sense.

Final Thoughts

This eggsperiment reaffirmed that using the boiling start is the way to go for our household. I’m willing to concede that I’m missing something in my methodology with the cold start, but the boiling start has been consistent in delivering easy-peel and well-cooked eggs. In the past, I’ve also been known to forget to remove the eggs from heat with the cold-start method until long after the pot has been boiling, so the boiling start is more forgiving in that respect.

We will also be adding salt with the boiling start method since it appears to be helpful with peeling cooked eggs. And, for the same reason, we will continue with our practice of running cold water over the cooked eggs.

Any suggestions for another experiment (eggs or otherwise)? The peeler has also told me that if I were to do another eggsperiment, she would much prefer easy-to-peel eggs. So for the eggs option, it’s in my best interest to keep on the boiling start path (I’ve heard steaming is a good way to go too…). Thanks for reading!

Post a Comment

Top-rated conference papers and posters

One of the best reasons to attend a Discovery Summit conference is to learn from other JMP users. The conference features high-quality paper and poster presentations that show how using  JMP helped to solve a business problem. The recent meeting in Amsterdam was no exception to this.

Fortunately, many of the Discovery Summit Europe presenters have posted materials related to their presentations in the JMP User Community. And this is something everyone has access to!

Attendees submitted ratings for papers and posters, and we recognized the highest-rated papers and posters during the final plenary gathering at the conference.

The top-rated and prize-winning posters and papers from Discovery Summit Europe 2016 are listed below. The first paper in each list is the winner for the category.

Best Contributed Paper:
  1. Test Time Reduction and Predictive Analysis Using Optimised Flow Based on D-Optimal Design, Principal Component Analysis and Hierarchical Component Analysis by Alain Gautier of Rockwell Collins
  2. Skeletons and Flying Carpets: A Step Beyond Profiles and Contours to Explore Multiple Response Surfaces by Christian Ritter of Ritter and Danielson Consulting
  3. Outlier Screening in Test of Automotive Semiconductors: Use of JMP Pro 12 Multivariate Analysis Platforms and Explore Outliers Utility by Corinne Bergès of NXP
Best Invited Paper (by a SAS employee): 
  1. Object-Oriented JSL – Techniques for Writing Maintainable/Extendable JSL Code by Drew Foglia
  2. Powerful Analysis of Definitive Screening Designs: Taking Advantage of Their Special Structure by Bradley Jones
  3. Run Program – The JMP Link to Other Programs by Michael Hecht
Best Poster: 
  1. Beating Complexity in Automated Method Qualification via Tailored Split-Split-Plot Design With JMP Pro by Davy van den Bosch, Zhiwu Liang and Pablo Moreno Pelaez of Procter & Gamble
  2. Analysis of a Waste Management Process Using Principal Components Analysis and Data Visualisation by Marco Reis of the University of Coimbra
  3. Using Physical and Computer Models to Teach DOE by Sam Gardner of Elanco

Check out these papers and posters as well as all the others in the JMP User Community.

Post a Comment

You have less than a week to enter the call for papers

JMP-Discovery2015_50B4296Last week, we heard from some amazing presenters at Discovery Summit Europe. Their topics varied greatly, but they all had at least one thing in common: They took the time to enter the call for papers.

Sending your abstract in is the first step, and it’s an easy one. But time is running out. You have until Monday, March 28, at 9 a.m. ET to take this step.

If you think you’d like to present at our upcoming Discovery Summit, which will be held Sept. 19-23 at SAS world headquarters, we hope you will answer the call for papers.

How do you do that?

JMP-Discovery2015_50B4833All you have to do is submit an abstract – only 150 to 200 words are required – telling us how you are using JMP to solve real problems. You may opt to present a paper or a poster; the choice is yours.

Then leave the rest to the steering committee. In late April, committee members will select presenters from the pool of entrants. If you’re chosen, you’ll present at the conference and attend for half the price.

Paper and poster presentations are an important part of every summit. Your unique perspective on the useJMP-Discovery2015_50B4261 of analytics in your field could leave fellow Discovery Summit attendees inspired and better prepared to do their jobs and implement sound strategies.

Would it help to see examples of papers and posters that have been accepted? Visit the Discovery Summit series page in the JMP User Community and select a previous conference to see materials by paper and poster presenters.

We hope to hear from you! Remember: The deadline to submit an abstract is March 28 at 9 a.m. ET.

Post a Comment

Eggstra! Eggstra! A new designed eggsperiment

Child looking at a carton of eggs

Will you be boiling eggs this weekend? What's your best method? (Photo by Caroll Co)

It’s been years since I last dyed Easter eggs, but this year we decided to give it a try. This means I’ve suddenly found the need to hard-boil some eggs. One of my favorite experiments that I’ve blogged about was the eggciting eggsperiment, so it’s the perfect time to revisit that experiment. That experiment has changed the way we hard-boil eggs at home – we’ve been using the boiling start method since.

I received a number of suggestions (and even online calculators) from the previous eggsperiment, so this time I wanted to play around more with the cooking factors involved with the pot of water. Previously, the only hard-to-change factor was if the eggs were placed into hot or cold water. We’ll investigate further this time, with hard-to-change factors:

  • Cooking Start (Hot/Cold)
  • Cooking Time (10 minutes, 12 minutes, 14 minutes)
  • Salt (0 tsp, ½ tsp, 1 tsp)
  • Vinegar (0 tbsp, ½ tbsp, 1tbsp)

Each pot will have two cups of water. For a hot cooking start, cooking time begins when the egg is placed into boiling water. For a cold cooking start, cooking time starts when the egg is removed from the heat after boiling. You can read more about cooking method in the previous eggsperiment and links within it.

I also have two easy-to-change factors:

  • Egg (brown, white)
  • Cooling Method (cold water/ice water)

For those of you who are interested only in the results, you’ll have to wait until next time. If you want to know how I’ve designed the experiment, read on…

The Design Setup

There are interesting considerations for this experiment. Firstly, I have three continuous factors in the hard-to-change factors that I’m interested in at three levels, along with a two-level categorical factor. I could certainly enter additional effects in the Custom Designer to accomplish the three levels for the continuous factors, but it sure would be nice if the design for the whole plot factors was a definitive screening design (DSD). Here is what a DSD would look like, if I enter those four factors in the Definitive Screening Design platform, found under DOE->Definitive Screening Design:

eggs2_p1

I end up with a 14-run design after choosing the “No Blocks Required” option.

But I still have the easy-to-change factors. To top it off, I only have 24 eggs, which isn’t nicely divisible by 14.

Hard-to-Change Covariates

Fortunately, in JMP 12 we added the ability to create a design with hard-to-change covariates. This means that I can have the design on the whole-plot factors be based on a DSD (although I might not have equally-sized whole plots), and it’s assigning the easy-to-change factors according to the design specifications.

Hard-to-change covariates?

With the 14-run DSD data table open, I load in the design the same way I usually do covariates. After launching Custom Design (DOE->Custom Design), under “Add Factor”, I find “Covariate.”

eggs2_p2

Selecting the four hard-to-change factors and clicking OK, they are loaded in as covariates, and I can now add the remaining two factors via “Add Factor”. My factors table looks like this:

eggs2_p3

Starting from JMP 12, I can change the role of the covariates by clicking on an entry under the Changes column for one of them, and choosing “Hard”. This automatically changes all the covariates to hard-to-change.

The Model

If using a DSD for the whole-plot factors, I feel comfortable that those main effects should be minimally aliased with second-order effects among the whole-plot factors. However, I now have two new factors to consider. Ideally, I get a design in which I can reduce the aliasing of second-order effects on the main effects, but also have the chance to detect a few active second-order effects. I used a model with the quadratic effects for the three continuous whole-plot factors, all the interactions involving one whole-plot and one sub-plot factor, and the interaction of the sub-plot factors. This isn’t the final model I’ll use when fitting, but should help me with model selection when it comes to analysis.

I chose alias optimality to create the design (from the red triangle menu Optimality Criterion->Make Alias Optimal Design), although I had similar results with D-optimality. This would help ensure that my main effects are minimally aliased with the second order effects, and I have a reasonable chance of detecting any large second effects that may occur.

How Many Whole Plots?

I know my overall run size is constrained to 24 runs. Using 14 whole plots is a far cry from the six I used in the first eggsperiment, but I also have more factors in the whole plots. The first design I tried used 14 whole plots (i.e., the full DSD). I was quite happy with the design, but it would be nice to reduce the pots of eggs that I need. I tried a different number of whole plots, and ultimately decided on 12. The biggest difference is a loss of power for the quadratic effects, but I’m really only expecting to detect quadratic effects if they are quite large. I also end up with two eggs per pot, which I prefer from the psychological standpoint of not having a pot of water with a single egg. Did I still maintain something that has the nice properties from a DSD? Based on the color map on correlations, it was close enough for my liking.

eggs2_p4

Next Time

Stay tuned for the results, but I’d be curious to hear how you hard-boil your eggs -- what did I miss this time? Thanks for reading!

Post a Comment

Follow the live blog of Richard Wiseman keynote

Richard Wiseman is the Professor of the Public Understanding of Psychology at the University of Hertfordshire. He delivers a keynote speech at Discovery Summit Europe 2016 in Amsterdam titled "The Luck Factor."

Based on his book of the same name, Wiseman's speech outlines the principles of good luck: maximising chance opportunities; listening to lucky hunches; expecting good fortune; and turning bad luck to good – so that you too can improve your odds in life.

View the live blog of this speech.

See photos, tweets and other live blogs from the conference at jmp.com/live.

Post a Comment

View the live blog of Douglas Montgomery speech

Arizona State University professor Doug Montgomery gives a keynote speech at Discovery Summit Europe 2016 in Amsterdam titled "The Flight of the Phoenix."

Montgomery, a professor of engineering and statistics, discusses the reasons that some people had believed that design of experiments was no longer of interest -- as well as new developments and applications that have made this one of the most active and important fields of applied statistics.

View the live blog of this speech.

See photos, tweets and other live blogs from the conference at jmp.com/live.

Post a Comment

Follow the live blog of David J. Hand keynote

David J. Hand is Senior Research Investigator and Emeritus Professor of Mathematics at Imperial College London. His speech at Discovery Summit Europe 2016 in Amsterdam this morning is titled, "Playing in Everyone's Backyard."

The speech title is a reference to John Tukey's famous statement: “The best thing about statistics is that you get to play in everyone’s backyard.”

In this speech, Hand strongly encourages us to raise public awareness of the foundational role of statistics. He wants everyone to understand that “statistics is the science of extracting information from data,” and “it is thus through statistics that we understand the world, make better decisions and improve the human condition.”

View the live blog of this speech.

See photos, tweets and other live blogs from the conference at jmp.com/live.

Post a Comment

View the live blog of John Sall keynote speech

SAS co-founder and JMP creator John Sall is giving a keynote speech this morning at Discovery Summit Europe 2016 in Amsterdam titled "The Design of JMP."

Sall says, "JMP was created in response to various forces, situations and opportunities. It evolved to address needs in unique ways. It got very good in certain areas."

The speech is the story of how it all came together over the last 27 years, from the first release to the latest version of the software.

View the live blog of this speech.

See photos, tweets and other live blogs from the conference at jmp.com/live.

Post a Comment

#OneLessPie for Pi Day

Last week, I celebrated the progress we've seen in cleaning up wayward pie charts, and today I again use Pi Day as an opportunity to give the world #OneLessPie.

Since I am at Discovery Summit Europe in Amsterdam this week, I chose a European topic for my makeover. I started with the pie chart below from the Wikipedia page on the European Space Agency (ESA). The chart shows the breakdown of the budget by domain. The good news is that the wedges are ordered by size, but the bad news is that there are too many wedges to make sense of and the colors are saturated. And it seems the three wedges that precede the "Other" category all have the same brown color!

esapie2

The Wikipedia article contains a reference to the source of the data. I followed it, expecting to find a table. Instead, I found a 3D pie chart at the European Space Agency site. (Aside: I didn't realize until I visited esa.int that int was a top-level URL domain for international web sites.)

esapie3
At least, the colors are nicer, and the wedges are labeled directly. But the 3D view distorts the sizes, and I can't figure out the order of the wedges. This is actually the 2016 budget; the Wikipedia page had only the 2011 budget, so I had to update some of the text as well.

Here is my replacement, a simple bar chart that uses length to more accurately encode the values and doesn't need multiple colors to distinguish the items.

Easbars

For Pi Day, I replaced the pie chart that had been on the European Space Agency page on Wikipedia with this bar chart. #OneLessPie

To create this simple bar chart, there were plenty of design decisions to make:

  • Should the axis be in euros or percent?
  • Should the bars be labeled with values?
  • Should I combine the tiny domains into an "other" group (note that they are invisible in the 3D pie chart)?
  • Should I include the asterisk from the original indicating "includes Programmes implemented for other Institutional Partners"? Or should I indicate that information with a different shade of bar?
  • Should I add grid lines?
  • Should I include last year's numbers?

In the context of Wikipedia, I opted for simplicity. But if this chart were part of an ESA report, for instance, I might choose differently on some of those questions. That is, the intended message affects our design decisions.

JMP Tip: How did I get the axis tick marks to say "500M€" instead of "500" or "500,000,000"? I could get the "M" part by using the new-with-JMP-12 "Precision SI" number format, but that doesn't go all the way. Instead, I added a Value Label column property with just these numbers in it, formatted the way I wanted, and the axis picked those up.

Happy Pi Day! I look forward to seeing your makeovers. Share them using #OneLessPie.

Post a Comment