New videos to grow your JMP skills available

New demos from live Mastering JMP webcasts are now available for viewing at your leisure. I have broken each webcast into two or three videos.

Registration is required to view the videos. Here's a hint: After you register, bookmark the page from which you launched the videos so that you can come back to the link as often as you like and view the videos without registering again.

Jami Hampton’s videos on Preparing Your Data for Analysis were recorded during the Jan. 20 Mastering JMP live webcast.

In the first episode, Jami covers the JMP data structure and how to import data. In the second episode, she demonstrates how to use column and row functions, including recoding columns, modifying column properties, selecting matching cells, adding value and column labels, compressing selected columns and stacking, splitting, joining, and sorting tables. In the third episode, Jami covers table functions and Tabulate, and demonstrates how to find and handle missing data, subset data, stack data, split data, sort tables by row or column, join tables, and use the JMP tabulate capabilities to group and summarize data.

Scott Wise’s videos on Exploratory Data Analysis and Dynamic Graphics were recorded during the Jan. 27 Mastering JMP live webcast. Wise uses a supply chain case study to show how one might use JMP to understand and correct late shipments impacting profitability.

In the first episode, Scott describes the case study and business problem, and then shows how to use JMP to examine relationships, patterns and outliers to gain critical insight into the problem. In the second episode, he shows how to use Distribution, Data Filter, Recursive Partitioning, Contingency Analysis, One-Way Analysis and more to uncover key variables impacting late shipments. In the third episode, he models the data to predict and provide information for correcting late shipments. He uses Fit Model, Parameter Estimates, Prediction Profiler and more.

The Supply Chain Late Orders Data Scott used is available for download from the JMP File Exchange.

Sam Gardner’s videos on Data Mining and Predictive Modeling were recorded during the Feb. 10 Mastering JMP live webcast.

In the first episode, Sam defines three goals of data mining: discovering patterns, uncovering relationships and building predictive models. He uses paper print banding data to uncover important factors that drive quality and oil field recovery data to identify important factors impacting the profitability of oil recovery efforts. In the second episode, Sam answers questions about the analyses techniques he used when mining the print banding data. In the third episode, he shows how to build predictive models using JMP regression, confusion matrices and decision trees. He uses JMP Pro bootstrap (random) forests and boosted trees to build predictive models and covers the use of training, validation and test data. He closes by briefly describing bootstrap aggregation (bagging) and boosting.

The three data sets Sam used are also available from the JMP File Exchange. Printing Manufacturing Data, Oil Recovery Factors, and Mobile Phone Customer Churn (Cancellation) Data.

Post a Comment

JMP Genomics shows the future of biomarker discovery

The Biomarkers Congress is Europe’s largest biomarkers event showcasing case studies in biomarker discovery, and validation strategies and regulation. Its industry delegate list reads like a who’s who in the world of biomarkers, with representation from throughout the globe, for example, AstraZeneca, GSK, Merck, Roche and Takeda. It was fantastic to catch up with JMP customers at the conference over the past couple of days.

Effective use of biomarkers is seen as an increasingly important strategy as pharmaceutical companies look to find new business streams based around personalised medicine.

A common theme at the conferences was the need to work with integrated data types. For example, researchers would like to be able to evaluate proteomic, metabolomics and gene data simultaneously and look for relationships between these different data. In addition, they would also like to develop predictive models that consider all data types as candidate predictors as opposed to evaluating each data set separately.

JMP Genomics and JMP Clinical have tools that address these needs. The cross-correlation feature will look for relationships between all possible combinations of data, and the predictive modeling tools can accommodate categorical and continuous data as candidate predictors.

Doug Robinson, JMP’s in-field life sciences specialist, presented a talk on how to explore and link genomics and clinical data visually. He showed JMP Genomics and shared information about some of the new features coming in the next release, due out in May this year.

Post a Comment

What price orthogonality?

On Feb. 8, my colleague, Professor Douglas Montgomery of Arizona State University, and I presented a webinar for the American Statistical Association. Our first demonstration dealt with designing an experiment for six factors each having two levels in 24 runs. One natural way to construct such a design would be to choose six columns from the orthogonal array discovered by Plackett and Burman, which has 24 rows and 23 columns (see below).

+++++++++++++++++++++++
−++++−+−++−−++−−+−+−−−−
−−++++−+−++−−++−−+−+−−−
−−−++++−+−++−−++−−+−+−−
−−−−++++−+−++−−++−−+−+−
−−−−−++++−+−++−−++−−+−+
+−−−−−++++−+−++−−++−−+−
−+−−−−−++++−+−++−−++−−+
+−+−−−−−++++−+−++−−++−−
−+−+−−−−−++++−+−++−−++−
−−+−+−−−−−++++−+−++−−++
+−−+−+−−−−−++++−+−++−−+
++−−+−+−−−−−++++−+−++−−
−++−−+−+−−−−−++++−+−++−
−−++−−+−+−−−−−++++−+−++
+−−++−−+−+−−−−−++++−+−+
++−−++−−+−+−−−−−++++−+−
−++−−++−−+−+−−−−−++++−+
+−++−−++−−+−+−−−−−++++−
−+−++−−++−−+−+−−−−−++++
+−+−++−−++−−+−+−−−−−+++
++−+−++−−++−−+−+−−−−−++
+++−+−++−−++−−+−+−−−−−+
++++−+−++−−++−−+−+−−−−−

Hold on a sec. What is an orthogonal array?

An orthogonal array is a matrix of symbols, in our case “+” and “–“. In each pair of columns, there are only four possible pairs of symbols: “+ +”, “+ –“, ”– +” and “– –“. To be an orthogonal array, each pair of symbols has to occur equally often in every pair of columns. In the above array, each of the four pairs of symbols appears six times in every one of the 253 pairs of columns.

Orthogonal arrays have great historical significance in the field of experiment design. Until the advent of computers, virtually every experiment design used an orthogonal array. There are two main reasons why early practitioners focused on orthogonal arrays. First, it is easy to calculate the effect of changing any factor – you just average all the responses for trials run at the + level of the factor and subtract the average of all the response for trials run at the – level of the factor. Second, the estimated effect of any factor is statistically independent of the estimated effect of any other factor.

If orthogonal arrays are so great, why use anything else?

Orthogonal arrays are very useful, but they only exist for certain numbers of trials. For example, orthogonal arrays having two symbols in each column only exist when the number of trials is a multiple of four. If all the factors have two levels, an orthogonal array with 15 rows does not exist.

More important is the fact that although the main effects of each factor in an orthogonal array are independent, the two-factor interactions may not be. In the example that Doug and I presented, we wanted to be able to estimate both the six main effects as well as the 15 two-factor interactions. If you include the overall average, there are 22 unknown quantities that we want to estimate. Since there are 24 runs, it seems possible to fit this 22-term model using ordinary least squares.

Can you choose six columns from the Plackett-Burman design to fit this model?

It depends on which six columns you use. There are 100,947 ways you can pick six columns out of the 23. Of those, more than half are incapable of fitting the two-factor interactions model. Of course, there are 49,588 ways of choosing the six columns that do allow for fitting all the two-factor interactions. Depending on which group of six columns you choose, the average variance of the coefficient estimates can vary by a factor of >28. That is, for the six-column design, the least desirable choice estimates the coefficients 28 times worse than the most desirable choice. So, to construct your design by picking six columns from the above array, you have to be very careful.

Are there other orthogonal design choices that are better?

I asked my colleague, Dr. Eric Schoen of the University of Antwerp, for help here. Eric is a world-renowned researcher in orthogonal design. He has constructed a catalog of all the statistically different orthogonal arrays having six columns and 24 rows. It turns out that there are only 1,350 of them. So, most of the six column choices from the 23 columns of the Plackett-Burman design were non-unique. Also, they are not exhaustive. It turns out that 20 of Eric’s orthogonal arrays were better than any of the six column choices from the Plackett-Burman design. The best of these was about 8% better than the best orthogonal array I found previously. Only 447 of the 1,350 unique orthogonal arrays could fit the model.

Can you do better at estimating all the effects of interest?

The answer is yes but only if you do not use an orthogonal array! I constructed a D-optimal design using the Custom Designer in JMP. Table 1 shows the comparison of the variance inflation factors (VIF) for the D-optimal design compared to the best orthogonal array.

Table 1 – Variance inflation factors for D-optimal design compared to the best orthogonal array.

Note that the VIF for every main effect and two-factor interaction is lower (better) for the D-optimal design than for the best of the orthogonal arrays.

The bottom line – don’t limit yourself by only considering orthogonal arrays.

Many investigators only consider orthogonal arrays when planning their experiments. This restriction comes at a price as the example from the webinar shows.

Finding the best orthogonal array actually required more experiment design expertise and more computing than finding the D-optimal design, and the resulting design was statistically inferior.

Table 2 – Six-factor 24-run D-optimal design for the main effects and two-factor interactions model.

 

Table 3 – Six-factor 24-run best orthogonal array for the main effects and two-factor interactions model.

Post a Comment

Visualizing Eurozone debt crisis data

Like many people, I've been closely following news of the European debt crisis. While no one knows what would happen if any European country actually defaults, speculation is rampant. However,  I haven’t come across any visuals that adequately show the scope of the situation and how one country's default would affect the rest of Europe and the UK. I decided to use JMP to see whether data visualization would help me understand the debt crisis better, and I did find it useful.

I started by looking at the crisis from the debtor country perspective.  The map below shows the countries, colored by the size of their debt relative to their Gross Domestic Product (GDP). Then I sized each based on the total amount that each country owes. Notice that Greece, which does not owe a large dollar amount (relative to the other countries), has a deep red color – this indicates that it owes much more than its GDP, making it high-risk.

JMP map of debtor countries in Europe

I then turned to viewing the data from the creditor country perspective. The heat map below shows which countries have loaned money to which, and it gives a view of how risky a creditor country’s portfolio is. Reading the heat map vertically, from the X axis up, it clearly shows that Greece is the riskiest debtor country, followed by Italy, Portugal and Ireland. It also shows which countries Greece owes money to. Read it horizontally to see which creditor country might be in the most trouble if Greece defaults. France and Germany have the highest risk portfolios of all the European countries.

JMP heat map of creditor countries in Europe

There’s only one thing missing from the above graphs – the size of each debt owed to the creditor countries. To view that, I created a tree map in Graph Builder. The graph below shows the relative risk of each loan and the relative size of the loans.

Tree map about Eurozone debt crisis

We can see that France’s portfolio not only contains risky debt, but it also is owed more by Greece and Italy than any other creditor country. Germany is next on the risk portfolio scale, but I am most concerned about Portugal, which has some Greek debt, and Spain, which has substantial Italian debt.

Looking at these graphs, you can hypothesize about how badly things might get should Greece default. Both Italy and Portugal have loaned money to Greece. If Italy were to default, Spain would most likely be next, and France and Germany would feel some severe pain. Even if Italy were able to withstand a Greek default, Portugal might not be so lucky. And if Portugal defaults, Italy would once again be vulnerable, as would Spain.

All in all, it looks pretty daunting. As of this writing, Greece has met the European Union’s requirements for spending cuts and will receive more bailout money. But will it be enough? Are other countries poised for disaster? Just yesterday, Moody's Investor Service lowered its credit ratings on Italy, Portugal and Spain.

What do you make of these graphs? Have I missed anything? Any suggestions for other ways to look at the data?

Source: http://www.bbc.co.uk/news/business-15748696

Post a Comment

Ten great things about JMP 10

Statistical Discovery in JMP 10 - new features in JMP 10With the JMP 10 release right around the corner, you might be asking yourself: What are the latest and greatest features that will available to me when I install this new version? I found myself asking the very same question when I first installed the beta and subsequent early adopter versions while preparing the JMP 10 fact sheet and other JMP 10 product information materials.

I can tell you – this is a big release with many amazing additions that you will find both exciting and useful.  I found it hard to limit the “great things” about JMP 10 to 10 items  – since there are so many new features, improvements and analysis platforms – but “Ten for 10” does have a nice ring to it, so I’m going for it.

Before I start the list, the first thing you’ll notice about this release is just how fast it is. The JMP developers have further optimized support for multi-core CPUs, which make JMP blaze through importing huge data files and working with data once it’s into a JMP table.

I have a proposition for you when you first install JMP 10: Search around your computer and find the largest data set you have and try using the Graph Builder platform to visualize it. You will be amazed how effortlessly JMP creates different visualizations as you drag-and-drop your data into Graph Builder – even if it has tens of millions of records.

Ten great things about JMP 10

  1. The Graph Builder, the revolutionary drag-and-drop, interactive way to build a graphical analysis, has been greatly improved to make visual discovery faster. Icons are now provided to switch between graph types through a dedicated panel. Graph Builder also includes more customizations, and even the ability to launch the Fit Model platform to determine if a trend you see visually is statistically significant.
  2. The Local Data Filter is a data filter that you can add to many platforms that lets you filter and focus on specific categories without disturbing the original data table.
  3. The Column Switcher saves the hassle of having to repeat the same analysis over and over when you have many columns of data that need the same analysis performed on them. It’s easy to interchange columns, even if you have thousands of columns, by just adding the Column Switcher within a platform. The switch can be performed manually by clicking or using the arrow keys, or even animated.
  4. The Control Chart Builder is a drag-and-drop way of building control charts analogous to Graph Builder. The platform automatically configures the control chart depending on the type of data you are looking at. You can also drag grouping or phase variables and get a handle on your process control data very efficiently.
  5. The Reliability Forecast platform lets you analyze warranty return data to create reserve forecasts. It lets you perform what-if analyses adjusting production volume, forecast length and contract terms.
  6. The Reliability Growth platform lets you model the reliability of a single repairable system over time as improvements are incorporated into the design. This platform fits Crow-AMSAA models and also features a useful change point detection fitting procedure that automatically determines when phases of the reliability model have changed.
  7. The Measurements Systems Analysis (MSA) platform provides a method for assessing the variation in your measurement system and gauges. The platform was developed under the guidance of Dr. Don Wheeler and his book:  EMP (Evaluating the Measurement Process) III: Using Imperfect Data. One of the great features in this platform is the ability to use the Shift Detection Profiler to explore the MSA-space and interact with your ability to detect a warning.
  8. The Nonlinear platform has a powerful new way of fitting curved data without the need to pre-impute a formula or starting values. Simply select from one of the models in a rich library, which includes popular bioassay or pharmacokinetic models, and your data is fit automatically.
  9. The Partial Least Squares (PLS) platform has been greatly improved and now includes a richer set of graphs and reports.
  10. The Custom Designer includes numerous important improvements, which let you set up discrete numeric factor roles, and the number of runs is automatically updated when center-point and replicate runs are changed.  In addition to these improvements to Custom Designer, a new Evaluate Design platform lets you evaluate any JMP data table treated as a design, change model and alias terms, and see updated diagnostics.

So, there we have it! Ten great new features in JMP 10 that should make you very excited to get your hands on this release. If you’d like to see a preview of some of these new features and platforms in action – look for a brand new video overview of JMP 10 coming in the near future.

Post a Comment

Embedding images in JSL scripts

Every now and then a question arises that spawns enough discussion that I think it is worthy of a blog post. The topic of discussion around JMP's virtual water cooler this time is the idea of including an image inside a JSL script. A similar discussion arose not long ago and prompted me to write a blog post called How to Add an Image to a Graph in JMP 9. But this time the initial question, and subsequent discussion, specifically asked about embedding the image in the script such that the script is not dependent on an external image file. An email thread started on Tuesday, and on Wednesday it found its way into my inbox. By Friday, I was still seeing more responses to the thread. I decided if there was that much interest within JMP, certainly there would be some interest throughout the JMP user community.

So can you embed an image within a JSL script so that the original image file is no longer necessary? The answer is, of course, yes. (If it was no I wouldn't be writing this blog post!) Ironically, the whole discussion about embedding images was going on at the same time I was writing my last blog post, Using images to add context to your data. So let's start where that one left off.

In that blog post, I used a JMP data table to generate a bivariate plot. The graph included arrows showing wind speed and direction. I then added an image to the graph. The image was a map of the area where the wind readings took place -- around Lake Michigan. If I wanted to save the graph, I could use the little-red-triangle (lrt) and select Script->Save Script to Script Window. This would write the JSL necessary to reproduce the entire graph. If I do that, I will see the following code. (This is only a portion of the actual code.)

Add Image(
      Set Blob(
            Char To Blob(
                  "625799eJM//9WNs7G2+0s6euR1e ... Ix0RcMtzpWCTfj/AXh1fKw=",
                  "base64compressed"
            ),
            "png"
      ),
      Bounds(
            Left( -90.96 ),
            Right( -84.02 ),
            Top( 44.97 ),
            Bottom( 39.04 )
      )
),

If you read How to Add an Image to a Graph in JMP 9, you will recognize the Add Image command. But in that example, I used a file reference, which meant that the JSL was dependent on an external image file. In this example, instead of a file reference, we are using a blob. The blob is a long string of character data that has been compressed using base64 compression. This is the smallest way to store an image as pixel data inside of a script. By the way, the ... in the middle of the string is to show that the majority of the character string has been removed for brevity. The actual string is much longer than what I've shown here. In addition to the compressed character data, the image type is also written out. This is so that when the script is read in, JMP knows how to interpret the character data stored in the blob. The nice thing about this is that I can generate the graph I want and then use Save Script to Script Window to get JMP to generate the compressed character data for me. I don't have to figure out how to do it myself. Nice!

At this point you might be wondering how big the script file is now. Well, in my case, the script without the image embedded is 1 KB. The image file is a PNG file and it is 26 KB. So the two files combined would be 27 KB. If I embed the image in the script file and save it to disk, the script file is now 34 KB. So the script containing the image is larger than the two individual files combined. But, I have to admit, I was surprised that it wasn't much larger. The PNG file is a binary file of compressed pixel data. The image embedded in the script is written out as ASCII characters. So I expected the script to be much larger. I suppose different images will have different results. In other words, your mileage may vary.

Now I can share my JSL script with others, and when they run it in JMP, they will see my graph, complete with the image. And they won't see the error message in the log that says the image file couldn't be found.

I hope you enjoy this little trick and, as always, please let me know what you think.

Post a Comment

New in JMP 10 for experiment design: Evaluate Design

JMP 10 is coming in March. In my next few posts, I plan to share the main new capabilities in the area of experiment design. The most visible of these new features is the Evaluate Design item on the DOE menu.

What does the Evaluate Design feature do?

Evaluate Design allows you to see the design diagnostics for any data table as if it were a designed experiment.

Why would anyone want to do that?

I have heard two common reasons for wanting this feature. First, there are many designs in textbooks, and it is desirable to compare the capabilities of the textbook design to the algorithmic design that JMP produces. Second, it often happens that an analyst gets data from a colleague and wants to find out whether the data can adequately support various model possibilities.

How about an example?

The data in the table below was reported by Longley in the Journal of the American Statistical Association in 1967. The data is econometric data. The response, Y, is a measure of total employment. The columns X1 through X6 are the factors. You could probably guess, for instance, that X6 is actually calendar year.

Longley (1967) Highly correlated econometric data

One thing about econometric data is that the variables are often correlated. Of course, this data is not the result of a designed experiment. Nevertheless, we can use design diagnostics to show why it is really hard to determine which of the six factors is actually driving the response.

You can find the Longley data in the file Longley.jmp in JMP’s Sample Data folder. The first thing we see after enter X1-X6 as factors and Y as the response in the launch dialog is a Fraction of the Design Space Plot. Points on the blue line in the plot show the fraction of the volume covered by the data that has a relative variance of prediction less than or equal to the plotted value. For instance, the vertical line falls on the X-axis at 0.5. The horizontal line intersects the vertical line at a point on the blue curve and hits the Y-axis at around 100. The interpretation of this is that half of the volume of space covered by the data has a relative variance of prediction less than 100. Conversely, half of the volume also has a relative variance of prediction greater than 100.

Median Relative Variance of Prediction is ~100.

So is that bad or good?

Actually, it is terrible! There are 16 rows in the table. If you could control X1 through X6 and perform an optimal design, the worst relative prediction variance would be 0.4375. So, in half of the region of the Longley data we are doing more than 200 times worse in prediction variance than the most poorly predicted combination of factors from a well-designed experiment.

So prediction is bad – what about parameter estimation?

The table below shows the VIF (variance inflation factors) for estimating coefficients for the main effects model. For an orthogonal design, the VIF for every coefficient is 1. We can see that because of the poor design of the data, the variance of the unknown parameters is as much as 2009 times worse than it would be for an orthogonal design.

Big VIF values mean poor precision for parameter estimates.

What is the cause of these poor diagnostics?

Opening the Color Map On Correlations outline node in the Evaluate Design report reveals the figure below. It is easy to see that X1, X2, X5 and X6 are nearly perfectly correlated. High correlations among factors results in high coefficient variance and high prediction variance.

Extremely high (nearly 1) pairwise correlations for X1, X2, X5 and X6.

Economists have to deal with econometric data as it comes. Governments do not run designed experiments on national economies. Looking at these diagnostics we can appreciate why it is difficult to say why economic indicators move as they do.

One reason Longley wrote his paper was to emphasize the difficulties in interpreting multiple regression output with historical data like this. Happily, drawing conclusions from well-designed experiments is much simpler.

Post a Comment

Using images to add context to your data

On more than one occasion, I have been asked why we added image functionality to JMP. After all, JMP is a statistical software package. What is the value of imagery and what can you do with images in JMP? Well, there are a number of reasons for adding the functionality, and there are lots of things you can do with images in JMP. The most basic reason, however, is simply that an image can add context to your data. Let me explain through an example.

Let's start, as we often do in JMP, with our data. I'll use a data table called Chicago Wind. The data, as the name implies, captures recordings of wind speed and direction. In this data table, there is a script that generates a Bivariate Fit and displays the wind data as arrows.

If you have ever watched a weather report on your local news channel, this type of graphic will be simple to understand. The arrows point in the direction of the wind, and the size of the arrow indicates the magnitude, or speed, of the wind. Along with this data, I was given an image file called windmap.png. The image shows a map of Lake Michigan and the surronding area. It is covered with little dots, indicating the location of observation stations.

If this looks familiar to you, it may be because I demonstrated this example at the JMP Discovery Summit in September. I can drag and drop the image into my graph. I can then interactively reposition and stretch the image to fit the data. I can even right-mouse-click to get to the image submenu and apply some transparency to the image. This will soften the image enough such that it still is visible and adds context to my data without overpowering the data itself.

I now have a visualization that I can use to draw some conclusions about why the arrows vary in both size and direction. If you look back at the previous graph with no image, you see lots of arrows. But all you can tell is that the wind varies. There are not any cues as to why. Once you add the image, your graph has some context. This context helps explain why the wind varies so much in both direction and magnitude over a relatively small area. Do you know why?

Post a Comment

Introducing definitive screening designs

In my two previous posts, I introduced the correlation cell plot for design evaluation and then showed how to use the plot to compare designs. Here, I want to use the same plot to show why definitive screening designs are, well, definitive.

For a complete technical description of definitive screening designs, you can read "A Class of Three-Level Designs for Definitive Screening in the Presence of Second-Order Effects" -- an article I co-wrote with Chris Nachtsheim of the University of Minnesota. Chris and I were delighted to learn recently we had won the American Society for Quality's 2012 Brumbaugh Award for our paper. This award is presented to the author(s) of the paper that has made the largest single contribution to the development of industrial application of quality control. The paper was published in January 2011 in the Journal of Quality Technology, and you can read it via the JMP website.

What is a definitive screening design?

The most notable way that definitive screening designs are different from standard designs is that all the factors are numeric and are tested at three levels. A second distinctive feature of a definitive screening design is that it is a self-foldover. That is, the runs of the design come in pairs that “mirror” each other. Suppose we encode the low setting of a factor as “–“, the high setting as “+” and the middle setting as “0”. Then, if one run of a foldover pair has factor settings encoded “+ 0 – + – +”, the other run has factor settings encoded “– 0 + – + –”. Each pair of runs has one factor at its middle value and all the others at their high or low values. One run is at the center of the design region with all the factors at their middle setting. Table 1 shows a definitive screening design for eight factors. Notice that it has one more than twice as many runs as there are factors, that is, 17 runs.

Table 1 Definitive screening design for eight three-level factors.

So what makes this design so special?

To see why the design in Table 1 is fantastic, let us use the correlation cell plot in Figure 1. Our potential model terms are all the main effects, two-factor interactions and quadratic effects. Note that only the cells on the diagonal of the plot are pure red. That means that none of the model terms are confounded with each other.

Figure 1 Correlation plot for definitive screening design.

The last eight columns of the cell plot show the quadratic effect terms. These effects are only mildly correlated with each other (|r| = 0.19). Each quadratic effect is uncorrelated with a two-factor interaction involving its factor. That is, the quadratic effect of factor A is uncorrelated with the AB interaction. Other two-factor interactions have an absolute correlation of 0.37. It turns out that all eight quadratic effects are estimable with the definitive screening design. The main effects of the design are all orthogonal to each other and to all the second order terms (two-factor interactions and quadratic effects).

The two-factor interactions have pairwise correlations that can take one of three values. The pink cells represent absolute correlations of two-thirds. The light blue cells represent correlations of only one-sixth. The pure blue cells show uncorrelated interaction pairs.

Let us compare this design and plot to the standard screening design for eight factors. That design is the minimum aberration fractional factorial design. This design is in Table 2, which has one added center run to make both designs have 17 runs including one center run.

Table 2 Standard screening design with one center run.

Figure 2 shows the cell plot for the fractional factorial design. The most notable feature of this cell plot is that all the cells are either pure blue or red. That is, every pair of columns is either completely uncorrelated or completely confounded.

Figure 2 Correlation plot for the standard screening design.

Note the block of red cells in the lower right. These red cells indicate that all the quadratic effects are confounded with each other. With one added center run, the standard screening design has some ability to detect very strong nonlinearity in the factor/response relationship. However, there is no way to determine which factor is causing the nonlinearity. By contrast, the definitive screening design can separately estimate the nonlinear effect of each factor.

Each two-factor interaction in the fractional factorial design is confounded with three other two-factor interactions. This means that if any two-factor interaction is active, the analysis can only indicate that there are four possible interactions that could explain the observed effect. Narrowing down this field to one interaction requires further experimentation. By contrast, the definitive screening design can reliably resolve any two-factor interaction that is large compared to its standard error.

Why are definitive screening designs definitive?

The purpose of screening is to separate the vital few factors that have a substantial effect on the response from the trivial many that have negligible effects. If a factor’s effect is strongly curved, a traditional screening design may miss this effect and screen out that factor. If there is a two-factor interaction, standard screening designs having a similar number of runs to the definitive screening design with the same number of factors will require follow-up experimentation to resolve the ambiguity. The definitive screening design can reliably accomplish the task of screening even if there are a couple of second order effects.

Post a Comment

Top 10 JMP Blog posts of 2011

The numbers are in from my SAS colleagues, and I have a list of the top 10 JMP Blog posts of last year. The ranking is based on total number of pageviews.

What's interesting is that almost all of the posts on this list were published in years prior to 2011. But apparently, they have continuing appeal.

Without further ado, here are the top blog posts of 2011:

  1. What Good Are Error Bars? (2008)
  2. Saving Graphs, Tables and Reports in JMP (2010)
  3. The Best Karts in Mario Kart Wii: A Mother's Day Story (2008)
  4. How to Make Tornado Charts in JMP (2008)
  5. Principal Variance Components Analysis (2010)
  6. JMP Into R! (2010)
  7. Keyboard Tips CTRL/Click, ALT/Click, CTRL/ALT/Click (2010)
  8. Solar Panel Output Versus Temperature (2009)
  9. How to Make Practical Sense of Data and Win a Book (2011)
  10. Set the Right Selling Price for Christmas Cookies (2008)
Post a Comment