Welcome back from holiday!

discovery2014-europeWe know the timing for our first-ever European Discovery Summit call for papers wasn’t ideal. Basically, it spans the entire summer holiday season. But luckily for all of the JMP champions in Europe, you have a bit of time left.

Before settling back into your regular work routine, take a few minutes to think about the ways you use JMP. Does it help you save energy, save time or save money? Do you use it to design efficient experiments? Does it help you communicate important statistical information to people who don’t otherwise get statistics?

Think about the value you see in JMP and ask yourself, “Would other JMP users find my application interesting or helpful?” If the answer is yes, then please share how you use JMP products by sending in a paper or poster abstract.

Here’s the process: Submit your abstract for a paper or poster by 19 Sept. Our Steering Committee  – a group of JMP champions from various industries and various European countries – will vote on the submissions in October. The authors of the selected paper and poster presenters will be our guests at Discovery Summit Brussels, showcasing their work on the 24 and 25 of March 2015.

Need a little inspiration? Check out the paper abstracts that were selected for Discovery Summit 2014 at SAS world headquarters in Cary, North Carolina.

Don't wait any longer. Holiday is over. And this opportunity to shine in front of analytic leaders in industry and academia is serious business. You’ll find all you need to know on the European Discovery Summit call for papers page.

Post a Comment

The eggciting results of my designed eggsperiment

In my previous blog entry, I talked about my frustrations in making good-looking hard-boiled eggs that were easy to peel. My Internet searches found a number of different techniques that cooks said were essential to success, but I wanted to know which techniques were best. So I set up a designed experiment to study the factors that affect the qualities of a hard-boiled egg. Now for the results…

How Do I Analyze This?

The responses measured were peel time, attractiveness of the egg and ease of peel.

The responses measured were peel time, attractiveness of the egg and ease of peel.  (Photo by Caroll Co)

First off, it’s important to notice that the Custom Designer added a column called Whole Plots (which indicates a batch of eggs that went into a pot) to the data table. We need to include this column as a random effect to ensure appropriate analysis. Fortunately, since we’ve created this design in JMP, you’ll notice a "Model" script on the left-hand side of the data table. Selecting Run Script brings up the Fit Model dialog with the model we had selected in Custom Design, and with the whole plot included properly. Alternatively, you could also go through Analyze -> Fit Model. Before hitting the Run button, I like to change the emphasis to Effect Screening to look at the Profiler.

The Results
I must admit, I had reservations about whether or not any effects would show up as significant. The results surprised me.

Peel Time
There was so much variation in the peel times that nothing came out significant at the 0.05 level. However, the effect of the cooking method was quite large, and the lack of significance could be in part due to having only 4 denominator degrees of freedom. The results suggest that starting the eggs in cold water increases the peeling time of the egg.

eggs_peel_time_estimates2

Or, if you prefer something more graphical, here are the 95% confidence intervals for the estimates.

peel time CIs

If I ignore the split-plot structure and simply look at the box plots of the peel time for the 12 eggs from each cooking method, the boiling start method looks to be more promising in terms of peel time. Note that I can’t throw away the split-plot structure if I’m doing analysis; I’m using this more as a guide to see if there’s reason for further experimentation – particularly since most instructions for hard-boiled eggs start from cold water.

eggs_peel_time_boxplot

Attractiveness
The attractiveness rating provided some interesting results. While there are some statistically significant results here, the largest effect is still the cooking start.

eggs_attractiveness_estimates2

attractiveness CIs

The results suggest that the boiling start leads to more attractive eggs. We also notice that not cracking the egg before cooling leads to more attractive eggs. To get a better idea of what’s happening with the interactions, I like to use the Prediction Profiler.

If we open up the Profiler and select “Maximize Desirability,” we find the best predicted settings are a boiling start, an old egg, cold-water bath and no cracking before cooling. This is particularly interesting since the model suggests better results using a cold-water bath instead of an ice-water bath. At the very least, it suggests that I would want to have the cooling method in my next experiment.

eggs_attractiveness_profiler

Ease of Peel Rating
The ease of peeling showed one significant effect. If you’ve read through to this point, you can probably guess which effect it is. When you consider that there were only six whole plots, it’s a bit surprising that I found a difference based on the cooking method.

eggs_ease_estimates2

peel ease CIs

Final Thoughts

The bottom line is that the best predicted settings were a boiling start, an old egg, cold-water bath and no cracking before cooling. When thinking about both attractiveness and ease of peel, the results suggest that I still want to study all four factors in the next experiment (with some additions as well).

This experiment has provided me some real food for thought (pun intended). You may have noticed that I had nothing about the taste/quality of the eggs in the response. In part, this was because it was going to be difficult to keep track of, and I didn’t want to cut the eggs right after they were peeled. That said, the eggs that we’ve eaten thus far have been better cooked than any batch I can remember preparing on my own in the last few years.

I also noticed that placing the eggs straight into boiling water led to cracks in some of the eggs (in three out of 12). This didn’t seem to affect the responses, but it is something that I would likely pay better attention to next time.

It certainly looks like there’s more experimenting to be done on this. In the future, I would increase the number of whole plots by using fewer eggs per batch and possibly add another level of randomization so that I can cook multiple batches of eggs at a time in different pots. This would also let me investigate some additional factors.

Thanks to everyone for the fantastic comments thus far, here and in the LinkedIn DOE group. Some of the additional items next time would be adding salt to the water, having cooking/cooling time and temperature as continuous factors (which is made all the more interesting after discovering fascinating material such as this blog post about soft-boiled eggs).

What factors would you like looked at next time? Do you have any other ideas for experiments not involving eggs that you would like to see? Please leave a comment to let me know!

Post a Comment

Using JMP Scripting Language (JSL) to collect iTunes use data

I was looking for good data to demonstrate the new Query Builder, which is coming in JMP 12 in March. The Query Builder is a modern SQL database query tool that helps users discover interesting data and make correct queries that are repeatable and sharable.  The data needed to be different from the sample data that ships with JMP. It needed to be a set of tables, which is how many of our customers store their data. And this set had to have a defined relationship among the tables, each describing a different topic of interest.

Where should I look for data that contained multiple topics and was rich enough to show how the software could speed up statistical discovery? It occurred to me that many of my co-workers use iTunes to listen to music and that they may be generating interesting data from their listening.

With a little research on the web, I found that iTunes generates an XML file to make music and playlists available to other applications.  This sounded like a good fit for JMP. And because I am a data analysis geek, looking at iTunes data in JMP seemed fun, and I could get the data I needed.

I wrote an XML parser for the iTunes file using JSL. I also wrote a brief survey in JSL to collect demographic data from the people who were willing to share their iTunes data. The project broke neatly into four parts:

  1. Reading XML with JMP
  2. Improving the performance of JSL using the Performance Profiler in the JSL Debugger
  3. Using the Application Builder to construct the survey
  4. Analyzing the iTunes data

Reading XML with JMP

We often get the question, “Does JMP open XML Files?” This is a tricky question; XML is a text file that contains information organized in an arbitrary tree of “tags.”

Let’s take a look at a simple XML example containing data for two variables, X and Y, with three rows of data.

Here's an XML data example:

<table name='fromxml'>
	<col name='x'>[1 2 3]
	</col>
	<col name='y'>[11 22 33]
	</col>
</table>

Here's a simple JSL example to parse the XML data:

example = load text file("xml data example.xml");
 
Parse XML
 ( example,
	On Element
	 ( "table", 
	    Start Tag( New Table( XML Attr( "name" ) ) ) 
	  ),
	On Element
	 ( "col", 
	    End Tag ( New Column( XML Attr( "name" ), 
            Set Values( Parse( XML Text() ) )   ) )
	  )
  );

Notice the three key elements used in the JSL above. The On Element syntax indicates what JSL expression to execute when the “table” or “col” string is found within the XML “tag” denoted by the symbols. The Start Tag and End Tag indicate what JSL expression should be executed at the start of the “table” or “col” tags or at the end of the tags. In this case, a New Table expression is executed whenever a “table” tag is started, and a New Column expression is executed whenever a “col” tag is ended (by a tag). Running the New Column and Set Values at the end tag ensures that the values for the column have all been read before setting values.

Here’s the resultant JMP data table:

itunes_data_table

Apple iTunes XML

Now let’s take a look at how Apple stores iTunes data in XML. The XML is in a format based upon Apple’s “plist” data structure. I found a document on the Apple website that gave the basics of the file elements.

Here is a snippet of the iTunes Library.xml file that is generated by my use of iTunes. The key tag is used as a generic trigger for what could be a column name or to indicate that a new track is coming.  The other tags (integer, string, etc.) denote observations in the data and give the types of data.

<key>2739</key>
<dict>
    <key>Track ID</key><integer>2739</integer>
    <key>Name</key><string>Fortune Plango Vulnera</string>
    <key>Artist</key><string>Carl Orff</string>
    ...
</dict>

Here is a short JSL program I wrote that seemed to work fine with a reduced set of my iTunes Library.xml file. To create the reduced set of data, I simply opened the .xml file (mine is 14MB) in a text editor and copied the first few tracks worth of data.

iTunespath = "~/Music/iTunes/iTunes Library short.xml";  
 
// Load the xml file into memory in the Global 'iTunespath'
cd_file_contents = Load Text File( iTunespath );
 
//Create an empty data table to hold the raw data
raw dt = New Table( "iTunes Data raw" );
raw dt << New Column( "names", character );
raw dt << New Column( "Values", character );
 
//Parse the iTunes xml
Parse XML( cd_file_contents,
	OnElement( "key", Start Tag( If( Row() > 0, raw dt << Add Rows( 1 ) ) ), end tag( Column( raw dt, "names" )[Row()] = XML Text() ) ),
	onElement( "integer", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "real", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "date", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "data", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "string", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "true", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "false", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	End Tag( row++ )
);  //end parse of the iTunes xml

Here is what the program produced:

itunes_JMP_data_table2

Notice that there is some data at the beginning that I do not need, but I can clean that up later.

Now, the big question is, will the script scale to my 14MB file? Execution time, by my watch, was 1 minute 45 seconds.  In my next blog post, I will show how to use the JSL Debugger’s Performance Profiler to find out how to speed up my code.

Post a Comment

An eggciting designed eggsperiment

What's the best method for getting hard-boiled eggs that are easy to peel and attractive? (Photos by Carroll Co)

What's the best method for getting hard-boiled eggs that are easy to peel and attractive? (Photos by Caroll Co)

A typical scene in my kitchen: I make a batch of hard-boiled eggs with the hope of an easy peel and a beautifully cooked center. But when I sit down to enjoy my egg, I find that, sadly, it’s not so easy to peel – or I have discoloration around the yolk (or worse yet, sometimes both occur).

Here's how I've been preparing my hard-boiled eggs: I start with the eggs in a pot of cold water. Then, I bring the pot to a boil, remove it from the heat and cover the pot for 12 minutes. After a recent disappointing experience with both overcooked and hard-to-peel eggs, I decided to investigate further in a quest to make better hard-boiled eggs.

My Internet search revealed that almost everyone claims to have a foolproof way to make hard-boiled eggs, but a quick browse through comments shows mixed results. Some common themes and questions appear, so it sounded like the perfect opportunity to use a designed experiment to separate fact from folklore.

For a first try at this eggsperiment, my budget for runs was two dozen eggs – same size/brand, purchased two weeks apart. Perhaps in a future experiment, I will use more eggs, but I wanted the peeler (my wife) to be blinded from knowing how the egg was prepared. Since I wasn’t going to be doing the peeling, 24 eggs seemed to be the limit of asking for help from my wife. I also quailed at the thought of having to eat so many egg salad sandwiches in a short period of time.

While most cooking methods for hard-boiled eggs start with cold water, a recent blog post had me intrigued about putting the eggs directly into boiling water.

So I ultimately decided on the following factors to study:

  1. Cooking method (start with cold water or put into boiling)
  2. Age of the egg (purchased two weeks ago or newly purchased)
  3. Cooling method (ice bath or cold tap water)
  4. Pre-cool crack (yes or no)

The pre-cool crack indicates whether I cracked the egg before using the cooling method in 3. If you’re familiar with design of experiments, you may recognize that not all of these factors are equally easy to change. For factors 2-4, I can assign these on an egg-by-egg basis (that is, they’re easy to change). For the cooking method, it is much more convenient if I cook more than one egg at a time. Thus, cooking method is a hard-to-change variable, or whole plot variable in the parlance of split-plot designs.

This means that the estimate of the effect of the cooking method is based on the number of batches I cook rather than the number of eggs. I ultimately decided on six batches of four eggs, or six whole plots. While this gives me only three batches for each cooking method, I hoped that I would get at least some indication whether changing the cooking method mattered. For the easy-to-change factors, I’m more likely to detect the important effects because of the number of eggs I have.

For the cooking method, I cooked one batch at a time in the same pot. I used the same amount of water in each batch (2 cups). The start with cold water was heated on medium until the water reached 188 degrees Fahrenheit, at which point I turned off the heat and covered the pot for 10 minutes. For the boiling method, I waited until the water just started boiling and put the eggs in for 11 minutes, while reducing the temperature to medium so that the water was simmering.

The Responses

The responses I measured were peel time, attractiveness of the egg and ease of peel.

The responses measured were peel time, attractiveness of the egg and ease of peel.

My main purpose here was to find out about ease of peeling, but there is still the aspect of whether or not a peeled egg is aesthetically pleasing. The final responses measured were:

  1. Peel time (in seconds)
  2. Attractiveness of the egg (rating from 1 to 5)
  3. Ease of peel (rating from 1 to 5)

While 1 and 3 seem similar, the peel time is likely to be very noisy and may not always pick up on frustration that can arise while peeling, which ease of peel should capture.

The Experiment

Now it’s time to design the experiment. The first step is to enter my responses and factors in the Custom Design platform, which is the first item under the DOE menu. We get something that looks like this:

eggs_factors_responses

Notice that all of the factors are set to “Easy” under the Changes column in the Factors table. To change cook start to be hard-to-change, click on the “Easy” under the Changes column for the cook start factor and select “Hard” from the list that comes up.

eggs_make_HTC

If we click the Continue button at the bottom, it’s time to set up the rest of the design. By default, the model is set to be able to estimate the main effects. With 24 eggs, we should be able to look at two-factor interactions, so I select Interactions -> 2nd to have the Custom Designer ensure the design can estimate all the main effects and two-factor interactions.

eggs_make_interactions

Finally, we need to set up the appropriate run size. Recall that we want six batches of four eggs (24 eggs total). Under the Design Generation tab, this means we set the Number of Whole Plots to 6, and the Number of Runs to 24.

eggs_design_generation

Clicking the Make Design button, and the experiment is ready to go. The design will look something like this:

egg_final_design

Any predictions as to the results? I’ll reveal the results next week.

Post a Comment

Scagnostics JMP Add-In: A new way to explore your data

Scagnostics, scatterplot diagnostics, was discovered by John and Paul Tukey and later popularized by Leland Wilkinson in Graph-Theoretic Scagnostics (2005). These analyses were redefined in High-Dimensional Visual Analytics: Interactive Exploration Guided by Pairwise Views of Point Distributions (2006).

The beauty of scagnostics is the ability to visually explore a data set. JMP has the inherent feature called Scatterplot Matrix (SPLOM), which allows the user to simultaneously compare the relationship between many pairs of variables.

However, SPLOMs lose their effectiveness when the number of variables gets too large. Figure 1 shows a portion of the SPLOM report.

Figure 1. SPLOM for Drosophila Aging Data 

Figure 1. SPLOM for Drosophila Aging Data

Let's explore the Drosophila Aging data (in JMP Sample Data), which has 48 observations and 100 numeric variables. Notice in Figure 1 the substantial number of variables in this data set. This can be overwhelming, and our ability to visually observe the data is flawed. In Figure 1, only about 15 percent of the actual SPLOM is shown. In a world where data sets are growing every day, we need to be able to extract meaningful information from the relationships between our variables. That’s where scagnostics comes in! Scagnostics assesses five aspects of scatterplots: outliers, shape, trend, density and coherence.

This summer, I wrote a JMP add-in (which you can download from the File Exchange if you have a free SAS profile) that allows you to interactively explore data using nine graph-theoretic measures. The add-in combines three current features of JMP: Distribution, Scatterplot Matrix and Graph Builder. Each point in the scatterplot represents a 2D scatterplot. When you select a point in the scatterplot matrix in the bottom left, Graph Builder shows the respective scatterplot for the two variables in the bottom right.

As an example, one point has already been selected in the SPLOM in Figure 2. The corresponding variables are log2in_Tsp42Ej and log2in_CG6372. For this pair of variables, there are two discernible clusters of data. This is noted in a high Clumpy value.

Figure 2. Scagnostics for Drosophila Aging Data – Clumpy Example

Figure 2. Scagnostics for Drosophila Aging Data – Clumpy Example

Figure 3 below shows us that if we select a point with a high monotonic value, we can observe a clear association and a strong linear relationship between the variables,  log2in_alpha_Cat and log2in_CG3430der.

Figure 3. Scagnostics for Drosophila Aging Data – Monotonic Example

Figure 3. Scagnostics for Drosophila Aging Data – Monotonic Example

Another key aspect of Scagnostics is outlier detection. Review the Graph Builder plot in Figure 4 below. When we inspect the two variables log2in_CG18178 and log2in_BcDNA_GH04120, we see two data points that visually appear to be outliers. Results with a substantial outlying value, as well as a relatively high skewed value, support the notion that this pair of variables has major outliers overall.

Figure 4. Scagnostics for Drosophila Aging Data – Outlying Example

Figure 4. Scagnostics for Drosophila Aging Data – Outlying Example

As we compare the original SPLOM report in Figure 1 to the recursive SPLOM and Graph Builder reports in Figures 2, 3 and 4, we uncover much more informative and enlightening analyses.

Now it’s time to download the Scagnostics Add-In and begin your own exploration!

Post a Comment

JMP add-in measures distance between 2 points

JMP has many tools and features that allow you to interactively explore and analyze data. But what if you just want to measure the distance between two points? You could compute the distance with the standard distance formula, but what if the coordinates are latitude and longitude pairs? The distance formula would not be a lot of help then. Thanks to the extensibility of JMP, I was able to develop a new add-in to do one simple task: measure distance. The add-in, called Distance Tool, is an interactive tool that enables you to perform quick and effortless measurements.

In addition to Euclidean distance, the Distance Tool has various distance metrics for you to select from. The tool can compute:

  1. Euclidean distance
  2. Absolute difference between coordinate components
  3. Taxicab distance
  4. Great-circle distance
  5. Other various distance metrics

The tool first finds all graphs that contain objects with measurable distances (Figure 1). The graphs are then assigned a unique key value based on the window title and position. You can then make measurements in the current graph by simply clicking and dragging (Figure 2).

The original graph

Figure 1: The original graph

 

Euclidian measure between two points

Figure 2: Euclidian measure between two points

Now what if you have a graph with latitude and longitude as its axes? No problem. The Great-Circle metric allows you to measure geographic distances between geodesic coordinates (Figure 3).

Figure 3

Figure 3

With the tool’s custom scale feature, you can even set your own scale for graphs with arbitrary axes.

Figure 4

Figure 4

The tool even allows you to trace out a path or polygon shapes, as in the image of an animal footprint (Figure 4).

All the measurements are recorded in separate data tables to give you the ability to store, analyze and organize the information you want (Figure 5).

Figure 5

Figure 5

The tool’s various options and features make it a powerful add-in for JMP.

You can download the Distance Tool add-in from the JMP File Exchange.

Post a Comment

Tips for learning JMP Scripting Language (JSL)

After using JMP in my AP Statistics course this past year, I realized what remarkable software it is. With just a few clicks, JMP could help me complete my homework!

In addition to being a homework helper, JMP was capable of handling large data sets, executing every type of analysis or test I’d ever heard of, creating beautiful custom graphs, and countless other features for the more advanced statistician. I was captivated by the software.

Having (self-proclaimed) proficiency in one computer programming language, C#, and brief exposure to Base SAS, I wanted to add to my arsenal of programming languages during my summer internship at JMP. JMP Scripting Language (JSL) was the perfect choice, incorporating both my love of JMP and my interest in programming. This summer, I used JSL to analyze the popularity of some of our marketing assets.

These are a few things I found useful during my plunge into JSL:

Experimentation with JMP: I took time to explore and play with the functions in JMP. That helped me learn some JSL syntax because many of the JSL functions are similar to or abbreviated versions of their JMP interface equivalents. 

Experience with Programming Languages: My knowledge of C# turned out to be helpful for learning JSL as the two shared several characteristics, including the use of loops and Boolean expressions. However, if you have not had exposure to other programming languages, you can still learn JSL.

Use of a Book: I read and replicated the code in the book Jump into JMP Scripting by Wendy Murphrey and Rosemary Lucas. The book explains the basics of what scripts are, how to obtain, run and edit scripts, and brief JSL statements. It also has frequently asked questions with correctly coded answers. All of this information helped jump-start my learning JSL. I used the FAQ section as exercises – reading, copying and running the code. This helped me to memorize the syntax and its uses.

Scripting Guide PDF in JMP: Although I learned the basics of JSL from a helpful introductory book, I was still not quite ready to start writing scripts freehand. So I sifted through the Scripting Guide PDF book under the Help menu in JMP as my next step. The Scripting Guide provided the details I needed to ease into coding, specifically syntax rules.

Viewing, Editing and Experimenting with Scripts Created Using JMP Commands: Once I had a firmer grasp of JSL and felt ready to try writing scripts, I began by running analyses and viewing the scripts. This enabled me to learn by editing how I wanted my reports to look, adding functions onto the scripts and experimenting with new JSL syntax before beginning to write scripts independently.

Capturing Scripts

Sample Data Sets: While I was editing scripts generated by JMP, I used the sample data sets that come with the software. The diverse sample data sets were perfect for trying out different analyses and experimenting with scripts.

JSL Syntax Reference PDF in JMP: When I felt ready to write scripts completely on my own, I used the JSL Syntax Reference PDF under the Help menu in JMP. It’s an excellent resource for learning and searching for JSL functions.

Searching the Help Index: Another helpful resource was the Help Index under the Help menu in JMP. I used it to learn more about the functionalities of JSL while writing scripts.

Help Menu

Another resource I didn’t use but is helpful is the Scripting Index under the Help menu in JMP. It has a dictionary of all JSL syntax and also shows example code of how to use each function, which is useful for learning new functions.

After learning JSL, I believe I have a deeper understanding of how JMP works and the ways I can use JMP. It has been a very enjoyable experience for me, and hopefully for you, too!

Post a Comment

John Sall on less data drudgery

One of the guiding principles for developers of JMP software is to keep the user “in flow.” They try to minimize the disruptions to the discovery process so you can stay focused on solving the problem at hand, rather than having to take multiple steps to overcome a data or analysis obstacle. The goal is to flow like water instead of drudging along.

This year, JMP celebrates 25 years of designing for an ever-smoother user experience. With 25 years of enhancements, we have many examples of capabilities that speed up discovery: one-click bootstrapping, Prediction Profiler, Assess Variable Importance, Fit Y by X, optimal designs, Graph Builder, Recode, Model Comparison — the list goes on and on.

The next version of JMP and JMP Pro, scheduled for release in March 2015, will bring more such clever capabilities to make your analytic journey even smoother. Since JMP 12 will be launched six months after this year’s Discovery Summit, John Sall, Co-Founder and Executive Vice President of SAS — and creator of JMP— will devote his keynote speech to providing a sneak peek of some of the new features. Not to give too much away, but here are a few things to pique your interest about what he might share:

  • Easier import, access and manipulation of data — including big data.
  • Many new data utilities to compress, bin and recode. (The substantial recode enhancements along with the Excel Import Wizard for the Mac are among my most-used new features lately.)
  • New modeling utilities — smart ways to explore and better handle outliers and missing data, new validation options and predictor screening.
  • New and enhanced analysis methods, several of which collapse several steps into one.
  • Easier sharing of analysis results and data movies.
  • Notice anything different about this JMP table of the keynote speakers at Discovery Summit 2014?

Screen Shot 2014-07-25 at 10.18.35 AM

Consider this a tease list as there are many more enhancements and capabilities coming in JMP 12 to decrease your data drudgery and augment ways you can share insights. Hope you'll join us to hear from John Sall at Discovery Summit next month!

Post a Comment

Reliability regression with binary response data (probit analysis) with JMP

Many readers may be familiar with the broad spectrum of reliability platforms and analysis methods for reliability-centric problems available in JMP. The methods an engineer will select – whether to solve a problem, improve a system or gain a deeper understanding of a failure mechanism – are dependent on many things. These dependencies could include whether the system or unit under study is repairable or non-repairable. Is the data censored, and if so, is it right-, interval-, or left- censored? What if there are no failures? How can historical data on the same or similar component be used to augment understanding?

I’d like to address a data issue specific to the response variable. The Reliability Regression with Binary Response technique can be a useful addition to the tools that reliability engineers or medical researchers use to answer critical business and health-related questions. For instance, when the response variable is simply counts of failures, rather than the much more commonly occurring response that is continuous in nature, alternate analytical procedures should be used. For example, say you are testing cell phone for damage due to dropping phone onto floor. You may test 25 phones each at various heights above the floor, e.g. 5 feet, 8 feet etc. Then you simply record the number of failures (damaged phones) per sample set. In a health related field, you may want to test the efficacy of a new drug at differing dosages, or compare different treatment types and record the patient survival counts.

The purpose of this blog post is to help you understand how you can perform regression analysis on reliability and survival data that has counts as the response. This is known as Reliability Regression with Binary Response Data, sometimes referred to as Probit Analysis. The data in Table 1 is a simple example from a class I attended at the University of Michigan a number of years ago. The study is focused on evaluating a new formulation of concrete to determine failure probabilities based on various load levels (stress factor). A failure is defined as a crack of some specified minimum length. Some questions we would like to answer include the following:

  • For a given load, say 4,500 lbs., what percent will fail?
  • What load will cause 10%, 25%, and 50% of the concrete sections to crack?
  • What is the 95% confidence interval that traps the true load where 50% of the concrete sections fail?
Table 1: Concrete Load Study

Table 1: Concrete Load Study

The data contains three columns. The Load column is the amount of pressure, in pounds, applied to the concrete sections. Trials are the number of sections tested, and Failures is the number of sections that failed as a result of crack development under the applied pressure. We will use JMP’s Fit Model Platform to perform the analysis. Depending on the distribution selection you choose to analyze your data with, I refer you to Table 2 below which will assist you in selecting the correct Link Function and appropriate transformation, if required, for your x variable.

Distribution Link Function Transformation on X
Sev Comploglog None
Weibull Comploglog Log
Normal Probit None
Lognormal Probit Log
Logistic Logit None
Loglogistic Logit Log
     

Table 2: Depending on your distribution, this table will guide you to the appropriate Link and Transformation selections in the Fit Model Dialog.

Open the data table and click the JMP Analyze menu, then select Fit Model. Once the dialog window opens, select the Load and Trials column and add to the Y dialog. Add Load as a model effect, and then highlight load in the Construct Model Effects dialog, click the red triangle next to Transform and select Log. Your model effect should now read Log(Load) as seen in the completed Fit Model dialog screen below. Select Generalized Linear Model for Personality, Binomial for Distribution since we are dealing with counts and Comp LogLog for the Link Function since we are using a Weibull fit for this example.

Figure 1: Completed Fit Model Dialog for fitting a Weibull in our example.

Figure 1: Completed Fit Model Dialog for fitting a Weibull in our example.

 

Next select Run. You will see the output in Figure 2:

Figure 2: Initial output with Regression Plot and associated output. Note the Log(Load) parameter estimate of 4.51 is the Weibull shape parameter.

Figure 2: Initial output with Regression Plot and associated output. Note the Log(Load) parameter estimate of 4.51 is the Weibull shape parameter.

So now let’s begin to answer the questions we posed at the beginning. To find out what percent of sections fail at a load of 4,500 lbs, go to the red triangle at the top next to the output heading Generalized Linear Model Fit. Select Profilers > Profiler. See Figure 3. Scroll down in the report window and drag the vertical red dashed line to select 4,500 for load, or highlight the load value on the x-axis and type in 4,500. You will see that at a load of 4,500 pounds, we can expect a 45% failure rate. The associated confidence interval may be of interest as well. With this current sample, results could range from as small as 29% up to as high as 65%.

Figure 3: Prediction Profiler with a load of 4,500 pounds.

Figure 3: Prediction Profiler with a load of 4,500 pounds.

 

Now, to find out what load will cause 10%, 25%, and 50% of the concrete sections to crack, we again go to the red triangle at the top of the report and select Inverse Prediction. You will see the following dialog in Figure 4. Type in 0.1, 0.25 and .50 to obtain results for 10, 25 and 50 percent, respectively.

Figure 4: Dialog for Inverse Prediction

Figure 4: Dialog for Inverse Prediction

Scroll down in the report where you will find the Inverse Prediction output. See Figure 5. The predicted load value, in pounds of pressure, for the B10 is 3055, B25 is 3817and B50 is 4639. A corresponding plot, which includes a visual representation of the confidence intervals, is also provided.

Figure 5: Inverse Prediction output.

Figure 5: Inverse Prediction output.

Finally, we would like to find the 95% confidence interval that traps the true load where 50% of the concrete sections fail. Again, refer to the Inverse Prediction output in figure 5. We find that a lower bound of 3,873 up to an upper bound of 5,192 traps 95% of the true load where 50% of the sections fail.

JMP has numerous capabilities for reliability analysis, with many dedicated platforms such as Life Distribution, Reliability Growth and Reliability Block Diagram, to name just a few. However, as you can see here, you can perform other reliability and survival analysis methods that using other JMP analysis platforms.

Post a Comment

Combining city & state information on map in Graph Builder - Part 1

Showing a map within Graph Builder in JMP has become a popular way to visualize data. This is partly because you can color the geographic area of interest based on a variable in the data table (Figure 1).

CJK_Blog_07-2014_total_crime_Figure-1

Figure 1

Or you can plot cities as points if you have latitude and longitude information (Figure 2).

CJK_Blog_07-2014_pollution_Figure-2

Figure 2

But what if you want to combine both?

A customer wanted to do exactly that. This JMP user was trying to show specific cities with states of interest while coloring those states on a particular property that was in a data table. On top of that, the JMP user wanted to be able to hover over the city to display its name and additional city information.

No problem! I’ll show you how. In my example, I'll use city pollution and population data found in Cities.jmp data set (found in the Sample Data under Help in JMP), and I'll join it with some state-level crime data (total crime, in this case). I'll use the crime data from CrimeData.jmp data set, which is also found in the Sample Data directory in JMP. The goal here is to show crime rate for each state in a given year and be able to see pollution levels for a given city in that state. The purpose is to explore a potential link between the two without plotting too much information.

The desired graph looks like this (Figure 3):

CJK_Blog_07-2014_combined_Figure-3

Figure 3

To create the desired graph, I will need to overlay the cities in their geographic location as points on top of the states, while at the same time making sure that only the states are colored. To make the graph, you would do the following, in order:

  1. Drag Latitude and Longitude to the Y and X areas, respectively.
  2. Drag State to the Map Shape Zone.
  3. Remove Smoother Graph Type by clicking its icon on the top Graph Type Toolbar.
  4. Drag State Total Crime Rate to the Color Zone.
  5. Drag and drop Points Graph Type onto the plot.
  6. Go to the Points section (see Figure 4). Find the Variables subsection, click on the “…” button and uncheck Color and Map Shape (see Figure 4). This option is needed to remove the coloring from the points and to allow them to center on their geographic coordinates instead of being centered on the state.
CJK_Blog_07-2014_step-6_Figure-4

Figure 4

For presentation purposes, I need to remove the axes (they do not add any information here) and change the color of the gradient representing total crime rates to something that is sequential instead of divergent (so I display the information in a more informative way). Right-clicking on each axis and removing tick marks and label gets rid of most of the axis. Next, I right-click on the center of the graph and then go to Graph>Border to uncheck the left and bottom border. If Latitude and Longitude still appear, I can select the text and delete it. I now have the graph/map depicted in Figure 3, but I am not done yet.

I wanted to be able to hover over each city and see the city name and additional meta-data/information found in the other columns. To make this happen, I:

  1. Select the columns of interest on the data table.
  2. Right-click on one of the column headers and choose Label/Unlabel (see Figure 5).
CJK_Blog_07-2014_label-table_Figure-5

Figure 5

When I hover the cursor over the city of interest, I get the information I want. I now have the desired output and behavior, as in Figure 3.

Now I can explore each city of interest without having to plot all the information on the same graph!

However, what if I wanted to show more information about the cities on the map? How would I show something like population size for each city and one of the pollution columns in the map without having to hover over each city? Stay tuned – the answer to these questions will come in a follow-up blog post.

Post a Comment