Creating a JMP 12 Recode script from a Lookup Table containing original and new values

I recently used a JMP add-in I wrote to import my complete set of BodyMedia FIT food log data files, including data from Dec. 21, 2010, through the last day I logged my meals in that software on March 29, 2015. My final data table contained 39,942 rows of food items names. When combined with the 60 days (1,551 rows) of data from my MyFitnessPal food log I have been keeping since I switched devices from BodyMedia to a FitBit Charge HR late last March, I have nearly 41,000 rows of food log data!

One essential step in my data preparation process for this data table has been to clean up the food item names by consolidating similar items under a single value. Without this cleanup step, I end up with lots of unique item names that really represent the same food item. For example, I ate a variety of different dark chocolates, and indicated the correct brand names where available, but often had to substitute as many specific items were not available in the BodyMedia database. This made exact names even less meaningful, so I felt it made sense to aggregate all varieties under a single name (“Candy Bar, Dark Chocolate”).

I cleaned up the food log data table that I had presented at Discovery Summit last fall using the Recode platform in an early adopter install of JMP 12. At the time, Recode lacked the new Save/Load Script options that I now rely on for all my data cleanup projects. Instead, I recoded my items and created a lookup table that listed the original and recoded names for each unique item that appeared in my food log. I updated this lookup table each time I added new items to my food log. A section of my lookup table is shown below.

Lookup table

Before I left to attend the QS15 conference in San Francisco last month, I took advantage of JMP formula columns to create a script that could be reloaded into the Recode dialog to recapture my item groupings so that I could update them. This made it simple to recode new food item names and save the updated script in case I needed to tweak my work in the future.

Since I found this script creation trick so useful, I thought I would share it. If you have used a similar lookup table approach in the past, you may be thinking it would be a lot of work to recreate your approach in Recode. Using this example, you too can transition from a lookup table to a JMP 12 Recode script that you can reload in the Recode dialog and update to accommodate new data. In the end, you can easily recreate a new version of your lookup table from your final data table containing the original and recoded values.

I needed to structure my script to be identical to a standard Recode script, so I opened one I had saved for a related project. To save a sample Recode script, you can create some groups in an open Recode dialog window, then click on the platform’s red triangle and choose Script > Save to File.

I opened the JSL file containing the Recode script, and observed that it began with a call to begin updating the table, and a set of match statements that paired the original item name with the recoded name.

Top of script

At the end of the script, it included the column name and end to the data update.

End of script

To fill in the list of paired original and recoded names that fell between the beginning and ending sections, I created a new formula column in my lookup table containing quoted versions of the original item name and the recoded item name.

Recode Script Syntax

I copied and pasted the formula column into a script window, and then added the correct statements and variable names at the top and bottom of the script.

When I reloaded it in the Recode dialog, my script created groupings like this:

Recode reloaded script2

I could then add to existing groups, make new ones or edit the representative group names. I used a similar approach to create a version of my script for recoding items into food groupings, since I had included food groups in my lookup table and wanted to group my new items.

One gotcha I encountered was that one food item name included a double quote character (“) that I had to replace so that it didn’t interfere with the quoted strings of the item names.

I hope you'll try out the JMP 12 Recode platform if you haven't already. Whether you are starting a new project or converting an old one to use the platform, I think you will be pleased!

Post a Comment

How to stack data for a Oneway analysis

The data you want to import into JMP often requires some manipulation before it’s ready to be analyzed in JMP. Sometimes data is arranged so that a row contains information for multiple observations. To prepare your data for analysis, you must restructure it so that each row of the JMP data table contains information for a single observation. In this example, you will see how to restructure data by stacking and recoding a data table in JMP. After the data is stacked, we can perform a Oneway analysis.

The data used in this example is called Fill Weights.xlsx, which is located in the Samples/Import Data folder installed with JMP. This data represents the weights of cereal boxes produced on three production lines. The goal is to stack the data in order to compare the results of the production lines and see whether they are producing approximately the same mean fill weight. Ideally, the mean fill weight of each production line will be close to the target fill weight.

In Fill Weights.xlsx, the production lines are arranged in three sets of columns. In your JMP data table, you need to stack the data from the three production lines into a single set of columns. This way, each row represents the data for a single part.

The figure below shows the initial format of the data in Excel. The weights of cereal boxes are randomly sampled from three different production lines.

Figure 1.1 Excel Spreadsheet- Unstacked

Excel Spreadsheet - Unstacked

The ID columns contain an identifier for each cereal box that was measured. The Line columns contain the weights (in ounces) for boxes sampled from the corresponding production line.

The target fill weight for the boxes is 12.5 ounces. Although you are interested in whether the three production lines meet the target, initially you want to see whether the three lines achieve the same mean fill rate. After the data is set up properly, you can conduct a Oneway analysis to test for differences among the mean fill weights.

Import the Data

To get started, first you must import the data. Select File > Open in JMP and select Fill Weights.xlsx from the Samples/Import Data folder.

In the Excel Import Wizard preview, row 1 contains information about the table, and row 2 is blank. The column header information starts on row 3. Also, rows 3 and 4 both contain column header information. Change the settings in the Excel Import Wizard so that the column headers start on row 3 and the number of rows with column headers is 2.

Here’s what the data table looks like once you’re finished editing in the Excel Import Wizard:

Figure 1.2 Imported Data Table

Imported Data Table

The data is placed in seven rows, and multiple IDs appear in each row. For each of the three lines, there is an ID and Weight column, giving a total of six columns.

Notice that the “Weights” part of the ID column name is unnecessary and misleading. You could rename the columns now, but it will be more efficient to rename the columns after you stack the data.

Stack the Data

Reshape the data so that each row in the JMP data table reflects only a single observation. This requires you to stack the cereal box IDs, the line identifiers and the weights into columns.

To do this, select Tables > Stack to place one observation in each row of a new data table. Because you are stacking two series, ID and Line, this is a multiple series stack. In the Stack window, select the Eliminate Missing Rows option to get rid of any rows with missing data. This is the completed Stack window:

Stack Window

Stack Window

The stacked data table contains columns labeled Data and Data 2. These columns contain the ID and Weight data. Delete the Label column since the entries were the column headings for the box IDs, which you don’t need in your table.

To make the data table more understandable, rename each column by double-clicking on the column header. In this example, the columns are renamed as follows:

New column headers

New column headers

As mentioned previously, you can exclude the “Weights” part of the Line column to make the table more readable. Click the Line column header to select the column and select Cols > Recode.

Change the values in the New Values column to match those in the figure below.

Recode columns

Recode columns

After recoding and selecting Done > In Place, your new data table is now properly structured to analyze in JMP. Now, each row contains data for a single cereal box. The first column gives the box ID, the second gives the production line, and the third gives the weight of the box.

Completed stacked data table

Completed stacked data table

Conduct the Oneway Analysis

Now that your data is stacked, we can conduct a Oneway Analysis of Variance to test for differences in the mean fill weights among the three production lines.

To do this, select Analyze > Fit Y by X and assign Weight to Y, Response and Line to X, Factor. Once the plot is created, select Means/Anova from the red triangle menu.

The mean diamonds in the plot show 95% confidence intervals for the production line means. The points that fall outside the mean diamonds are not outliers. To see this, add box plots to the plot. From the red triangle menu, select Display Options > Box Plots.

Box plots

Box plots

Notice all points fall within the box plots boundaries; therefore, they aren’t outliers.

Let’s look at the All Pairs, Tukey HSD comparison results. From the red triangle menu, select Compare Means > All Pairs, Tukey HSD. In the plot, click on the comparison circle for Line C. Here are the results:

Weight by line

Weight by line

In the Analysis of Variance report, the p-value of 0.0102 provides evidence that the means are not all equal. Compare each group means visually by examining the intersection of the comparison circles. The outside angle of intersection tells you whether the group means are significantly different. If the intersection angle is close to 90 degrees, you can verify whether the means are significantly different by clicking on the comparison circle to select it.

Groups that are different from the selected group appear as thick gray circles. Notice Line C is selected and appears red (in JMP default colors), and Line B appears as thick gray. This means Line B is not in the same group as Line A, therefore their means are significantly different. The mean for Line C differs from the mean for Line B at the 0.05 significance level. Lines A and B do not show a statistically significant difference.

In addition, the mean diamonds shown in the plot span 95% confidence intervals for the means. The numeric bounds for the 95% confidence intervals are given in the Means for Oneway ANOVA report. The plot indicates that the confidence intervals for Lines B and C do not contain the target fill weight of 12.5: Line B appears to overfill and Line C appears to underfill. For these two production lines, the underlying causes that result in off-target fill weights should be addressed. Perhaps equipment needs replacing, or maybe the lines need adjusting.

With data structured similarly to the data used in this example, whatever your case might be, stacking it for a Oneway analysis is a great way to compare your results.

Post a Comment

Toy cars and DOE: The results

13 diecast cars of different colors

Find out what happens when my father and I use DOE to figure out a better way to dye toy cars. (Photo courtesy of Caroll Co)

Last time, I gave a Father’s Day tale of a father and son’s quest in dyeing toy cars. This time, I’ll share our results, but first remind you of the factors we studied:

  • Car: A/B/C/D
  • Dye type: Solid/liquid
  • Dye amount: low/high (2 Tbsp liquid/4 Tbsp liquid per half cup, or 1 tsp dry/2 tsp dry per half cup)
  • Length of time: 15 mins/30 mins
  • Dye color: red/blue/yellow
  • Vinegar: yes/no

The Response

When we were discussing this experiment, it wasn’t obvious the best way to measure how “well” a car was dyed. One thought was to photograph the cars and measure the change on the RGB scale, but this would require a lot of work with photography, and we usually had some sense as to the goodness of the coloring just by looking at a car. If we have a subjective response, the cars can all be compared side-by-side to make them relative to the others. In the end, we had three people give a forced ranking on the 17 cars with 1 being the best and 17 the worst, where the basis of comparison was how close the advertised color the car was. The final response was the average of these ranks.

The Cars

In setting up the analysis, it also became apparent that car should be a random effect instead of a fixed effect. If we dye cars in the future, we will most likely not be using the same castings used in this experiment, and we want to find a dyeing method that we can use in the future as new castings are released. When I go into the Fit Model platform, I can do this by selecting Attributes -> Random Effect with Car selected in the Model Effects list.

The Results

It turned out that it was enough to fit the main effects model for this data. I tried adding interactions, but nothing came up as warranting further investigation. It turned out that the amount of dye and length of time didn’t seem to make much of a difference, but using the solid dye showed a noticeable improvement, and that the different color of dyes have varying effectiveness. In particular, we found that the blue dye was the most effective, followed by yellow, and that it was difficult to get good coloring with red.

cardye3

Final Thoughts

My father and I were both surprised at the results, in that many others dyeing these cars have been moving toward the liquid form of the dye. Admittedly, this is in part due to convenience, as mixing the liquid form is much more forgiving with splatters during mixing (a word to the wise – if you try this at home, use lots of newspaper). I think there’s still room for improvement, so I’ve started to pick up some cars for a future experiment. Any readers have experience with their own dyeing of objects? Please leave me a comment and let me know your thoughts. Thanks for reading!

Post a Comment

Seeing differently with Beau Lotto

36-lottoAfter a photo of “the Dress” went viral on the Internet this past February, JMP developer John Ponte wrote an entertaining and informative blog post, What color is The Dress? JMP can tell you!

We had shared this blog post with Beau Lotto, neuroscientist, human perception researcher, and Director of Change Lab at the University College London, before his keynote speech at our first European Discovery Summit this past March.

The Dress is still getting attention. The July/August issue of Scientific American Mind features the article Unraveling “The Dress”, in which a popular color perception example created by Beau Lotto and Dale Purves is shown to illustrate the importance of context in color perception. An interactive version of that image is available on lottolab.org.

While there is no mention of The Dress in Beau Lotto’s Discovery Summit plenary speech (until the Q&A), you can see and hear about his fascinating research on perception and innovation if you tune in to the July 15 Analytically Speaking webcast. As a result of watching his talk, perhaps you will:

  • See science as play and a way of being.
  • Welcome uncertainty (everything interesting begins with doubt).
  • Learn that innovation has two parts….

His talk is highly enjoyable for anyone who is fascinated by the workings of our brains (as I am). If you can’t catch the premier of this webcast, you can always view it from the archive once it's available there, usually by the following day.

Post a Comment

How to create an axis break in JMP

I’ve been asked three times this year about how to make a graph in JMP with an axis break. Before I show how, I want to ask “Why?” The obvious answer to “Why?” is “to show items with very different values in one graph,” but that’s a little unsatisfying. I want to know why they need to be in one graph. The advantage of a graphical representation of data over a text representation is that we can judge values based on graphical properties like position, length and slope. However, once we break the scale, those properties are no longer as comparable. We effectively have two separate graphs after all – which is actually how we can make such views in JMP.

Related to my “Why?” inquiry, I’ve had a difficult time finding a compelling real-world example to illustrate an axis break, so I made some hypothetical data. Say we have timing values for a series of 100 runs of some process. Usually, the process takes a few seconds per run. But sometimes there’s a glitch, and it takes several minutes. Here’s the data on one graph (all on one y scale).

TimesWhole

We can see where the glitches are, but we can’t see any of the variation in the normal non-glitch runs. Some would also object to the “wasted” space in the middle of the graph. However, those aren’t necessarily bad attributes. The non-glitch variation is lost because it’s insignificant compared to the glitch times, and the space works to show the difference. Nonetheless, if our audience already understands those features of the data, we can break the graph in two to show both subsets on more natural scales.

TimesBroken

Now we can see that the non-glitch times are increasing on some curve. The “trick” in Graph Builder is to add the variable to be split to the graph twice in two different axis slots. Then we can adjust the axes independently, perhaps even making one of them a log axis. The Graph Spacing menu command adds the spacer between the graphs to emphasize the break. It’s easier to show than explain, so here’s an animated GIF of those steps.

playAxisBreak

I skimmed a few journals looking for examples of broken axes. Here’s an example of a pattern I saw a few times for drug treatment studies where the short-term and long-term responses are both interesting. This graph is from Annals of Internal Medicine and shows two different groups’ responses to an HIV treatment.

HIV Treatment 1

Each side of the axis break uses different units of time, which fits perfectly with the idea that there are really two separate axes. One thing that bothers me about this graph, though, is the connection of the lines across the gap. Notice the difference in my JMP version:

HIV Treatment 2

With different x scales, the slopes should be different. That is, the change per week (slope on the left) should be flatter than the change per year (slope on the right) for the same transition. Fortunately, Graph Builder takes care of this for you, but it’s something to be aware of when you’re reading these kinds of graphs in the wild.

The broken line from the HIV study is an example of how an axis break can distort the information encoded by the graphic element. A more serious distortion occurs when bar charts are split by a scale break since the bars can no longer do their job of representing values with length. I’m not even going to show a picture of that. Never use a scale break with a bar chart.

When making graphs with scale breaks, make sure each part works on its own, because perceptually they really are separate graphs.

Post a Comment

Father's Day fun with toy cars and DOE

three customized diecast cars

My father and I have collected and customized diecast cars for the past 15 years. But lately, dyeing the cars has become a challenge. (Photos courtesy of Caroll Co)

With Father’s Day fast approaching, it seemed fitting that I should share a story about a father and son bonding over design of experiments (DOE) and toy cars. Full disclosure: Some (including their wives) think both the father and son in this tale are too old to be playing with toy cars.

My father and I began collecting diecast vehicles 15 years ago. To this day, whenever we go to a store that sells toy cars, it’s our first stop, regardless of what we’re shopping for. And both of us have bedrooms in our homes that have officially become toy rooms.

Back when we started collecting cars, my father and I would often customize vehicles using this simple method: We would get a vehicle that came painted white from the factory. We would prepare a popular fabric dye according to its package directions and leave the white car in the dye for 15 minutes. The car would come out looking great. But lately, when my father has tried dyeing some cars using this method, the results have been disappointing.

My parents were recently visiting us in North Carolina from their home in Canada. So my father and I figured, what better way to spend some time together than designing an experiment to see if we can find a new recipe?

The factors

Our initial thought was that it might be a matter of adjusting how much dye to use. However, there are other possibilities to consider at the same time. It may also depend on the color of dye or length of time in the liquid. The dye is now also available in both solid and liquid forms. We had used the solid form in the past, and my father had used liquid in his recent attempts. Some online searches also suggested that we should add vinegar. Of course, there’s also an issue with what car(s) to use. I was hoping to use four different castings, but it turns out that it requires some extra care, as I’ll discuss shortly.

To summarize, our factor list was as follows:

  • Car: A/B/C/D
  • Dye type: Solid/liquid
  • Dye amount: low/high (2 Tbsp liquid/4 Tbsp liquid per half cup, or 1 tsp dry/2 tsp dry per half cup)
  • Length of time: 15 mins/30 mins
  • Dye color: red/blue/yellow
  • Vinegar: yes/no

The need for covariates

As many collectors know, it’s not easy finding a particular vehicle in multiples when searching at stores. Not only are there few vehicles in white at any given time, it’s also extremely unlikely to find them in equal quantities. After enough searching, I was fortunate enough to find four different castings, with five of vehicle A, and four of B, C, and D. These quantities are close enough to balanced that I could probably find a 17-run design and relabel as appropriate, but I would prefer this to be taken care of during the design phase.

Fortunately, this is easy to accomplish through entering the cars as a covariate. Sometimes we use covariates as a means of choosing optimal subsets, but they are also useful when you have underlying set of design runs that you want the design to obey. All I needed to do in this case was to create a data table with a column for “car” and fill out 17 rows with five A’s and four B/C/D’s. When I go to DOE-> Custom Design, under Add Factors there’s an option for Covariate.

cardye1

This lets me select the column “car” and add it as a factor.

The design

After I load in the covariate, all I need to do is add the rest of the factors:

cardye2

I keep the number of runs at 17, and when I create the design, it’s accommodating the numbers of each car that I have on hand.

Next time, I’ll talk about the analysis (notice I haven’t even made mention of a response yet). But in the meantime, happy Father’s Day to all the fathers reading this!

Post a Comment

JMP Student Edition 12: Free with leading intro stats textbooks

Introductory Statistics is notorious for being one of the least popular courses required for graduation. Fortunately, modern approaches to teaching statistics are changing the perceptions and popularity of statistics for the better. These approaches are largely driven by data, rather than mathematics, and the modern data-driven interface of JMP is empowering instructors to teach engaging introductory courses.

To meet the needs of these modern courses, JMP has just released the latest version of JMP Student Edition, a streamlined version that contains all of the analysis and visualization tools covered in any introductory statistics course. (Our website shows a comparison of JMP and JMP Student Edition.) Its easy-to-use, point-and-click interface, Windows and Mac compatibility, and intuitive navigation make it an ideal companion for learning statistics, too.

JMP SE contains all the univariate, bivariate and multivariate

JMP Student Edition contains all the univariate, bivariate and multivariate statistics and graphics covered in most first-year courses.

The latest version is now easier to get. Many leading textbooks include access to download a copy of JMP Student Edition through an authorization code packaged in their print or e-text products. Each downloaded copy provides 24 months of use, so students can continue to enjoy JMP Student Edition well beyond their introductory course.

Based on JMP 12.1, the latest release contains several new and improved features:

  • For instructors who wish to emphasize randomization and resampling to motivate inference, Bootstrapping is built-in to the Distribution platform.
  • For courses that wish to introduce concepts and applications of big data, we have removed the limit on data size.
  • Some courses, such as those in business statistics, are beginning to introduce concepts of analytics. So, we have included elements of the Partition platform to provide classification and regression trees.

JMP SE 2 trees

  • To facilitate student projects and communicating results, JMP Student Edition now features a direct export to PowerPoint as well as an interactive HTML output option.
  • “Applet-like” conceptual demonstrations of fundamental ideas such as probability, sampling distributions and confidence intervals, to name a few, are now included and integrated into the help menu. Easy to access, they also feature the ability to use your own data to simulate concepts.

JMP SE 3 concept

  • Mapping tools in JMP Student Edition now include street maps.

JMP SE 4 map

Additional effort went into tailoring JMP Student Edition to the standard output seen in textbooks. Default output of one- and two-variable graphs will more closely reflect standard practice.

JMP SE 5 histogram

For high schools that need a license for computer labs or classrooms, we offer a special five-year schoolwide license for middle and high schools. Contact JMP academic sales for more information at academic@jmp.com. Additional teaching resources for AP Statistics are freely available at our website.

Post a Comment

Did LeBron James step up his game in the playoffs?

The Golden State Warriors beat the Cleveland Cavaliers to win the NBA championship despite the best efforts of LeBron James. With the Cavaliers depleted by injuries (particularly to Kevin Love and Kyrie Irving), James was faced with carrying his team against a very talented and well-rounded Warriors team. And he was most certainly up for the challenge, LeBron had an amazing series, shouldering even more responsibility than usual and making it competitive against the Warriors.

LeBron’s performance in the finals got me wondering: Can we pinpoint exactly when he started to increase his output? Did he step up his game for the finals in particular, or had he been ramping it up throughout the playoffs? Or maybe his performance in the finals was nothing unusual, although I seriously doubted that.

First things first. We should plot his data for the entire season. There are many ways to evaluate a basketball player’s impact on the court. But for our purposes, let’s just look at his points scored, rebounds and assists.

lebron

The data seem a little too noisy to say confidently where LeBron started to increase his output. It’s probably safe to say that his rebounds started to increase around game number 75 (which happens to be the beginning of the playoffs), but it is hard to say. So let’s see if we can use a statistical model to help us find the changepoints.

Finding the changepoints

One approach to finding changepoints in our response is to fit a model like

E(points in game 1) = \(\beta_0\)

E(points in game 2) = \(\beta_0 + \beta_1\)

E(points in game 3) = \(\beta_0 + \beta_1 + \beta_2\)

and so on. This model generalizes to:

E(points in game \(j\) ) = E(points in game \(j-1\) ) + \(\beta_j\) .

So anytime one of our \(\beta_j\) is nonzero, we know that our mean has shifted up or down at game \(j\) . We can use a variable selection technique to tell us exactly which of those parameters should be nonzero. If we use the Lasso for estimation and selection (available in the Generalized Regression platform in JMP Pro), this model is a special case of a model called the fused lasso.

And the model says...

Let’s take a look at the results of our fused lasso model for LeBron’s points, rebounds and assists. The prediction functions for these models give us a much clearer picture than when we looked at the raw data. LeBron’s points remained constant throughout the regular season, started to increase throughout the playoffs and peaked during the finals. His rebounds steadily increased over the regular season, but increased more dramatically throughout the playoffs. Likewise, his assists jumped up during the playoffs as well.

lebronModel

You want your superstars to respond on the biggest stage, and I feel like LeBron truly did that. Things looked bleak when both Kevin Love and Kyrie Irving got injured in the playoffs, but the remaining Cavaliers were up for the challenge. The Warriors were expected to run them off the court, but the Cavaliers were able to make it a competitive and entertaining series, thanks in large part to LeBron’s historic performance. And this is high praise considering that the Cavaliers took out my beloved Atlanta Hawks in the Eastern Conference Finals!

Reference

Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1), 91-108.

Post a Comment

How to combine Squarify and Split layouts with hierarchical Treemaps

In my previous blog post, I introduced you to a new layout algorithm for Treemaps in JMP, called Squarify. I explained how Split preserves the order of your data while Squarify orders the data by size.

But what if you have data that is hierarchical? JMP can display hierarchical data in a Treemap by creating groups. Each group is laid out in tiles, just like each category is tiled within the group. If you select Split, then the groups are laid out using Split, preserving the order of your data. Then the categories are laid out using Split within each group. Similarly, selecting Squarify will reorder the groups, displaying the largest group in the upper-left corner and work its way down to the smallest group in the bottom-right corner. Then categories will be laid out the same way within the groups. But is there a way to combine these two techniques in one Treemap? There is with hierarchical data.

Mixed mode

When you have hierarchical data, you can select the third layout option, called Mixed. Mixed mode will layout the groups using Split, which will preserver the order of your groups. But then it will display the categories using Squarify, which will order the categories within each group from largest to smallest.

Again, I will demonstrate this using the San Francisco crime data from the samples data folder installed with JMP. My previous post showed the number of incidents of each type of crime. We will do that again, but this time let's group it by the day of the week and let's select Mixed from the layout menu option.

Mixed

Looking at this Treemap, we see each day of the week as a grouping. Since Mixed uses Split for the groups, the order of our weekdays is preserved, with Sunday in the top left and Saturday in the bottom-right corner. But the categories are laid out using Squarify. The gives us nicely shaped rectangles that are easy to compare and orders them with the largest value in the top-left corner of each grouping.

In my earlier post on Squarify, we saw that Larceny/Theft was the most common type of crime. Grouping by the day of the week, we see that it is the most common crime every day of the week.

Pop quiz

What if we wanted to know which day of the week has the most crime (which group is the largest)? Well, hopefully by now you know how to find the answer to that question....

Yes! That's right. You would use Squarify, and you'd see that the answer is Monday by looking in the top-left corner. (What is it about Mondays?)

SquarifyGroup

Get this sample data set -- or some other hierarchical data -- and try these new options for yourself.

Post a Comment

Using the Disallowed Combinations Filter in JMP 12

In a previous blog post, I investigated my travel time to work using an estimate from Google Maps. In that post, my possible departure times to and from work were the same every day. However, it’s not uncommon in designs, even when using computer simulators, to have restrictions on the design space. Since JMP 11, this could be accommodated for space filling designs using Disallowed Combinations, but it required a Boolean JSL expression, and you needed to remember that categorical factors have to be specified with an ordinal value. We tried to make specifying disallowed combinations easier in JMP 12 with the new Disallowed Combinations Filter.

In the commute time example, after I move past the first screen, there’s an outline box for Define Factor Constraints underneath the factors. Linear Constraints works as before, and the Disallowed Combinations Script is the option if you want to use Disallowed Combinations via a JSL expression.

disallowedp1

Let’s take a look at the Disallowed Combinations Filter. Selecting that option brings up a list of the factors in something that has a similar look to the Data Filter for Data Tables.

disallowedp2

For this example, maybe I want to exclude design points in which if I leave at 8:30 a.m. or later (i.e., morning >= 60), then my evening commute should start after 5:00 p.m. (i.e., evening = 60 and evening <= 30 together.). I simply select the morning and evening features, click the “Add” button to have them in the filter, and set the sliders to the appropriate condition, like below:

disallowedp3

When I create the design, none of the rows will have both morning >= 60 and evening <= 30 together.

I especially like the Disallowed Combinations Filter with categorical factors. Instead of the above condition, maybe I have a Tuesday afternoon meeting that doesn’t let me leave until after 5:00 p.m. on Tuesdays (i.e., disallow evening <= 30 on Tuesdays) and a morning appointment on Thursdays where I want to leave before 8:30 a.m on Thursdays (i.e., disallow morning >= 60 on Thursdays). I select evening and day first and put in that condition, choose OR to add the second condition, and then add the second condition by selecting morning and day. My filter looks like this:

disallowedp4

Now I can go ahead and create the design.

Final Thoughts

The Disallowed Combinations Filter appears in Custom Design, Space Filling Design, Covering Arrays and Augment Design. If you create a design with the Disallowed Combinations Filter, the saved script has Disallowed Combinations converted into the JSL Boolean expression. This means that running the DOE script does not bring up the Data Filter, but rather the “Use Disallowed Combinations Script.” I have found this useful to create larger disallowed combinations scripts with lots of “OR” statements when the Filter method begins to get tedious.

Post a Comment