Happy little trees: An updated Forest Plot Add-In

Recently, I experienced an event that brought me back to my childhood. I was having dinner at my brother's house with my sons. The television was on, and at some point Bob Ross' show "The Joy of Painting" came on the air.  If you've never experienced the show, it is very entertaining to watch. Bob starts with a blank canvas, and in 30 minutes' time, paints a beautiful landscape using a carefully selected palette of oil-based paints. The show is extremely relaxing, due in part to Bob's soft-spoken instruction and easygoing manner. Though Bob usually had a idea of what he wanted to paint in any given episode, nothing was ever set in stone, and a "happy little tree" or two (as Bob would call them) could end up anywhere in the canvas to fill some space. My brother and I couldn't help but get sucked into watching the show. Initially, our children seemed bored with it, but eventually were also pulled into Bob's world. It was a really nice moment.

Where I am going with this? Well, many happy little trees makes me think of a happy little forest, which make me think of a happy little forest plots.

In all seriousness though, shortly after revisiting the land of happy little trees with my family, I received a request from one of our JMP users about the Forest Plot Add-In. The user wanted to have the ability to size the center bubble using another variable available in the data table. This feature would be particularly useful for summarizing results from a meta-analysis, allowing you to indicate the size of each clinical trial in the plot. You may want to check out a previous blog post on forest plots. So I've updated the add-in to include this feature.

I'll illustrate this new feature with the data table in Figure 1, which shows 95% confidence intervals of the log2(relative risk) for 11 adverse events from the Nicardipine clinical trial.  Count is the number of patients that experienced the event during the study.

Figure 1. Data table of 95% CI for log2(relative risk) for adverse events from Nicardipine Trial

Figure 1. Data table of 95% CI for log2(relative risk) for adverse events from the Nicardipine Trial

The dialog of the forest plot add-in has some additional features (Figure 2). First is the Marker Size variable in the column role section of the dialog. Second is the set of Size Preferences at the bottom of the dialog. The area of the marker is proportional to the values of the Marker Size variable, Count in this example. Count is first scaled to be between 0 and 1 (Count01, say). The radii of the circles drawn are equal to square root (Scale x (Count01 + Minimum)). Minimum = 0 implies no symbol drawn for the records where Count has the minimum value. Running this example produces Figure 3. Transparency makes it possible to easily see narrow confidence intervals that are contained entirely within larger markers (such as vasoconstriction below).

Figure 2.  Forest plot dialog

Figure 2. Forest plot dialog


Figure 3.  Forest plot applying a size variable

Figure 3. Forest plot applying a size variable

While forest plots are useful for summarizing data from a meta-analysis, they are also useful for summarizing results from various subgroups of patients. While this particular example doesn't focus on subgroups, I'll add some more detail that can also be useful for subgroup forest plots. Click the red triangle > Show Control Panel. This allows you to further manipulate the figure in Graph Builder. Drag AE Class over to the y-axis (Figure 4).

Figure 4.  Labeling adverse events with body system

Figure 4. Labeling adverse events with body system

Clicking Done provides Figure 5.

Figure 5.  Adverse events grouped within body class

Figure 5. Adverse events grouped within body class

You can edit this a bit by right-clicking on y-axis, clicking Axis Settings, adding Lower Frame for both labels, and selecting Long Divider for Tick Mark Style for Label Row 2. This produces Figure 6. You can use either figure to summarize additional detail regarding the classification of various intervals summarized within the figure. This can be useful for summarizing subgroups, such as Gender (Male, Female). Note that ordering of the data table is important prior to using the add-in. Terms must be ordered the way you want them to be displayed in the forest plot. If classification is used as in Figures 5 and 6, terms must be sorted by class; the classes must be in the appropriate order; and the terms must be in the desired order within class. Selection of the bubbles is still possible, though you should select them in the center of each bubble. Using the drag-and-select feature of JMP (as one would use to select several points) is likely the most straightforward approach.

Figure 6.

Figure 6. Adverse events grouped within body class with long dividers

You can download this updated Forest Plot Add-In from the JMP User Community (join the community to download the add-in).

Post a Comment

An essential book for new JMP users

Cover of the book JMP Essentials Are you new to JMP or just use it occasionally to explore and analyze your data? There are many resources that can help you quickly get your work done with JMP, including documentation, webcasts and the User Community. But sometimes, the best thing is a book that guides you, step by step.

JMP Essentials, by Curt Hinrichs and Chuck Boiler, is that book. It's for the new user and occasional user of JMP who needs to get the right results from their data right away. It's designed like a cookbook, Curt and Chuck said: "Find what you need and follow the steps." After a successful initial publication of the book four years ago, the book is now in its second edition.

Curt has led the Academic team for many years and knows all about the learning needs of university students as they use JMP to understand statistics, data analysis and data visualization. Chuck leads the group of systems engineers at JMP and has the inside scoop about commercial customers' experience of the software. In addition, you may know Chuck as the founding instructor for the Getting Started With JMP live webcasts for beginners. So this pair are the perfect authors of a book for new users.

The book is available for purchase online.

Curt and Chuck told me a bit more about the new edition of book:

In your experience, who is using JMP Essentials? Who should use it?

We wrote JMP Essentials principally for the new user of JMP, with the goal of getting that person up and running and generating meaningful results quickly. In the first edition, we have found this intended audience to include commercial customers transitioning from another statistical software, students in introductory statistics courses and quite a few spreadsheet users who want to go to that next level of data visualization and analytic capability. While JMP is not difficult to use by any measure, its navigation is different from other statistics packages or spreadsheets, and we have tried to present this along with the “essentials” of JMP in the most efficient way. In fact, much of the coverage in the book has been influenced by the Getting Started with JMP weekly webinar that Chuck hosts and the needs that I typically see among new users in the academic community.

Why did you write a second edition of JMP Essentials?

We are grateful that the first edition was well-received and that readers reached out to give us feedback on the book. That feedback prompted us to begin considering how we would approach a second edition. Plus, the first edition was written with JMP 8 and now – being on the eve of version 12 – we found many areas in need of revision or addition that are reflected by the four versions of JMP that have been released since. Mind you, we have tried to keep these revisions/additions true to our essentials of JMP scope. There have been many useful  enhancements to JMP in the areas of statistical modeling, reliability, design of experiments and consumer research in these releases, but these areas are beyond the scope of the book and covered especially well by other books and JMP documentation.

What’s new and different in this edition? How different is it?

You may notice that we’ve gained a little weight in the second edition! That is, the page count has increased by about 25 percent. This wasn’t intentional but is reflected in many of the sections we revised for accuracy, expanded for clarity or added to the book. Even after extensive editing, the content needed to grow just as the audience for data analysis seems to be exploding. The additions to the second edition should serve these new audiences. We have added eight new sections to the book that we think will appeal to many users; they include Filtering Data, Creating Maps, Using the Excel Import Wizard, Combining Windows Into Dashboards, and Sharing Dynamic Graphs with HTML 5.

Can you give us an example of a technical tip that is in the new edition?

It's hard to pick just one tip, but one that we think will be a real time saver is the support in Graph Builder for background street maps with density contours. In the context of positional mapping, seeing the frequency and density of occurrences of anything with a street map can be very enlightening. This is especially relevant with the advent of handheld devices that provide location data, which are becoming ubiquitous. The San Francisco Crime map example in Chapter 4 provides an introduction to this useful feature that also employs filtering, which is covered in Chapter 2. All you need are data where latitude and longitude have been recorded.

Post a Comment

Exploring and visualizing sleep data with JMP

My BodyMedia® FIT® armband uses a variety of sensors to monitor my movements, including an accelerometer and gyroscope. When plugged into a computer via USB or connected via Bluetooth to an iPhone, the data stored on my device is uploaded to BodyMedia® servers and an algorithm runs to determine when I was active, lying down, awake or sleeping for each minute of monitoring time. I get an estimate of how long I slept, and I can view a minute-by-minute visualization of the time I spent lying down, awake and asleep. (You can get more details about BodyMedia’s sleep classification algorithm in a white paper.) I can extract the data for these daily sleep summaries from the activity summary files, as I described in an earlier blog post.

I wanted to analyze and create some effective data visualizations of my sleep data. An early version of my JMP Discovery Summit e-poster included a multivariate scatterplot of sleep duration, sleep efficiency and time spent lying down during~1,300 nights. I colored points red indicating nights with sleep durations <4 hours. Since I rarely sleep so few hours, I suspected some or all of those low outliers to be infrequently occurring data collection errors. On the other hand, I knew that the highest sleep day was an accurate measurement from a recent bout of the stomach flu!

Scatterplot Matrix

By the time I met with Xan Gregg, lead developer of Graph Builder and head of the JMP Data Visualization group, I had switched to using box plots to visualize my sleep data. The box plot view told a much more interesting story since it showed data over time and gave me a better sense of the variability of my sleep measurements within months. Also, the addition of a time dimension clearly showed that the big drop in my sleep efficiency at the end of September 2011 coincided perfectly with my son’s birth!blue and orange sleep box plot2

When I showed this graph to Xan, he recommended:

  • Lightening the colors used in the box plots.
  • Darkening the smoother lines.
  • Adding annotations to point to important trends.
  • Changing my X axis labels to a horizontal orientation.

We both thought that the final version of my sleep graph was more visually appealing and easier to interpret.

Teal and purple with anno final

I was surprised to see how much my sleep patterns varied with the time of year, as you can see from the box plots and the up-and-down pattern of the smoother line in the sleep duration graph on the bottom. In contrast to my activity measurements, which I know are strongly affected by the fact that I wear my armband less during the day in summer, I wear my armband to bed almost without fail. Seasonal variation in sleep patterns is actually a commonly recognized phenomenon in human sleep, yet I had completely missed it in my own data up till now! This pattern was not obvious when I viewed the standard summary reports of my sleep information within days, weeks and months.

It's easy to see how useful long-term data on nightly sleep patterns could be if you are chronically tired. Although it may not be as accurate as a sleep study done by a medical professional, accumulating nightly data over a long period of time with a sleep-monitoring device could help you assess how much sleep you need to awake feeling rested. If you have insomnia, you can use your sleep data to see when and how long you are awake in the night. If you battle sleepiness during the day, you can change your behavior in various ways and assess the outcome on your sleep.

Experts recommend various sleep improvement strategies such as cutting out or limiting caffeine, going to bed earlier, getting up when you wake up rather than hitting snooze repeatedly, adding exercise to your day and avoiding the use of screens close to bedtime. Sleep monitoring gives you actual outcome data to assess which of these strategies may work to optimize your own sleep patterns. Some sleep monitors even can wake you at an optimum time given where you are in your sleep cycle. Since starting to monitor my sleep, I have rarely used a morning alarm. When I wake up, I assess whether I am still tired, and if not, I check how many hours I slept. If I believe I have had enough uninterrupted sleep based on these two assessments, I get up and start my day.

By examining my food log and sleep data, I have discovered that in addition to seasonal variations, a number of other factors appear to affect how well I sleep. I sleep much less when I am experiencing stress at work, and a few days without exercise can hurt my sleep. What and when I eat can also impact my sleep quality.  I have observed that the stimulant kick provided by chocolate Greek yogurt (my favorite breakfast) disrupts my sleep if I eat it as an evening snack. Perhaps because of my background in biochemistry and genomics, I wasn't satisfied by simply observing this connection. I had habituated to drinking caffeine late at night in the past without such a negative impact on my sleep, so I wondered what was different here?  I did some reading about cocoa powder, and it turns out that it contains caffeine and a related stimulant called theobromine. The same cellular pathway deactivates both these chemicals, and I know from other test results that I have the slow form of a major enzyme involved. I suspect that when I eat chocolate Greek yogurt close to bedtime, I can't metabolize enough of the stimulant chemicals it contains before I go to sleep, and as a result, I sleep poorly throughout the night.

To reproduce my sleep graph with your own data in Graph Builder, open a data table containing sleep duration and/or sleep efficiency measures in hr:m format by date. If your data table doesn't have Year and Month variables, you can create these from the Date variable in your table by right-clicking on it in the Graph Builder variable chooser and adding new transform columns from the Date Time menu. I used a Value Ordering property on the Month variable to create a Month Name column and made sure that Year and Month Name were specified as Ordinal.

To create the graph, open Graph Builder and drag:

  • Year to the X axis.
  • Month Name to the X axis (just above Year, so the axis shows Month Name nested within Year).
  • Sleep Efficiency to the Y axis.
  • Sleep Duration to the Y axis just below Sleep Efficiency so they appear in separate sections.

To complete the graph, change the element type to Box Plot using the icon at the top of the window, adjust your Y-axis, graph title and axis titles if desired, and add one or more annotations from the Tools menu. You can right-click on annotations to change their appearance to match your graph like I did. Stay tuned for the next few posts where Xan and I show how he helped improve other of visualizations of my data!

You can learn more about my interests in quantified self data analysis in this blog post here, see an e-poster on my activity and food log data import project in the JMP Discovery Summit 2014 User Community here, and read about how I imported my BodyMedia® Activity Summary files from Excel here and Food Log files from text here. I used JMP to recode food item names and classify foods into categories, and then used my data to characterize my activity and meal logging patterns. You can download a JMP add-in from the JMP File Exchange here to import your own BodyMedia® activity summary files and food log files or CSV formatted food log files from the free MyFitnessPal website.

Post a Comment

Hear from analytics thought leaders (and get their books too)

Analytics thought leaders are busy folks these days as more organizations are finding analytics are key to their success. But once a month, a different analytics expert makes time to have a conversation with my colleague Anne Milley for the Analytically Speaking webcast series. Their discussions cover statistics, design of experiments, quality engineering, data visualization, and consumer and market research.

We’ve compiled some of the highlights from the series in a webcast we're calling Analytically Speaking Featuring the Best in Show. It will be shown Wednesday, Dec. 10, 1 – 2:30 p.m. ET. This webcast premiered on December 10. It is now available on demand. The best in show includes:

  • SAS co-founder John Sall on the importance of the statistical discipline.
  • Behavioral economist and author Dan Ariely on why more businesses don’t experiment.
  • Statistical Thinking authors Ronald Snee and Roger Hoerl on the relationship between quality and reliability.
  • Popular bloggers Kaiser Fung and Alberto Cairo on the elements of good data visualization.
  • Words of wisdom for aspiring data analysts from esteemed statisticians David Salsburg and Stu Hunter.
  • And, of course, Professor Dick De Veaux on how he became the official statistician of The Grateful Dead!

There were so many good moments that this webcast runs longer than the usual hour. You have the option to watch the webcast in its entirety or segment by segment, as your time permits.

If you do watch, we hope you’ll leave a thoughtful comment below about your favorite moment from the webcast. The first 16 to participate will qualify to receive a free book from the following selection of titles (some of which are signed by the authors):

Numbersense: How to Use Big Data to Your Advantage, Kaiser Fung

Optimal Design of Experiments: A Case Study Approach, Peter Goos and Bradley Jones

The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century, David Salsburg

Statistics for Experimenters: Design, Innovation, and Discovery, George E.P. Box, J. Stuart Hunter and William H. Hunter

The Improbability Principle: Why Coincidences, Miracles, and Rare Events Happen Every Day, David J. Hand

The Innovator’s Hypothesis, Michael Schrage

Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, Michael J.A. Berry and Gordon S. Linoff

Statistics for Business: Decision Making and Analysis, Robert A. Stine and Dean Hunter

Here's how the book giveaway works: After watching the Best in Show webcast, tell us your favorite moment from the webcast, here in a comment. Your comment must be between 50 and 75 words long. Be sure to enter your e-mail address when you write your comment so we can contact you if you are a winner. Only one book per commenter. Commenters must reside in the US to be eligible to receive a book. The first 16 eligible commenters will win a book!

Post a Comment

John Sall named AAAS Fellow for contributions to statistical sciences, software

John Sall, co-founder and Executive VP of SAS, and lead architect of JMP softwareJohn Sall, SAS co-founder and Executive Vice President, has been named a Fellow of the American Association for the Advancement of Science, the world’s largest general scientific society.

Sall was elected “for distinguished and visionary contributions to statistical sciences and software development, having the greatest impact on businesses, education, engineering and all other sciences,” the AAAS said.

Sall leads the JMP business unit, which creates interactive and visual statistical discovery software designed for scientists and engineers. He frequently speaks about statistics at universities and conferences. Sall developed many of the earliest analytical procedures for Base SAS software and had significant roles in creating other SAS products, including SAS/ETS, SAS/IML, SAS/OR and SAS/QC. Sall also developed JMP software and remains its lead architect.

"John has performed extraordinarily broad and deep work at SAS for more than 35 years – work that has had a powerful impact on science across the globe,” said Russ Wolfinger, Director of Research and Development for Genomics in the JMP business unit and a 2012 AAAS Fellow.

Sall said the work of his team at JMP is to make statistical modeling as friendly, accessible and informative as possible, and to make experimentation as efficient as possible.

“Doing well at these goals will contribute greatly to scientific discovery and engineering breakthroughs,” said Sall. Along with SAS Analytics software, JMP helps organizations deliver the value of analytics by putting data to work for solving problems, making better decisions and improving our world. “The value of statistics is realized when it is translated, taught and delivered with accessible computing environments to the scientists and analysts who use it," Sall added.

Sall recognized the leadership in statistics at North Carolina State University for creating the environment from which SAS launched. "Thank you especially to three statisticians whose exemplary leadership at NCSU put them on the path to becoming university deans: Dan Solomon, Blan Godfrey and Sastry Pantula," Sall said.

Pantula, Dean of the College of Science at Oregon State University, called Sall a true visionary and a great promoter of science. "Statistical sciences are at times invisible, but are having an impeccable impact on innovation and discoveries. The software behind it is even more invisible, but the software developed by John and his colleagues at JMP is having a great impact on business, education, engineering and sciences throughout the world,” Pantula said.

J. Stuart Hunter, Professor Emeritus at Princeton University’s School of Engineering and Applied Science and an expert in design of experiments, said: "John Sall has spent his entire career immersed in the advancement of science and engineering. His career of many decades contains myriad contributions to the statistical solution of engineering and social problems. His personal contributions to computer software programs adapted to statistical data analysis are simply outstanding. His recent innovations in graphical displays of masses of multivariate data has added meaningfully to the art of finding signals in noise."

Post a Comment

Holiday book gift ideas for the analytically minded

The gift-giving season is approaching, and it’s time to start thinking about the quantitatively inclined people on your list. A few years ago, I wrote a post offering a list of books. Many new analytical books have been published since then, and there are some classics worth revisiting. So I wanted to list some more recent books and make sure you know about the recommended reading page of the Analytically Speaking webcast series (now in its third year). Many thought leaders from this webcast series — several of whom are authors — share books they recommend, and we continually update that reading list.

The first three books on my list below are from three of our featured keynotes at Discovery Summit this past September. You can view the plenary talks for David Hand and Michael Schrage to get a sense of what their recent books cover. The recording of the speech by Jonah Berger will air in January.

For those interested in analytical concepts — “listening for the melody” versus making the music:

For those seeking more recent views on data visualization:

For big data enthusiasts:

And for those who enjoy books on methods, techniques and more focused topics:

Among the holiday catalogs (arriving since October) was a book catalog with an appropriate plaque: “Life is short. Read fast.”  If you have some books you’d like to suggest, leave me a comment. Happy reading!

Post a Comment

Where are they now? How blog posts fare over time

As I approach the one-year anniversary of my first post for this blog, I’ve grown interested in how the popular JMP Blog posts from the past have done in the years that follow. In other words, do the popular posts continue to be popular?

A good starting point for an initial investigation is the top 10 JMP blogs from 2011. I looked at the time period right after that list was published to just a few weeks ago: from Jan. 26, 2012, to Oct.31, 2014. The data available to me comes from Google Analytics. If you pay attention to the view counts, you might notice the discrepancy between the Google Analytics views and the total views shown on the blog entries themselves – that topic is beyond the scope of this post (it appears to be a common problem when comparing the reports of different web analytics tools, and there are many possible reasons for it).

Changing Ranks
If I were to rank the top 10 JMP Blog posts of 2011 now, the list would look a bit different (change in rank in parentheses):

  1. Saving Graphs, Tables, and Reports in JMP (2010) (+1)
  2. How to Make Tornado Charts in JMP (2008) (+2)
  3. What Good Are Error Bars? (2008) (-2)
  4. JMP into R! (2010) (+2)
  5. Principal Variance Components Analysis (2010) (0)
  6. Keyboard Tips CTRL/Click, ALT/Click, CTRL/ALT/Click (2010) (+1)
  7. Solar Panel Output Versus Temperature (2009) (+1)
  8. The Best Karts in Mario Kart Wii: A Mother’s Day Story (2008) (-5)
  9. Set the Right Selling Price for Christmas Cookies (2008) (+1)
  10. How to Make Practical Sense of Data and Win a Book (2011) (-1)

If you prefer to see this visually, below is a slopegraph created in Graph Builder comparing the 2011 rankings to the 2014 view count since the original list was published.

slopegraph2011blogsThe drop in rank of some posts isn't so surprising. It may be that Mario Kart Wii isn't as popular a game as it has been. And it makes sense that a blog post about a book giveaway (How to Make Practical Sense of Data and Win a Book) holds much less interest now that the contest has long ended. Why do you think particular posts rose or fell in the ranks?

Posts With Staying Power
The change in ranking tells only part of the story. Fortunately, I also have the weekly view counts available to me. Below is the cumulative view count of each of the blog entries over time. I've highlighted the top three in a different color and labeled them.

It was interesting to me to see that it's the technical posts and, particularly, the “how to” articles that seem to stand the test of time quite well. The "Saving Graphs, Tables, and Reports in JMP" is the most noteworthy to me, as even in 2014 it remains among the most popular posts for the year.

A natural next question is: What were the most-viewed blog posts during that time frame, ignoring the original list?

Here's what I found:

  1. Saving Graphs, Tables, and Reports in JMP (2010)
  2. Graphical output options using JMP (2011)
  3. Image analysis of an elephant's foot in JMP (2013)
  4. Introducing definitive screening designs (2012)
  5. “The desktop is dead” and other myths (2013)
  6. George Box: A remembrance (2013)
  7. US ZIP code map files in JMP (2012)
  8. Train, Validate and Test for Data Mining in JMP (2010)
  9. Ordering categorical graph elements (2011)
  10. JSL tip: finding unique values in a list (2011)

Are any of your favorite JMP Blog posts missing from this list? Leave me a comment below and let me know your favorites. Thanks for reading!

Post a Comment

What I love about Discovery Summit

John Sall shows the latest version of JMP at Discovery Summit 2014.

A high point of Discovery Summit is John Sall's tour of the latest version of JMP. At the conference in Brussels in March, Sall will show JMP 12.

I attended my first Discovery Summit conference in 2009. Held at the beautiful Swissôtel in Chicago, it included a keynote speech by Malcolm Gladwell (author of Outliers) and an evening of reminiscing by none other than George E.P. Box!

Gladwell gave a memorable speech in which he discussed, among other topics, the problem of having too much information. When Enron was questioned about its accounting practices, the company overwhelmed us with information – so much so that almost no one could wade through all the data and documents. This is the data challenge all of us face now, Gladwell said.

Everything about that conference was first class – the views from the top-floor reception, the food, the speakers and the participants. I’d been to a lot of conferences (maybe too many), but this was special. Even after three days of talks, I didn’t want the conference to end.

With a bar set that high in 2009, I thought JMP could not possibly host a conference like that again. But I was wrong, and I’ve been wrong five times – the number of Discovery Summits held since then. Here’s why Discovery Summit is always the highlight of my conference year.

Keynote speeches
The keynote speeches continue to impress. In 2010, Dan Ariely, author of The Upside of Irrationality, told us about how people make seemingly irrational choices; this happens almost by default because we often have too many choices, and some organizations take advantage of this fact. He shared the example of The Economist magazine’s subscription pricing. The magazine offered online subscriptions for $59, print subscriptions for $125, and print and online for $125. Most people selected the print-and-online option. No one (rationally) chose print only. However, when the print-only option was removed, the percentage of people opting for $59 online jumped from 16 to 68 percent, while the percentage of people who chose $125 print-and-online combo dropped from 84 to 32 percent.

In 2013, the San Antonio conference featured Nate Silver, who gave a talk based on his book The Signal and the Noise. I recall he said that sometimes the best answer is “I don’t know,” but it’s hard for experts to say that. This made me think of John Tukey’s Type III error – finding the right answer to the wrong question.

Informal discussions
One of the best parts of Discovery Summit is the opportunity to network and exchange ideas with an amazing collection of thought leaders, researchers and authors. That’s what I want when I commit to spending three days away from home, and no other conference so consistently delivers it.

I’ve made new friends, reconnected with old friends and gotten ideas for new projects from every Summit I’ve attended.

Access to JMP staff
The very people who create JMP give presentations at the conference and host “Meet the Developers” sessions – where you can ask your specific JMP questions, tell what new features you’d like to see in the software and watch a live demo to learn something totally new. The chance to see the latest wizardry from the JMP developers is worth the price of admission by itself. Watching John Sall analyze entire data sets seemingly without effort and seeing what’s new in the latest version of JMP are always high points as well.

All in all, it’s a conference that’s hard to beat for anyone who explores, analyzes or models data.

I’m happy to say I will be a keynote speaker at Discovery Summit Europe in March. I’ll be reporting on my work studying the performance of aging athletes in running and swimming. Did you know that the record marathon time for the age 90+ group is 5 hours 40 minutes (!?) or that the 100 m record time for those 100 and older is 23.14 sec?

Because of the increased participation of older athletes, we now have a much better sense of how masters’ athletes compare to their 35-year-old counterparts. Would you guess that older athletes fare relatively better in sprints or long-distance events? Men or women? To find out, book your flight to Brussels and join me for the first Discovery Summit Europe!

Post a Comment

Analytically Speaking: Q&A with reliability expert Bill Meeker

20141107_140731Earlier this month, we had the pleasure of hosting Bill Meeker, Distinguished Professor of Liberal Arts and Sciences at Iowa State University, on our Analytically Speaking webcast series. Bill is a noted author and expert in the areas of reliability data analysis, reliability test planning, accelerated testing, nondestructive evaluation and statistical computing. Our viewers had so many good questions for Bill during the live webcast, we didn’t have time to include all of them. For those that were in the queue, Bill has kindly provided answers.

Question: Is there a link or relationship between cohort analysis and survival analysis? Can they be used together? And if so, how would they complement each other?

Answer: Yes, cohort analysis and survival analysis methods can be used together to get more insight into population behavior. In cohort analysis, we stratify our population into different groups of units that share a similar characteristic or characteristics. For example, we might stratify a population of potential customers on the basis of geographic location and/or past experience with the potential customers. Then we could, for example, do separate analyses of time to respond to an offer for each subgroup. An appropriate model to fit would be the “defective subpopulation model” (or DS model) in JMP, in the Life Distribution platform. Some proportion of the population will never respond. This model is also known as the “limited failure population model” and the “cure model,” allowing estimation of the proportion of the population that will respond and the distribution of time to respond for each subgroup. This model is described in some detail in Chapter 11 of Meeker and Escobar (1998). In the Life Distribution and many other platforms, there is an analyze “By” option that will do separate analyses for each cohort. (In JMP 12, there will be a “Compare Groups” option in the Life Distribution platform to make such comparisons even easier to perform.)

Question: You mentioned recidivism as an application of reliability. What are some other areas of application of reliability analysis that you didn’t get to mention during the webcast?

Answer: Yes, Anne mentioned recidivism, and that certainly has been an application of life data analysis (a.k.a. survival analysis) methods. Indeed, it was one of the early applications of the “cure model” mentioned above. There was interest in how long a person would stay out of jail after release. But, of course, some individuals are “cured” and will never return. There are innumerable applications of these methods, which generically might be called “time to event” applications. In engineering reliability, we are often concerned with time to failure or time to return (of a product for warranty repair). In food science and in the development of many other products, one is interested in the shelf life. In the banking industry, there would be interest in “time to payment” for a defaulted loan. In medical applications, there is interest in time to recovery after a treatment. In sociology, there might be interest in time to divorce after marriage. Again, the “cure” model might be appropriate here because a sizable proportion of couples will never have a divorce. In many applications, we are not just interested in the first event, but in the recurrence of certain events over time. Examples include the recurrence of a disease over time (e.g., common colds), repairs of a machine, customers returning for more purchases, etc. Special models and methods are available for such data. Again, I recommend Wayne Nelson’s 2003 book on the subject as a good place to start learning about the analysis of recurrence data. JMP also has powerful analysis methods for recurrence data.

Question: Do you often need to convince end users or customers that 25 or 30 trials are necessary, when these trials are expensive and therefore resisted? If so, what approach would you use?

Answer: Yes, the most common question asked of any statistical consultant is “how many units do I need to test?” And in reliability applications we hear the related question “How long do I need to test?” JMP has extensive tools for planning experiments of different kinds, including reliability experiments such as demonstration tests and accelerated life tests. The theory behind the methods is impeccable. When the software says you need 30 units to achieve the desired precision, it is correct. But that might not help to convince end users with limited resources. I have found it useful to supplement the “black box” answers with repeated simulation of the proposed experiment. I typically run through the analysis five or six complete simulated data sets and then graphically summarize the results of 50 such simulated experiments. The graphically presented simulation-summary results allow visualization of the variability that one could expect to see in repeated experiments and how far away from the truth any given result might be. Such simulations can be used to compare different candidate test plans (e.g., different sample sizes). You do not need to know any theory of experimental design to appreciate the implications coming from the visualization of the simulation results. Such simulations could be programmed in the JMP Scripting Language, JSL.

Question: Can you talk about reliability studies as one – or the main – consideration in a designed experiment, particularly with regard to the approach taken with Taguchi methods?

Answer: Taguchi methods (a.k.a. robust design methods) provide a collection of design-of-experiment tools that can be used to make products and processes more robust to externals noises, such as variability in raw materials or variability in the manner in which a product is used. The use of these methods has been shown to have high potential for improving the quality of a product or the output of a process. Because quality is a prerequisite for high reliability (recall that reliability is “quality over time”), the skillful use of robust design methods will also improve reliability. In some applications, robust design methods can be used to focus directly on reliability. Two books that I highly recommend for this area are:

Question: When you exclude a failure mode, does it treat those as censored data?

Answer: In the multiple failure mode analysis, the first step is to estimate the “marginal” distributions for each failure mode. Under an assumption of independence of the different failure modes, this is done, literally, by making separate data sets for each of the different failure modes and then doing separate analyses for each. In the construction of these data sets, with focus on one failure mode, failure for all other failure modes are treated as right censored (because all we know is that the failure mode getting focus has not occurred yet). This is done for each failure mode. Then the so-called “series system model” can be used to combine the estimates of the marginal distributions to obtain an estimate of the failure time distribution with all of the failure modes active. A simple extension of this approach is to provide an estimate of the failure time distribution with just some of the failure modes active (so you can see the effect of eliminating one or more of the other failure modes). Modern software with capabilities for the analysis of reliability data, like JMP, will do all of this automatically. Technical details for this topic can be found in Chapter 15 of Meeker and Escobar (1998).

If you’d like to see the archived webcast, you can learn more from Bill’s extensive expertise on statistics, reliability and more.

Post a Comment

UK User Forum warms up for Discovery Summit Europe

Fujifilm's bioprocessing line

Fujifilm's bioprocessing line

The theme running through the recent UK User Forum was the use of JMP to simulate and optimise processes, particularly within the chemical industry. Fujifilm Diosynth Technologies hosted the event in its Billingham offices in, which used to be part of the ICI chemicals conglomerate. Users came together to share how they successfully use JMP to solve problems and to drive efficiencies within their organisations.

Dr. Graham McCreath, Head of R&D at Fujifilm, welcomed the users. The day went smoothly with an excellent standard to the presentations, each introduced by the new Chair of the forum, Dr. Claire Crawford of W.L. Gore.

Dr. Mahesh Shivhare is the Senior Process Statistician at Fujifilm, one of the world's leading GMP drug contract manufacturing organizations. Mahesh described how the bottleneck in their processes had moved from experiments to data analysis. To ease this situation, he created the Rapid Data Analysis Platform, RaDAP, using JMP Scripting Language (JSL), so that scientists could quickly and robustly analyse the data from their bioreactors. Mahesh used automatically generated scripts in JMP. “The idea came from the user forum in the south [at Infineum],” Mahesh said, “and allows scientists to just click on a button to identify which reactors work well based on different characteristics.”

Dr. Stephen Pearson is a Chemical Process Statistician at Syngenta, a global agri-business company. Stephen wowed the audience by presenting 100 percent in JMP. He even recreated his slides using Graph Builder. Stephen helps 70 scientists at Syngenta UK manufacturing sites carry out data analysis. He found that an effective way to achieve this was to create a JMP application he calls the “Analysis Assistant.” This JMP application takes scientists through a process of gathering, processing and visualising data. “My aim is to create tools that allow the scientists to do things more efficiently. Every hour I spend writing code needs to save double the scientist’s time,” he said. Claire described Stephen’s scripts as “really impressive.”

David Payne is Head of Continuous Improvement at Macfarlan Smith, a leading supplier of active pharmaceutical ingredients for the pain relief market, and part of Johnson Matthey. David said that his company faced a challenge in improving the efficiency of manufacturing morphine from poppy straw. Variability in the morphine extraction process created significant processing problems on the plant. David and his team used JMP to analyse process data to identify how they could improve control and increase throughput. The net result is an elegant solution that has considerable financial benefit to the organisation. It also forms the foundation for controlled processes in future plants. David will be presenting his paper at Discovery Summit Europe in Brussels next March.

Matt Linsley is Director of Newcastle University's Industrial Statistics Research Unit. ISRU’s aim is to bridge the gap between academia and industry in the fields of applied statistics and continous improvement methodologies. ISRU's learning programmes involve JMP to teach design of experiments. An "Improving Chemical Processes using Statistical Methodologies" learning programme is delivered on an annual basis with Durham University's School of Chemistry. This programme includes the use of a Reaction Simulator developed by GSK, the experimental data being analysed using JMP. Matt shared a YouTube video providing a flavour of the 2014 programme. "It is absolutely key to control key process variables in order to optimise process performance. A design of experiments strategy supported by a statistical software package such as JMP can help to identify those key process variables and support their long-term control,” Matt said. “Our intention is to build Definitive Screening Designs into next year's programmes in order to compare their benefits to more traditional designs.”

David Burnham is owner of Pega Analytics, which provides training and consulting in JMP. David started his presentation on the value of simulation with an archive video by Stu Hunter: What Is Design of Experiments? - Part 2. David has a personal interest in the combination of design of experiments and simulation. He showed how simulation could help a scientist make an informed decision, for example, about the tradeoff of doing more experimental runs. “Scientists are trying to develop understanding through experience, and I am using simulation to artificially create experience,” David said. “In a sense, embracing the art of statistics can give us an understanding of how statistics can help us even before we collect data.” David also will be presenting his paper at Discovery Summit Europe in Brussels in March.

Ian Cox, European Marketing Manager for JMP, gave a whistle stop tour of JMP 12. Ian started by saying that John Sall’s vision of statistical discovery, where there is a graphic for every statistic, still holds true after 25 years. Ian demonstrated many of the exciting capabilities in JMP 12, due for release March 2015. Users who are interested in exploring these can contact me for further information.

The forum's steering committee analysed the results of the survey conducted over the course of the day to work out what the users would like at future user group events. Claire will provide details of these results to the users over the coming weeks.

The next gathering of UK users will be at the Discovery Summit Europe in Brussels in March 2015. We hope to see you there!

Stephen Pearson of Syngenta shows how he created his presentation slides in Graph Builder.

Dr. Stephen Pearson of Syngenta shows how he created his his presentation slides in Graph Builder.

David Payne of Macfarlan Smith explains how he uses the simulator in JMP to solve problems.

David Payne of Macfarlan Smith uses the simulator in JMP to solve problems.

Matt Linsley of Newcastle University's Industrial Statistics Research Unit shows how he profiles multiple responses in JMP.

Matt Linsley of Newcastle University's Industrial Statistics Research Unit profiles multiple responses in JMP.

David Burnham of Pega Analytics speaks to an engaged audience at the UK User Forum.

David Burnham of Pega Analytics speaks to an engaged audience at the UK User Forum.

Steering committee members, Claire Crawford and Mahesh Shivhare, discuss the survey at the UK User Forum.

Steering committee members, Claire Crawford and Mahesh Shivhare, discuss the survey at the UK User Forum.

Post a Comment