New to Discovery Summit? Let me connect you

If you’ve been to a Discovery Summit, you know what you have to look forward to. One of the best parts of the conference is meeting other smart, curious people who are interested in discovering cool things in data.

Even the introverts come out of their shells when around hundreds of other like-minded people. When smart, curious and interested people get together, they talk. They talk a lot. And when a large cadre of attendees return year after year, they have a lot to share and catch up on.

So what about people who have just recently started playing with data in JMP or who are novices to the conference?

To those people, I say: Come find me.

I’ll introduce you to other people in your industry who have similar business pains, to users who apply JMP in the ways that you do so you can share best practices, and to the Steering Committee members who select the papers and posters. I’ll hook you up with the developers who work on your favorite JMP platforms. I’ll even introduce you to John Sall, SAS co-founder and JMP creator. Like you, he’s smart, curious and interested in learning new things.

Malcolm Gladwell (a speaker at a previous Discovery Summit) would call me a “Connector.” Per Wikipedia, “Connectors are the people in a community who know large numbers of people and who are in the habit of making introductions.” That’s me. There’s a good chance I’ll know somebody who knows somebody who knows somebody. By the end of the Summit, you’ll feel like you know a whole lot of somebodies too.

So, when you get to San Diego on Sept. 14, come find me so I can start connecting you.

Post a Comment

Creating a covering array when you can't test some factors together

I’ve written before about how we use covering arrays created by JMP to test JMP itself. A recent example that came from my colleagues Wenjun Bao and Joseph Morgan was so intriguing that I wanted to share it with others.

To test a particular platform in JMP Genomics, there are 14 different factors that can be varied, ranging from 14 levels to two levels. What makes this case different from the testing software preferences example I discussed previously is that for three of the factors (Interactive Hierarchical Clustering Options, Automated Hierarchical Clustering Options, Minimum Recombination Grouping Options), only one can be set for any given test run. This restriction arises because of the behavior of the following radio control:


In this example, we have a factor for Linkage Grouping Method that has three possible levels. Each of the grouping methods has one of the above mentioned factors associated with it. Essentially, these three associated factors break into three separate cases to consider. If we wanted a strength 2 covering array, couldn’t we just create a strength 2 covering array for each possibility? This would ensure that each possible (allowable) pair occurs in our testing.

We could, but…

The two factors with the largest number of levels have 14 and 9 levels, respectively. If we didn’t have any restrictions, the smallest possible strength 2 covering array has 126 = 14*9 runs (and the covering array platform can find such a design). However, if we create 3 separate covering arrays, that means each will be 126 runs, for a total of 378 = 126*3 runs. A strength 2 covering array ensures each possible pair occurs at least once, but by breaking it up into three covering arrays, we end up having more coverage than we need. By combining the three covering arrays, each of the pairs not involved in the restricted factors is actually occurring at least three times.

What would be nice is a design that has missing values for factors that cannot occur based on the other settings.

Really? Missing values?

When you think of it from a traditional DOE standpoint, creating a design with missing values sounds silly. But for covering arrays, where we’re looking at combinations of factors, it makes perfect sense: If a factor has a missing value in a row, it means it’s not relevant for that particular test. This also means that if we see a failure for that test, we know the missing factor is not involved in the cause. Fortunately, our Analysis tool recognizes missing values as well.

OK, so how can I create this design?

Consider one of the three grouping methods, Automated Hierarchical Clustering. When it appears, another factor, Automated Hierarchical Clustering Options, can take on three different levels, while the factors Interactive Hierarchical Clustering Options and Minimum Recombination Grouping Options should be missing. We can use the handy disallowed combinations filter, and when Linkage Grouping Method is Automated Hierarchical Clustering, disallow all values for Interactive Hierarchical Clustering Options, join with an OR and do the same thing with Automated Hierarchical Clustering and Minimum Recombination Grouping Options.


We could then follow a similar procedure for the other two grouping methods linking these with OR statements from the Data Filter. So we should now be ready to create the design…

Not quite yet

We’ve overlooked one thing in our disallowed combinations that is very easy to overlook – the designer will still try to make pairs of those restricted factors show up in rows with Grouping Method missing, which doesn’t make any sense for our design. So, we have to disallow all possible combinations between those columns from occurring. For example, if we choose Interactive Hierarchical Clustering Options and Automated Hierarchical Clustering Options from the filter, we would get (with the earlier disallowed combinations cropped from the top):


After we’ve done those combinations and the previously mentioned disallowed combinations, all connected by using OR in the filter, we can create the design.

The design

The Covering Array platform finds us a 140-run design (more on this in a bit, but this is, in fact, the smallest possible run size), with only one of Interactive Hierarchical Clustering Options, Automated Hierarchical Clustering Options, Minimum Recombination Grouping Options set for each row and the other two missing. I’ve put the data table on the File Exchange in the JMP User Community, where you can see the final result and take a peek at what the resulting disallowed combinations looked like.

Final thoughts

While some extra work was needed to set up the disallowed combinations, the savings in the number of tests (378 vs. 140) was dramatic. With the help of the Disallowed Combinations Filter, once it was determined what should be disallowed, it was easy to input them, even with a number of different combinations.

The keen reader may have noticed I could collapse all the possible values from the three variables into one factor with 10 levels (this is how I knew that 140 = 14*10 runs was the lower bound). While it’s certainly possible to construct the design in such a way, it’s easy to lose the context as to what that variable is trying to describe.

Do you run across similar types of problems when using covering arrays? If there’s interest, we can see about adding the capability to easily generate the specialized disallowed combination when some factors cannot be tested together. Thanks for reading!

Post a Comment

Presenting at Discovery Summit Europe in 6 easy steps

Staircase at Discovery Summit Europe

Go ahead -- take the steps.

What’s more satisfying than attending a Discovery Summit event? Presenting at one!

I know, I know. That’s an honor reserved for the best and the brightest analytic thinkers in the world. Before you say you’re not up for it, answer this question: Do you have a story of data discovery to share, a tale of a cool application of analytics and/or ROI to tell?

If yes, consider yourself qualified to submit a Discovery Summit abstract for consideration. The call for papers for Discovery Summit Europe is open now, but closes 30 September.

Now that you know that you CAN submit an abstract, I’ll tell you HOW. Here are six easy steps:

  1. Write 150-200 words describing your story: As the protagonist, what conflict did you need to overcome. and how did you resolve it with analytics? Oh, and make up a title.
  2. Decide if you want to talk for 30 or 45 minutes, or present a poster.
  3. Write a 75-word bio for yourself and any friends who will be presenting with you.
  4. Decide if your story would be best for beginners or experts, or folks who are somewhere in between.
  5. Fill out this easy form.
  6. If your paper or poster submission is accepted, plan to attend Discovery Summit Europe 14-17 March 2016 in Amsterdam.

See, it’s a simple process. But start now, because the deadline is approaching and if you don’t take those steps, you can’t make it onto the stage.

Post a Comment

Assessing my Skulpt Aim data with JMP

In my last post, I mentioned that I have recently acquired several new quantified-self devices, including the Skulpt Aim. These new devices and the data from them have brought me greater opportunities to think more deeply about measurement systems.

When I begin using a new self-tracking device, I have the same kind of basic questions that any scientist or engineer might ask about a novel measurement tool:

  • Why does this tool interest me?
  • What does this device measure?
  • Does this new data confirm what I would expect?
  • Do changes over time represent important trends or random noise?

These questions have been on my mind over the past six weeks as I’ve collected daily data with the Aim, a body fat and muscle quality monitoring tool.

Why does this tool interest me?

I've blogged in my Fitness and Food series about my past body size fluctuations and how I adopted quantified-self practices such as food logging, activity monitoring and weight tracking with a wireless scale to reach a healthy weight maintenance zone. Tracking my diet, activity and weight over the past six years has helped me better understand how my food intake and exercise habits have affected my short- and long-term weight trends throughout my lifetime. However, since strength training is my workout of choice, body weight has always felt unsatisfying as a long-term success metric.

Weight history purple and aqua2

What does this device measure?

Unlike other methods I have tried before, the Aim provides two different metrics: % fat (the tried-and-true measure of body fat percentage) and a novel measure called muscle quality (MQ). In short, the device estimates % fat by passing a current through a specific body part and measuring its resistance. It uses the time between discharge of the current into the muscle and detection of the corresponding voltage measurement to calculate muscle quality. The basic idea is that larger, fitter muscle fibers retain current longer. (The Skulpt site has more information about how it works.)

The Aim estimates overall body fat percentage and average muscle quality through a four-site measurement, similar to the multisite approach used by caliper assessments. But the Aim’s real novelty is its ability to assess and report measures for individual muscle areas. This fills a gap in my quantified-self data collection by providing me a frequent and convenient way to quantify muscle maintenance and incremental changes in body areas due to training choices. I had seen the Aim online several months ago, but having the chance to try the device myself at the recent QS15 conference really sealed the deal.

To use the Aim, I spray water on the area I am going to measure and on the back of the device; I then set the device on a specific muscle area, following recommendations for device placement shown in the app’s embedded videos. A few seconds later, the Aim displays % fat and MQ for that area. When I fit a regression line to data points across all body parts in Graph Builder as shown in the first graph below, you can see that there is an inverse relationship between MQ and % fat. Intuitively, it makes sense that areas with higher muscle quality will tend to have lower fat percentages.

Percent fat vs muscle quality 8-29-15

However, adding Body part as an overlay variable in the second graph reveals that the MQ and % fat profiles of different muscle areas can vary greatly.

Percent fat vs muscle quality overlay 8-29-15

Does this new data confirm what I would expect?

To answer this question, I had to start collecting data! So for the six weeks, I have been performing three to five replicates of the Aim’s Quick Test each day. It uses measurements of my right side bicep, tricep, ab, and quadricep muscles to estimate my overall body fat. Every week or so, I also measure other individual muscle areas. I perform all these tests first thing in the morning before eating and drinking, right after I weigh myself. The graph below summarizes the number of measurements I have taken for different areas over this period of time.

N reps 8-30-15

Muscle quality is a new metric to me so I don’t have any past measurements for comparison. But the patterns in the data I collected indicate that the muscles that I train regularly and heavily tend to have the highest muscle quality (MQ) scores. As expected, areas that I haven’t trained regularly with weights in recent years (e.g., calves) have lower muscle quality scores. My abs are an interesting exception. I rarely train them directly, but their MQ scores are very high, probably because most weight training exercises require the use of abdominal muscles to stabilize the movement.

The best body fat data I have comes from a January 2014 DXA scan, which assessed me at 17.5% body fat at a dieted-down weight of 127.5 lbs. My recent quick test measurements with the Aim have been taken at a more typical maintenance weight around 135 lbs and estimate my % fat at 18-19%. Although my weight is not directly comparable to my weight on the day of my DXA, my results are in the ballpark of what I’d expect after adding in a few pounds for extra food and water in my system, a few pounds for extra body fat, and 1.75 years of training time.

I used my Skulpt data with a custom body map I created earlier this year in JMP to show mean MQ and % fat by body area (averaged over left and right sides). I reversed the color scales so the trends for each measure could be compared more easily. Like the body-part specific regression lines shown above, this graph also reflects the inverse relationship between MQ and % fat.

Mean MQ 8-30-15Mean fat 8-30-15

Do changes over time represent important trends or random noise?

I had some questions I wanted to answer before assessing how my workouts might affect my % fat and MQ measures in the short and long term. While casual Aim users might be satisfied by taking a single measurement daily or weekly, I expected my measurements to vary around the true mean for each body part/side combination due to random and systematic variables.

Without daily access to a gold standard test like DXA, I could not verify the accuracy of the Aim’s measurements, but that has never been my intent. I am much more interested in establishing a measurement routine that generates precise measurements each day so I can make sense of daily or weekly trends in the context of my weight, eating and workout variables. The Skulpt blog mentioned an expected between-day test-retest variation of 5%. Put another way, an area measured at 20% body fat one day would be expected to measure 20% +/-1% the next day. But I predicted that I would see variables like water weight impact my daily measurements, such that my true values could differ between days, so I was more concerned with assessing replicate measurements taken on the same day. Establishing within-day precision would be key to establishing a baseline for my MQ and % fat values.

To assess within-day variation, I used Graph Builder to create a graph of the standard deviations of my MQ and % fat measurements for the four sites I measure daily. I used a light blue shaded reference range to indicate the 1% fat and 1 MQ point standard deviation that I hoped to achieve.

Measurement variability

The variability trends I saw in my July measurements caused me to question and adjust my measurement techniques:

  • Early on, my within-day variability for my MQ and body fat scores was relatively high. I soon realized that I wasn’t following the Aim instructions to the letter. I began to spray the back of the unit before each and every rep, ensuring that the metal contacts were consistently soaked for each measurement. You can see this change begin to reduce the variability of my data around July 21.
  •  Once I made the above improvement, I started to notice a new pattern. My first rep for a muscle group seemed to be different than later reps. I confirmed this suspicion by examining my raw data. I theorized that perhaps this might happen because the device was wet before rep 1, but the muscle area itself was dry until after rep 1 was complete. I began spraying each body area before I started measuring, and this further improved my measurement consistency.

At the end of July, I noticed another disturbing trend. The standard deviation of my MQ measurement for my right bicep was trending up, not down! This affected the consistency of my four-site average. In puzzling it over, I concluded that since my bicep is a relatively small and narrow muscle, slight position changes probably affected its measurement more than for larger muscle groups.

I decided to test my theory by experimenting with the position of the device. For five reps (group 1), I made an effort to hold the unit slightly higher on my bicep area, and then moved it to a slightly lower position for five reps (group 2). The figure below illustrates how this affected my results. Although one rep in the higher position group had an MQ score of 125 (marked with a red x), the rest of the MQ scores in the higher position group were several points lower than those in the second group. It seemed clear that I needed to choose one of these positions and stick with it to obtain the most consistent measurements for this problematic muscle area.

Device position 9-2-15

Over subsequent days, I applied the lessons learned above and chose my bicep measurement area more consistently, reducing the SD(MQ) and SD(% fat) for biceps in my data set. At this point, I’m happy with being able to consistently measure MQ +/- 2 and % fat +/- 1% on most days for almost all areas, and the 4 position overall estimate that I take daily has fallen into a predictable range.

What's next with this data?

Given my initial adventures in measurement consistency above, I knew I had more work to do with this data set. I was continuing to collect daily data, but wanted to assess it and my measurement technique more systematically. JMP has an MSA (Measurement Systems Analysis) platform designed to help assess sources of variability in a measurement system. I wanted to learn more about the platform and use it to assess my measurements so far. I already knew I had outlier measurements in my data table. What’s the best way to identify and remove them? I needed to explore my data, evaluate my outlier filtering options, apply them, and assess how their removal affected my within-day measurement consistency. I’ll share what I discovered in future posts.

Post a Comment

Discovery Summit Europe: What's your story?

Speaker on stage at Discovery Summit Brussels

What's your JMP story? Submit an abstract.

At Discovery Summit Europe last March, I met many amazing people. You may not remember me, but I remember you. We met during delicious dinners, perky plenaries, and thought-provoking paper and poster presentations. Oh, and we bonded over a lovely glass of red wine or two, if I remember correctly.

You’re from Lufthansa, STATCON and Novozymes … GE, GSK and P&G … and you’re the brilliant young researchers at the University of Exeter.

Yes, I’m calling you out because now is the time to submit abstracts for consideration for Europe’s second Discovery Summit: 14-17 March 2016. This time, the Summit will be in Amsterdam, and it's shaping up to be another awesome event for analytic thinkers.

I know you have a good story to tell, but you need to let us know about! Time is running out to submit an abstract for consideration; in fact, the call for papers and posters closes on 30 September. You’ll find all of the details at the Discovery Summit Europe webpages.

Post a Comment

Exploring when to begin drawing Social Security benefits

How old will you be when you die? This may seem like an odd question to ask, but the US Social Security system is set up in such a way that this is a question retirees need to consider. Deciding when to begin collecting benefits greatly depends on how long retirees think they will be collecting benefits.

My interest in this all started when my wife and I discussed my father-in-law’s decision to wait until “full” retirement age to start collecting his Social Security benefits. As a result of that conversation, I used JMP to explore the data, and I wanted to share what I found. I'm not a financial planner, so please note that I'm not offering advice to anyone here.

My father-in-law was born in 1953, so "full" retirement age for him would be 66 years old for him. For those of you who are not familiar with the system, Social Security offers a sliding scale of benefits depending on the age you begin collecting. If full retirement age is 66 and the monthly benefit is, say, $1,000, the scale looks like the following JMP data table: (I got the formula from the Social Security site.)

Drawing age vs. Collection rate

This seems like a fairly straightforward calculation. If an early death is expected (because of family history or poor health, for instance), the analysis shows it makes sense to start collecting benefits at age 62. Assuming it is financially possible to wait – and a long life is expected – the table shows that it makes more sense to start collecting at age 70. Using the same scenario as above, JMP has broken down the optimum age for benefit collection by age at death.

Age at Death vs. amount collected_2

The table shows that up to the age of 76, it is in a retiree's best interest to start collecting Social Security at the age of 62. It also shows that retirees who live to 86 years old or longer should delay collecting Social Security until age 70. For retirees who die between age 77 and 86, the optimum age of benefit collection is also summarized in the table. The one thing I found most interesting about this analysis was that at no point was it best to retire at the “full” retirement age of 66, my father-in-law’s retirement age.

That calculation is pretty straightforward. But it got me thinking that if a retiree is able to wait until 70 to start collecting benefits, then there is not an immediate need for the money. That means this is really not an apples-to-apples comparison because a retiree can take the money out and invest it. If a retiree started drawing benefits at 62, but invested instead of spending the money, the graph looks different (this is assuming a 5% interest rate compounding monthly).

Money Accrued vs. Drawing age

If retirees are able to invest the money, it is in their best interest to start drawing Social Security at the age of 62, if they do not live longer than 81. However, if they live to 86 or more, the graph above shows that they should delay drawing Social Security until 70. The graph also shows the window where another age besides 62 or 70 is ideal. (The line peaks at the optimal timing for each age in this window.) The window shrinks when retirees invest.

Changing the interest rate did not have a huge impact on the result. If the interest rate is between 0 and 8.75%, that just changes the size of that window. The higher the interest rate (up to 8.75%), the smaller the window. Here also we see that in no scenario was it the best option to retire at “full” retirement age of 66.

There is another factor to consider also, and that is the cost of living adjustment or COLA. The COLA is in place to ensure the purchasing power of the Social Security income is not eroded by inflation. Over the past 10 years, this has averaged 2.6%. Factoring a 2.6% COLA into the equation does not change the results dramatically. If we add the COLA to the equation above with investing, it is in retirees' best interest to start drawing at 62, if they do not live longer than 80. If they live to 84 or more, they should delay drawing Social Security until age 70.

In summary, I found that in most cases, the best ages to begin collecting Social Security benefits are either 62 or 70. For those who expect a long life into their mid-80s, the best bet is delaying until age 70 to begin benefit collection.

Post a Comment

Beyond Spreadsheets: Amy Clayman, Voice Systems Engineering

“When building a predictive model, we find the JMP Pro interfaces to be very intuitive, allowing us to work closely with other JMP Pro users to build the model together.”

-- Amy Clayman, Data-Driven Decisions Circle, VCE

Amy ClaymanBeyond Spreadsheets is a blog series that highlights how JMP customers are augmenting their tools and processes for exploratory data analysis to make breakthrough discoveries. We talked with JMP users to learn more about how and why their organizations bring Excel data into JMP to create graphics and explore what-if scenarios.

Our fifth and final interview in the series is with Amy Clayman, Consultant at VSE and owner of Financial Fluency LLC. She has more than 20 years of experience in corporate finance and public accounting and has been featured in CFO magazine. She is also the president of the Jewish Community Relation Council of Southern New Jersey.

Fast Facts

  • JMP user since 2014
  • Favorite JMP features: Graph Builder, Profiler and modeling capabilities.
  • Proudest professional moment: We built a model in JMP Pro that accurately predicts customer behavior more than 85 percent of the time. We have integrated this algorithm into our data warehouse to share with other teams in our organization.

How long have you been a JMP user?

We selected JMP Pro a year ago. We had been exploring various predictive analysis software solutions for well over a year before selecting JMP Pro. We tested and used an Excel data mining add-in and various cloud-based solutions before we made our final decision.

Tell us a little bit about the function of your department and how it contributes to your organization’s mission.

Our department is referred to as the “Data-Driven Decisions Circle.” We are responsible for helping VSE use financial, operational and external data to guide the organization’s decision-making process. The group is dedicated to helping the company improve its profitability and ROI. For example, we are focused on the following initiatives:

  • Understanding and enhancing the entire customer experience so that we can improve interactions and increase spending levels.
  • Determining which media sources attract the most valuable customers and generate the best ROI.
  • Discovering which programs, campaigns or system enhancements help improve VSE’s ROI.

What do you like most about the type of work you do? 

I started at VSE over 10 years ago to initially support its accounting and reporting needs; later, I became involved in its data analytics project. I love working with the people! VSE has an amazingly talented and compassionate staff. They challenge you to do your best work, and they like to have fun.

Whether it’s people from the marketing, creative services, service delivery, technology or finance teams, every member of the organization wants to better understand what is happening and how we can improve the customer experience in a fiscally responsible manner. We often get inundated with questions that usually start with “Why did this happen?”, “What is the impact to the customer?”, and “How will this affect future revenues?” Sometimes I feel like the pathologist in a Law & Order episode. The company expects our group, in an unbiased manner, to dissect the event or problem and provide an explanation that will better enable the company to create a path to success.

What is a professional accomplishment of which you are most proud?

We built a model in JMP Pro to predict the annual spending category of a new member based on their behavior in their first seven days on the service. The model accurately predicts the correct category more than 85 percent of the time. We have integrated this algorithm into our data warehouse so these predictions are easily accessible to the media team.

This model allows us to help our highly skilled media team quickly understand the revenue opportunities of the member’s media choices. Historically, the team may have chosen to wait several weeks or months before pausing or extending a program. Now we can provide them with additional intelligence about the likely outcome to complement their decision-making process.

The magic in this process is centered on identifying and building the relevant data so that the algorithm tells the business user what is likely to happen with a high degree of confidence.

Why do you like most about using JMP Pro?

My group is responsible for educating and communicating to the key business users. We need to do this in a concise, thorough and organized fashion. The data visualization tools are an excellent starting point and allow us to communicate the trends quickly. Using features such as Graph Builder and Profiler allows us to tell the story – fast.

As our team better understands the data using the visualization tools, we then look to identify patterns or relationships. In JMP, we use the modeling features to help predict potential outcomes and identify which attributes have the strongest correlation to the predicted outcome. Communicating these patterns and relationships helps our key business users create an improved customer experience.

VSE is a highly collaborative environment. When building a predictive model, we find the JMP Pro interfaces to be very intuitive, allowing us to work closely with other JMP users to build the model together.

Have you used spreadsheet programs in the past to conduct your statistical analysis? If so, can you describe the pros and cons?

Yes, we have used several other programs.

From my perspective, the pros of Microsoft Excel:

  • Most finance professionals are comfortable with Excel, so the environment is familiar.
  • It has some data visualization features that can be easily manipulated.
  • It is inexpensive.

And the cons of Microsoft Excel:

  • Data visualization is very limited when compared to JMP Pro.
  • Selection of data modeling techniques is limited.
  • Ability to compare model results is limited.
  • Ability to clean and prep the data for modeling is limited.
  • There are some latency issues.
  • We have limited access to trained experts who exclusively support this product.

JMP allows us to more effectively understand, present and predict potential outcomes.

The most important advantage to selecting JMP Pro over the other spreadsheet tools is the access to JMP’s exceptional technical staff. Our technical resource representative guides us on how to best use the software and is constantly educating us on the best approaches to get the most out of the tool.

It is truly the combination of JMP Pro and the people at JMP that has helped us advance our mission to have data drive our decision-making process. We believe in our staff’s instincts, but we have an obligation to provide them with the most relevant information in the most intelligent fashion to help them lead our organization.

What advice or best practices would you give to other companies that are currently relying on spreadsheet tools to conduct statistical analyses?

Don’t be afraid of or overwhelmed by all of the functionality of JMP. We continue to migrate in stages. Your ability to grow as a professional in this field will be limited if you choose to only use a spreadsheet tool. The predictive analytics field is constantly evolving, and the tools and professionals you interact with will determine how effective you can be in this role. Do not sell yourself or your company short by using less sophisticated tools to address this need.

Want to learn how to uncover information you might miss from using spreadsheets alone? Watch the new webcast case study, Going Beyond Spreadsheet Analytics With Visual Data Discovery, to see how a sports equipment and apparel manufacturer digs deep into the data to improve a supply chain process that was not working.

Post a Comment

New book to spark enthusiasm for descriptive statistics and probability in the classroom

Book cover pageWhether you teach introductory statistics courses in engineering, economics or natural sciences, or master courses on applied statistics or probability theory, you’ll want to consider using a new book: Statistics with JMP: Graphs, Descriptive Statistics and Probability by Peter Goos and David Meintrup. Unlike other comparable books, it covers all levels of mathematical depth, statistical concepts and real-life applications. What sets the book apart is that it clearly shows mathematical derivations and presents a step-by-step guide to making calculations and graphs.

Peter GoosDavid MeintrupThe origin of the book is a series of lectures on descriptive statistics and probability presented in Dutch by Peter Goos at the faculty of Applied Economics of the University of Antwerp in Belgium.

Goos (who is also with the University of Leuven in Belgium) migrated the course demos, exercises and exam from Excel to JMP and teamed up with David Meintrup from University of Applied Sciences, Ingolstadt/Germany, to thoroughly revise, extend and translate the content into English. Goos and Meintrup are both passionate educators and longtime JMP users, so they were a dream team to work on this book.

The pair’s motivation to write this book was twofold: As expressed in their preface, they did not want to "sweep technicalities and mathematical derivations under the carpet.” For the sake of deepening the students’ understanding of statistical concepts, they showed all mathematical derivations in detail throughout the book.

Their second impetus was to “ensure that the concepts introduced in the book can be successfully put into practice.” Step-by-step instructions and numerous screenshots show how to generate graphs, calculate descriptive statistics and compute probabilities in JMP 12. They chose JMP “because it is powerful, yet easy to use.”

To illustrate the methods and to emphasize their usefulness, the book contains many examples involving real-life data from various application fields, including business, economics, sports, engineering and natural sciences. All data sets are available with stored scripts to easily reproduce figures, tables and analyses. The data files are wrapped by a JMP Journal and packaged as a JMP add-in (except two larger data sets that are available separately), making them ready to use in the classroom. This add-in is available as a resource from the JMP Academic Community, or with additional supporting material from the Wiley book companion website.

With the purchase of this book, you receive a 12-month license for JMP Student Edition. The software is directly available for download and can be activated using the code found in each hard copy; an electronic copy is also available upon request.

JMP Student EditionBut wait, there’s more! A companion book, “Statistics with JMP: Hypothesis Tests, ANOVA and Regression,” which follows the same approach, is planned for early 2016.

Book details: 978-1-119-03570-1, Hard cover, 368 pages, April 2015. Also available as an e-book on Amazon, Apple iBooks and Google Play. Visit the Wiley book page for a book index, a sample chapter or an evaluation copy.

Post a Comment

Top 5 Discovery Summit paper picks by SEs

Some mornings over coffee or tea, my husband asks me what I’ll be doing that day. I like this exercise because it helps me get mentally prepared for the various meetings and projects that await me in the office. Usually, I only make it to my 10:30 appointment before he bores of my calendar entries and moves on to the next topic. That either says a lot about my meetings or about his inability to focus before a second cup of coffee….I opt to believe it’s him and not my calendar.

But this morning was different. I told him I’d be spending my day working on Discovery Summit 2015, answering questions from Steering Committee members, telling our R&D leaders about this year’s conference facility, and working with our Systems Engineers (SE) to make sure there was at least one at every paper presentation. Bingo – I had Greg’s full attention.

Here’s why: Every year, we ask an SE to hang out in each of the rooms where papers are presented to ensure that any hiccups are addressed quickly. SEs are the natural choice for this assignment this because they are always brilliant and always curious. This is a win-win for all: Paper presentations run smoothly and the SE is guaranteed a seat in the room.

I start the process by asking which papers they’d most like to see, and then make room assignments by preference. One of the best things about this process is that I learn which papers pique the curiosity of our SEs – remember, they’re brilliant and curious people. My husband – always curious and, at times, brilliant – wanted to know which papers were most requested. While each and every paper had at least one SE request, these five were the most popular:

  1. Developing a Nondestructive Test Gauge Using the JMP Discriminant Analysis Platform
    By Jason Robbins, Process Engineer, US Synthetic
  2. An Interactive JMP Environment for Control Group Selection and Visualizing Impact From Promotional Uplift Experiments
    By Brian Eastwood, Market Analyst, Nu Skin Enterprises; John Salmon, PhD, Assistant Professor, Brigham Young University
  3. Transforming Consumer Surveys into Consumer Insight Using JMP 12
    By Mike Creed, Consumer Modeling Expert, Procter & Gamble; Diane Farris, Consumer Modeling Leader, Procter & Gamble
  4. Bias Adjustment in Data Mining
    By Stanley Young, PhD, CEO, CGStat; Bob Obenchain, PhD, Principal Consultant, Risk Benefit Statistics
  5. Truth and Lies: Consumer Perception vs. Data
    By Rosario Murguia, Consumer and Product Research Manager, Procter & Gamble; Diana Ballard, Senior Consulting Statistician, Predictum Inc.; Michael E. Haslam, PhD, Vice President of Application Development, Predictum Inc.

Now that you’re thinking about a few of the awesome breakout options you’ll have at Discovery Summit this year, you can begin planning which papers you want to see. And you’ll know that each and every talk will run smoothly because there will always be an SE on call, usually standing in the back of the room, ready to jump in at a moment’s notice.

Post a Comment

Beyond Spreadsheets: Mary Ann Shifflet, University of South Indiana

“I love being able to tell my students that if they learn JMP, it will be a skill they can put on their resume that will set them apart from other applicants.”

-- Mary Ann Shifflet, Professor, Romain College of Business, University of Southern Indiana

mshiffletBeyond Spreadsheets is a blog series that highlights how JMP customers are augmenting their tools and processes for exploratory data analysis to make breakthrough discoveries. We are featuring Q&As with JMP users to learn more about how and why their organizations bring Excel data into JMP to create graphics and explore what-if scenarios.

Our fourth interview in the series is with Mary Ann Shifflet, statistics professor and JMP advocate.

Fast Facts

  • JMP user since 2012
  • Favorite JMP features: Graph Builder, Distribution, Fit Model
  • Proudest professional moment: Receiving the Dean’s Award for Teaching Excellence in 2012 for redesigning an Elementary Statistics course to incorporate JMP.

What do you like most about the type of work you do? 

There are two things that I really enjoy about teaching statistics to college students. One is seeing the lightbulb come on for someone who has been trying to understand a difficult concept. The other is a little more subtle; it’s knowing that I am teaching them one of the most important elements of their course work in the College of Business. All business decisions require the use of data, so laying that foundation is critical. If you look at the skills required for a career in business, many of them have to do with data analysis and problem solving – the skills taught in my course.

Why do you use JMP in your teaching? 

I use JMP in my teaching because I wanted students to be able to do real data analysis when they leave my class – or at the very least, be able to summarize and interpret data. I wanted to give them a tool that would allow them to develop a little bit of confidence in using data for business decisions. I had used other programs in my professional practice, but JMP was the one that we selected.

Since incorporating JMP in the class, I am able to teach some relatively sophisticated regression modeling and diagnostics – sophisticated for a sophomore-level course, at least. This modeling would not be possible if we were doing the calculations by hand with scientific calculators or even if we were using Excel.

In what ways have you used Excel for teaching statistical analysis? How is using JMP different?

We, of course, use it [Excel] for data input and storage, but I have not used it much for analysis. I made the transition from using TI-84 calculators to JMP for analysis.

I find JMP much easier to use than Excel and easier to incorporate in the class.

There were a number of things to consider as we made this transition, not the least of which was availability. Any student who has a computer probably has Excel on it. Students didn’t have JMP available, which is why we felt it was essential to give them free access to it. We were able to do that through a campus-wide academic license.

I find JMP makes it very easy for me to teach students the steps to use in acquiring the necessary output. The ease of use, along with the support features available with JMP, make it ideal for my purposes.

How have students reacted to using JMP? 

The initial feedback from students in the first experimental section was very positive, leading us to transition all of my sections to the JMP model more quickly than we had originally planned. As time has passed, I have received mostly positive feedback. I have had students come back after leaving the class and tell me they were able to use JMP for projects in other classes and even in some team competitions. Several students were able to obtain very good internships due to their experience with JMP in the classroom.

Is there anything else you would like to mention? 

Since JMP is such an easy tool to use, I have the opportunity to focus much more on the business decision aspects of data analysis. As a result, students leave the class truly able to use data to support their business decisions.

I hear many people say that students need to be able to use Excel for data analysis since that may be their only data analysis option in many companies. While that may be true, I love being able to tell my students that if they learn JMP, it will be a skill they can put on their resume that will set them apart from other applicants. Knowing a software program like JMP communicates something different – and in my view, better – than simply knowing how to use Excel to analyze data.

Ready to go beyond spreadsheets? Visit and learn how to augment your existing processes for exploratory data analysis to uncover information you might miss from using spreadsheets alone.

Post a Comment