Assessing my Skulpt Aim data with JMP

In my last post, I mentioned that I have recently acquired several new quantified-self devices, including the Skulpt Aim. These new devices and the data from them have brought me greater opportunities to think more deeply about measurement systems.

When I begin using a new self-tracking device, I have the same kind of basic questions that any scientist or engineer might ask about a novel measurement tool:

  • Why does this tool interest me?
  • What does this device measure?
  • Does this new data confirm what I would expect?
  • Do changes over time represent important trends or random noise?

These questions have been on my mind over the past six weeks as I’ve collected daily data with the Aim, a body fat and muscle quality monitoring tool.

Why does this tool interest me?

I've blogged in my Fitness and Food series about my past body size fluctuations and how I adopted quantified-self practices such as food logging, activity monitoring and weight tracking with a wireless scale to reach a healthy weight maintenance zone. Tracking my diet, activity and weight over the past six years has helped me better understand how my food intake and exercise habits have affected my short- and long-term weight trends throughout my lifetime. However, since strength training is my workout of choice, body weight has always felt unsatisfying as a long-term success metric.

Weight history purple and aqua2

What does this device measure?

Unlike other methods I have tried before, the Aim provides two different metrics: % fat (the tried-and-true measure of body fat percentage) and a novel measure called muscle quality (MQ). In short, the device estimates % fat by passing a current through a specific body part and measuring its resistance. It uses the time between discharge of the current into the muscle and detection of the corresponding voltage measurement to calculate muscle quality. The basic idea is that larger, fitter muscle fibers retain current longer. (The Skulpt site has more information about how it works.)

The Aim estimates overall body fat percentage and average muscle quality through a four-site measurement, similar to the multisite approach used by caliper assessments. But the Aim’s real novelty is its ability to assess and report measures for individual muscle areas. This fills a gap in my quantified-self data collection by providing me a frequent and convenient way to quantify muscle maintenance and incremental changes in body areas due to training choices. I had seen the Aim online several months ago, but having the chance to try the device myself at the recent QS15 conference really sealed the deal.

To use the Aim, I spray water on the area I am going to measure and on the back of the device; I then set the device on a specific muscle area, following recommendations for device placement shown in the app’s embedded videos. A few seconds later, the Aim displays % fat and MQ for that area. When I fit a regression line to data points across all body parts in Graph Builder as shown in the first graph below, you can see that there is an inverse relationship between MQ and % fat. Intuitively, it makes sense that areas with higher muscle quality will tend to have lower fat percentages.

Percent fat vs muscle quality 8-29-15

However, adding Body part as an overlay variable in the second graph reveals that the MQ and % fat profiles of different muscle areas can vary greatly.

Percent fat vs muscle quality overlay 8-29-15

Does this new data confirm what I would expect?

To answer this question, I had to start collecting data! So for the six weeks, I have been performing three to five replicates of the Aim’s Quick Test each day. It uses measurements of my right side bicep, tricep, ab, and quadricep muscles to estimate my overall body fat. Every week or so, I also measure other individual muscle areas. I perform all these tests first thing in the morning before eating and drinking, right after I weigh myself. The graph below summarizes the number of measurements I have taken for different areas over this period of time.

N reps 8-30-15

Muscle quality is a new metric to me so I don’t have any past measurements for comparison. But the patterns in the data I collected indicate that the muscles that I train regularly and heavily tend to have the highest muscle quality (MQ) scores. As expected, areas that I haven’t trained regularly with weights in recent years (e.g., calves) have lower muscle quality scores. My abs are an interesting exception. I rarely train them directly, but their MQ scores are very high, probably because most weight training exercises require the use of abdominal muscles to stabilize the movement.

The best body fat data I have comes from a January 2014 DXA scan, which assessed me at 17.5% body fat at a dieted-down weight of 127.5 lbs. My recent quick test measurements with the Aim have been taken at a more typical maintenance weight around 135 lbs and estimate my % fat at 18-19%. Although my weight is not directly comparable to my weight on the day of my DXA, my results are in the ballpark of what I’d expect after adding in a few pounds for extra food and water in my system, a few pounds for extra body fat, and 1.75 years of training time.

I used my Skulpt data with a custom body map I created earlier this year in JMP to show mean MQ and % fat by body area (averaged over left and right sides). I reversed the color scales so the trends for each measure could be compared more easily. Like the body-part specific regression lines shown above, this graph also reflects the inverse relationship between MQ and % fat.

Mean MQ 8-30-15Mean fat 8-30-15

Do changes over time represent important trends or random noise?

I had some questions I wanted to answer before assessing how my workouts might affect my % fat and MQ measures in the short and long term. While casual Aim users might be satisfied by taking a single measurement daily or weekly, I expected my measurements to vary around the true mean for each body part/side combination due to random and systematic variables.

Without daily access to a gold standard test like DXA, I could not verify the accuracy of the Aim’s measurements, but that has never been my intent. I am much more interested in establishing a measurement routine that generates precise measurements each day so I can make sense of daily or weekly trends in the context of my weight, eating and workout variables. The Skulpt blog mentioned an expected between-day test-retest variation of 5%. Put another way, an area measured at 20% body fat one day would be expected to measure 20% +/-1% the next day. But I predicted that I would see variables like water weight impact my daily measurements, such that my true values could differ between days, so I was more concerned with assessing replicate measurements taken on the same day. Establishing within-day precision would be key to establishing a baseline for my MQ and % fat values.

To assess within-day variation, I used Graph Builder to create a graph of the standard deviations of my MQ and % fat measurements for the four sites I measure daily. I used a light blue shaded reference range to indicate the 1% fat and 1 MQ point standard deviation that I hoped to achieve.

Measurement variability

The variability trends I saw in my July measurements caused me to question and adjust my measurement techniques:

  • Early on, my within-day variability for my MQ and body fat scores was relatively high. I soon realized that I wasn’t following the Aim instructions to the letter. I began to spray the back of the unit before each and every rep, ensuring that the metal contacts were consistently soaked for each measurement. You can see this change begin to reduce the variability of my data around July 21.
  •  Once I made the above improvement, I started to notice a new pattern. My first rep for a muscle group seemed to be different than later reps. I confirmed this suspicion by examining my raw data. I theorized that perhaps this might happen because the device was wet before rep 1, but the muscle area itself was dry until after rep 1 was complete. I began spraying each body area before I started measuring, and this further improved my measurement consistency.

At the end of July, I noticed another disturbing trend. The standard deviation of my MQ measurement for my right bicep was trending up, not down! This affected the consistency of my four-site average. In puzzling it over, I concluded that since my bicep is a relatively small and narrow muscle, slight position changes probably affected its measurement more than for larger muscle groups.

I decided to test my theory by experimenting with the position of the device. For five reps (group 1), I made an effort to hold the unit slightly higher on my bicep area, and then moved it to a slightly lower position for five reps (group 2). The figure below illustrates how this affected my results. Although one rep in the higher position group had an MQ score of 125 (marked with a red x), the rest of the MQ scores in the higher position group were several points lower than those in the second group. It seemed clear that I needed to choose one of these positions and stick with it to obtain the most consistent measurements for this problematic muscle area.

Device position 9-2-15

Over subsequent days, I applied the lessons learned above and chose my bicep measurement area more consistently, reducing the SD(MQ) and SD(% fat) for biceps in my data set. At this point, I’m happy with being able to consistently measure MQ +/- 2 and % fat +/- 1% on most days for almost all areas, and the 4 position overall estimate that I take daily has fallen into a predictable range.

What's next with this data?

Given my initial adventures in measurement consistency above, I knew I had more work to do with this data set. I was continuing to collect daily data, but wanted to assess it and my measurement technique more systematically. JMP has an MSA (Measurement Systems Analysis) platform designed to help assess sources of variability in a measurement system. I wanted to learn more about the platform and use it to assess my measurements so far. I already knew I had outlier measurements in my data table. What’s the best way to identify and remove them? I needed to explore my data, evaluate my outlier filtering options, apply them, and assess how their removal affected my within-day measurement consistency. I’ll share what I discovered in future posts.

Post a Comment

Discovery Summit Europe: What's your story?

Speaker on stage at Discovery Summit Brussels

What's your JMP story? Submit an abstract.

At Discovery Summit Europe last March, I met many amazing people. You may not remember me, but I remember you. We met during delicious dinners, perky plenaries, and thought-provoking paper and poster presentations. Oh, and we bonded over a lovely glass of red wine or two, if I remember correctly.

You’re from Lufthansa, STATCON and Novozymes … GE, GSK and P&G … and you’re the brilliant young researchers at the University of Exeter.

Yes, I’m calling you out because now is the time to submit abstracts for consideration for Europe’s second Discovery Summit: 14-17 March 2016. This time, the Summit will be in Amsterdam, and it's shaping up to be another awesome event for analytic thinkers.

I know you have a good story to tell, but you need to let us know about it...now! Time is running out to submit an abstract for consideration; in fact, the call for papers and posters closes on 18 September. You’ll find all of the details at the Discovery Summit Europe webpages.

Post a Comment

Exploring when to begin drawing Social Security benefits

How old will you be when you die? This may seem like an odd question to ask, but the US Social Security system is set up in such a way that this is a question retirees need to consider. Deciding when to begin collecting benefits greatly depends on how long retirees think they will be collecting benefits.

My interest in this all started when my wife and I discussed my father-in-law’s decision to wait until “full” retirement age to start collecting his Social Security benefits. As a result of that conversation, I used JMP to explore the data, and I wanted to share what I found. I'm not a financial planner, so please note that I'm not offering advice to anyone here.

My father-in-law was born in 1953, so "full" retirement age for him would be 66 years old for him. For those of you who are not familiar with the system, Social Security offers a sliding scale of benefits depending on the age you begin collecting. If full retirement age is 66 and the monthly benefit is, say, $1,000, the scale looks like the following JMP data table: (I got the formula from the Social Security site.)

Drawing age vs. Collection rate

This seems like a fairly straightforward calculation. If an early death is expected (because of family history or poor health, for instance), the analysis shows it makes sense to start collecting benefits at age 62. Assuming it is financially possible to wait – and a long life is expected – the table shows that it makes more sense to start collecting at age 70. Using the same scenario as above, JMP has broken down the optimum age for benefit collection by age at death.

Age at Death vs. amount collected_2

The table shows that up to the age of 76, it is in a retiree's best interest to start collecting Social Security at the age of 62. It also shows that retirees who live to 86 years old or longer should delay collecting Social Security until age 70. For retirees who die between age 77 and 86, the optimum age of benefit collection is also summarized in the table. The one thing I found most interesting about this analysis was that at no point was it best to retire at the “full” retirement age of 66, my father-in-law’s retirement age.

That calculation is pretty straightforward. But it got me thinking that if a retiree is able to wait until 70 to start collecting benefits, then there is not an immediate need for the money. That means this is really not an apples-to-apples comparison because a retiree can take the money out and invest it. If a retiree started drawing benefits at 62, but invested instead of spending the money, the graph looks different (this is assuming a 5% interest rate compounding monthly).

Money Accrued vs. Drawing age

If retirees are able to invest the money, it is in their best interest to start drawing Social Security at the age of 62, if they do not live longer than 81. However, if they live to 86 or more, the graph above shows that they should delay drawing Social Security until 70. The graph also shows the window where another age besides 62 or 70 is ideal. (The line peaks at the optimal timing for each age in this window.) The window shrinks when retirees invest.

Changing the interest rate did not have a huge impact on the result. If the interest rate is between 0 and 8.75%, that just changes the size of that window. The higher the interest rate (up to 8.75%), the smaller the window. Here also we see that in no scenario was it the best option to retire at “full” retirement age of 66.

There is another factor to consider also, and that is the cost of living adjustment or COLA. The COLA is in place to ensure the purchasing power of the Social Security income is not eroded by inflation. Over the past 10 years, this has averaged 2.6%. Factoring a 2.6% COLA into the equation does not change the results dramatically. If we add the COLA to the equation above with investing, it is in retirees' best interest to start drawing at 62, if they do not live longer than 80. If they live to 84 or more, they should delay drawing Social Security until age 70.

In summary, I found that in most cases, the best ages to begin collecting Social Security benefits are either 62 or 70. For those who expect a long life into their mid-80s, the best bet is delaying until age 70 to begin benefit collection.

Post a Comment

Beyond Spreadsheets: Amy Clayman, Voice Systems Engineering

“When building a predictive model, we find the JMP Pro interfaces to be very intuitive, allowing us to work closely with other JMP Pro users to build the model together.”

-- Amy Clayman, Data-Driven Decisions Circle, VCE

Amy ClaymanBeyond Spreadsheets is a blog series that highlights how JMP customers are augmenting their tools and processes for exploratory data analysis to make breakthrough discoveries. We talked with JMP users to learn more about how and why their organizations bring Excel data into JMP to create graphics and explore what-if scenarios.

Our fifth and final interview in the series is with Amy Clayman, Consultant at VSE and owner of Financial Fluency LLC. She has more than 20 years of experience in corporate finance and public accounting and has been featured in CFO magazine. She is also the president of the Jewish Community Relation Council of Southern New Jersey.

Fast Facts

  • JMP user since 2014
  • Favorite JMP features: Graph Builder, Profiler and modeling capabilities.
  • Proudest professional moment: We built a model in JMP Pro that accurately predicts customer behavior more than 85 percent of the time. We have integrated this algorithm into our data warehouse to share with other teams in our organization.

How long have you been a JMP user?

We selected JMP Pro a year ago. We had been exploring various predictive analysis software solutions for well over a year before selecting JMP Pro. We tested and used an Excel data mining add-in and various cloud-based solutions before we made our final decision.

Tell us a little bit about the function of your department and how it contributes to your organization’s mission.

Our department is referred to as the “Data-Driven Decisions Circle.” We are responsible for helping VSE use financial, operational and external data to guide the organization’s decision-making process. The group is dedicated to helping the company improve its profitability and ROI. For example, we are focused on the following initiatives:

  • Understanding and enhancing the entire customer experience so that we can improve interactions and increase spending levels.
  • Determining which media sources attract the most valuable customers and generate the best ROI.
  • Discovering which programs, campaigns or system enhancements help improve VSE’s ROI.

What do you like most about the type of work you do? 

I started at VSE over 10 years ago to initially support its accounting and reporting needs; later, I became involved in its data analytics project. I love working with the people! VSE has an amazingly talented and compassionate staff. They challenge you to do your best work, and they like to have fun.

Whether it’s people from the marketing, creative services, service delivery, technology or finance teams, every member of the organization wants to better understand what is happening and how we can improve the customer experience in a fiscally responsible manner. We often get inundated with questions that usually start with “Why did this happen?”, “What is the impact to the customer?”, and “How will this affect future revenues?” Sometimes I feel like the pathologist in a Law & Order episode. The company expects our group, in an unbiased manner, to dissect the event or problem and provide an explanation that will better enable the company to create a path to success.

What is a professional accomplishment of which you are most proud?

We built a model in JMP Pro to predict the annual spending category of a new member based on their behavior in their first seven days on the service. The model accurately predicts the correct category more than 85 percent of the time. We have integrated this algorithm into our data warehouse so these predictions are easily accessible to the media team.

This model allows us to help our highly skilled media team quickly understand the revenue opportunities of the member’s media choices. Historically, the team may have chosen to wait several weeks or months before pausing or extending a program. Now we can provide them with additional intelligence about the likely outcome to complement their decision-making process.

The magic in this process is centered on identifying and building the relevant data so that the algorithm tells the business user what is likely to happen with a high degree of confidence.

Why do you like most about using JMP Pro?

My group is responsible for educating and communicating to the key business users. We need to do this in a concise, thorough and organized fashion. The data visualization tools are an excellent starting point and allow us to communicate the trends quickly. Using features such as Graph Builder and Profiler allows us to tell the story – fast.

As our team better understands the data using the visualization tools, we then look to identify patterns or relationships. In JMP, we use the modeling features to help predict potential outcomes and identify which attributes have the strongest correlation to the predicted outcome. Communicating these patterns and relationships helps our key business users create an improved customer experience.

VSE is a highly collaborative environment. When building a predictive model, we find the JMP Pro interfaces to be very intuitive, allowing us to work closely with other JMP users to build the model together.

Have you used spreadsheet programs in the past to conduct your statistical analysis? If so, can you describe the pros and cons?

Yes, we have used several other programs.

From my perspective, the pros of Microsoft Excel:

  • Most finance professionals are comfortable with Excel, so the environment is familiar.
  • It has some data visualization features that can be easily manipulated.
  • It is inexpensive.

And the cons of Microsoft Excel:

  • Data visualization is very limited when compared to JMP Pro.
  • Selection of data modeling techniques is limited.
  • Ability to compare model results is limited.
  • Ability to clean and prep the data for modeling is limited.
  • There are some latency issues.
  • We have limited access to trained experts who exclusively support this product.

JMP allows us to more effectively understand, present and predict potential outcomes.

The most important advantage to selecting JMP Pro over the other spreadsheet tools is the access to JMP’s exceptional technical staff. Our technical resource representative guides us on how to best use the software and is constantly educating us on the best approaches to get the most out of the tool.

It is truly the combination of JMP Pro and the people at JMP that has helped us advance our mission to have data drive our decision-making process. We believe in our staff’s instincts, but we have an obligation to provide them with the most relevant information in the most intelligent fashion to help them lead our organization.

What advice or best practices would you give to other companies that are currently relying on spreadsheet tools to conduct statistical analyses?

Don’t be afraid of or overwhelmed by all of the functionality of JMP. We continue to migrate in stages. Your ability to grow as a professional in this field will be limited if you choose to only use a spreadsheet tool. The predictive analytics field is constantly evolving, and the tools and professionals you interact with will determine how effective you can be in this role. Do not sell yourself or your company short by using less sophisticated tools to address this need.

Want to learn how to uncover information you might miss from using spreadsheets alone? Watch the new webcast case study, Going Beyond Spreadsheet Analytics With Visual Data Discovery, to see how a sports equipment and apparel manufacturer digs deep into the data to improve a supply chain process that was not working.

Post a Comment

New book to spark enthusiasm for descriptive statistics and probability in the classroom

Book cover pageWhether you teach introductory statistics courses in engineering, economics or natural sciences, or master courses on applied statistics or probability theory, you’ll want to consider using a new book: Statistics with JMP: Graphs, Descriptive Statistics and Probability by Peter Goos and David Meintrup. Unlike other comparable books, it covers all levels of mathematical depth, statistical concepts and real-life applications. What sets the book apart is that it clearly shows mathematical derivations and presents a step-by-step guide to making calculations and graphs.

Peter GoosDavid MeintrupThe origin of the book is a series of lectures on descriptive statistics and probability presented in Dutch by Peter Goos at the faculty of Applied Economics of the University of Antwerp in Belgium.

Goos (who is also with the University of Leuven in Belgium) migrated the course demos, exercises and exam from Excel to JMP and teamed up with David Meintrup from University of Applied Sciences, Ingolstadt/Germany, to thoroughly revise, extend and translate the content into English. Goos and Meintrup are both passionate educators and longtime JMP users, so they were a dream team to work on this book.

The pair’s motivation to write this book was twofold: As expressed in their preface, they did not want to "sweep technicalities and mathematical derivations under the carpet.” For the sake of deepening the students’ understanding of statistical concepts, they showed all mathematical derivations in detail throughout the book.

Their second impetus was to “ensure that the concepts introduced in the book can be successfully put into practice.” Step-by-step instructions and numerous screenshots show how to generate graphs, calculate descriptive statistics and compute probabilities in JMP 12. They chose JMP “because it is powerful, yet easy to use.”

To illustrate the methods and to emphasize their usefulness, the book contains many examples involving real-life data from various application fields, including business, economics, sports, engineering and natural sciences. All data sets are available with stored scripts to easily reproduce figures, tables and analyses. The data files are wrapped by a JMP Journal and packaged as a JMP add-in (except two larger data sets that are available separately), making them ready to use in the classroom. This add-in is available as a resource from the JMP Academic Community, or with additional supporting material from the Wiley book companion website.

With the purchase of this book, you receive a 12-month license for JMP Student Edition. The software is directly available for download and can be activated using the code found in each hard copy; an electronic copy is also available upon request.

JMP Student EditionBut wait, there’s more! A companion book, “Statistics with JMP: Hypothesis Tests, ANOVA and Regression,” which follows the same approach, is planned for early 2016.

Book details: 978-1-119-03570-1, Hard cover, 368 pages, April 2015. Also available as an e-book on Amazon, Apple iBooks and Google Play. Visit the Wiley book page for a book index, a sample chapter or an evaluation copy.

Post a Comment

Top 5 Discovery Summit paper picks by SEs

Some mornings over coffee or tea, my husband asks me what I’ll be doing that day. I like this exercise because it helps me get mentally prepared for the various meetings and projects that await me in the office. Usually, I only make it to my 10:30 appointment before he bores of my calendar entries and moves on to the next topic. That either says a lot about my meetings or about his inability to focus before a second cup of coffee….I opt to believe it’s him and not my calendar.

But this morning was different. I told him I’d be spending my day working on Discovery Summit 2015, answering questions from Steering Committee members, telling our R&D leaders about this year’s conference facility, and working with our Systems Engineers (SE) to make sure there was at least one at every paper presentation. Bingo – I had Greg’s full attention.

Here’s why: Every year, we ask an SE to hang out in each of the rooms where papers are presented to ensure that any hiccups are addressed quickly. SEs are the natural choice for this assignment this because they are always brilliant and always curious. This is a win-win for all: Paper presentations run smoothly and the SE is guaranteed a seat in the room.

I start the process by asking which papers they’d most like to see, and then make room assignments by preference. One of the best things about this process is that I learn which papers pique the curiosity of our SEs – remember, they’re brilliant and curious people. My husband – always curious and, at times, brilliant – wanted to know which papers were most requested. While each and every paper had at least one SE request, these five were the most popular:

  1. Developing a Nondestructive Test Gauge Using the JMP Discriminant Analysis Platform
    By Jason Robbins, Process Engineer, US Synthetic
  2. An Interactive JMP Environment for Control Group Selection and Visualizing Impact From Promotional Uplift Experiments
    By Brian Eastwood, Market Analyst, Nu Skin Enterprises; John Salmon, PhD, Assistant Professor, Brigham Young University
  3. Transforming Consumer Surveys into Consumer Insight Using JMP 12
    By Mike Creed, Consumer Modeling Expert, Procter & Gamble; Diane Farris, Consumer Modeling Leader, Procter & Gamble
  4. Bias Adjustment in Data Mining
    By Stanley Young, PhD, CEO, CGStat; Bob Obenchain, PhD, Principal Consultant, Risk Benefit Statistics
  5. Truth and Lies: Consumer Perception vs. Data
    By Rosario Murguia, Consumer and Product Research Manager, Procter & Gamble; Diana Ballard, Senior Consulting Statistician, Predictum Inc.; Michael E. Haslam, PhD, Vice President of Application Development, Predictum Inc.

Now that you’re thinking about a few of the awesome breakout options you’ll have at Discovery Summit this year, you can begin planning which papers you want to see. And you’ll know that each and every talk will run smoothly because there will always be an SE on call, usually standing in the back of the room, ready to jump in at a moment’s notice.

Post a Comment

Beyond Spreadsheets: Mary Ann Shifflet, University of South Indiana

“I love being able to tell my students that if they learn JMP, it will be a skill they can put on their resume that will set them apart from other applicants.”

-- Mary Ann Shifflet, Professor, Romain College of Business, University of Southern Indiana

mshiffletBeyond Spreadsheets is a blog series that highlights how JMP customers are augmenting their tools and processes for exploratory data analysis to make breakthrough discoveries. We are featuring Q&As with JMP users to learn more about how and why their organizations bring Excel data into JMP to create graphics and explore what-if scenarios.

Our fourth interview in the series is with Mary Ann Shifflet, statistics professor and JMP advocate.

Fast Facts

  • JMP user since 2012
  • Favorite JMP features: Graph Builder, Distribution, Fit Model
  • Proudest professional moment: Receiving the Dean’s Award for Teaching Excellence in 2012 for redesigning an Elementary Statistics course to incorporate JMP.

What do you like most about the type of work you do? 

There are two things that I really enjoy about teaching statistics to college students. One is seeing the lightbulb come on for someone who has been trying to understand a difficult concept. The other is a little more subtle; it’s knowing that I am teaching them one of the most important elements of their course work in the College of Business. All business decisions require the use of data, so laying that foundation is critical. If you look at the skills required for a career in business, many of them have to do with data analysis and problem solving – the skills taught in my course.

Why do you use JMP in your teaching? 

I use JMP in my teaching because I wanted students to be able to do real data analysis when they leave my class – or at the very least, be able to summarize and interpret data. I wanted to give them a tool that would allow them to develop a little bit of confidence in using data for business decisions. I had used other programs in my professional practice, but JMP was the one that we selected.

Since incorporating JMP in the class, I am able to teach some relatively sophisticated regression modeling and diagnostics – sophisticated for a sophomore-level course, at least. This modeling would not be possible if we were doing the calculations by hand with scientific calculators or even if we were using Excel.

In what ways have you used Excel for teaching statistical analysis? How is using JMP different?

We, of course, use it [Excel] for data input and storage, but I have not used it much for analysis. I made the transition from using TI-84 calculators to JMP for analysis.

I find JMP much easier to use than Excel and easier to incorporate in the class.

There were a number of things to consider as we made this transition, not the least of which was availability. Any student who has a computer probably has Excel on it. Students didn’t have JMP available, which is why we felt it was essential to give them free access to it. We were able to do that through a campus-wide academic license.

I find JMP makes it very easy for me to teach students the steps to use in acquiring the necessary output. The ease of use, along with the support features available with JMP, make it ideal for my purposes.

How have students reacted to using JMP? 

The initial feedback from students in the first experimental section was very positive, leading us to transition all of my sections to the JMP model more quickly than we had originally planned. As time has passed, I have received mostly positive feedback. I have had students come back after leaving the class and tell me they were able to use JMP for projects in other classes and even in some team competitions. Several students were able to obtain very good internships due to their experience with JMP in the classroom.

Is there anything else you would like to mention? 

Since JMP is such an easy tool to use, I have the opportunity to focus much more on the business decision aspects of data analysis. As a result, students leave the class truly able to use data to support their business decisions.

I hear many people say that students need to be able to use Excel for data analysis since that may be their only data analysis option in many companies. While that may be true, I love being able to tell my students that if they learn JMP, it will be a skill they can put on their resume that will set them apart from other applicants. Knowing a software program like JMP communicates something different – and in my view, better – than simply knowing how to use Excel to analyze data.

Ready to go beyond spreadsheets? Visit www.jmp.com/beyond and learn how to augment your existing processes for exploratory data analysis to uncover information you might miss from using spreadsheets alone.

Post a Comment

Beyond Spreadsheets: Bruce McCullough, Drexel University

“I want my students focusing on statistical methods, not on software.”
-- Bruce McCullough, Professor, LeBow College of Business, Drexel University

mccullough_bBeyond Spreadsheets is a blog series that highlights how JMP customers are augmenting their tools and processes for exploratory data analysis to make breakthrough discoveries. We talk with JMP users to understand how and why their organizations bring Excel data into JMP. Our third interview in this series is with Professor Bruce McCullough, who teaches statistics at Drexel University.

Fast Facts

  • JMP user since 2011
  • Favorite JMP features: Its superior visualization capabilities
  • Proudest professional moment: Leading the American Economic Review to adopt a mandatory data-code archive for its articles

In which courses do you and your students use JMP?
I teach statistics to undergrads, MBAs, master’s degree students in business analytics, and PhD students. I use a variety of statistical packages in my own work. I use JMP when I teach undergrads and MBAs because it’s powerful and easy to learn. I have been using JMP in my teaching for about four years.

What do you like most about the type of work you do?
Every day is different, and if a day is boring, it’s my fault. I have written on the accuracy of statistical software, the replicability of published research and other topics. Lately, I have refocused my efforts toward data mining and predictive analytics.

What is a professional accomplishment of which you are most proud?
I co-authored the article that led the American Economic Review to adopt a mandatory data-code archive for the articles it publishes. Prior to this, research in economics journals was largely not replicable. After the American Economic Review took its action, all the other top economics journals followed suit.

Why do you use JMP in your teaching?
I want my students focusing on statistical methods, not on software. I need software that is easy to use, accurate, and available for PCs and Macs. Only a very few packages meet these requirements. I gave up on Minitab when Macs got popular because it was available only for PCs. I tried R with a GUI for a couple years, but the students just had too many problems with it. So I tried JMP, and found that it met all my needs. One principal advantage of JMP over the other packages that I considered is its superior visualization capabilities, which are either nonexistent or harder to use in the other packages.

How have students reacted to using JMP?
Students find JMP easy and intuitive. Of course, it was designed from the ground up for the sole purpose of doing statistics. All the statistical functionality of Excel is merely a kludge grafted onto a spreadsheet. Remember, spreadsheets are for accountants to balance numbers, not for doing statistics.

Is there anything else you would like to mention?
Many years ago I was on a panel with Jon Cryer, a statistician from the University of Iowa, and he showed a picture of a man holding a handsaw standing next to a giant redwood tree, captioned thusly: “Get the right tool for the right job! Friends don't let friends use Excel for statistics!”

Ready to go beyond spreadsheets? Visit http://www.jmp.com/beyond and learn how to gather and prepare data for analysis from multiple sources.

Post a Comment

Beyond Spreadsheets: Tom Treynor, Zymergen

“The software you use not only shapes what you learn from your data; it shapes the questions you ask!”

-- Tom Treynor, Director, Zymergen

TomTreynorThe Beyond Spreadsheets blog series shows how JMP customers are augmenting their tools and processes for exploratory data analysis to make breakthrough discoveries. The series features Q&As with JMP users to learn more about how and why their organizations bring Excel data into JMP to create graphics and explore what-if scenarios. This week's Q&A is with Tom Treynor, Director at Zymergen, and an experienced scientist, chemist and quality engineer.

Fast Facts

  • JMP user since 2011
  • Favorite JMP feature: JMP Profiler
  • Proudest professional moment: "It was pretty cool the first time we reduced the unexplained variation in a fermentation process from over 10% to under 2%, yet it gets cooler every time we do it!"

Tell us a little bit about the function of your department and how it contributes to your organizations mission.

Zymergen is harnessing the power of biology to make chemical products that are good for business, people and the environment. Every quarter my department, Test Operations, is making hundreds of thousands of measurements and distilling them into a handful of critical decisions about which engineered microbes should deliver the best process economics at industrial fermentation scales. However, the high throughput and low volumetric scales with which we operate at Zymergen pose significant false positive and false negative risks, respectively, so we also distill our data into critical decisions about how to improve the processes we use to improve the microbes.

What do you like most about the type of work you do?

Every week at work, I learn new things about biology, chemistry and physics, because every week my team is deriving new mechanistic insights through testing and refining our statistical models. Even now, when there are hundreds of things already tracked in our databases, we are still innovating better ways to measure them and discovering new things worth measuring.

What do you like most about using JMP?

The folks who built JMP recognize that any problem worth solving has both multiple factors and multiple competing responses (e.g., cost, quality, speed). Although JMP has many capabilities that derive from this insight (for example, the ability to decorate columns with information such as Specification Limits), they all come together so nicely in JMP’s Profiler platform. Another of JMP’s greatest capabilities is the way its user interface has been so well designed for training scientists and engineers to become applied statisticians themselves.

What is a professional accomplishment of which you are most proud?

It was pretty cool the first time we reduced the unexplained variation in a fermentation process from over 10% to under 2%, yet it gets cooler every time we do it! Although it is theoretically possible to achieve the same increase in decision quality by buying and operating more than 25 times as many bioreactors, I think it is more satisfying to increase the productivity of my team by an order of magnitude than to grow it by that amount. I can't wait to tackle some of the most challenging fermentation processes in my industry!

How is using JMP different from using spreadsheets to conduct statistical analysis?

The greatest power of spreadsheet programs is also their greatest weakness: They let you put anything anywhere. As my colleagues make the transition to JMP, they eventually realize the flexibility they have lost is not a bug, but a feature of their new statistical software. For example, no longer do they need to copy and paste their equations to other cells in their spreadsheet, since a JMP formula is applied automatically to every row in its column. Although it takes some practice to write formulae that explicitly incorporate all the conditional statements they had previously implied by selectively aggregating or copying and pasting in a spreadsheet, developing this skill makes it so much easier for them to read the many, many stories that each measurement has to tell.

What advice or best practices would you give to other companies that are currently relying on spreadsheet tools to conduct statistical analyses?

The software you use not only shapes what you learn from your data; it shapes the questions you ask! For example, we all learned how to make histograms in grade school, and yet so few of our colleagues have ever made a histogram in a spreadsheet. Although it can be done, and it doesn’t even take long once you know how to do it, making histograms in a spreadsheet is simply not as easy as making a scatterplot. As a result, our colleagues almost always default to making scatterplots without even considering other kinds of analysis. In fact, I bet few of you reading this even thought to make a histogram in your workplace until you started using JMP – and now you make histograms all the time. Why? Because with JMP you can make countless histograms, visualize outliers, test for normality and fit each distribution in under 10 seconds. JMP makes it so easy to ask the right questions and to perform the right analyses once you know how to use it.

Ready to go beyond spreadsheets, too? Visit www.jmp.com/beyond to gain key insights on visualization, including how to determine what graphs should be used in certain situations for “what-if” analysis.

Post a Comment

Chocolate smackdown: The final analysis

Recently, my colleague Ryan Lekivetz wrote about our trip to Discovery Summit Europe in Brussels and our plan to test whether Belgian chocolate was really better-tasting than US chocolate. Ryan has blogged in detail about the constraints of designing the study, as well as the factors involved. In this blog entry, I’ll describe how we collected the data and analyzed the results.

Sensory analysis is a big topic with a well-developed literature (MacFie & Thomson 1994 and Næs T, Brockhoff PB, Tomic O. (2010) are good references to learn about the topic). We just wanted to answer one question: In a blind taste test, would people prefer the Belgian chocolate we’d carried across the Atlantic Ocean — or the US chocolate we could purchase at the corner grocery? Given the dollar/euro exchange rate and the limited space in my suitcase, this was an important question to answer before Discovery Summit Europe 2016!

Ryan’s designed experiment not only balanced on our two hypothesized effects (origin and cacao content), but it also randomized the question order. Study participants get tired, especially in taste tests, so we needed to make sure people were seeing their chocolates in the random orders assigned without giving away which chocolates they were tasting. I was also tasked with finding a US chocolate that was of comparable quality to our Belgian chocolate. More on that later.

Clean data collection requires preparation.

Clean data collection requires preparation.

I therefore spent an evening at my kitchen table marking paper plates with choice set numbers, Survey IDs and a “Start Here” arrow (see above). Each of the survey subjects would get a plate with five side-by-side pairs of chocolates. There are 10 pieces of chocolate on each plate, and 25 plates in all (250 pieces total).

Sheila

Sheila Loring, industriously sorting chocolates.

Each side-by-side pair represents a choice set in the experiment, and each subject had to choose which chocolate of each pair (choice set) he or she preferred.

Setting up those 25 pieces of chocolate was not an easy task. Many thanks to our documentation specialist Sheila Loring and JMP tester Bryan Shepherd for their assistance. (Strangely, they wouldn’t accept extra chocolate as a thank you.)

Once we had the paper plates loaded, it was time to run the surveys. We did our surveys in two groups so we didn’t have to wrangle 25 participants at once; at the beginning of each session, we made sure the participants knew how to sample their chocolate and mark their plates.

If you try this at home, I strongly recommend running the surveys just after lunch. It’s hard to make your test subjects listen to orientation when there’s delicious chocolate sitting right in front of them.

Two of our test subjects, pondering the mysteries of chocolate.

Two of our test subjects, pondering the mysteries of chocolate.

Once we had the data collected, I entered it into JMP. My first job was to make sure we’d collected the data correctly, and for that I used the Categorical platform. Ryan’s design was balanced, but we’d also collected respondent information (chocolate preference and JMP division) as part of our survey information. I also wanted to make sure my data entry was correct.

Note: This was run on my choice response data, which has 10 rows per participant, so there are actually five people who don’t prefer dark chocolate, not 50.

Figure 1

Figure 1

I did find some data entry errors using Categorical. Sometime during the data entry process, I got tired of typing “Development” over and over and started typing “Dev” instead. I also had to change how I measured chocolate preference because instead of checking “Milk” or “Dark,” two of my survey respondents got industrious and wrote in “Neither.” Figure 1 shows the data after I used Cols->Utilities->Recode to clean it up.

Once I my data was clean, it was time to use the Choice platform. Analyze->Consumer Research->Choice took me to the dialog, where I was able to specify how the data were collected and enter the effects.

Figure 2

Figure 2

In Figure 2, you can see that the Grouping columns are: Participant ID, Survey ID, and Choice Set ID. This tells JMP which column IDs uniquely identify a single choice set for a given participant. See the Choice documentation for more information on how to specify grouping columns for a single-table survey.

Figure 3

Figure 3

Once I had my grouping columns identified, it was time to specify the effects. Figure 3 shows the effects I first entered into the model. Notice that Chocolate Preference and JMP Division only appear as interaction effects. Choice models always compare the Choice a subject made to all of the choices he/she was shown in a given choice set, which means you can’t estimate a main effect (fixed) for something that is subject-specific (like whether a person prefers dark chocolate). You can only estimate how it affects subjects’ preferences for product-level factors (like percent cacao in a piece of plain chocolate).

Figure 4

Figure 4

For this analysis, I was really glad the Choice model dialog stays open even after you click the “Run” button. Using the AIC (lower is better) and the Likelihood Ratio tests, I was able to whittle my model effects down fairly quickly. Figure 4 shows my final model. You can see that origin isn’t statistically significant at the 95% level. I left it in since it was our variable of interest, and I wanted to discuss it in context with the other effects, but I’m pretty sure I had my answer by Figure 4.

Figure 5 illustrates the results using the Probability Profiler:

Figure 5

Figure 5

In this case, my comparison group is the Belgian chocolate with 41% cacao, and I make sure the subject effects are consistent (so dark-chocolate lovers are compared to other dark-chocolate lovers).

As you might have guessed, dark-chocolate lovers prefer dark chocolate (as you can see from the fact that the line goes up, and then up again as cacao content increases). They also appear to have a slight preference for Belgian chocolates over US. When I move the toggle for both the base and comparison products preferences to “Not Dark,” the slight preference for Belgian chocolate remains, but you can see a sharp drop in “Probability of Purchase” between 41% and 60% cacao. (Figure 6).

Figure 6

Figure 6

In fact, the Probability Profiler shows that if you presented our five test subjects who did not express a preference for dark chocolate with two pieces of chocolate that only differed by cacao (one 41% cacao and the other 60% cacao), the our model predicts only a 1% chance that they would choose the dark chocolate. Percentage of cacao is the most important factor in predicting which one people will prefer.

So the winner is...

If we compare US vs. Belgian, keeping the cacao content and chocolate preferences constant, we see that there’s a much smaller difference in the predicted probabilities. If we have two pieces of chocolate with the same cacao content, one from the US and one from Belgium, and asked someone to choose between them, the model predicts a 39% probability that they would choose the US chocolate, and a 61% probability that they would choose the Belgian chocolate. One could argue there's a slight preference for Belgian chocolate.

I did try some other variations, including an analysis without the five milk-chocolate lovers. Origin was statistically significant in some of the competing models, but the effect stayed the same: 39% probability US vs. 61% Belgium. The data is available on the JMP User Community if you want to try some analyses of your own.

Now, a study of just 25 participants is pretty small, and if I were working with a customer, I’d caution against making a huge decision based on my analysis. However, when it comes to deciding whether I’m going to stuff my suitcase with heavy, melty, space-consuming chocolate on my next trip to Europe, I think I have my answer.

  • The most important thing is to make sure I bring milk chocolate for my friends who like milk chocolate and dark chocolate for my friends who like dark chocolate.
  • Controlling for that factor, Belgian chocolate might be slightly better (though I can't prove it's significantly better), but the effect is small.
  • There’s at least one chocolate store in Raleigh, North Carolina, that did great in our blind taste test -- and it has ice cream as well!

References

  • MacFie, H. & Thomson, D. (Eds.) (1994) Measurement of Food Preferences. London: Blackie.
  • Næs T, Brockhoff PB, Tomic O. (2010) Statistics for Sensory and Consumer Science. London: John Wiley & Sons.
Post a Comment