Where are they now? How blog posts fare over time

As I approach the one-year anniversary of my first post for this blog, I’ve grown interested in how the popular JMP Blog posts from the past have done in the years that follow. In other words, do the popular posts continue to be popular?

A good starting point for an initial investigation is the top 10 JMP blogs from 2011. I looked at the time period right after that list was published to just a few weeks ago: from Jan. 26, 2012, to Oct.31, 2014. The data available to me comes from Google Analytics. If you pay attention to the view counts, you might notice the discrepancy between the Google Analytics views and the total views shown on the blog entries themselves – that topic is beyond the scope of this post (it appears to be a common problem when comparing the reports of different web analytics tools, and there are many possible reasons for it).

Changing Ranks
If I were to rank the top 10 JMP Blog posts of 2011 now, the list would look a bit different (change in rank in parentheses):

  1. Saving Graphs, Tables, and Reports in JMP (2010) (+1)
  2. How to Make Tornado Charts in JMP (2008) (+2)
  3. What Good Are Error Bars? (2008) (-2)
  4. JMP into R! (2010) (+2)
  5. Principal Variance Components Analysis (2010) (0)
  6. Keyboard Tips CTRL/Click, ALT/Click, CTRL/ALT/Click (2010) (+1)
  7. Solar Panel Output Versus Temperature (2009) (+1)
  8. The Best Karts in Mario Kart Wii: A Mother’s Day Story (2008) (-5)
  9. Set the Right Selling Price for Christmas Cookies (2008) (+1)
  10. How to Make Practical Sense of Data and Win a Book (2011) (-1)

If you prefer to see this visually, below is a slopegraph created in Graph Builder comparing the 2011 rankings to the 2014 view count since the original list was published.

slopegraph2011blogsThe drop in rank of some posts isn't so surprising. It may be that Mario Kart Wii isn't as popular a game as it has been. And it makes sense that a blog post about a book giveaway (How to Make Practical Sense of Data and Win a Book) holds much less interest now that the contest has long ended. Why do you think particular posts rose or fell in the ranks?

Posts With Staying Power
The change in ranking tells only part of the story. Fortunately, I also have the weekly view counts available to me. Below is the cumulative view count of each of the blog entries over time. I've highlighted the top three in a different color and labeled them.

cumulative2011blogs
It was interesting to me to see that it's the technical posts and, particularly, the “how to” articles that seem to stand the test of time quite well. The "Saving Graphs, Tables, and Reports in JMP" is the most noteworthy to me, as even in 2014 it remains among the most popular posts for the year.

A natural next question is: What were the most-viewed blog posts during that time frame, ignoring the original list?

Here's what I found:

  1. Saving Graphs, Tables, and Reports in JMP (2010)
  2. Graphical output options using JMP (2011)
  3. Image analysis of an elephant's foot in JMP (2013)
  4. Introducing definitive screening designs (2012)
  5. “The desktop is dead” and other myths (2013)
  6. George Box: A remembrance (2013)
  7. US ZIP code map files in JMP (2012)
  8. Train, Validate and Test for Data Mining in JMP (2010)
  9. Ordering categorical graph elements (2011)
  10. JSL tip: finding unique values in a list (2011)

Are any of your favorite JMP Blog posts missing from this list? Leave me a comment below and let me know your favorites. Thanks for reading!

Post a Comment

What I love about Discovery Summit

John Sall shows the latest version of JMP at Discovery Summit 2014.

A high point of Discovery Summit is John Sall's tour of the latest version of JMP. At the conference in Brussels in March, Sall will show JMP 12.

I attended my first Discovery Summit conference in 2009. Held at the beautiful Swissôtel in Chicago, it included a keynote speech by Malcolm Gladwell (author of Outliers) and an evening of reminiscing by none other than George E.P. Box!

Gladwell gave a memorable speech in which he discussed, among other topics, the problem of having too much information. When Enron was questioned about its accounting practices, the company overwhelmed us with information – so much so that almost no one could wade through all the data and documents. This is the data challenge all of us face now, Gladwell said.

Everything about that conference was first class – the views from the top-floor reception, the food, the speakers and the participants. I’d been to a lot of conferences (maybe too many), but this was special. Even after three days of talks, I didn’t want the conference to end.

With a bar set that high in 2009, I thought JMP could not possibly host a conference like that again. But I was wrong, and I’ve been wrong five times – the number of Discovery Summits held since then. Here’s why Discovery Summit is always the highlight of my conference year.

Keynote speeches
The keynote speeches continue to impress. In 2010, Dan Ariely, author of The Upside of Irrationality, told us about how people make seemingly irrational choices; this happens almost by default because we often have too many choices, and some organizations take advantage of this fact. He shared the example of The Economist magazine’s subscription pricing. The magazine offered online subscriptions for $59, print subscriptions for $125, and print and online for $125. Most people selected the print-and-online option. No one (rationally) chose print only. However, when the print-only option was removed, the percentage of people opting for $59 online jumped from 16 to 68 percent, while the percentage of people who chose $125 print-and-online combo dropped from 84 to 32 percent.

In 2013, the San Antonio conference featured Nate Silver, who gave a talk based on his book The Signal and the Noise. I recall he said that sometimes the best answer is “I don’t know,” but it’s hard for experts to say that. This made me think of John Tukey’s Type III error – finding the right answer to the wrong question.

Informal discussions
One of the best parts of Discovery Summit is the opportunity to network and exchange ideas with an amazing collection of thought leaders, researchers and authors. That’s what I want when I commit to spending three days away from home, and no other conference so consistently delivers it.

I’ve made new friends, reconnected with old friends and gotten ideas for new projects from every Summit I’ve attended.

Access to JMP staff
The very people who create JMP give presentations at the conference and host “Meet the Developers” sessions – where you can ask your specific JMP questions, tell what new features you’d like to see in the software and watch a live demo to learn something totally new. The chance to see the latest wizardry from the JMP developers is worth the price of admission by itself. Watching John Sall analyze entire data sets seemingly without effort and seeing what’s new in the latest version of JMP are always high points as well.

All in all, it’s a conference that’s hard to beat for anyone who explores, analyzes or models data.

I’m happy to say I will be a keynote speaker at Discovery Summit Europe in March. I’ll be reporting on my work studying the performance of aging athletes in running and swimming. Did you know that the record marathon time for the age 90+ group is 5 hours 40 minutes (!?) or that the 100 m record time for those 100 and older is 23.14 sec?

Because of the increased participation of older athletes, we now have a much better sense of how masters’ athletes compare to their 35-year-old counterparts. Would you guess that older athletes fare relatively better in sprints or long-distance events? Men or women? To find out, book your flight to Brussels and join me for the first Discovery Summit Europe!

Post a Comment

Analytically Speaking: Q&A with reliability expert Bill Meeker

20141107_140731Earlier this month, we had the pleasure of hosting Bill Meeker, Distinguished Professor of Liberal Arts and Sciences at Iowa State University, on our Analytically Speaking webcast series. Bill is a noted author and expert in the areas of reliability data analysis, reliability test planning, accelerated testing, nondestructive evaluation and statistical computing. Our viewers had so many good questions for Bill during the live webcast, we didn’t have time to include all of them. For those that were in the queue, Bill has kindly provided answers.

Question: Is there a link or relationship between cohort analysis and survival analysis? Can they be used together? And if so, how would they complement each other?

Answer: Yes, cohort analysis and survival analysis methods can be used together to get more insight into population behavior. In cohort analysis, we stratify our population into different groups of units that share a similar characteristic or characteristics. For example, we might stratify a population of potential customers on the basis of geographic location and/or past experience with the potential customers. Then we could, for example, do separate analyses of time to respond to an offer for each subgroup. An appropriate model to fit would be the “defective subpopulation model” (or DS model) in JMP, in the Life Distribution platform. Some proportion of the population will never respond. This model is also known as the “limited failure population model” and the “cure model,” allowing estimation of the proportion of the population that will respond and the distribution of time to respond for each subgroup. This model is described in some detail in Chapter 11 of Meeker and Escobar (1998). In the Life Distribution and many other platforms, there is an analyze “By” option that will do separate analyses for each cohort. (In JMP 12, there will be a “Compare Groups” option in the Life Distribution platform to make such comparisons even easier to perform.)

Question: You mentioned recidivism as an application of reliability. What are some other areas of application of reliability analysis that you didn’t get to mention during the webcast?

Answer: Yes, Anne mentioned recidivism, and that certainly has been an application of life data analysis (a.k.a. survival analysis) methods. Indeed, it was one of the early applications of the “cure model” mentioned above. There was interest in how long a person would stay out of jail after release. But, of course, some individuals are “cured” and will never return. There are innumerable applications of these methods, which generically might be called “time to event” applications. In engineering reliability, we are often concerned with time to failure or time to return (of a product for warranty repair). In food science and in the development of many other products, one is interested in the shelf life. In the banking industry, there would be interest in “time to payment” for a defaulted loan. In medical applications, there is interest in time to recovery after a treatment. In sociology, there might be interest in time to divorce after marriage. Again, the “cure” model might be appropriate here because a sizable proportion of couples will never have a divorce. In many applications, we are not just interested in the first event, but in the recurrence of certain events over time. Examples include the recurrence of a disease over time (e.g., common colds), repairs of a machine, customers returning for more purchases, etc. Special models and methods are available for such data. Again, I recommend Wayne Nelson’s 2003 book on the subject as a good place to start learning about the analysis of recurrence data. JMP also has powerful analysis methods for recurrence data.

Question: Do you often need to convince end users or customers that 25 or 30 trials are necessary, when these trials are expensive and therefore resisted? If so, what approach would you use?

Answer: Yes, the most common question asked of any statistical consultant is “how many units do I need to test?” And in reliability applications we hear the related question “How long do I need to test?” JMP has extensive tools for planning experiments of different kinds, including reliability experiments such as demonstration tests and accelerated life tests. The theory behind the methods is impeccable. When the software says you need 30 units to achieve the desired precision, it is correct. But that might not help to convince end users with limited resources. I have found it useful to supplement the “black box” answers with repeated simulation of the proposed experiment. I typically run through the analysis five or six complete simulated data sets and then graphically summarize the results of 50 such simulated experiments. The graphically presented simulation-summary results allow visualization of the variability that one could expect to see in repeated experiments and how far away from the truth any given result might be. Such simulations can be used to compare different candidate test plans (e.g., different sample sizes). You do not need to know any theory of experimental design to appreciate the implications coming from the visualization of the simulation results. Such simulations could be programmed in the JMP Scripting Language, JSL.

Question: Can you talk about reliability studies as one – or the main – consideration in a designed experiment, particularly with regard to the approach taken with Taguchi methods?

Answer: Taguchi methods (a.k.a. robust design methods) provide a collection of design-of-experiment tools that can be used to make products and processes more robust to externals noises, such as variability in raw materials or variability in the manner in which a product is used. The use of these methods has been shown to have high potential for improving the quality of a product or the output of a process. Because quality is a prerequisite for high reliability (recall that reliability is “quality over time”), the skillful use of robust design methods will also improve reliability. In some applications, robust design methods can be used to focus directly on reliability. Two books that I highly recommend for this area are:

Question: When you exclude a failure mode, does it treat those as censored data?

Answer: In the multiple failure mode analysis, the first step is to estimate the “marginal” distributions for each failure mode. Under an assumption of independence of the different failure modes, this is done, literally, by making separate data sets for each of the different failure modes and then doing separate analyses for each. In the construction of these data sets, with focus on one failure mode, failure for all other failure modes are treated as right censored (because all we know is that the failure mode getting focus has not occurred yet). This is done for each failure mode. Then the so-called “series system model” can be used to combine the estimates of the marginal distributions to obtain an estimate of the failure time distribution with all of the failure modes active. A simple extension of this approach is to provide an estimate of the failure time distribution with just some of the failure modes active (so you can see the effect of eliminating one or more of the other failure modes). Modern software with capabilities for the analysis of reliability data, like JMP, will do all of this automatically. Technical details for this topic can be found in Chapter 15 of Meeker and Escobar (1998).

If you’d like to see the archived webcast, you can learn more from Bill’s extensive expertise on statistics, reliability and more.

Post a Comment

UK User Forum warms up for Discovery Summit Europe

Fujifilm's bioprocessing line

Fujifilm's bioprocessing line

The theme running through the recent UK User Forum was the use of JMP to simulate and optimise processes, particularly within the chemical industry. Fujifilm Diosynth Technologies hosted the event in its Billingham offices in, which used to be part of the ICI chemicals conglomerate. Users came together to share how they successfully use JMP to solve problems and to drive efficiencies within their organisations.

Dr. Graham McCreath, Head of R&D at Fujifilm, welcomed the users. The day went smoothly with an excellent standard to the presentations, each introduced by the new Chair of the forum, Dr. Claire Crawford of W.L. Gore.

Dr. Mahesh Shivhare is the Senior Process Statistician at Fujifilm, one of the world's leading GMP drug contract manufacturing organizations. Mahesh described how the bottleneck in their processes had moved from experiments to data analysis. To ease this situation, he created the Rapid Data Analysis Platform, RaDAP, using JMP Scripting Language (JSL), so that scientists could quickly and robustly analyse the data from their bioreactors. Mahesh used automatically generated scripts in JMP. “The idea came from the user forum in the south [at Infineum],” Mahesh said, “and allows scientists to just click on a button to identify which reactors work well based on different characteristics.”

Dr. Stephen Pearson is a Chemical Process Statistician at Syngenta, a global agri-business company. Stephen wowed the audience by presenting 100 percent in JMP. He even recreated his slides using Graph Builder. Stephen helps 70 scientists at Syngenta UK manufacturing sites carry out data analysis. He found that an effective way to achieve this was to create a JMP application he calls the “Analysis Assistant.” This JMP application takes scientists through a process of gathering, processing and visualising data. “My aim is to create tools that allow the scientists to do things more efficiently. Every hour I spend writing code needs to save double the scientist’s time,” he said. Claire described Stephen’s scripts as “really impressive.”

David Payne is Head of Continuous Improvement at Macfarlan Smith, a leading supplier of active pharmaceutical ingredients for the pain relief market, and part of Johnson Matthey. David said that his company faced a challenge in improving the efficiency of manufacturing morphine from poppy straw. Variability in the morphine extraction process created significant processing problems on the plant. David and his team used JMP to analyse process data to identify how they could improve control and increase throughput. The net result is an elegant solution that has considerable financial benefit to the organisation. It also forms the foundation for controlled processes in future plants. David will be presenting his paper at Discovery Summit Europe in Brussels next March.

Matt Linsley is Director of Newcastle University's Industrial Statistics Research Unit. ISRU’s aim is to bridge the gap between academia and industry in the fields of applied statistics and continous improvement methodologies. ISRU's learning programmes involve JMP to teach design of experiments. An "Improving Chemical Processes using Statistical Methodologies" learning programme is delivered on an annual basis with Durham University's School of Chemistry. This programme includes the use of a Reaction Simulator developed by GSK, the experimental data being analysed using JMP. Matt shared a YouTube video providing a flavour of the 2014 programme. "It is absolutely key to control key process variables in order to optimise process performance. A design of experiments strategy supported by a statistical software package such as JMP can help to identify those key process variables and support their long-term control,” Matt said. “Our intention is to build Definitive Screening Designs into next year's programmes in order to compare their benefits to more traditional designs.”

David Burnham is owner of Pega Analytics, which provides training and consulting in JMP. David started his presentation on the value of simulation with an archive video by Stu Hunter: What Is Design of Experiments? - Part 2. David has a personal interest in the combination of design of experiments and simulation. He showed how simulation could help a scientist make an informed decision, for example, about the tradeoff of doing more experimental runs. “Scientists are trying to develop understanding through experience, and I am using simulation to artificially create experience,” David said. “In a sense, embracing the art of statistics can give us an understanding of how statistics can help us even before we collect data.” David also will be presenting his paper at Discovery Summit Europe in Brussels in March.

Ian Cox, European Marketing Manager for JMP, gave a whistle stop tour of JMP 12. Ian started by saying that John Sall’s vision of statistical discovery, where there is a graphic for every statistic, still holds true after 25 years. Ian demonstrated many of the exciting capabilities in JMP 12, due for release March 2015. Users who are interested in exploring these can contact me for further information.

The forum's steering committee analysed the results of the survey conducted over the course of the day to work out what the users would like at future user group events. Claire will provide details of these results to the users over the coming weeks.

The next gathering of UK users will be at the Discovery Summit Europe in Brussels in March 2015. We hope to see you there!

Stephen Pearson of Syngenta shows how he created his presentation slides in Graph Builder.

Dr. Stephen Pearson of Syngenta shows how he created his his presentation slides in Graph Builder.

David Payne of Macfarlan Smith explains how he uses the simulator in JMP to solve problems.

David Payne of Macfarlan Smith uses the simulator in JMP to solve problems.

Matt Linsley of Newcastle University's Industrial Statistics Research Unit shows how he profiles multiple responses in JMP.

Matt Linsley of Newcastle University's Industrial Statistics Research Unit profiles multiple responses in JMP.

David Burnham of Pega Analytics speaks to an engaged audience at the UK User Forum.

David Burnham of Pega Analytics speaks to an engaged audience at the UK User Forum.

Steering committee members, Claire Crawford and Mahesh Shivhare, discuss the survey at the UK User Forum.

Steering committee members, Claire Crawford and Mahesh Shivhare, discuss the survey at the UK User Forum.

Post a Comment

Not lost in translation

Two keynotes were presented in English, two in Japanese. Yet nothing was lost in translation. Well, maybe a joke or two fell a bit flat. But for the most part, simultaneous translations bridged the language gap at Discovery Summit Japan.

As SAS Principal Research Fellow Bradley Jones talks definitive screening designs, one screen shows English content and the other shows Japanese.

As SAS Principal Research Fellow Bradley Jones talks about definitive screening designs, one screen shows English content and the other shows the Japanese translation.

The Nov. 7 event was the first time the conference series left the country. Discovery Summit Japan mirrored the US conference in several ways, principally by conveying the universal themes of excellence in data exploration, innovative uses of analytics and the need for analytics to be more widely accessible across organizations.

“Statistics are going to make us so strong,” said Tadashi Mitsui of Toshiba Corporation. As long as you use JMP, you are getting the best from statistics, he went on to say during his keynote address.

Mitsui and Takaya Kojima from Waseda University both gave keynote speeches. Also featured were talks from John Sall, SAS Executive Vice President and Co-Founder, and SAS Principal Research Fellow Bradley Jones.

More than 150 JMP users attended, including Yoko Suzuki of Tokyo Metropolitan University. Suzuki said she had wanted to attend Discovery Summit in the United States but hadn’t been able to make the trip. As soon as she saw that JMP was hosting one in Japan, she knew she would attend.

The conference series will continue to move into regions with high concentrations of JMP users, with Discovery Summit Europe next. That Summit will be held in March in Brussels, and is being led by a Steering Committee made up of JMP users from across the continent.

Perhaps we will see you there? If not, plan to attend Discovery Summit 2015, which will be held next September in San Diego.

JMP Customer Care Manager Jeff Perkinson gets a microphone in preparation for hosting Discovery Summit Japan.

JMP Customer Care Manager Jeff Perkinson gets a microphone in preparation for hosting Discovery Summit Japan.

 

Translators worked from a soundproof booth, providing both Japanese-to-English and English-to-Japanese translations.

Translators work from a soundproof booth, providing both Japanese-to-English and English-to-Japanese simultaneous translations.

 

Japanese-speaking attendees wore ear pieces to hear English translated into Japanese.

Japanese-speaking attendees wear ear pieces to hear English translated into Japanese.

 

JMP Japan Systems Engineers helped staff the “Ask the Experts” stations. This was comparable to Discovery Summit’s “Meet the Developers” sessions.

JMP Japan Systems Engineers help staff the “Ask the Experts” stations. This is comparable to the “Meet the Developers” sessions at Discovery Summit in the US.

Post a Comment

Visualizing completeness of food logging data with Graph Builder

The second graph of my Discovery Summit 2014 poster summarized my meal logging habits. I made this graph while trying to identify patterns in my summary data that could alert me to days with missing or incomplete daily food logs. Initially, I created a point chart in Graph Builder to plot each day’s calories consumed vs calories burned, with the percentage of meals I had logged (out of 6 possible meals) specified as a Wrap variable.

Calories burned by calories consumed

JMP testing manager Audrey Shull and product manager Dan Valente both suggested simplifying this graph when they reviewed my poster. While it was easy for me to spot outliers, like the blue points on the left side of the top two graphs, there was too much going on to understand with a quick glance. Next, I created a simpler view of this data showing the percentage of meals logged on the X axis and calories consumed on the Y axis in a point chart.Compliance point chart2

When I met with Xan Gregg, lead developer of Graph Builder and head of the data visualization group at JMP, he thought my chart could be improved further by plotting the number of meals logged instead of the percentage and adding jittering to better display the point density. We played around with the data in Graph Builder to explore additional graph type possibilities. In the end, I preferred the look of a density plot to all others we considered. The final version I included in my poster is shown below.

Density of cals consumed and meals

As you can see in the lower left above, I have skipped only 10 days of data collection since I began using the armband and its food logging software. I estimated daily calories without logging specific food items on 34 days, which show up with a single meal logged. In reviewing days with three logged meals, I noted that they all occurred during my pregnancy on days when I had apparently decided I really didn’t want to know my daily total, and stopped tracking!

This graph illustrates clearly that I usually log four-six meals per day. When looking at this graph, I think it’s important to remember that I collected this data eating foods that I chose, not following any specific diet or meal plan. I made an effort to log before or just after meals to improve recall of items and quantities; however, I didn’t plan for a particular number of meals per day or specific macronutrient percentages. Though food labels often refer to a 2,000-calorie-per-day diet, actual individual calorie needs are affected by many factors, including gender, age and base amount of lean body mass (LBM). I used an online calculator and prior experience to estimate my calorie needs as a  5’4.5” woman who has worked out with weights since age 13, walks several times a week and has a sedentary desk job. Trial and error using my own data is the best way to fine-tune how many calories I can eat and burn to reach and maintain weight-related goals. If this data-driven approach to weight loss interests you, I'd suggest collecting your own data and using it to find the intake and activity levels that allow you to progress toward your own goals.

To reproduce my food log meal graph with your own data, open a data table containing the number of meals you logged each day and the corresponding calories consumed each day. I created a number of meals variable by recoding the Meal Log Compliance measure imported from my BodyMedia Activity Summary file and set the type of meal number variable to Numeric and modeling type to Continuous. In my case, 100% compliance corresponded to six meals logged.

To create the graph, launch Graph Builder and drag:

  • Calories consumed to the Y axis.
  • Number of meals to the X axis.

To complete the graph, change the element type to Contour using the icon at the top of the window, adjust your Y-axis, graph title and axis titles if desired, and add one or more annotations from the Tools menu. You can right-click on annotations to change their appearance to match your graph color theme. Stay tuned for the next post where Xan and I show how I summarized my sleep data history using Graph Builder.

If you'd like to read more about this project, you can check out the first blog post in this series to learn more about my interest in quantified self (QS) data analysis and my JMP Discovery Summit 2014 e-poster that explored 1,316 days of activity and food log data. You can read more in blog posts detailing how I wrote JSL scripts to import Excel-formatted Activity Summary files and text formatted food log files from BodyMedia®’s Activity Manager software into JMP. A JMP add-in is available on the JMP File Exchange so that you import your own files.  You can view a copy of my e-poster on the JMP User Community. It’s free to join the community, where you can learn from JMP users all over the world!

Post a Comment

Using Graph Builder to visualize my activity data collection patterns

Before finalizing my Discovery Summit 2014 poster on my personal diet and fitness data, I asked my colleague Xan Gregg (lead developer of Graph Builder in JMP) to review my poster draft. By the time Xan and I sat down at my computer, I had created several graphs that I liked. I had also reviewed a helpful set of graph suggestions Xan wrote for JMP Blog authors and as a result, edited my graph titles to better reflect their main messages and improved my axis label descriptions. (By the way, if you would like to see Xan show how to create a number of interesting graphs in Graph Builder, including another graph from my poster, you can see the recording of his Discovery Summit talk titled "Unlocking the Secrets of Graph Builder.")

Together, Xan and I looked at the first graph on my poster, which I used to show how my seasonal activity patterns were confounded with differences in how much I wear my BodyMedia® FIT® armband. I had experimented with line and bar graphs for this data, and by the time I showed it to Xan, I had settled on using an area graph to show how the mean percentage of time I wore the armband (top, in blue) each week tracked very closely with my mean step count (bottom, in red).

Activity and compliance early

Upon seeing the draft version of this graph, Xan recommended:

  • Reordering the sections of the graph to tell a better story.
  • Using Y axis variables for armband wear and activity with the same units (hr:m).
  • Using a nested X axis for a hierarchical display of month and year.
  • Adding annotations to draw attention to a key area of the graph.

Here is the final version that I used in my poster:

Seasonal compliance

The annotations that I added draw attention to the fact that my device usage patterns tend to be different in the summer and winter. I wear my armband less with sleeveless and short-sleeve outfits because it’s rather conspicuous on my upper arm. During the summer of 2012, you can see that I actually wore the armband more regularly, and I ended up with a strap tan line that I didn’t like. As a result, I wore it less during the summers of 2013 and 2014.

Clearly, I made a conscious decision to use the armband less in the summer without realizing just how much impact it could have on the accuracy of my activity and step measurements. Now I know that I will have to treat this data carefully when analyzing it further. If I had not explored my activity and usage data first to remind me of this usage pattern, I could have created any number of plausible explanations for why my activity levels were so much lower during the hot North Carolina summer months.

Although I am unlikely to change my summer wear pattern for the armband, I have been experimenting with step counting apps on my phone that can provide supplementary activity estimation data. The iPhone Moves app seems especially good at passive data collection on my movements and activity, but that topic probably deserves its own blog post!

To reproduce my activity area graph with your own data in Graph Builder, open a data table containing an activity measure and a hours of usage measure in hr:m format. If you don’t yet have Year and Month transformations of your date variable in your table, you can create them by right-clicking on your Date variable in the Graph Builder variable chooser and adding new transform columns from the Date Time menu. I used a Value Ordering property on the Month variable to create a Month Name column and made sure that Year and Month Name were specified as Ordinal.

Then, to create the graph, drag:

  • Year to the X axis.
  • Month Name to the X Axis (just above Year) so the axis has Month Name nested in Year.
  • Activity to the Y Axis.
  • Time Onbody to the Y axis just below Activity so they appear in separate graph sections.

To complete the graph, change the element type to Area using the icon at the top of the Window, adjust your Y-axis, graph title and axis titles if desired, and add one or more annotations from the Tools menu. You can right-click on annotations to change their appearance to match your graph like I did. Stay tuned for the next post where Xan and I show how I summarized my food log compliance data using Graph Builder!

Check out the first blog post in this series to learn more about my interest in quantified self (QS) data analysis and my JMP Discovery Summit 2014 e-poster that explored 1,316 days of my activity and food log data. You can read more details about how I exported my Excel-formatted Activity Summary files and Food Log files from BodyMedia®’s Activity Manager software and imported them into JMP. I also shared how I used the JMP 12 Recode platform to clean my imported data table. I wrote a JMP add-in available on the JMP File Exchange that you can use to import your own files. You can find a copy of my e-poster on the JMP User Community. It’s free to join the community, where you can learn from JMP users all over the world!

Post a Comment

Which Belgian beer tastes best? A designed experiment

During the The Scale-Up of Chemical Processes conference in Brussels earlier this year, the organizers and I decided to do an experiment using JMP. Of course, the experiment had to involve beer tasting!

We had 24 participants for the experiment. We used eight Belgian beer brands for testing and planned the experiment using the Custom Designer in JMP. Participant and beer lists were available in electronic form so that both could be imported and used for the design. Each beer would be rated for aroma, taste, complexity and balance on a 1 to 5 scale, with 1 being excellent and 5 being poor. The Custom Designer took all of this information and delivered a list with a random assignment of four of the eight beers to each participant in a randomized order. Each participant also received a guide that gave some basic information about the art of beer tasting and how he or she should rate the four categories.

The newly trained beer experts took their job seriously and had well-informed discussions. During the experiment, we realized that there might be a severe gender influence, and although it was neither a blocking nor a randomized factor, we took it as a variable into the data table.

Beer_Experiment_1 Beer_Experiment_2

Looking at the model fits for the evaluation, we achieved Rsquare values between 33% and 43%, which are not acceptable for technical people but good for this type of experiment. In both models, the influence of the raters on the respective responses was higher than the influence of the beer brands themselves (you can see that the slopes of the red line for “Name” are steeper than that for “Beer”).

Beer_Experiment_3

In order to find the best beer, we used the optimizer in JMP, which combined the influence of every beer upon all four criteria and selected the beer with the best overall rating. In our test, this was Kwaremont Blond. We saved the formula for this weighted combination of criteria per beer back to the data table. Thus, we were able to calculate the relative preference for each beer.

Returning to the question of gender influence, we separated the male and female cohort.

Beer_Experiment_4

The bar charts ordered by the preferences of the ladies show a clear preference for Wilderen Krieg, which is a sweet cherry beer. For the male participants, Kwaremont Blond was the preferred beer, a brand that was not tested by women at all.

Beer_Experiment_5

 

Because men outnumbered women by 4:1, the overall ranking reflects the men’s preferences:

Beer_Experiment_6

We treated the ratings for the different criteria in this analysis as continuous variables, which is not correct. We should have analyzed them as ordinal variables. However, this takes much more effort, and it would have been more difficult to produce summaries. Finally, we looked at the preference model that combined the five influence functions per criterion.

When doing this exercise, the ranking remained essentially the same. Only Estaminet Premium Pils and Vedett White changed their positions. Not a major change, since their average ratings did not differ much anyway.

Beer_Experiment_7

Here are the top three beers:

  1. Kwaremont Blond
  2. Estaminet Premium Pils
  3. Vedett White

So the next time you visit Belgium, look for these favorite beers. Statistical planning and chemical expertise can’t be wrong! You can have a chance to taste them if you attend Discovery Summit Europe 2015 in Brussels. Registration is now open. Hope to see you there!

Post a Comment

Recoding BodyMedia® food log data in JMP

I ended my previous blog post at the point in my JMP Discovery Summit project when I realized the extent of food item name redundancy across my nearly four years of food logs collected with the BodyMedia® Activity Manager app. While I knew I had eaten differently prepared varieties of certain foods, the replication was also an artifact of using keyword searches to locate the right items to add to my food log. The keyword I used to search for a given item varied, and the matching item that I chose to log at a given meal also varied, so I often selected different item names for highly similar foods.

Ultimately, I wanted to summarize the number of calories I ate from related items and also total up calories eaten by food category. A sensible first step was to reduce the number of redundant food item names in my data table. I wanted the food item recoding process to be as easy as possible, and of course, reproducible through scripting so I would be able to process new data with minimal work.

I explored using the JMP 11 Recode platform to consolidate similar food item names into a single cleaned value. Before I started recoding, my data table contained 1,859 unique food item names. Since food items names were displayed in the Recode window in alphabetical order, I found it challenging to locate similar food items that were not listed alphabetically. For example, if I wanted to rename nearly identical items listed under different brand names, I had to first locate all the related items scattered throughout my item list (e.g., "CHIPS AHOY! Chewy Chocolate Chip Cookies," "Cookie, Chocolate Chip, Commercial, 12%-17% Fat," "Jason's Deli Chocolate Chip Cookie," "PILLSBURY Chocolate Chip Cookies, Refrigerated Dough") and rename them to a common cleaned value (e.g., "Cookie, Chocolate Chip"). To locate all related items, I searched the data table using the Find function or used Find under the Data Filter red triangle menu.

Data Filter Find

Once I located all the related items, I scrolled to their location in the Recode window and pasted in the cleaned item name. At one point as I worked through my data set, I accidentally closed the Recode window without saving my changes. Instead of repeating my work, I decided to explore an alternative strategy that I hoped would allow me to classify my items more quickly and easily assign new items to my food groupings.

I used the Free Text feature (found on the Multiple tab of the JMP Categorical platform launch dialog) to extract the list of unique words from my food item names. I reviewed the list to remove common or non-specific words and placed the remaining words into food categories. Then, I used a JSL loop to scan for these keywords in food item names using the PatMatch function in JMP. If I found a keyword, I added that word’s category to a comma-delimited list in a column saved with a Multiple Response column property.

While initially I thought this approach would make it simpler to classify new items, it turned out to be time-consuming for my script to search all items for all keywords. It took even longer for me to review all the classified food items and verify that they had been placed into sensible categories based on the keywords they contained. As I examined my processed table, I was dismayed to note many non-specific keyword matches. In one example, "Chicken of the Sea Chunk Light Tuna" matched both Meat (keyword: chicken) and Fish (keyword: fish) food categories. Coffeemate Non-Dairy Creamer included the keyword coffee, causing it to be incorrectly assigned to the CoffeeMilk group. I realized that I would need to fix some of the original names before the pattern match and clean up other category lists after the match. Since I needed to reproduce each step through scripting, I would need to write custom JSL or generate data cleaning JSL with Recode -- so I decided to go back to my original Recode strategy.

Right around that time, newly hired JMP developer James Preiss began to revamp Recode for JMP 12. I shared my food log Recode use case with James and many of my challenges lined up with customer requests already on his to-do list. As soon as updates to Recode began to surface in JMP 12 daily builds, I tested them with my food log files and shared a subset of my item list with James and Recode tester Rosemary Lucas. I was thrilled to see that many of the steps I did manually in JMP 11 with a combination of Recode, Data Filter, Find/Replace and JSL scripting are integrated into Recode in JMP 12.

In fact, long before the Recode platform updates were complete, I was able to create a table of cleaned, grouped item names from my food item list, in far less time than I had spent trying to script around the keyword matching problem. I then added categories for each cleaned food item name and merged them into my food log data table by joining on the original item name. Using Recode helped me cut the original number of unique names in my table (1,859) in half! Now, when I import new food log files, I return briefly to Recode to classify any new items, update my item name/category table, merge it with my data, and I am ready to proceed.

JMP 12 won’t be out till March 2015, so I’ll admit I am being purposefully vague about the many new features in the Recode platform. I love the Recode updates, and I know you will too! (Look for more detailed blog posts about Recode as March approaches and after the software is available.)

In my next blog posts, I will introduce some of the graphs I created for my Discovery Summit poster and show how I improved them with the help of Xan Gregg, creator of the Graph Builder platform and leader of the Data Discovery group at JMP.

For more background on my poster and my interests in quantified self (QS) data analysis, check out the first blog post in this series. Subsequent posts share details about how I exported my Excel-formatted Activity Summary files and Food Log files from the BodyMedia® Activity Manager software and imported them into JMP. I used custom JSL scripts to create two JMP data tables, one with 1,316 rows of activity data and the other with 34,432 rows of food items logged over nearly four years. I wrote a JMP add-in supporting these data types and CSV-formatted  food log files from the free MyFitnessPal website.

Post a Comment

Drawing 95% confidence intervals for parameter estimates from Fit Model

In my recent blog entry discussing the results of my adventures with hard-boiled eggs, one reader had asked how I created the figures with the confidence intervals for the parameter estimates from Fit Model. I typically use Graph Builder whenever I can for visualization, and the graph below with the parameter estimates for attractiveness of the eggs as the response was no exception.

GB_eggs1

Seeing that I needed a few extra steps to produce this graph, it seemed worthwhile to write a blog explaining how I did this. The biggest piece is being able to use Graph Builder, giving me the flexibility to add the customization I may want.

Getting the Values Out of Fit Model

I’ll assume that the model has already been fit using Fit Model; in this example, I’m using attractiveness of the egg as a response. You can find the data set on the JMP File Exchange. Our first step is to get the confidence interval for the parameter estimates. Right-click on the table under Parameter Estimates, select Columns and choose to add the Lower 95% to the table, and then repeat for the upper 95%. Alternatively, you could click the red triangle at the beginning of the report, and choose Regression Reports -> Show All Confidence Intervals.

GB_eggs2

Another right-click on that table gives us the option to “Make into Data Table,” which is what we choose. This data table contains a row for each term in the model, as well as the columns from the parameter estimates in Fit Model – particularly the lower and upper bounds. It’s this data table that we’ll use to create the graph.

In Graph Builder, you can move Term to the Y-axis on the left (I could put it on the X axis, but the length of the term names looks better on the Y), select Lower 95% and Upper 95%, and move these to the X-axis.

Drawing the Confidence Intervals

To get the intervals, I do the following:

  • Right-click in the graph, and choose Add-> Bar.
  • On the left-hand side, go to the Bar section, and change Bar style from “Side by side” to Interval (see figure below). Alternatively, a right-click on the graph, and Bar->Bar Style-> Interval.
  • On the top of Graph Builder, de-select the "Points" button to be left with just the intervals.

GB_eggs3

At this point, your graph should look something like this:

GB_eggs4

Adjusting the Ordering of the Terms

If you look at the graph, you’ll notice that the terms have been placed in alphabetical order, and not the order we had in the parameter estimates. There’s also the Intercept term, which you may or may not be interested in including. If we go back to the data table and exclude the intercept row, Graph Builder will update the graph automatically.

If you want the terms to appear in the same order as the original table, you can head back to the parameter estimates data table (while leaving Graph Builder open):

  • Right-click on the Term column and select “Column Info.”
  • From the Column Properties drop-down, select “Value Ordering.”
  • Rearrange the Term labels to the desired order.
  • Choose to reverse this after it’s all done (the top term on the Y-axis in Graph Builder corresponds to the largest value).
  • Click Apply.

Graph Builder will update itself with the new order for terms!

What if you didn’t care that the order matched up the parameter estimates table? For instance, maybe you prefer to have the terms sorted by the actual estimate. This is even easier – simply select Estimate and drag it to the right of the Y-axis.

GB_eggs5

Getting the Reference Line at 0

Adding the reference line is easy enough:

  • Double-click on the X-axis where the values are (or right-click and choose Axis Settings).
  • Under Reference Lines, select a color, and click the Add button.
  • Click OK.

Final Thoughts

You can even add points at center of the intervals (i.e., the estimate) by adding the estimate column to the X-axis by selecting it and moving it there. The axes and title can now be adjusted with a simple double-click. There are plenty of options in Graph Builder to customize it to your liking. Hopefully, you found this blog post useful!

Post a Comment