Not lost in translation

Two keynotes were presented in English, two in Japanese. Yet nothing was lost in translation. Well, maybe a joke or two fell a bit flat. But for the most part, simultaneous translations bridged the language gap at Discovery Summit Japan.

As SAS Principal Research Fellow Bradley Jones talks definitive screening designs, one screen shows English content and the other shows Japanese.

As SAS Principal Research Fellow Bradley Jones talks about definitive screening designs, one screen shows English content and the other shows the Japanese translation.

The Nov. 7 event was the first time the conference series left the country. Discovery Summit Japan mirrored the US conference in several ways, principally by conveying the universal themes of excellence in data exploration, innovative uses of analytics and the need for analytics to be more widely accessible across organizations.

“Statistics are going to make us so strong,” said Tadashi Mitsui of Toshiba Corporation. As long as you use JMP, you are getting the best from statistics, he went on to say during his keynote address.

Mitsui and Takaya Kojima from Waseda University both gave keynote speeches. Also featured were talks from John Sall, SAS Executive Vice President and Co-Founder, and SAS Principal Research Fellow Bradley Jones.

More than 150 JMP users attended, including Yoko Suzuki of Tokyo Metropolitan University. Suzuki said she had wanted to attend Discovery Summit in the United States but hadn’t been able to make the trip. As soon as she saw that JMP was hosting one in Japan, she knew she would attend.

The conference series will continue to move into regions with high concentrations of JMP users, with Discovery Summit Europe next. That Summit will be held in March in Brussels, and is being led by a Steering Committee made up of JMP users from across the continent.

Perhaps we will see you there? If not, plan to attend Discovery Summit 2015, which will be held next September in San Diego.

JMP Customer Care Manager Jeff Perkinson gets a microphone in preparation for hosting Discovery Summit Japan.

JMP Customer Care Manager Jeff Perkinson gets a microphone in preparation for hosting Discovery Summit Japan.


Translators worked from a soundproof booth, providing both Japanese-to-English and English-to-Japanese translations.

Translators work from a soundproof booth, providing both Japanese-to-English and English-to-Japanese simultaneous translations.


Japanese-speaking attendees wore ear pieces to hear English translated into Japanese.

Japanese-speaking attendees wear ear pieces to hear English translated into Japanese.


JMP Japan Systems Engineers helped staff the “Ask the Experts” stations. This was comparable to Discovery Summit’s “Meet the Developers” sessions.

JMP Japan Systems Engineers help staff the “Ask the Experts” stations. This is comparable to the “Meet the Developers” sessions at Discovery Summit in the US.

Post a Comment

Visualizing completeness of food logging data with Graph Builder

The second graph of my Discovery Summit 2014 poster summarized my meal logging habits. I made this graph while trying to identify patterns in my summary data that could alert me to days with missing or incomplete daily food logs. Initially, I created a point chart in Graph Builder to plot each day’s calories consumed vs calories burned, with the percentage of meals I had logged (out of 6 possible meals) specified as a Wrap variable.

Calories burned by calories consumed

JMP testing manager Audrey Shull and product manager Dan Valente both suggested simplifying this graph when they reviewed my poster. While it was easy for me to spot outliers, like the blue points on the left side of the top two graphs, there was too much going on to understand with a quick glance. Next, I created a simpler view of this data showing the percentage of meals logged on the X axis and calories consumed on the Y axis in a point chart.Compliance point chart2

When I met with Xan Gregg, lead developer of Graph Builder and head of the data visualization group at JMP, he thought my chart could be improved further by plotting the number of meals logged instead of the percentage and adding jittering to better display the point density. We played around with the data in Graph Builder to explore additional graph type possibilities. In the end, I preferred the look of a density plot to all others we considered. The final version I included in my poster is shown below.

Density of cals consumed and meals

As you can see in the lower left above, I have skipped only 10 days of data collection since I began using the armband and its food logging software. I estimated daily calories without logging specific food items on 34 days, which show up with a single meal logged. In reviewing days with three logged meals, I noted that they all occurred during my pregnancy on days when I had apparently decided I really didn’t want to know my daily total, and stopped tracking!

This graph illustrates clearly that I usually log four-six meals per day. When looking at this graph, I think it’s important to remember that I collected this data eating foods that I chose, not following any specific diet or meal plan. I made an effort to log before or just after meals to improve recall of items and quantities; however, I didn’t plan for a particular number of meals per day or specific macronutrient percentages. Though food labels often refer to a 2,000-calorie-per-day diet, actual individual calorie needs are affected by many factors, including gender, age and base amount of lean body mass (LBM). I used an online calculator and prior experience to estimate my calorie needs as a  5’4.5” woman who has worked out with weights since age 13, walks several times a week and has a sedentary desk job. Trial and error using my own data is the best way to fine-tune how many calories I can eat and burn to reach and maintain weight-related goals. If this data-driven approach to weight loss interests you, I'd suggest collecting your own data and using it to find the intake and activity levels that allow you to progress toward your own goals.

To reproduce my food log meal graph with your own data, open a data table containing the number of meals you logged each day and the corresponding calories consumed each day. I created a number of meals variable by recoding the Meal Log Compliance measure imported from my BodyMedia Activity Summary file and set the type of meal number variable to Numeric and modeling type to Continuous. In my case, 100% compliance corresponded to six meals logged.

To create the graph, launch Graph Builder and drag:

  • Calories consumed to the Y axis.
  • Number of meals to the X axis.

To complete the graph, change the element type to Contour using the icon at the top of the window, adjust your Y-axis, graph title and axis titles if desired, and add one or more annotations from the Tools menu. You can right-click on annotations to change their appearance to match your graph color theme. Stay tuned for the next post where Xan and I show how I summarized my sleep data history using Graph Builder.

If you'd like to read more about this project, you can check out the first blog post in this series to learn more about my interest in quantified self (QS) data analysis and my JMP Discovery Summit 2014 e-poster that explored 1,316 days of activity and food log data. You can read more in blog posts detailing how I wrote JSL scripts to import Excel-formatted Activity Summary files and text formatted food log files from BodyMedia®’s Activity Manager software into JMP. A JMP add-in is available on the JMP File Exchange so that you import your own files.  You can view a copy of my e-poster on the JMP User Community. It’s free to join the community, where you can learn from JMP users all over the world!

Post a Comment

Using Graph Builder to visualize my activity data collection patterns

Before finalizing my Discovery Summit 2014 poster on my personal diet and fitness data, I asked my colleague Xan Gregg (lead developer of Graph Builder in JMP) to review my poster draft. By the time Xan and I sat down at my computer, I had created several graphs that I liked. I had also reviewed a helpful set of graph suggestions Xan wrote for JMP Blog authors and as a result, edited my graph titles to better reflect their main messages and improved my axis label descriptions. (By the way, if you would like to see Xan show how to create a number of interesting graphs in Graph Builder, including another graph from my poster, you can see the recording of his Discovery Summit talk titled "Unlocking the Secrets of Graph Builder.")

Together, Xan and I looked at the first graph on my poster, which I used to show how my seasonal activity patterns were confounded with differences in how much I wear my BodyMedia® FIT® armband. I had experimented with line and bar graphs for this data, and by the time I showed it to Xan, I had settled on using an area graph to show how the mean percentage of time I wore the armband (top, in blue) each week tracked very closely with my mean step count (bottom, in red).

Activity and compliance early

Upon seeing the draft version of this graph, Xan recommended:

  • Reordering the sections of the graph to tell a better story.
  • Using Y axis variables for armband wear and activity with the same units (hr:m).
  • Using a nested X axis for a hierarchical display of month and year.
  • Adding annotations to draw attention to a key area of the graph.

Here is the final version that I used in my poster:

Seasonal compliance

The annotations that I added draw attention to the fact that my device usage patterns tend to be different in the summer and winter. I wear my armband less with sleeveless and short-sleeve outfits because it’s rather conspicuous on my upper arm. During the summer of 2012, you can see that I actually wore the armband more regularly, and I ended up with a strap tan line that I didn’t like. As a result, I wore it less during the summers of 2013 and 2014.

Clearly, I made a conscious decision to use the armband less in the summer without realizing just how much impact it could have on the accuracy of my activity and step measurements. Now I know that I will have to treat this data carefully when analyzing it further. If I had not explored my activity and usage data first to remind me of this usage pattern, I could have created any number of plausible explanations for why my activity levels were so much lower during the hot North Carolina summer months.

Although I am unlikely to change my summer wear pattern for the armband, I have been experimenting with step counting apps on my phone that can provide supplementary activity estimation data. The iPhone Moves app seems especially good at passive data collection on my movements and activity, but that topic probably deserves its own blog post!

To reproduce my activity area graph with your own data in Graph Builder, open a data table containing an activity measure and a hours of usage measure in hr:m format. If you don’t yet have Year and Month transformations of your date variable in your table, you can create them by right-clicking on your Date variable in the Graph Builder variable chooser and adding new transform columns from the Date Time menu. I used a Value Ordering property on the Month variable to create a Month Name column and made sure that Year and Month Name were specified as Ordinal.

Then, to create the graph, drag:

  • Year to the X axis.
  • Month Name to the X Axis (just above Year) so the axis has Month Name nested in Year.
  • Activity to the Y Axis.
  • Time Onbody to the Y axis just below Activity so they appear in separate graph sections.

To complete the graph, change the element type to Area using the icon at the top of the Window, adjust your Y-axis, graph title and axis titles if desired, and add one or more annotations from the Tools menu. You can right-click on annotations to change their appearance to match your graph like I did. Stay tuned for the next post where Xan and I show how I summarized my food log compliance data using Graph Builder!

Check out the first blog post in this series to learn more about my interest in quantified self (QS) data analysis and my JMP Discovery Summit 2014 e-poster that explored 1,316 days of my activity and food log data. You can read more details about how I exported my Excel-formatted Activity Summary files and Food Log files from BodyMedia®’s Activity Manager software and imported them into JMP. I also shared how I used the JMP 12 Recode platform to clean my imported data table. I wrote a JMP add-in available on the JMP File Exchange that you can use to import your own files. You can find a copy of my e-poster on the JMP User Community. It’s free to join the community, where you can learn from JMP users all over the world!

Post a Comment

Which Belgian beer tastes best? A designed experiment

During the The Scale-Up of Chemical Processes conference in Brussels earlier this year, the organizers and I decided to do an experiment using JMP. Of course, the experiment had to involve beer tasting!

We had 24 participants for the experiment. We used eight Belgian beer brands for testing and planned the experiment using the Custom Designer in JMP. Participant and beer lists were available in electronic form so that both could be imported and used for the design. Each beer would be rated for aroma, taste, complexity and balance on a 1 to 5 scale, with 1 being excellent and 5 being poor. The Custom Designer took all of this information and delivered a list with a random assignment of four of the eight beers to each participant in a randomized order. Each participant also received a guide that gave some basic information about the art of beer tasting and how he or she should rate the four categories.

The newly trained beer experts took their job seriously and had well-informed discussions. During the experiment, we realized that there might be a severe gender influence, and although it was neither a blocking nor a randomized factor, we took it as a variable into the data table.

Beer_Experiment_1 Beer_Experiment_2

Looking at the model fits for the evaluation, we achieved Rsquare values between 33% and 43%, which are not acceptable for technical people but good for this type of experiment. In both models, the influence of the raters on the respective responses was higher than the influence of the beer brands themselves (you can see that the slopes of the red line for “Name” are steeper than that for “Beer”).


In order to find the best beer, we used the optimizer in JMP, which combined the influence of every beer upon all four criteria and selected the beer with the best overall rating. In our test, this was Kwaremont Blond. We saved the formula for this weighted combination of criteria per beer back to the data table. Thus, we were able to calculate the relative preference for each beer.

Returning to the question of gender influence, we separated the male and female cohort.


The bar charts ordered by the preferences of the ladies show a clear preference for Wilderen Krieg, which is a sweet cherry beer. For the male participants, Kwaremont Blond was the preferred beer, a brand that was not tested by women at all.



Because men outnumbered women by 4:1, the overall ranking reflects the men’s preferences:


We treated the ratings for the different criteria in this analysis as continuous variables, which is not correct. We should have analyzed them as ordinal variables. However, this takes much more effort, and it would have been more difficult to produce summaries. Finally, we looked at the preference model that combined the five influence functions per criterion.

When doing this exercise, the ranking remained essentially the same. Only Estaminet Premium Pils and Vedett White changed their positions. Not a major change, since their average ratings did not differ much anyway.


Here are the top three beers:

  1. Kwaremont Blond
  2. Estaminet Premium Pils
  3. Vedett White

So the next time you visit Belgium, look for these favorite beers. Statistical planning and chemical expertise can’t be wrong! You can have a chance to taste them if you attend Discovery Summit Europe 2015 in Brussels. Registration is now open. Hope to see you there!

Post a Comment

Recoding BodyMedia® food log data in JMP

I ended my previous blog post at the point in my JMP Discovery Summit project when I realized the extent of food item name redundancy across my nearly four years of food logs collected with the BodyMedia® Activity Manager app. While I knew I had eaten differently prepared varieties of certain foods, the replication was also an artifact of using keyword searches to locate the right items to add to my food log. The keyword I used to search for a given item varied, and the matching item that I chose to log at a given meal also varied, so I often selected different item names for highly similar foods.

Ultimately, I wanted to summarize the number of calories I ate from related items and also total up calories eaten by food category. A sensible first step was to reduce the number of redundant food item names in my data table. I wanted the food item recoding process to be as easy as possible, and of course, reproducible through scripting so I would be able to process new data with minimal work.

I explored using the JMP 11 Recode platform to consolidate similar food item names into a single cleaned value. Before I started recoding, my data table contained 1,859 unique food item names. Since food items names were displayed in the Recode window in alphabetical order, I found it challenging to locate similar food items that were not listed alphabetically. For example, if I wanted to rename nearly identical items listed under different brand names, I had to first locate all the related items scattered throughout my item list (e.g., "CHIPS AHOY! Chewy Chocolate Chip Cookies," "Cookie, Chocolate Chip, Commercial, 12%-17% Fat," "Jason's Deli Chocolate Chip Cookie," "PILLSBURY Chocolate Chip Cookies, Refrigerated Dough") and rename them to a common cleaned value (e.g., "Cookie, Chocolate Chip"). To locate all related items, I searched the data table using the Find function or used Find under the Data Filter red triangle menu.

Data Filter Find

Once I located all the related items, I scrolled to their location in the Recode window and pasted in the cleaned item name. At one point as I worked through my data set, I accidentally closed the Recode window without saving my changes. Instead of repeating my work, I decided to explore an alternative strategy that I hoped would allow me to classify my items more quickly and easily assign new items to my food groupings.

I used the Free Text feature (found on the Multiple tab of the JMP Categorical platform launch dialog) to extract the list of unique words from my food item names. I reviewed the list to remove common or non-specific words and placed the remaining words into food categories. Then, I used a JSL loop to scan for these keywords in food item names using the PatMatch function in JMP. If I found a keyword, I added that word’s category to a comma-delimited list in a column saved with a Multiple Response column property.

While initially I thought this approach would make it simpler to classify new items, it turned out to be time-consuming for my script to search all items for all keywords. It took even longer for me to review all the classified food items and verify that they had been placed into sensible categories based on the keywords they contained. As I examined my processed table, I was dismayed to note many non-specific keyword matches. In one example, "Chicken of the Sea Chunk Light Tuna" matched both Meat (keyword: chicken) and Fish (keyword: fish) food categories. Coffeemate Non-Dairy Creamer included the keyword coffee, causing it to be incorrectly assigned to the CoffeeMilk group. I realized that I would need to fix some of the original names before the pattern match and clean up other category lists after the match. Since I needed to reproduce each step through scripting, I would need to write custom JSL or generate data cleaning JSL with Recode -- so I decided to go back to my original Recode strategy.

Right around that time, newly hired JMP developer James Preiss began to revamp Recode for JMP 12. I shared my food log Recode use case with James and many of my challenges lined up with customer requests already on his to-do list. As soon as updates to Recode began to surface in JMP 12 daily builds, I tested them with my food log files and shared a subset of my item list with James and Recode tester Rosemary Lucas. I was thrilled to see that many of the steps I did manually in JMP 11 with a combination of Recode, Data Filter, Find/Replace and JSL scripting are integrated into Recode in JMP 12.

In fact, long before the Recode platform updates were complete, I was able to create a table of cleaned, grouped item names from my food item list, in far less time than I had spent trying to script around the keyword matching problem. I then added categories for each cleaned food item name and merged them into my food log data table by joining on the original item name. Using Recode helped me cut the original number of unique names in my table (1,859) in half! Now, when I import new food log files, I return briefly to Recode to classify any new items, update my item name/category table, merge it with my data, and I am ready to proceed.

JMP 12 won’t be out till March 2015, so I’ll admit I am being purposefully vague about the many new features in the Recode platform. I love the Recode updates, and I know you will too! (Look for more detailed blog posts about Recode as March approaches and after the software is available.)

In my next blog posts, I will introduce some of the graphs I created for my Discovery Summit poster and show how I improved them with the help of Xan Gregg, creator of the Graph Builder platform and leader of the Data Discovery group at JMP.

For more background on my poster and my interests in quantified self (QS) data analysis, check out the first blog post in this series. Subsequent posts share details about how I exported my Excel-formatted Activity Summary files and Food Log files from the BodyMedia® Activity Manager software and imported them into JMP. I used custom JSL scripts to create two JMP data tables, one with 1,316 rows of activity data and the other with 34,432 rows of food items logged over nearly four years. I wrote a JMP add-in supporting these data types and CSV-formatted  food log files from the free MyFitnessPal website.

Post a Comment

Drawing 95% confidence intervals for parameter estimates from Fit Model

In my recent blog entry discussing the results of my adventures with hard-boiled eggs, one reader had asked how I created the figures with the confidence intervals for the parameter estimates from Fit Model. I typically use Graph Builder whenever I can for visualization, and the graph below with the parameter estimates for attractiveness of the eggs as the response was no exception.


Seeing that I needed a few extra steps to produce this graph, it seemed worthwhile to write a blog explaining how I did this. The biggest piece is being able to use Graph Builder, giving me the flexibility to add the customization I may want.

Getting the Values Out of Fit Model

I’ll assume that the model has already been fit using Fit Model; in this example, I’m using attractiveness of the egg as a response. You can find the data set on the JMP File Exchange. Our first step is to get the confidence interval for the parameter estimates. Right-click on the table under Parameter Estimates, select Columns and choose to add the Lower 95% to the table, and then repeat for the upper 95%. Alternatively, you could click the red triangle at the beginning of the report, and choose Regression Reports -> Show All Confidence Intervals.


Another right-click on that table gives us the option to “Make into Data Table,” which is what we choose. This data table contains a row for each term in the model, as well as the columns from the parameter estimates in Fit Model – particularly the lower and upper bounds. It’s this data table that we’ll use to create the graph.

In Graph Builder, you can move Term to the Y-axis on the left (I could put it on the X axis, but the length of the term names looks better on the Y), select Lower 95% and Upper 95%, and move these to the X-axis.

Drawing the Confidence Intervals

To get the intervals, I do the following:

  • Right-click in the graph, and choose Add-> Bar.
  • On the left-hand side, go to the Bar section, and change Bar style from “Side by side” to Interval (see figure below). Alternatively, a right-click on the graph, and Bar->Bar Style-> Interval.
  • On the top of Graph Builder, de-select the "Points" button to be left with just the intervals.


At this point, your graph should look something like this:


Adjusting the Ordering of the Terms

If you look at the graph, you’ll notice that the terms have been placed in alphabetical order, and not the order we had in the parameter estimates. There’s also the Intercept term, which you may or may not be interested in including. If we go back to the data table and exclude the intercept row, Graph Builder will update the graph automatically.

If you want the terms to appear in the same order as the original table, you can head back to the parameter estimates data table (while leaving Graph Builder open):

  • Right-click on the Term column and select “Column Info.”
  • From the Column Properties drop-down, select “Value Ordering.”
  • Rearrange the Term labels to the desired order.
  • Choose to reverse this after it’s all done (the top term on the Y-axis in Graph Builder corresponds to the largest value).
  • Click Apply.

Graph Builder will update itself with the new order for terms!

What if you didn’t care that the order matched up the parameter estimates table? For instance, maybe you prefer to have the terms sorted by the actual estimate. This is even easier – simply select Estimate and drag it to the right of the Y-axis.


Getting the Reference Line at 0

Adding the reference line is easy enough:

  • Double-click on the X-axis where the values are (or right-click and choose Axis Settings).
  • Under Reference Lines, select a color, and click the Add button.
  • Click OK.

Final Thoughts

You can even add points at center of the intervals (i.e., the estimate) by adding the estimate column to the X-axis by selecting it and moving it there. The axes and title can now be adjusted with a simple double-click. There are plenty of options in Graph Builder to customize it to your liking. Hopefully, you found this blog post useful!

Post a Comment

Using JSL to import BodyMedia® FIT® food log files into JMP

In my previous blog post, I described how I imported my BodyMedia® Activity Summary data files from Excel into JMP. Today, I will share how I automated importing nearly four years of my BodyMedia® food logs into JMP. I have also uploaded an add-in to the JMP File Exchange that you can use to import your own Activity Summary and Food Log files, as well as CSV-formatted food log files from the popular and free MyFitnessPal food logging website.

For my poster project (see details below), I created JSL scripts to combine, parse and format nearly 50 Activity Summary Excel files and 50 food log files exported from BodyMedia® software covering the time period from 12/20/10 to 7/28/14. I exported my food log files from the BodyMedia® web-based Activity Monitor software in PDF format.

When I first started working with my food log files, I realized that I would not be able to process them directly since JMP does not support import from PDF format. The experience that followed gave me a much better understanding of why that is the case. I have heard customers suggest that JMP should add support for PDF import, and JMP developers respond that tables in PDFs are intended for presentation purposes and are often not organized in a regular format like other file types that JMP supports. I experienced this firsthand with my own files!

Initially, I thought I could convert my PDF food log files to Excel format and import them using a similar approach to the one I used to import my activity summary files. While I tried a variety of software products promising easy PDF-to-Excel conversion, I can only characterize the whole experience as an epic fail. I found that the structure of the converted tables was highly irregular from page to page. After spending an hour manually editing a single Excel file into a more regular format, I gave up. I could not imagine spending a minute more on this incredibly boring task, and I had 40 files to process at that point! I was not willing to wait till Discovery Summit 2017 to get my data imported, so I decided to look for an easier and faster route.

I discovered that I could save my PDF files as text files using Adobe Acrobat. Unfortunately, the files contained space-delimited item names of varying length, followed by space-delimited numbers representing calories, serving size and macronutrient amounts. The point-and-click text import options in JMP require fixed length or delimiters between fields, so neither option produced an analysis-ready data table from my files. I decided instead to import and combine my files using a loop similar in structure to the one I wrote to import my Activity Summary Excel files. (If you read my earlier posts, you know that I borrowed heavily from a text import JSL example from the SESUG paper written by Michael Hecht.

Once my files were imported and concatenated, I could parse out individual pieces of information from each line of text using regular expressions in JSL. I had some trepidation about this path at first. Although had used Perl for regular expressions in graduate school, my regex skills were rusty, and I had not used regexes in JSL before. I visited JMP developer Craige Hales, who implemented the regular expression engine in JMP, and he provided plenty of encouragement and pointers to helpful examples.

To format my table for further analysis, I needed to loop over all the rows in the table and match patterns that corresponded to dates, meal names, calorie content and grams of various macronutrient types. I used loops like the following to parse out the data I needed and place each piece of information into its own column. Below, I checked for the meal name and captured it from where it appeared -- on the first line for each day. I then filled the cells of the Meal column with that meal name value until I encountered a new match, indicating the start of items for the next meal.

//Capture Meal
for (c=1, c <= y, c++,
	string = Combinedlog:Lifestyle and Calorie Management System[c];  
	//Check for match to Meal name
	w = ( Regex Match (
	     Pat Regex("^(Breakfast|AM Snack|Lunch|PM Snack|Dinner|Late Snack) (.*$)")));
	if (N Items(w) == 0,
	     Combinedlog:Meal[c] = meal;
	     continue ());
	Combinedlog:Lifestyle and Calorie Management System[c] = w[3];
	Combinedlog:Meal[c] = w[2];
	meal = w[2];

I extracted all the information I needed from using the general regex matching strategy shown above. After running the final version of my script, I had a nicely formatted table with 34,432 rows of food item names with associated calorie and macronutrient information organized by date and meal.

If you have BodyMedia food log files saved in PDF format, you can save them to text files in Adobe Acrobat and download my add-in from the JMP File Exchange to perform a point-and-click import of your own food log files into JMP. The tools in this add-in support Activity Summary files, food log files saved as text and food log text files exported from MyFitnessPal. Special thanks to JMP testing manager Audrey Shull and tester Melanie Drake for scripting suggestions and add-in testing help!

Unfortunately, the sense of victory I felt from my successful import was short-lived. A quick summary of my table revealed that it contained >1,800 unique food item names! I began to realize that my work with this data table was just beginning. My food log was filled with naming inconsistencies since I had selected a variety of differently named but highly similar items from the food item database over the years. I was going to have to clean up my food item names and group them in a more meaningful way if I wanted to begin to understand the patterns in my food log data. Stay tuned for part four of this blog series to learn more about my adventures in food item name recoding.

Check out this recent  blog post to learn more about how my interest in quantified self (QS) analysis projects led me to create a Discovery Summit 2014 e-poster titled “Analysis of Personal Diet and Fitness Data With JMP.” If you were not able to see my e-poster at the conference in Cary, you can sign in and see a PDF version in the Discovery Summit 2014 section of the JMP User Community.

Post a Comment

Using JSL to import BodyMedia® FIT® activity monitor data into JMP

In an earlier blog post, I introduced the topic of my JMP Discovery Summit 2014 e-poster titled “Analysis of Personal Diet and Fitness Data With JMP” and shared my interests in quantified self (QS) analysis projects. For my poster project, I exported two different types of data files from the web-based Activity Manager software by BodyMedia® and wrote JMP Scripting Language (JSL) scripts to import, format and clean my data for further analysis and visualization in JMP. I hope you were able to join me to hear more at the conference in Cary, but if not, you can sign in to the JMP User Community and check out a PDF of my e-poster in the Discovery Summit 2014 section. (Membership in the User Community is free and is a great way to learn from JMP users around the world!)

Today, I’ll share how I used JSL to import and combine a set of multi-worksheet Excel files containing activity and calorie summary information. The Activity Manager software also exports PDF Food Log files, which I’ll cover in my next blog post. By the way, I have uploaded an add-in to the JMP File Exchange that you can use to import your own BodyMedia® Activity Summary and Food Log files into JMP. It also includes a bonus add-in that imports CSV files that you can download from the popular MyFitnessPal food logging website using a Chrome extension.

In early August, I exported nearly 50 Activity Summary and 50 Food Log files covering the time period from 12/20/10 to 7/28/14. Activity Summary data is saved in Excel workbooks that contain six different worksheets, but I was interested in importing data from just the first five: Summary, Activity, Meals, Sleep and Weight. You can see the details of the contents of the other worksheets in my e-poster.

Excel workbook

Most of my Activity Summary Excel files contained four weeks of data, although some covered fewer weeks. All files had column headers on line 5 and most had 28 lines of data on the first four tabs, though the number of rows on the fifth tab (Weight) varied. Before 2013, I entered weight measurements manually into the Activity Manager. In January 2013, I began to use a Withings Smart Body Analyzer scale, which uploads weight measurements wirelessly into its own web-based software and shares them automatically via a third-party integration with the Activity Manager.

I decided to script the import process after interactively importing my data files twice, about a year apart. Interactive import was more time-consuming the second time because I had even more files to work with. Since I accumulate new data every day, my number of files will always grow over time. After the second import, I formatted my columns and added formulas to the combined data table, only to compare it to the table from my first import and realize that I had forgotten several important steps.

While I backtracked to complete the steps I'd missed, I began to consider how scripting could make reprocessing all my files much simpler and faster. If the import process was easier, I would look at new data much more often. Fortunately, I found that even a novice scripter like me could write the JSL to import and combine my files.

I started with Source scripts (automatically added to JMP data tables during interactive import) and JSL examples from online documentation and discussion forums. I looped over all my files using a strategy patterned after a text file import example in a SESUG paper written by JMP developer Michael Hecht. As the longtime Mac expert at JMP, Michael is well-known for his high-quality and elegant code in all languages, so why not borrow from a master?

I ran into some snags while working to combine my data into a single unified table. One that I had to handle prior to import was that the Date column on the Sleep tab contained an overnight range rather than a single day like all the other worksheets did, preventing me from merging it directly without a preprocessing step. For example, a value of 12/20/2012-12/21/2012 indicated the night of sleep that started on 12/20/14 and ended the morning of 12/21/14. I parsed out the second day’s date using the JMP Formula Editor and the JMP Word character pattern function.

Formula editor Date

After creating the formula interactively, I added code to my script to make these steps easily repeatable with help from the JSL Get Script command. Running the command MyColName << Get Script; on the Date column from the Sleep tab printed a key snippet of JSL to JMP’s log, which I added to my script to automate this step.  In my final table, the hr:m value displayed in the Sleep column represented how much I had slept before awakening that morning.

There was extra information in my input files that I didn’t want included in my final JMP table. Calorie totals, averages and targets were summarized at the bottom of all of my worksheets (lines 36-39 in the picture of the table above). I added steps to my script to filter out these summaries, the lines of information above the column headers and also blank rows. In total, the version of my activity data table that I used for my Discovery Summit poster contained 1,316 rows of daily data on calories eaten and burned, activity measures, sleep and weight measurements.

My data table wasn’t quite analysis-ready yet. The numeric columns in my table containing durations and percentages that were not formatted correctly upon import. I added formats, missing value codes and modeling types as column properties interactively. I used the Transform column feature on my Date column to quickly add new Date Time variables like Year, Month, Week and Day to my table, and then added to my script to automate those steps. I also added a new formula column to the table (Calories Burned-Calories Consumed) to represent my caloric deficit/excess for a given day.

If you have BodyMedia® Activity Summary files saved in Excel format, you can download my add-in from the File Exchange to perform a point-and-click import of your own files into JMP. This add-in supports the Activity Summary file type I just described, BodyMedia® food log files saved as text, and also food log text files that you can export from the popular (and free) MyFitnessPal with the help of a Chrome extension.

Special thanks to JMP testing manager Audrey Shull and technical writer Melanie Drake for scripting suggestions and add-in testing help! Stay tuned for my next blog post, where I’ll describe how I automated the import of my BodyMedia® food log files using JSL.

Post a Comment

Webcasts show how to build better statistical models

We have two upcoming webcasts on Building Better Models presented at times convenient for a UK audience:

These webcasts will help you understand techniques for predictive modelling. Today’s data-driven organisations find that they need a range of modelling techniques, such as bootstrap forest (a random-forest technique) and partial least squares (PLS), both of which are particularly suitable for variable reduction for numeric data with many correlated variables. For example, some organisations deal with a multitude of potential predictors of a response, sometimes numbering into the thousands. Bootstrap forest and PLS can help analysts separate the signal from the noise, and find the handful of important variables.

Other organisations deal with the problem of customer segmentation. They may need to employ techniques including cluster analysis, decision trees and principal component analysis (PCA). Decision trees are particularly good for variable selection. Using a variety of modelling techniques can result in a different selection of variables, which can provide useful insight into the hidden drivers of behaviour.

Consumer data is notoriously messy, with missing values, outliers and in some cases variables that are correlated. Missing values can be a real problem because the common regression techniques exclude incomplete rows when building the models. This "missingness" itself can be meaningful, so using informative missing techniques to understand its importance can help you create better models. Some techniques, such as bootstrap forest and generalised regression, handle messy data seamlessly.

A critical step in building better models is to use holdback techniques to build models that give good predictions for new data, as well as describe the data used to build the models. Holding back data to validate models helps to keep the model honest by avoiding overfitting and creating a more accurate model.

Analysts face a major hurdle in explaining their models to executives in a way that enables them to do "what if" or scenario analysis, thereby exploring decisions before committing to them. A powerful way to do this is by dynamically profiling the models. Once companies have selected the best model, they often want to deploy the models to score existing and new data so that different departments can take appropriate actions.

I hope you can join us for one of these live presentations where we will demonstrate how to use these predictive modelling techniques using case studies.

Post a Comment

Getting started with risk-based monitoring

Our own Richard Zink has written extensively about the risk-based monitoring (RBM) capabilities in JMP Clinical, both on this blog and, of course, in his book.

Risk-based monitoring diagram

A risk-based monitoring process feeds data from study sites into a dashboard, which then alerts the sponsor to situations that need further investigation.

As a complement to the wealth of hands-on information that Richard has created, which primarily covers the mechanics of RBM with JMP software, we decided to publish a brief article on getting started with risk-based monitoring.

The article covers several aspects of RBM, including:

  • A basic definition.
  • Why a risk-based approach is better (hint: it's a lot cheaper).
  • Details about how the monitoring process might work in this new model.
  • An overview of the risk dashboard, a key piece of an RBM platform.
  • How to approach the transition to RBM and dealing with organizational change.

If nothing else, we hope the article facilitates some conversations in your organization as you transition to the new world of risk-based trial monitoring.

Read the guide to getting started with risk-based monitoring.

Post a Comment