In my previous blog post, I described how I imported my BodyMedia® Activity Summary data files from Excel into JMP. Today, I will share how I automated importing nearly four years of my BodyMedia® food logs into JMP. I have also uploaded an add-in to the JMP File Exchange that you can use to import your own Activity Summary and Food Log files, as well as CSV-formatted food log files from the popular and free MyFitnessPal food logging website.
For my poster project, I created JSL scripts to combine, parse and format nearly 50 Activity Summary Excel files and 50 food log files exported from BodyMedia® software covering the time period from 12/20/10 to 7/28/14. I exported my food log files from the BodyMedia® web-based Activity Monitor software in PDF format.
When I first started working with my food log files, I realized that I would not be able to process them directly since JMP does not support import from PDF format. The experience that followed gave me a much better understanding of why that is the case. I have heard customers suggest that JMP should add support for PDF import, and JMP developers respond that tables in PDFs are intended for presentation purposes and are often not organized in a regular format like other file types that JMP supports. I experienced this firsthand with my own files!
Initially, I thought I could convert my PDF food log files to Excel format and import them using a similar approach to the one I used to import my activity summary files. While I tried a variety of software products promising easy PDF-to-Excel conversion, I can only characterize the whole experience as an epic fail. I found that the structure of the converted tables was highly irregular from page to page. After spending an hour manually editing a single Excel file into a more regular format, I gave up. I could not imagine spending a minute more on this incredibly boring task, and I had 40 files to process at that point! I was not willing to wait till Discovery Summit 2017 to get my data imported, so I decided to look for an easier and faster route.
I discovered that I could save my PDF files as text files using Adobe Acrobat. Unfortunately, the files contained space-delimited item names of varying length, followed by space-delimited numbers representing calories, serving size and macronutrient amounts. The point-and-click text import options in JMP require fixed length or delimiters between fields, so neither option produced an analysis-ready data table from my files. I decided instead to import and combine my files using a loop similar in structure to the one I wrote to import my Activity Summary Excel files. (If you read my earlier posts, you know that I borrowed heavily from a text import JSL example from the SESUG paper written by Michael Hecht.
Once my files were imported and concatenated, I could parse out individual pieces of information from each line of text using regular expressions in JSL. I had some trepidation about this path at first. Although had used Perl for regular expressions in graduate school, my regex skills were rusty, and I had not used regexes in JSL before. I visited JMP developer Craige Hales, who implemented the regular expression engine in JMP, and he provided plenty of encouragement and pointers to helpful examples.
To format my table for further analysis, I needed to loop over all the rows in the table and match patterns that corresponded to dates, meal names, calorie content and grams of various macronutrient types. I used loops like the following to parse out the data I needed and place each piece of information into its own column. Below, I checked for the meal name and captured it from where it appeared -- on the first line for each day. I then filled the cells of the Meal column with that meal name value until I encountered a new match, indicating the start of items for the next meal.
//Capture Meal for (c=1, c <= y, c++, string = Combinedlog:Lifestyle and Calorie Management System[c]; //Check for match to Meal name w = ( Regex Match ( string, Pat Regex("^(Breakfast|AM Snack|Lunch|PM Snack|Dinner|Late Snack) (.*$)"))); if (N Items(w) == 0, Combinedlog:Meal[c] = meal; continue ()); Combinedlog:Lifestyle and Calorie Management System[c] = w; Combinedlog:Meal[c] = w; meal = w; );
I extracted all the information I needed from using the general regex matching strategy shown above. After running the final version of my script, I had a nicely formatted table with 34,432 rows of food item names with associated calorie and macronutrient information organized by date and meal.
If you have BodyMedia food log files saved in PDF format, you can save them to text files in Adobe Acrobat and download my add-in from the JMP File Exchange to perform a point-and-click import of your own food log files into JMP. The tools in this add-in support Activity Summary files, food log files saved as text and food log text files exported from MyFitnessPal. Special thanks to JMP testing manager Audrey Shull and tester Melanie Drake for scripting suggestions and add-in testing help!
Unfortunately, the sense of victory I felt from my successful import was short-lived. A quick summary of my table revealed that it contained >1,800 unique food item names! I began to realize that my work with this data table was just beginning. My food log was filled with naming inconsistencies since I had selected a variety of differently named but highly similar items from the food item database over the years. I was going to have to clean up my food item names and group them in a more meaningful way if I wanted to begin to understand the patterns in my food log data. Stay tuned for part four of this blog series to learn more about my adventures in food item name recoding.
Check out this recent blog post to learn more about how my interest in quantified self (QS) analysis projects led me to create a Discovery Summit 2014 e-poster titled “Analysis of Personal Diet and Fitness Data With JMP.” If you were not able to see my e-poster at the conference in Cary, you can sign in and check out a PDF version in the Discovery Summit 2014 section of the JMP User Community.