A rising sophomore in college, I am nearing the end of my summer internship with the JMP marketing team. While I’ve spent previous summers doing more technical work, I was interested in learning the ways that technical knowledge could help to solve business problems. I got the chance to complete
Tag: Recode
Data preparation before modeling is an unavoidable chore. One of the most time-consuming tasks can be cleaning up categorical data that may have misspellings, inconsistent capitalization and abbreviations, and the like. The Recode tool in JMP makes data prep a lot easier. Watch this video by my colleague Ryan DeWitt
If you read my last post, then you know that I’m giving myself the gift of data this holiday season! For me, collecting data on my diet and fitness habits is a gift that just keeps on giving. Although I may not look at all my data sets on a
With Halloween right around the corner, it's time to decide what costume to wear. The National Retail Federation did a survey to find out the popular costumes this year, and I thought it would be fun to explore and visualize the results of that survey. The survey asked three questions:
Recently, my colleague Ryan Lekivetz wrote about our trip to Discovery Summit Europe in Brussels and our plan to test whether Belgian chocolate was really better-tasting than US chocolate. Ryan has blogged in detail about the constraints of designing the study, as well as the factors involved. In this blog
I recently used a JMP add-in I wrote to import my complete set of BodyMedia FIT food log data files, including data from Dec. 21, 2010, through the last day I logged my meals in that software on March 29, 2015. My final data table contained 39,942 rows of food items names.
In my previous blog post, I shared how I created a table of workout information in JMP and summarized my workout patterns in 2014. To drill down into more detailed summaries of my data at the exercise level, I first had to clean up my data table with the JMP 12 Recode
Data entered manually is usually not clean and consistent. Even when data is entered by multiple-choice fields rather than by text-entry fields, it might need additional work when it is combined with data that may not use the same categories across sources. Sometimes the same categories are spelled differently, abbreviated
In an earlier blog post, I shared that I used the JMP 12 version of the Recode platform to clean up food item names in a data table containing nearly four years of food log information. I was able to halve the number of unique food item names that appeared in my ~35,000-row table, reducing the
Text data cleaning is an unglamorous but important step in statistics and analytics. Manually entered data is full of misspellings, typographical errors and inconsistencies. Even machine-generated data can cause problems if two data sources disagree on formatting. Errors must be fixed before analyzing data, because the tiniest difference makes two