Coming to Discovery Summit? Get the mobile app

Discovery Summit 2014 starts on Monday, and there's lots going on and much to know about the conference. You can get all that info as well as interactive features in a free app for iOS and Android. For a quick overview of what the app offers, watch this video:

 

With the mobile app, you can get the latest agenda, messages from the conference planner, speaker information and map of the conference venue. You can also:

  • Build your own agenda.
  • Find sessions you are interested in, based on level and topic.
  • Learn about JMP developers and their expertise. You can identify the right person to talk to during Meet the Developers sessions.
  • Rate and comment on sessions that you attend. We really want everyone to do this.
  • Take notes on attendees and developers you meet and sessions you attend -- and then email your notes to yourself.
  • Find and message other attendees.
  • Create a public profile of yourself for other attendees. Click the My Account section in the app menu to add information, including a photo.
  • Earn badges by checking into sessions, and you may win a conference prize. Check the Info section in the app for Badge Game Rules.

The Discovery Summit 2014 mobile app is available for:

How to get started with the app
The app is password-protected so that only registered attendees of the conference can use it. Once you launch the app, click the Login button. You will need to establish a password. Enter the email that you used to register for the conference for this process and click the "Email Password" button. You will receive an email that will enable you to set a password. Click the link in that email and set a password. Return to the app and enter your email and password. You will only have to do this once!

See you soon!

Post a Comment

Digging into my diet and fitness data

If you’re a regular reader of the JMP Blog, then you already know that those of us who work for JMP have taken a page from the Hair Club for Men. From our hobbies to internal activities, the people who work at JMP are also JMP users! I seriously considered using that classic line from Hair Club for Men commercials while preparing my poster titled “Analysis of Personal Diet and Fitness Data With JMPfor the upcoming Discovery Summit 2014 conference in Cary, NC.

Over my next few blog posts, I will be sharing some of what I have learned while preparing my Discovery Summit poster along with a few reflections on what I have learned during the first year of what I like to call “the PhD of me.”

My interest in self-tracking grew from a long struggle with my own weight and emotional eating habits. As an overweight middle school student, I discovered that tracking my meals and strength training workouts in a notebook helped me reach a healthy weight. Unfortunately, during stressful periods of my life, I often returned to food for stress relief. I gained 30 pounds during college and lost it in graduate school, only to gain 60 pounds when my first pregnancy coincided with end of my degree program. I have experienced several large weight swings since that time, yet I reached my goal of being in a healthy weight range by the time I entered my second pregnancy in early 2011. Within the first 6 months after my son's birth, I lost all the weight I had gained, and tracking my diet and workouts was an integral part of the process. I have maintained my weight within a much smaller range since spring 2012. While maintaining, I find that continuing to track my eating habits and activity level helps me stay mindful and avoid the patterns that caused me trouble in the past.

Wouldn’t it have been better just to share a graph instead of writing the paragraph above? As you may already know, JMP 12 offers me new tools to do exactly that!

Weight Graph Grad School to Present 9-9-14

Over the years, I have used a variety of heart rate monitors, pedometers, pulse and blood pressure monitors, though having to take and collate manual notes on those measures has been a hurdle in getting more of my data into analysis-ready format. I have been thrilled with the evolution of activity monitors and smart phone apps capable of collecting data passively without the need for extensive note-taking. In fact, the rise of activity monitors has fueled a whole movement called the Quantified Self (QS) that includes people like me who track activities ranging from diet and fitness information to Internet use, sleep, stress levels or other measures.

It may sound a bit weird or obsessive to people who don’t track information about themselves, but many QS fans find daily data collection incredibly useful in identifying and optimizing their dietary habits and daily routines. QS data can even be useful in health related pursuits. Some have used it to successfully pinpoint mysterious food or environmental allergy triggers. This past spring, my dad sent me a link to Gary Wolf’s 2010 Ted talk, and his description of the QS movement sounded immediately familiar to me. I identify with the QS movement more than ever after nearly four years of using a BodyMedia® FIT armband activity tracker and its food-logging software.

Like many users of such devices, I depend on the daily dashboards, weekly and monthly reports provided by the monitor’s web and app-based software to see short-term trends. I never seriously considered getting my data out of the tracking software and into JMP until I had accumulated years’ worth of food logging and activity data. Unfortunately, the longest time frame I could specify when exporting my activity data or food log information was 28 days. After importing my activity data from multi-worksheet Excel files interactively once with just two years’ worth of exported data files, I concluded that I would need to automate this process through scripting.

Like it is for so many of our customers, JMP was the perfect tool to help me move beyond standard reports to truly exploring my data.  If I had realized how much I would learn from tackling this seemingly unrelated analysis project, I would have started it sooner! I mapped out the steps I needed to take to get my activity summary data in from each of the multiple worksheets and collected snippets of JSL code, including a very helpful loop example from a SESUG paper written by JMP Mac developer Michael Hecht. Soon, I was able to merge, clean and format my combined data table.

The scripting experience I gained from successfully tackling the Excel file import helped me with my next challenge: importing my food log files. While I hoped to use a PDF-to-Excel conversion program, I found the structure of the PDF tables in the BodyMedia® files was not regular enough to convert cleanly to Excel.

I converted my food log files to text instead and imported them into JMP. With advice from JMP developer Craige Hales, I parsed out the information I needed using the JSL-based regular expression engine in JMP. When I got stuck, I depended heavily on online scripting resources and helpful suggestions from resident JSL experts Melanie Drake, Rosemary Lucas and Audrey Ventura. Upon completing the project, I decided to submit an abstract for Discovery Summit covering the import, processing and visualization of my data.

My Discovery Summit poster shares more details about how I imported and prepared nearly four years of two different types of data collected with my activity monitor armband and its web- and app-based food logging software. I hope you will join me at the poster session on Wednesday, Sept. 17, to learn more about how I’ve used my own data to better understand the patterns in my weight loss and maintenance efforts.

If you stop by at my poster, you’ll also get a sneak peek at some of the new features coming in JMP 12 because I used many of them! If you are a member of the JMP User Community, you can see a PDF version of my poster on the JMP Discovery Summit 2014 community.  Those of you who know me are probably not surprised that this blog post was edited for length.  You can read a longer version of this post on my JMP User Community blog here. (Psst. It’s free and simple to become a member of the JMP User Community.)

Upcoming blog posts will share more about how I got my fitness and diet data into JMP and worked with JMP visualization expert and developer Xan Gregg to optimize visualizations that appear on my Discovery Summit poster.

Post a Comment

5 intriguing things I learned from 'Contagious'

Author and professor Jonah BergerAs one who works in communication, I was keenly interested to read Contagious:Why Things Catch On, by Jonah Berger, a Wharton School of Business marketing professor.

I enjoyed many of the stories in the book about why ideas, products and information spread, but there were a handful of examples that particularly intrigued me:

  1. The physical building in which you cast your vote in political elections can affect your vote. So if you vote in a public school building, as I used to do, you would be more likely to vote in favor of raising taxes to support public schools. Seeing classrooms, hall lockers and children's artwork as you walk to the voting booth has an effect on your vote. I now vote in a neighborhood church; I wonder how that will influence my voting.
  2. So many Vietnamese immigrants became manicurists simply because of word of mouth. It all began with Vietnamese refugees who had escaped the fall of Saigon and were living in a tent city outside Sacramento, California, in 1975. The actress Tippi Hedren visited the refugees in the camp frequently. Some of the Vietnamese women, who had had impressive careers of their own in Vietnam, admired Hedren's manicured nails. So the actress brought her manicurist to the camp to teach these women how to do nails. These 20 women subsequently enjoyed success with their own nail salons and spread the word to new immigrants. Now 80 percent of manicurists in California and 40 percent nationwide are Vietnamese-American.
  3. People bought candy bars because they heard about a NASA mission to Mars. News reports of the Pathfinder mission in 1997 kept the planet Mars at the top of Americans' minds, and they bought more Mars candy bars as a result. It sounds bizarre since there was no connection between the stories about Pathfinder and candy. But it's an interesting point: People may respond to cues about products, whether intended or not, whether in context or not.
  4. Sporting a mustache in November raises money and awareness of a cause. When a guy you know goes from being clean-shaven to growing a "mo," it's visible and obvious, and you've got to ask about it. When you do ask, you find out he's raising awareness of prostate cancer as well as raising money for the Prostate Cancer Foundation. The ALS Ice Bucket Challenge works in the same way: Each video posted online makes support for the ALS cause visible. It's a public declaration, and it starts (or continues) conversations.
  5. People will happily spend $100 on a Philly cheesesteak sandwich. I found this difficult to believe partly because I have little appreciation for beef, having grown up in a vegetarian household. Of course, the cheesesteak from Barclay Prime in Philadelphia is no ordinary cheesesteak sandwich. It's seriously gourmet, with sophisticated ingredients like Kobe beef, heirloom tomatoes, black truffles and lobster tail. That it costs $100 makes it newsworthy.

Jonah Berger will be a keynote speaker at Discovery Summit 2014 in a couple of weeks. I expect he will be telling these sorts of stories in his speech, and I hope you'll be in the audience along with me and hundreds of other JMP users. If you are joining us, you'll receive a signed copy of Berger's book.

Post a Comment

Exploring data on the best pizza in the US

I am of Italian descent from the greater New York City area, so it should be no surprise that I love pizza. My interest was piqued when my niece Samantha recently posted a ranking of the “101 Best Pizzas in America," according to the Daily Meal® website, which conducted the voting on 700 pizza shops by 78 food experts. The list contains the name of the restaurant, its signature pizza, and its city/state.

I decided that this data was worthy of bringing into JMP and exploring further (don’t judge me). So I imported the list, looked up every address, and ran the JMP Geocoder Add-In to ascertain the longitude and latitude of each restaurant. (You can download the add-in with a free JMP User Community account.) I then used Recode to create a new column for US regions, and started to investigate the data.

I first decided that Tabulate would be a good place to start to see which region has most of the winners (you know where this is going, don’t you?). I clicked on the box for Order by count of grouping columns to sort the data in descending order.

Let’s take a look:

JMP_tabulate_pizza

And, as I expected, the Northeast had the most.  But I didn’t expect that it would contain more than half of the total. My next surprise was that the Southwest narrowly beat out the Midwest for second place. (Oh, I can see all the nasty emails from my Chicago in-laws and friends pouring in.)

Then I drilled down by state:

JMP_tabulate_pizza2

Again, no surprise that New York is numero uno, but it was nice to see that Connecticut (CT) trounced New Jersey (NJ) for the number two spot. (More emails, sigh, but I concede that Jersey is cooler because they have The Boss.)

California was the big winner in the Southwest – more on that later. And of course Chicago dominates the Midwest.

Still, there was more to learn by looking at the data plotted on a map. So here it is using Open Street maps:

JMP_bubble_plot_pizza

First I looked at California (CA) to see where its winners were located:

JMP_bubble_plot_pizza2

Six of the 10 California places are in San Francisco, with two more in Oakland and Berkeley. So, good for you, folks in the Bay Area!

But ultimately, I was most interested in the Northeast:

JMP_bubble_plot_pizza3

And what did I learn? The biggest concentration of the best pizza in the US is Brooklyn, New York. From my hometown of Bridgeport, Connecticut, all 53 of the places in this area are within a 150-mile radius. But most importantly to me, the No. 1 spot goes not to a pizzeria in New York City, but to Frank Pepe of New Haven, Connecticut, just 20 miles from where I grew up.

JMP_bubble_plot_pizza4

I’ll definitely need to have a slice when I’m visiting family.  But hey, since they are all within driving distance from Bridgeport, maybe I should hit them all.

Post a Comment

Why design of experiments keeps the science in science

Custom Design dialog box in JMP

Design of experiments complemented my work as a chemist.

In a recent discussion in the LinkedIn DOE group, I learned that some scientists resist the use of design of experiments because they believe that using DOE would be "taking the science out of science.” I believe this type of resistance comes from the minds of those who fear change. The unfortunate effect of this fear is that scientists end up chasing noise and experimenting less efficiently than they could by leveraging the synergy of statistics and science.

A George Box quote that I love is "Discovering the unexpected is more important than confirming the known." As a chemist at Kodak for 28 years, I was involved in hundreds of designed experiments in the chemistry space.

The patents that we obtained during those years were due to finding the exceptions to the rules rather than confirming the known. The non-intuitive findings and efficiencies gained using DOE gave us a significant competitive advantage. We always had the threat of outsourcing chemical manufacturing hanging over our heads, and we found out that rarely could anyone elsewhere compete with the quality and cost of our chemicals. (Stay tuned for some non-intuitive discoveries that I will share in a upcoming blog post celebrating the 25th anniversary of JMP.)

The science was paramount at the critical stage of discovering the synthetic route – which is the most efficient pathway to produce a compound – and chemistry. Once you have a synthetic route that accounts for all of the health, safety and environmental considerations and you have a chemical compound that is fit for use, then you need to switch your focus from feasibility to manufacturability. You gain significant efficiencies by knowing when to apply DOE to the identified process and synthetic route.

During this phase, DOE helps you understand the impact of the variation of the inputs on the outputs on the manufacturing process and the nuances of the entire measurement system. This understanding is key to being able to deliver on a commercialization timeline. And this, in turn, is critical to running a business with the lowest inventory, which you are now able to do because your processes are well-defined and predictable.

The bottom line is that DOE and science – in my case, chemistry – are complementary, not contradictory. Scientists can and should use DOE to make their work in developing and improving processes much more efficient. They’ll have more time to focus on the science they love.

Post a Comment

Collecting iTunes data – Part 2

In my previous blog post about iTunes data, I showed a simple JMP Scripting Language (JSL) program to parse the iTunes XML file. I ended the post wondering whether the code would perform well on my full 13.3MB iTunes Library.xml file.

The JSL program ran in less than a second on my 7KB test file. How long did it take for the entire 13.3MB? It took a full 90 seconds. Where was the time being spent? The answer to that is the focus of this post, which is part two of a series about my project collecting iTunes data.

How do I easily improve my script’s performance?

Before JMP 11, this was not an easy question to answer. JMP 11 introduced the JSL Profiler. This is a great tool to assess where a JSL program is spending time on a line-by-line basis.

To use the JSL Profiler, I clicked the “Debug Script” icon at the top of the JSL editor, and then clicked on the “timer” icon to put the debugger into profiler mode. Running my script on the full 13.3MB file showed that the code spent all its time in the Parse XML line, which includes the Add Row() command. My guess is that most of the time is being spent updating the table state.

JSL Profiler Report

The JSL Profiler Report in JMP

I realized I could add a JSL message to the data table to hold off all update messages. The message is:

  dt << Begin Data Update;

At the end of the data table processing the message, I would need to add this:

 dt << End Data Update;

After adding these messages, the program took only about 13 seconds to run, rather than 90 seconds – an 85 percent reduction in processing time!

Now I need to clean up some of the data that I do not want for my analysis. My data has a lot of records that detail which tracks are part of playlists. I need to add some code to my JSL to not process tags that contain playlist data. In my data, this was about 10 percent of the total.

iTunes Playlist data table

iTunes Playlist Data

Thanks to JMP developer Michael Hecht for suggesting the following code that I added to the JSL. Now any “key” tags that had “Playlist” as text values were not processed as data.

OnElement(
	"key",
		End Tag(
			txt = XML Text();
				If(
					process tag,
						If(
							txt == "Playlists", process tag = 0,
							Is Missing( Num( txt ) ),
							raw dt << Add Rows( 1 );
							raw dt:Key[Row()] = txt;
						),
					txt == "Tracks", process tag = 1
			);
		)
	)

You can download my JSL program from the JMP File Exchange and investigate your own iTunes data. (Download requires a free JMP User Community account.)

I’ll save the full analysis of the data for my last post, but I’ll give you a preview. Below is a Treemap showing my listening by genre and artist. I do love Bruce Springsteen and have “forced” my kids to listen to him a lot over the years, even taking them to many live concerts. I also love opera, jazz and blues. As you can see, I just love music.

Tree Map of Play Count by Genre and Artist using the JMP Tree Map platform.

Treemap of Play Count by Genre and Artist using the JMP Treemap platform.

Post a Comment

Welcome back from holiday!

discovery2014-europeWe know the timing for our first-ever European Discovery Summit call for papers wasn’t ideal. Basically, it spans the entire summer holiday season. But luckily for all of the JMP champions in Europe, you have a bit of time left.

Before settling back into your regular work routine, take a few minutes to think about the ways you use JMP. Does it help you save energy, save time or save money? Do you use it to design efficient experiments? Does it help you communicate important statistical information to people who don’t otherwise get statistics?

Think about the value you see in JMP and ask yourself, “Would other JMP users find my application interesting or helpful?” If the answer is yes, then please share how you use JMP products by sending in a paper or poster abstract.

Here’s the process: Submit your abstract for a paper or poster by 19 Sept. Our Steering Committee  – a group of JMP champions from various industries and various European countries – will vote on the submissions in October. The authors of the selected paper and poster presenters will be our guests at Discovery Summit Brussels, showcasing their work on the 24 and 25 of March 2015.

Need a little inspiration? Check out the paper abstracts that were selected for Discovery Summit 2014 at SAS world headquarters in Cary, North Carolina.

Don't wait any longer. Holiday is over. And this opportunity to shine in front of analytic leaders in industry and academia is serious business. You’ll find all you need to know on the European Discovery Summit call for papers page.

Post a Comment

The eggciting results of my designed eggsperiment

In my previous blog entry, I talked about my frustrations in making good-looking hard-boiled eggs that were easy to peel. My Internet searches found a number of different techniques that cooks said were essential to success, but I wanted to know which techniques were best. So I set up a designed experiment to study the factors that affect the qualities of a hard-boiled egg. Now for the results…

How Do I Analyze This?

The responses measured were peel time, attractiveness of the egg and ease of peel.

The responses measured were peel time, attractiveness of the egg and ease of peel.  (Photo by Caroll Co)

First off, it’s important to notice that the Custom Designer added a column called Whole Plots (which indicates a batch of eggs that went into a pot) to the data table. We need to include this column as a random effect to ensure appropriate analysis. Fortunately, since we’ve created this design in JMP, you’ll notice a "Model" script on the left-hand side of the data table. Selecting Run Script brings up the Fit Model dialog with the model we had selected in Custom Design, and with the whole plot included properly. Alternatively, you could also go through Analyze -> Fit Model. Before hitting the Run button, I like to change the emphasis to Effect Screening to look at the Profiler.

The Results
I must admit, I had reservations about whether or not any effects would show up as significant. The results surprised me.

Peel Time
There was so much variation in the peel times that nothing came out significant at the 0.05 level. However, the effect of the cooking method was quite large, and the lack of significance could be in part due to having only 4 denominator degrees of freedom. The results suggest that starting the eggs in cold water increases the peeling time of the egg.

eggs_peel_time_estimates2

Or, if you prefer something more graphical, here are the 95% confidence intervals for the estimates.

peel time CIs

If I ignore the split-plot structure and simply look at the box plots of the peel time for the 12 eggs from each cooking method, the boiling start method looks to be more promising in terms of peel time. Note that I can’t throw away the split-plot structure if I’m doing analysis; I’m using this more as a guide to see if there’s reason for further experimentation – particularly since most instructions for hard-boiled eggs start from cold water.

eggs_peel_time_boxplot

Attractiveness
The attractiveness rating provided some interesting results. While there are some statistically significant results here, the largest effect is still the cooking start.

eggs_attractiveness_estimates2

attractiveness CIs

The results suggest that the boiling start leads to more attractive eggs. We also notice that not cracking the egg before cooling leads to more attractive eggs. To get a better idea of what’s happening with the interactions, I like to use the Prediction Profiler.

If we open up the Profiler and select “Maximize Desirability,” we find the best predicted settings are a boiling start, an old egg, cold-water bath and no cracking before cooling. This is particularly interesting since the model suggests better results using a cold-water bath instead of an ice-water bath. At the very least, it suggests that I would want to have the cooling method in my next experiment.

eggs_attractiveness_profiler

Ease of Peel Rating
The ease of peeling showed one significant effect. If you’ve read through to this point, you can probably guess which effect it is. When you consider that there were only six whole plots, it’s a bit surprising that I found a difference based on the cooking method.

eggs_ease_estimates2

peel ease CIs

Final Thoughts

The bottom line is that the best predicted settings were a boiling start, an old egg, cold-water bath and no cracking before cooling. When thinking about both attractiveness and ease of peel, the results suggest that I still want to study all four factors in the next experiment (with some additions as well).

This experiment has provided me some real food for thought (pun intended). You may have noticed that I had nothing about the taste/quality of the eggs in the response. In part, this was because it was going to be difficult to keep track of, and I didn’t want to cut the eggs right after they were peeled. That said, the eggs that we’ve eaten thus far have been better cooked than any batch I can remember preparing on my own in the last few years.

I also noticed that placing the eggs straight into boiling water led to cracks in some of the eggs (in three out of 12). This didn’t seem to affect the responses, but it is something that I would likely pay better attention to next time.

It certainly looks like there’s more experimenting to be done on this. In the future, I would increase the number of whole plots by using fewer eggs per batch and possibly add another level of randomization so that I can cook multiple batches of eggs at a time in different pots. This would also let me investigate some additional factors.

Thanks to everyone for the fantastic comments thus far, here and in the LinkedIn DOE group. Some of the additional items next time would be adding salt to the water, having cooking/cooling time and temperature as continuous factors (which is made all the more interesting after discovering fascinating material such as this blog post about soft-boiled eggs).

What factors would you like looked at next time? Do you have any other ideas for experiments not involving eggs that you would like to see? Please leave a comment to let me know!

Post a Comment

Using JMP Scripting Language (JSL) to collect iTunes use data

I was looking for good data to demonstrate the new Query Builder, which is coming in JMP 12 in March. The Query Builder is a modern SQL database query tool that helps users discover interesting data and make correct queries that are repeatable and sharable.  The data needed to be different from the sample data that ships with JMP. It needed to be a set of tables, which is how many of our customers store their data. And this set had to have a defined relationship among the tables, each describing a different topic of interest.

Where should I look for data that contained multiple topics and was rich enough to show how the software could speed up statistical discovery? It occurred to me that many of my co-workers use iTunes to listen to music and that they may be generating interesting data from their listening.

With a little research on the web, I found that iTunes generates an XML file to make music and playlists available to other applications.  This sounded like a good fit for JMP. And because I am a data analysis geek, looking at iTunes data in JMP seemed fun, and I could get the data I needed.

I wrote an XML parser for the iTunes file using JSL. I also wrote a brief survey in JSL to collect demographic data from the people who were willing to share their iTunes data. The project broke neatly into four parts:

  1. Reading XML with JMP
  2. Improving the performance of JSL using the Performance Profiler in the JSL Debugger
  3. Using the Application Builder to construct the survey
  4. Analyzing the iTunes data

Reading XML with JMP

We often get the question, “Does JMP open XML Files?” This is a tricky question; XML is a text file that contains information organized in an arbitrary tree of “tags.”

Let’s take a look at a simple XML example containing data for two variables, X and Y, with three rows of data.

Here's an XML data example:

<table name='fromxml'>
	<col name='x'>[1 2 3]
	</col>
	<col name='y'>[11 22 33]
	</col>
</table>

Here's a simple JSL example to parse the XML data:

example = load text file("xml data example.xml");
 
Parse XML
 ( example,
	On Element
	 ( "table", 
	    Start Tag( New Table( XML Attr( "name" ) ) ) 
	  ),
	On Element
	 ( "col", 
	    End Tag ( New Column( XML Attr( "name" ), 
            Set Values( Parse( XML Text() ) )   ) )
	  )
  );

Notice the three key elements used in the JSL above. The On Element syntax indicates what JSL expression to execute when the “table” or “col” string is found within the XML “tag” denoted by the symbols. The Start Tag and End Tag indicate what JSL expression should be executed at the start of the “table” or “col” tags or at the end of the tags. In this case, a New Table expression is executed whenever a “table” tag is started, and a New Column expression is executed whenever a “col” tag is ended (by a tag). Running the New Column and Set Values at the end tag ensures that the values for the column have all been read before setting values.

Here’s the resultant JMP data table:

itunes_data_table

Apple iTunes XML

Now let’s take a look at how Apple stores iTunes data in XML. The XML is in a format based upon Apple’s “plist” data structure. I found a document on the Apple website that gave the basics of the file elements.

Here is a snippet of the iTunes Library.xml file that is generated by my use of iTunes. The key tag is used as a generic trigger for what could be a column name or to indicate that a new track is coming.  The other tags (integer, string, etc.) denote observations in the data and give the types of data.

<key>2739</key>
<dict>
    <key>Track ID</key><integer>2739</integer>
    <key>Name</key><string>Fortune Plango Vulnera</string>
    <key>Artist</key><string>Carl Orff</string>
    ...
</dict>

Here is a short JSL program I wrote that seemed to work fine with a reduced set of my iTunes Library.xml file. To create the reduced set of data, I simply opened the .xml file (mine is 14MB) in a text editor and copied the first few tracks worth of data.

iTunespath = "~/Music/iTunes/iTunes Library short.xml";  
 
// Load the xml file into memory in the Global 'iTunespath'
cd_file_contents = Load Text File( iTunespath );
 
//Create an empty data table to hold the raw data
raw dt = New Table( "iTunes Data raw" );
raw dt << New Column( "names", character );
raw dt << New Column( "Values", character );
 
//Parse the iTunes xml
Parse XML( cd_file_contents,
	OnElement( "key", Start Tag( If( Row() > 0, raw dt << Add Rows( 1 ) ) ), end tag( Column( raw dt, "names" )[Row()] = XML Text() ) ),
	onElement( "integer", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "real", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "date", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "data", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "string", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "true", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	onElement( "false", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
	End Tag( row++ )
);  //end parse of the iTunes xml

Here is what the program produced:

itunes_JMP_data_table2

Notice that there is some data at the beginning that I do not need, but I can clean that up later.

Now, the big question is, will the script scale to my 14MB file? Execution time, by my watch, was 1 minute 45 seconds.  In my next blog post, I will show how to use the JSL Debugger’s Performance Profiler to find out how to speed up my code.

Post a Comment

An eggciting designed eggsperiment

What's the best method for getting hard-boiled eggs that are easy to peel and attractive? (Photos by Carroll Co)

What's the best method for getting hard-boiled eggs that are easy to peel and attractive? (Photos by Caroll Co)

A typical scene in my kitchen: I make a batch of hard-boiled eggs with the hope of an easy peel and a beautifully cooked center. But when I sit down to enjoy my egg, I find that, sadly, it’s not so easy to peel – or I have discoloration around the yolk (or worse yet, sometimes both occur).

Here's how I've been preparing my hard-boiled eggs: I start with the eggs in a pot of cold water. Then, I bring the pot to a boil, remove it from the heat and cover the pot for 12 minutes. After a recent disappointing experience with both overcooked and hard-to-peel eggs, I decided to investigate further in a quest to make better hard-boiled eggs.

My Internet search revealed that almost everyone claims to have a foolproof way to make hard-boiled eggs, but a quick browse through comments shows mixed results. Some common themes and questions appear, so it sounded like the perfect opportunity to use a designed experiment to separate fact from folklore.

For a first try at this eggsperiment, my budget for runs was two dozen eggs – same size/brand, purchased two weeks apart. Perhaps in a future experiment, I will use more eggs, but I wanted the peeler (my wife) to be blinded from knowing how the egg was prepared. Since I wasn’t going to be doing the peeling, 24 eggs seemed to be the limit of asking for help from my wife. I also quailed at the thought of having to eat so many egg salad sandwiches in a short period of time.

While most cooking methods for hard-boiled eggs start with cold water, a recent blog post had me intrigued about putting the eggs directly into boiling water.

So I ultimately decided on the following factors to study:

  1. Cooking method (start with cold water or put into boiling)
  2. Age of the egg (purchased two weeks ago or newly purchased)
  3. Cooling method (ice bath or cold tap water)
  4. Pre-cool crack (yes or no)

The pre-cool crack indicates whether I cracked the egg before using the cooling method in 3. If you’re familiar with design of experiments, you may recognize that not all of these factors are equally easy to change. For factors 2-4, I can assign these on an egg-by-egg basis (that is, they’re easy to change). For the cooking method, it is much more convenient if I cook more than one egg at a time. Thus, cooking method is a hard-to-change variable, or whole plot variable in the parlance of split-plot designs.

This means that the estimate of the effect of the cooking method is based on the number of batches I cook rather than the number of eggs. I ultimately decided on six batches of four eggs, or six whole plots. While this gives me only three batches for each cooking method, I hoped that I would get at least some indication whether changing the cooking method mattered. For the easy-to-change factors, I’m more likely to detect the important effects because of the number of eggs I have.

For the cooking method, I cooked one batch at a time in the same pot. I used the same amount of water in each batch (2 cups). The start with cold water was heated on medium until the water reached 188 degrees Fahrenheit, at which point I turned off the heat and covered the pot for 10 minutes. For the boiling method, I waited until the water just started boiling and put the eggs in for 11 minutes, while reducing the temperature to medium so that the water was simmering.

The Responses

The responses I measured were peel time, attractiveness of the egg and ease of peel.

The responses measured were peel time, attractiveness of the egg and ease of peel.

My main purpose here was to find out about ease of peeling, but there is still the aspect of whether or not a peeled egg is aesthetically pleasing. The final responses measured were:

  1. Peel time (in seconds)
  2. Attractiveness of the egg (rating from 1 to 5)
  3. Ease of peel (rating from 1 to 5)

While 1 and 3 seem similar, the peel time is likely to be very noisy and may not always pick up on frustration that can arise while peeling, which ease of peel should capture.

The Experiment

Now it’s time to design the experiment. The first step is to enter my responses and factors in the Custom Design platform, which is the first item under the DOE menu. We get something that looks like this:

eggs_factors_responses

Notice that all of the factors are set to “Easy” under the Changes column in the Factors table. To change cook start to be hard-to-change, click on the “Easy” under the Changes column for the cook start factor and select “Hard” from the list that comes up.

eggs_make_HTC

If we click the Continue button at the bottom, it’s time to set up the rest of the design. By default, the model is set to be able to estimate the main effects. With 24 eggs, we should be able to look at two-factor interactions, so I select Interactions -> 2nd to have the Custom Designer ensure the design can estimate all the main effects and two-factor interactions.

eggs_make_interactions

Finally, we need to set up the appropriate run size. Recall that we want six batches of four eggs (24 eggs total). Under the Design Generation tab, this means we set the Number of Whole Plots to 6, and the Number of Runs to 24.

eggs_design_generation

Clicking the Make Design button, and the experiment is ready to go. The design will look something like this:

egg_final_design

Any predictions as to the results? I’ll reveal the results next week.

Post a Comment