Are Lilly Pulitzer for Target items still a bargain on eBay?

Lilly Pulitzer box

Photo courtesy of Caroll Co.

Readers in the US may have seen ads for the Lilly Pulitzer collection at Target. On Sunday, April 19, this collection launched with many Target stores having queues that resembled those for Black Friday sales. Most products sold out within a few minutes of the store openings. Online was no better, and items lasted as long as they did mainly due to technical difficulties.

With demand far exceeding supply, it’s not surprising that shoppers were upset that a very large number of items have appeared on eBay at a much higher price than retail. You can read plenty of discussion on this if you follow the Twitter hashtag #LillyforTarget or do an Internet search.

Because of the shopping frenzy (and my wife's interest in the brand), I was curious about how close the secondary market prices are to reaching the prices you would pay for items directly from Lilly Pulitzer, which are typically made with different materials, and potentially have a larger variety of prints.

I found 10 different Lilly Pulitzer for Target items that closely resembled items on the Lilly Pulitzer website. I chose a variety of items from the Target collection, and the two sets of items passed my wife’s “close-enough-to-the-same” test. I collected data for 30-50 sold listings on eBay of each item from the evening of April 20, and took the median of the total price (cost + shipping). For most items, this consisted of all listings that had enough of a description in the title so that I didn’t need to go to the item description. The median seemed more a reasonable choice than the mean, since it accounts for people who might have picked up items locally to save on shipping, and also accounts for a few outliers on the high end.

Visualizing data in Graph Builder

I decided to look at a slope graph that showed Target’s original price, the eBay price and the Lilly Pulitzer price. I knew this would be easy to do in Graph Builder, especially if I had my columns set up ahead of time: one for Item, one for “group” (categorical with Target original/eBay and Lilly), and finally price. Double-clicking on the legend in Graph Builder allowed me to quickly change the ordering to the way I wanted it. The resulting graph is below.


Not surprisingly, the eBay prices are higher than the original Target prices, but for the most part, they are not hitting the same prices as buying from Lilly Pulitzer. While the Target dresses are commanding a premium on eBay, they are still cheaper than buying them from Lilly Pulitzer. It’s also interesting to note that the flip flops in the pattern I looked at must be very popular, as they are going for more on eBay versus comparable ones on the Lilly Pulitzer website (unless my comparable just wasn’t very comparable).


Another way to look at this is to consider the Target prices in relation to the Lilly Pulitzer costs, to see if there’s some type of trend -- whether it be outperforming items, or whether items that are cheaper relative to Lilly to begin with see larger increases when sold on eBay.


From this, I found no striking differences. But the flip flops, shorts and scarf are seeing slightly bigger increases, while the tunic and girls maxi dress are not quite as much.

Final Thoughts

Unless you are desperate for the Target prints, it doesn't make sense to buy many of these items on eBay.  There’s not that big of a leap to simply buy the  Lilly Pulitzer products (with the exception of the dresses), or wait for one of Lilly Pulitzer’s big sales (where the items sell out within minutes as well).

While I was collecting the data, my wife pointed out that Lilly Pulitzer online doesn’t have items in XXL. Naturally, this begs the question if there’s a difference in the realized prices on eBay. I will look into this for one of the dresses in my next post… .Thanks for reading!

Post a Comment

Cleaning categories at scale with Recode in JMP 12

Data entered manually is usually not clean and consistent. Even when data is entered by multiple-choice fields rather than by text-entry fields, it might need additional work when it is combined with data that may not use the same categories across sources. Sometimes the same categories are spelled differently, abbreviated differently, or capitalized differently, or just miskeyed, resulting in many more categories and an invalid analysis when they are used in a model.

For example, consider the customer field in the “Cylinder Bands” data that can be downloaded from the UCI Machine Learning laboratory (originally from Bob Evans, RR Donnelly & Sons).

When you import this data, you will find that there are 83 unique values of "Customer". There are a lot fewer actual customers than that — it is just that there are multiple codings of the same customer. For example, the customer Abbey Press is coded in three ways: “ABBEY”, “ABBEYPRESS”, and “ABBYPRESS”.


In the early versions of JMP, we used tricks to recode the values. For example, we might copy “ABBEYPRESS” into the clipboard, then use a histogram to select rows corresponding to “ABBEY”, select the column “Customer” and then paste — which would change each “ABBEY” into “ABBEYPRESS”.

Once the Recode command became available, and we could just copy “ABBEYPRESS” and paste it into the other recode Values for “ABBEY” and “ABBYPRESS” in the Recode dialog.

All this works if you don’t have too many categories, but what if you have hundreds of categories? It can be very laborious to perform all those copies and pastes.

There are two very important ways that recoding is much better in the new Recode facility in JMP 12.

First, you can do a whole group just by selecting all the recode values and right-clicking to pick the chosen category. Now they appear as a group, all together, even if the original values were separated. No more copy and paste; it’s simply select and choose, with the results forming a visible group.



But the really amazing feature in the new Recode is the one that allows you to automatically find groups of similar values.


In this example, the feature automatically formed 16 groups. You can then check to see if the resulting groups have all the categories you want, but not too many.

How does it determine which values to combine? It looks at each pair of category labels and determines the edit distance between the two character strings. Also called the Levenshtein distance, it is the minimum number of edit operations to convert from one string to the other. When you run the “Group Similar Values” command, it brings up a dialog to choose which kind of changes to consider when calculating this distance and the criterion to use to call it a match. The default choices for this dialog usually work well.

Some of the groups were done perfectly, such as for “ECKERD”:

Because “ABBEY” is so different from “ABBEYPRESS”, it doesn’t satisfy the criterion to combine them, so you must select and right-click "Group" to do that.


Similarly, it didn’t catch the abbreviation of “CAS” for “CASUAL” in these two categories:


And it only got five of the six “HANOVERHOUSES”:




While you will need to check the results, the automated feature gets most of them.

After recoding, we have reduced the number of categories from 83 to 56.

Closer inspection reveals that other recodings are needed; in fact, the entries after row 503 seem to have switched from uppercase to lowercase. By control-clicking to the menu item “Convert to lowercase”, we can fix them all.



As you move into larger data tables that need cleaning up, it can be very helpful to have some automated features like “Group Similar Values”, and an improved user interface flow, such as selecting and grouping instead of copying and doing multiple pastes.

(The new Recode features, introduced in an earlier blog post, were implemented by James Preiss.)

Post a Comment

Exploring workout data history with Graph Builder

I have already posted in my fitness and food blog series about automating the import of my BodyMedia activity and food log data files via JSL and visualizing my data in JMP. I presented an e-poster on this project at the 2014  Discovery Summit US conference, which I've posted in the JMP User Community.

Unfortunately, I wasn't able to automate the import of the historical workout data I have tracked in written notebooks for years. I recently adopted the Full Fitness iPhone app to track workouts I do without my notebook in front of me. Full Fitness allows me to enter custom workouts and export my data to CSV. However, I've found it's quicker to type my older workout data into a table so I can copy and paste data for repeated workouts.

Now that I've dedicated some early morning hours to data entry, I have all of my workouts from 2014 and 13 of the past 17 Januaries saved in a JMP data table. What kind of information does this table contain? Most lines record unique exercise, weight and rep combination from weight training workouts. I record the date and name of the workout program I am following that day, and I indicate the number of weights I used (1 for a barbell, 2 for dumbbells). I created a formula column that uses this information to calculate a Total Weight Lifted metric for each row in my table.

total weight lifted formula

I note my starting and ending time or simply the duration of my workout, if I wrote that down. In the past, I tracked the duration of my cardio workouts, but since I don't train for endurance sports now, I leave it to my activity monitor to capture information on non-weight training activity.  Here is an example of how the table looks for the first workout I completed in 2014. (I'll explain more about how I created the Primary Body Part column in my next post.)

Workouts in a data table


Below, I used Graph Builder to create a multi-element chart to summarize my 2014 workout timing and volume information by day of the week. A box plot element on top shows information about the duration of my workouts, a heat map element in the middle displays the relative frequency of my workout start times, and the bar element at the bottom summarizes the total weight I lifted each day across all exercises.

You can clearly see from my data that I usually work out on Tuesday, Thursday, Saturday and Sunday -- and Friday workouts are a rare occurrence for me. I tend to work out earlier in the morning during the week, and slightly later on the weekends. I sometimes work out during my lunch hour, but usually only early in the week or if I miss a scheduled morning workout.

Workout schedule in Graph Builder in JMP

I found it interesting to compare my weekly workout patterns with the aggregated data from Up activity monitor users as shown on the Jawbone blog. Up users log the most workouts on Monday, and then workout logging declines steadily every day throughout the week. Like me, Up users log the fewest workouts on Friday. They suggested that perhaps this day-of-week effect could be attributed to higher motivation to work out after weekend indulgences. Post-weekend compensation might also explain users eating more "health foods" earlier in the week (the "Quinoa Monday" effect). I was excited to see that a similar percentage of female and male Jawbone users logged weight training workouts! Lifting weights has been shown to have beneficial effects on lipid profiles and resting blood glucose levels, and also appears to be a relatively low-impact form of exercise that can help maintain or improve bone density, which is crucially important for women as we age.

To recreate my workout summary graph in Graph Builder, you will need to first create a Day of Week variable.  You can quickly add this column to your data table by right-clicking on the column header of a continuous Date variable column and choosing New Formula Column > Date Time > Day of Week.  (This highly useful feature is new in JMP 12.) I recoded my day-of-week numbers into day-of-week abbreviations.  Then I dragged

  • Day Abbr (recoded Day of Week) to the X axis
  • Workout duration to the Y axis
  • Start Time to the Y axis, below duration so that it created a second section, and right clicked to change to a heat map element
  • Total Weight Lifted to the Y axis below start time, in a third separate section, and changed it to a bar element

I customized my color theme to a purple one by double clicking on the legend elements to edit their colors and themes -- since everything looks better in purple!

Stay tuned for my next blog post on this topic, where I'll show how I used the JMP 12 Recode platform to consolidate and clean up my workout data.

Post a Comment

5 reasons to catch Analytically Speaking with John Sall

JohnSallCenteredBack by popular demand, SAS co-founder and JMP chief architect, John Sall, is the guest on this month’s Analytically Speaking webcast on April 15. There are many reasons you don’t want to miss it, but here are my top five.

1. JMP 12 has just been released, and John is sure to show some very useful features that will keep your analysis in flow and make sharing results even easier. Though we’ve had many blog posts showcasing new JMP 12 features, it’s always a treat to see the chief architect of JMP highlighting new capabilities. One of the new options in Graph Builder is to “squarify” your treemaps. The treemap below shows the mix of titles from registrants for the last webcast with John. It will be interesting to see if it changes much — I bet we will see more data scientists.

Graph Builder

2. In this era of big data, you may be pleasantly surprised at how adept JMP is at dealing with really wide data. John and others in JMP R&D have made enhancements to JMP 12 to better enable the analysis of data even with hundreds of thousands of columns.

3. JMP recently celebrated its 25th birthday, so you’ll get to hear a little about the original impetus to create JMP and how it’s evolved over the years.

4. Last month, John kicked off the first Discovery Summit Europe conference by demonstrating some of the key capabilities of JMP 12. Attendees had the opportunity to flip through early copies of Peter Goos and David Meintrup’s new textbook, Statistics with JMP: Graphs, Descriptive Statistics and Probability. (Peter was a keynote speaker at the conference, and David was a conference steering committee member.) We'll be talking about John's impressions of Discovery Summit Europe and what motivated Peter and David to write the book.

5. Because this webcast will be live, you can ask questions. If you can’t tune in for the live webcast April 15, feel free to submit questions in the comments area below for consideration. You can watch the archived webcast, which usually posts the next day. We hope you’ll join us!

Post a Comment

Results of our designed experiment: Tasty iced tea

glass of iced tea

The designed experiment gave us a way to make crowd-pleasing iced tea. (Photos courtesy of Caroll Co)

In my previous post, I described an experiment that my wife and I conducted to find a method for making delicious iced tea with juice. The factors we looked at were these:

  • Tea type: black tea or oolong
  • Steep method: hot water vs. cold water
  • Steep time: short (5 minutes hot/4 hours cold) or long (10 minutes hot/8 hours cold)
  • Amount of tea: 2 tsp per cup or 3 tsp per cup
  • Juice: cranberry or apple
  • Juice proportion: 25% or 50%
  • Added sugar: 1 tsp or 2 tsp per cup of liquid

So, were we able to make some great-tasting iced tea? Fortunately, yes.

However, in doing the analysis, I realized that I should have been more careful in setting up the design. For those who don’t care to read all the details below, I'll tell you what our experiment suggested worked best: Half apple juice with oolong tea was the best combination, with the cold steep method potentially being better.

Performing the Experiment

While the kitchen did become a bit cluttered, we were able to complete the experiment. I randomized the numbers on the cups so my wife couldn’t easily associate batches of tea, and using the measuring tape for the response was effective.

The Analysis

If you recall the design setup, I only specified a main effects model to the Custom Designer. Looking at the color map on correlations, the design is orthogonal for the main effects, so I can estimate them independently… but I have some full aliasing between main effects in the whole plot and two-factor interactions from the whole plot.

The one that concerned me the most is tea type being confounded with steep method * steep time. The steep time is really just a placeholder that depends on the method, so in hindsight I would have included this interaction in the model – I guess that’s what follow-up experimentation is for, although I didn’t have that luxury this time.


Fitting the main effects model, I see that apple juice and using more of it show up as significant. I also noticed that oolong tea and cold steeping have larger effects.


After doing some model exploration, I see that a much simpler model comes from using tea, juice and juice proportion:


It is interesting to note the juice*juice proportion interaction, since it was partially aliased with steep method in the original design. With the interaction in the model, the effect of steep method becomes rather small. We’ll still investigate steep method if/when we do some more experiments, but the fact that both main effects from the interaction are significant suggests the larger effect in the main effects model was mostly due to the interaction.

Final Thoughts

tea pot with tea leaves in it

I would have been more careful with the design if we had had more time. But we still got useful results.

If I wasn't pressed for time, I would have been more careful with the design that we ran, either by adjusting the number of whole plots or ensuring the confounding happened between two-factor interactions. However, it was still better than no experiment at all, since we had no idea whether we would even be able to come up with an iced tea to consider serving, so the results were pleasantly surprising.

We did cold steep oolong tea overnight and then used the 50% apple juice proportion. Based on the results, we also used less tea and sugar than we might have otherwise, since they didn’t have a large effect. We went through 2 gallons for 10 people, with other drink options (including apple juice all by itself), so it must not have tasted all that bad.

Any future kitchen experiments you would like to see? Leave a comment below and let me know. Thanks for reading!

Post a Comment

Creating a better iced tea with design of experiments

iced tea in a glass and a small bowl with tea leaves

What's the best way to make delicious iced tea with fruit juice? (Photo courtesy of Caroll Co)

My wife is a huge fan of tea – to the extent that our kitchen has two shelves dedicated to it. For a recent family gathering, she wanted to clear out some tea by making a large batch of iced tea (i.e., to allow more room in the cupboard to buy more tea). She had served iced tea at previous gatherings to rave reviews, but wanted to try something different by making iced tea with added fruit juice after trying such a concoction at a café.

Her own attempts at fruit-juiced iced tea had mixed results, so she was nervous about making a large batch if it wasn’t going to turn out well. We were also in a time crunch – we had only one day to figure this out. What better way to spend a day off than running a designed experiment?

The Factors

For this experiment, we used what we had on hand at home to create an iced tea that (hopefully) tastes good enough to serve to guests. While I couldn’t promise my wife that we’d find a usable recipe, I was confident enough that I convinced her to be the official taste-tester. The factors we considered:

  • Tea type: black tea or oolong
  • Steep method: hot water vs. cold water
  • Steep time: short (5 minutes hot/4 hours cold) or long (10 minutes hot/8 hours cold)
  • Amount of tea: 2 tsp per cup or 3 tsp per cup
  • Juice: cranberry or apple
  • Juice proportion: 25% or 50%
  • Added sugar: 1 tsp or 2 tsp per cup of liquid

The first four factors are based on the brewing of the tea, while the last three are related to the tea itself. This means that if I make bigger batches of tea, I can split these batches up and vary the last three factors. This would make the first four factors hard-to-change, and the last three easy-to-change (since I can just vary those on a measured cup). On the other hand, there’s nothing stopping me from varying all seven factors for each cup, so how to decide what to do?

Whole Plots and Run Size

We knew that the final tasting would be done with a Styrofoam cup, so in theory there’s almost no limit as to how many cups she could taste. However, even taking a few sips at a time, we didn’t want to overwhelm her with too many choices. We decided on 16 cups as a reasonable number to use.

We also realized that in order to make a batch of tea in a teapot and let it cool posed a limitation in the number of containers/space we had available, so we went with eight batches of tea – i.e., 8 whole plots. Each of these batches would be used for 2 cups of iced tea, for a total of 8*2 = 16 cups (i.e., runs).

The Response

We wanted to measure the taste, but how could we differentiate between the 16 cups of tea? We could try a forced ranking, but there may be subtle differences that would make it difficult to distinguish, and large gaps may exist between the good-tasting and bad-tasting teas. The nice thing is that she can take small sips, so it’s easy enough to taste everything more than once before deciding. In the end, we laid out a tape measure on the table, with my wife placing each cup somewhere on the tape measure relative to how it tasted compared to the others. The worst would be 0, best would be 10, and all other scores based on their final placement.

The Experiment

We decided each batch of tea would be made with 2 cups of water. All the teas were chilled in the fridge so they had the same starting temperature. Each Styrofoam cup would end up with one cup of liquid (tea and juice mix).

Below is what my Factors Table looks like (note the first four factors are hard to change) in the Custom Designer.

I created the design using main effects only, and in the Design Generation outline, I specified 8 whole plots (batches of tea), and 16 runs (cups of tea).


I created the design using main effects only, and in the Design Generation outline, I specified 8 whole plots (batches of tea), and 16 runs (cups of tea).


Check back tomorrow for my next post, in which I'll reveal the results of this experiment.

Is there any special way you like to make iced tea? Leave me a comment below and let me know.

Post a Comment

Graph makeover: Fractal scatterplot

One of the marvels of the Internet is the Online Encyclopedia of Integer Sequences (OEIS). Started 50 years ago by Neil J. A. Sloane (pic) as a graduate student, the repository now contains more than 256,000 integer sequences run by a non-profit OEIS Foundation with Sloane still at the helm. (Aside: I wanted to donate $1,000 to the foundation so I could have my name next to Donald Knuth's on the short list of $1,000 donors, but my wife provided a voice of reason as usual.)

Why do we need an encyclopedia of integer sequences? If your study produces a sequence of integers, you can use the encyclopedia to see what the next term is or what else is known about the sequence. For (a contrived) example, if you're counting carbon trees and get counts of 4, then 9, then 18, then 42 trees when you add more and more atoms, you can look up "4, 9, 18, 42" in the OEIS and find sequence A000678, which has more information and even a formula.

For the 50th anniversary, the foundation produced a poster featuring nine visually interesting sequences. One, A229037 the greedy sequence with no three equally-spaced items forming an arithmetic sequence, was this scatterplot:


The graph was automatically generated by the website, and the interesting aspect is that the sequence appears to have a fractal structure in this space in that each cluster of points is a larger version of a previous cluster. However, there are two artifacts of the graph that obscure that information. There is serious over-striking of points, especially at the low end. Also, the clusters appear to get more sparse just because the spacing is growing, but the dots are staying the same size. As an attempt to remedy those artifacts, I made a version with translucent and variably-sized dots.


I think it's easy to see the fractal structure now. When I shared it with Sloane, he promptly added it to the sequence's web page and plans to update the poster as well. By the way, this sequence has a graphical explanation that hints at why it has a graphical structure: No three points in the scatterplot fall along the same line with equal spacing.

I'll leave you with a puzzle involving another sequence from the poster A250001. It's the number of ways to arrange n circles, ignoring size and not allowing single-point intersections. This image from the poster shows 7 of the 14 ways of arranging 3 circles. Can you find the other 7? There are 168 ways of arranging 4 circles, and no one knows how many ways there are to arrange 5 or more circles.


Post a Comment

Graph makeover: 3-D yield curve surface

A couple weeks ago, The Upshot section of The New York Times produced this "glorious" interactive 3-D graph of the last 25 years of US Treasury yield curve data titled "A 3-D View of a Chart That Predicts The Economic Future: The Yield Curve."


The graph is very appealing at some level and comes with well-done animated flyovers to highlight some interesting features. Commentaries at Flowing Data and Visualising Data have been mostly positive. However, I'm always suspicious of 3-D views because of the extra step needed to translate values accurately in our minds and, for surfaces, the danger of missing information that's obscured. This graph works more as a backdrop for drill-downs into slices of interest rather than as a standalone data representation. While I think it adds value as a context, there is also a complexity cost to consider, and it's worth exploring other views.

Getting the data was refreshingly easy in this case. The US Treasury Department provides the data in an HTML table, and the Import HTML feature iin JMP brings it into a data table nicely. Though there are more than 100 HTML <table> elements in the web page, JMP correctly identifies the one that contains data (the others are likely used for page layout). The only glitch was that the date values use two-digit years. Fortunately, JMP has a preference for how to interpret two-digit years, and after setting it to treat "90" as "1990," the dates come in correctly.

First, I'll try a 3-D surface view in JMP for comparison. Though the original looks beautiful in many ways, one feature I thought strange for a surface graph is the way the loan term lengths are treated categorically. That is, the spacing between 1-month and 3-month rates is the same as between 20-year and 30-year rates. I've seen yield curves drawn both ways, and it usually doesn't matter too much since the curve is often simplified to one of three states: rising, level or inverted. But given the context of the graph's title about “predicting the future,” it seems reasonable to look at the term length as a continuous value (that is how far into the future we're predicting).

Here is the surface plot in JMP. I could play with the lighting and smoothing, but this lets us get a sense of the effect of a continuous representation of the term length.


The call-outs of the original piece focus on the three possible 2-D profiles. Looking at the rate versus the term length with a separate curve for each date (yellow to red) produces an attractive view, even if not very informative.


With the coloring, we can sense the downward trend over time though we miss the dips, which are obscured. Possibly this could serve as a backdrop if a few years of interest were highlighted and labeled.

Here's the same view with only one out of every 40 days shown. At least, we can get sense of the older low rates, which were previously obscured.


Another way to slice the cube is to look at each term length's rate over time. This graph of two term lengths representing short-term and long-term rates over the last 25 years in 2-D gives a clearer view:


To me, this 2-D view is clearer than the same 2-D profile within the context of the 3-D view. It's easier to see both the steady declining trend in the long-term rate and where the short-term rates were higher than the long-term rates. Another embodiment of my favorite maxim, "Less is more."

Finally, here is a reproduction of the heat map of date versus term length and using the interest rate as the color. The cut-out for the missing 30-year rates in the mid-2000s is a good application of the "alpha hull" feature added to contour plots in JMP 11.


I usually like heat maps for 3-D data, but this one doesn't seem very informative. Maybe it's the amount of variation in the rate or the irregular spacing of the term length values, but it's harder for me to get a good sense of the data from this view. I think the core issue is that the interest rate is too important to be represented by color alone (necessarily imprecise).

One benefit of remaking graphs like this is you discover some of the many decisions the designers had to consider when making the published view. A few substantive decisions for this data:

  • Continuous versus categorical term length.
  • An appropriate level of smoothing, since there were too many days in the history to show every value.
  • Dealing with gaps in the data.
  • Deciding which of many interesting data features merit call-outs.

I saved my work as a JMP script (uploaded to the JMP User Community), so I could redo it easily with, for instance, new data or new smoothing parameters for experimentation. It takes a little more effort to create a reproducible script from an interactive data exploration, but I'm finding the practice to be rewarding.

Post a Comment

Q&A with market research expert Walter R. Paczkowski

Last month, we featured consumer and market research expert and Founder of Data Analytics Corp. Walter R. Paczkowski on our Analytically Speaking webcast series. If you missed the live webcast, you can still view it on demand. Host Anne Milley took many audience questions, but was unable to get to all of them. So Walter graciously agreed to answer some of them in this Q&A.20150318_121214

Questions: (a) What is your approach to building hypotheses to be tested ahead of an analytics project? (b) Do you find that analytical work for B2B segments are much harder than B2C segments because there can be so many factors in B2B that cannot be put into a model?

Answer: The responses to these first two questions are similar, so I'll answer them together. This is where the upfront qualitative work becomes an important part of the overall research design. Remember, there are two phases I advocate for most projects: qualitative followed by quantitative. The qualitative phase helps set the parameters for the quantitative phase. We generally don’t know all the parameters needed for the quantitative phase – key factors or attributes, levels for the factors, correct wording, important concepts, just to mention a few. The qualitative research, focus groups or one-on-one in-depth interviews with subject matter experts (i.e., SMEs) or key opinion leaders (i.e., KOLs), helps identify them. This makes the quantitative phase more focused and powerful.

What does this have to do with hypotheses and B2B factors? Hypotheses are just parameters, no different from a list of factors or attributes to include in the quantitative phase. Discussions with consumers or SMEs or KOLs can help formulate hypotheses that marketing personnel may never have imagined.

The same holds for B2B modeling – or, in fact, for any modeling for B2B, as well as B2C or B2B2C. If the list of factors is large, then seek help from SMEs and KOLs. They’ll help tell you what is important and what can be ignored. But this is just upfront qualitative research.

Question: Do you think organizations have a balanced approach to creating value from both the found data and the more information-rich data to be gained from well-designed surveys and experiments?

Answer: I’m not sure about “balanced,” but the use of both types of data is definitely there. Since I do work across a wide range of industries, I see many practices, the best and worst, which I talked about with Anne. Many of the large organizations, the sophisticated ones I mentioned in the interview, use these two sources of data to answer their key business questions and understand their markets. These are the ones who follow the best practice of using the right tools – the tool being the type of data in this case.

Over the past few years, I’ve presented workshops on choice modeling, a great example of an experimental approach, and working with Big Data, as I mentioned in the interview. Not only have they been well-attended, but I noticed that many attendees were from the same company, different divisions but nonetheless the same company. So the use of both types is there – I have the data!

Question: When pricing is a factor in a choice experiment, how well does the optimal price indicated by the experiment correspond to what the actual best price should be in the field?

Answer: This is a great question. And hard to answer. First, the last part of the question asked about “what the actual price should be in the field.” This is the whole purpose of the study – to find that price. I think what the question is really asking is whether or not the study replicates current existing prices. That can best be determined using the Profiler in JMP or a simulator by setting the base case to the current actual conditions. But these conditions won’t match exactly what’s in the market since market prices are driven by many other factors that the study can’t handle. Nonetheless, the study should come close. So look for the Profiler to help on this issue.

Post a Comment

John Tukey on the rule of zero-origin scales

I saw the following post recently on Twitter:

Eric Jonas @stochastician Mar 16
There’s basically never a reason to start the y-axis of your comparison graph anywhere besides zero.

It generated several dissenting replies, including one from me. Coincidentally, I had just reread part of John Tukey's classic book Exploratory Data Analysis (1977), in which he shows a good counter-example to that guideline. His example comes from a discussion introducing a variation of a box plot called a "schematic plot." He introduced the general box plot in 1969, and the schematic plot refined the box plot with a specific set of rules for the whisker lengths and outlier displays, which has always been the default in JMP where it's called an "outlier box plot."

The example in question uses Lord Rayleigh's measurements of the mass of nitrogen. It's a very small data set by today's standards, and Tukey nicely lists the data in his book. I also checked Lord Rayleigh's publication, On an anomaly encountered in determinations on the density of nitrogen gas, 1894 which contains a few more observations and a couple minor differences from Tukey's data. I've attached CSV and JMP versions of the data set in my JMP User Community post, with Tukey's data in a separate column. Here's an excerpt from Lord Rayleigh's paper:


Importantly, the measurements record the details about how the nitrogen itself was produced. The graph below shows the recorded weight (in grams per "globe") versus its source and the purifying agent.


The main difference is whether the nitrogen comes from air or not, which is how Tukey shows it. Here are some of his text and figures.




Although Tukey is comparing summary views (box plot vs. mean bar chart), his point holds for raw data as well. Here are JMP scatterplot versions of those plots.



It turns out that Lord Rayleigh's "nitrogen" from air also contained other elements unknown at the time, and the small differences led to the discovery of the element argon, for which he won a Nobel Prize.

So while a zero scale is often wise for comparison graphs, there is no substitute for making an intelligent choice. As Tukey suggests, the zero-origin plot doesn't make the case for a Nobel Prize.

Post a Comment