Monday, October 5. 2009JMP Is 20 Years Old
Today is the 20th anniversary of JMP's first release, and I want to thank everyone who has helped to make JMP a success.
JMP Version 1 shipped on October 5, 1989 -- or as we claimed at the time September 35 -- so that we could say we shipped in the third quarter of 1989, our goal. JMP started as a research project in the late '80s. In the earlier part of that decade, we had spent several years rewriting SAS completely (but compatibly) to fit on personal computers. But by 1988, we felt three big forces, which can be characterized by:
As for the Vehicle, SAS was becoming a large enterprise-scale product -- a larger investment than some users, like engineers and scientists, were willing to handle. We were producing analytical trucks, but there was a market for analytical cars, i.e., something with low investment and ease of driving. We needed a more personal-scale tool, one for the desktop project rather than for the enterprise system. As for the Roles, statistics itself was seeing the opportunities in exploratory techniques, and the value of graphics and interactivity. The statistics profession had been molded as a testing discipline, a role like a lawyer whose job is to prove things that we already knew. What was missing was the exploratory role, like a detective, whose job is to discover things we didn't already know. Especially since John Tukey's Exploratory Data Analysis and the improvement of statistical graphics, statistics needed to serve in the detective role as well as the lawyer role. Graphics was the key enabler of seeing patterns, and points that don't fit patterns. As for the Technology, the graphical user interface arrived with the Macintosh, and later, Windows. It is a huge difference to just point and click rather than look up and type. Applications written for batch computing through languages were not suited for graphical interactivity. It was time for some fresh design. In response to these three forces, we formed a small group to put something together. In a year and a half, we released Version 1 of JMP. This was a very small product compared to the JMP of today, but it had all the basics of statistics and graphics, with many innovative features. We thought "jump" was a name to suggest a big step into a new future, a product that jumps in responsiveness to the mouse, and a tool that enables our customers to do the experiments and make the discoveries to take huge strides in their products and processes. In the early years, we learned important lessons. We learned that engineers and scientists were our most important customer segment. These people were smart, motivated and in a hurry -- too impatient to spend time learning languages, and eager to just point and click on their data. We had a product that was nearly as easy as walk-up-and-use with enough delights to hold their loyalty. We learned that engineers need design of experiments (DOE), quality and productivity support (Six Sigma), and reliability modeling. We made sure we got better in these areas -- particularly DOE. We thought that engineers should be able to just ask the computer to custom-make a design that fits their needs rather than attempting to find a pre-built design that works. We learned how to port to Windows. We made JMP work on Windows with release 3.1, using the Altura library. This was a quick effort. Soon we were busy rewriting the whole product in a different implementation language with a portability host-interface layer, which led to a wait of more than three years before Version 4. Version 4 not only switched languages, but also introduced a new nervous system for the product, including the JMP Scripting Language. In the last few years, JMP has matured considerably. The big driving force has been in meeting the needs of those users we talk to, who correspond with us, who sometimes invite us into their sites. We have a very dedicated group of users who keep us directed, and help us serve more and more researchers every year. Recently, I heard the group of passionate JMP users termed the “JMPerati,” analogous to Stephen Baker’s term, the “numerati.” JMP has broadened to become more versatile. JMP now supports business visualization in partnership with SAS Business Intelligence, and this in turn has encouraged us to introduce more visualization platforms, like the drag-and-drop Graph Builder in JMP 8. JMP can now handle larger problems because of work we have done to multithread many of the bottleneck methods and to implement JMP on 64-bit systems. And we now work with various SAS teams on projects in several areas, collaborating and sharing efforts. JMP is 20 years old, but it seems like it is just getting started. We are growing fast. Last year, our business grew faster than ever, and we are set up to grow even faster in the future. Happy birthday, JMP, and thank you, everyone, for your contributions to JMP's success. Monday, April 6. 2009Not Really an $11 Trillion Hole
The front page of the Wall Street Journal on March 13 highlighted an "$11 Trillion Hole" and said "Americans See 18% of Wealth Vanish." I looked at the chart, and the 2008 number indeed looked as if it had fallen off a cliff.
But then I looked at the rest of the curve and remembered the two big bubbles that were going on, the Internet Bubble in the 1990s and the Housing Bubble in the 1990s and 2000s. I thought I should just exclude those points from the long-term trend. So I got the data from the Federal Reserve's Web site and tried to reproduce the Wall Street Journal plot, adding a trend line that excluded the bubble points. So how does our current net worth look with respect to the long-term trend? Not bad at all. We are not in a $11 trillion hole but are back on track after some roller-coaster years. ![]() Legend: green = used to estimate the regression line, 1985 to 1996 red = the points in the bubbles blue = the current value that was the subject of the Wall Street Journal article I don't want to deny in any way that we are in an economic crisis. But I do want to remind everyone that portfolio valuation drops are not quite as bad as they seem if you consider that the last few years of huge yields were somewhat artificial, and just returning to normal valuations will look like a crash. Sources:
Posted by John Sall
in Data Visualization, JMP - General, Statistics
at
09:57
| Comments (3)
| Trackbacks (0)
Thursday, January 15. 2009Optimal Design of the Choice Experiment
My previous blog post covered issues in the design of a choice experiment for laptop computers. The goal was to model the trade-offs among features and price. In this post, I'll show how to design a choice experiment.
The Choice Design feature, which you access from the DOE menu in JMP 8, designs choice experiments. This platform was developed by Bradley Jones, with help from Chris Gotwalt. The first job is to enter the factors in the experiment. After adding the factors and specifying the levels, the window looks like this: ![]() Next, we specify the model. This has to be a small experiment, so we just take the default main-effect model. ![]() Next, we fill in the Prior Specification. Remember from my previous post that the optimal design depends on what the answer is, and we don’t know the answer. Actually, we already know a lot about the choices. We already know that people want large disks, higher speeds, longer battery life and lower price. The experiment measures the relative strengths of these characteristics; it measures trade-offs, particularly the trade-off between price and features. The response is in the positive direction, utility. Notice that all the factor levels are ordered so that the least desirable levels are first and the most desirable levels are last. Now we can tell the designer that we know the direction of these levels. We do this by entering a prior mean. We say that 80 Gig (GB) is worth 1 utility unit more than 40 Gig. We say that 2.0 GHz CPU is worth 1 utility unit more than 1.5 GHz, etc. Of course we don’t really know the magnitude of these, and the uncertainty of that is expressed in the Prior Variance Matrix, with 1s on the diagonals. The convention is that if the first level is less desirable, then you enter a negative value, as we do here. When there are three levels in increasing utility order, enter negative, then 0. Actually, it doesn’t matter whether we enter the levels in the right order for the parameterization as long as the ordering is consistent across levels. ![]() This Prior Specification is important in experiments like this, where the factors all have known preference directions and the goal is to measure trade-offs. If we didn’t specify this, then we could easily get choice-set items where one choice included all of the better factor levels and the other choice included all of the worse factor levels; in such a case, the choice response would be trivially obvious, and the run would be wasted. Now we specify the rest of the experiment we want: ![]() Suppose we have 16 subjects lined up to take the choice survey. We figure that each subject has the patience to do six comparisons. Each choice set will be two profiles — we could ask people to choose among more, but that is more work for the subject — two is standard. We choose to do two survey sets. This is a compromise between giving everyone the same questions and giving everyone his or her own separate survey with separately designed choice sets. The total number of subjects is the product of the last two specifications (2*8=16). The total number of choice responses is the product of the last three specifications (6*2*8=96). Now there are two levels of design data here. There are the profiles that go into making each choice set. There are two profiles per choice set times six choice sets per survey times two surveys, making a table of (2*6*2) 24 unique choice profiles. ![]() This structures the factor-level data so that you can prepare the raw material for the survey. Then there is the subject-level data for the responses, showing which subjects get which survey and having a slot to enter the response for each choice trial. Here are the rows for the first two subjects. The first subject is taking Survey 1, and the second subject is taking Survey 2. ![]() The Choice1 and Choice2 values index the Choice ID value in the Profiles table that matches the Choice Set ID. For example, in row 10, Choice1 is Choice ID 1 for Choice Set 10 in Survey 2, which is Row 19 in the Profile table (80 Gig, 1.5 GHz, 4 hours, $1,000), where the other choice is the next profile in Row 20 (40 Gig, 2.0 GHz, 4 hours, $1,500). Why have two tables instead of one? It turns out that you have a choice of one table or two. Let’s see whether this design follows the guidelines. Every choice must be a trade-off of desirable alternatives:
This tests whether you are willing to pay $300 more to get two more hours of battery life even if you also have to sacrifice speed. Trade-off of $300 and speed for battery life.
Are there any degenerate choices (i.e., where the choices are equal)? No. That's good. For each factor, do we have choices where that factor is constant (so that a dominant factor can’t prevent the other factors from being measured)? Well, no. Price is always different in each choice set, so if price is totally dominant, we can’t measure other effects. If this is a concern, then we need to go back to the Design Generation field and change 4 to 3 in “Number of attributes that can change within a choice set.” How about the polarity question? Polar factors should always have a mixture of polarity. That means the trade-offs should always be meaningful, not just all-good versus all-bad. This is where the Prior Specification works well. All of the choices are working pretty hard to measure values of interest. No choice is uninteresting. Now we have an experimental design. Thanks to Brad Jones for this example. Tuesday, December 16. 2008Choice Experimental Designs Are Different
Laptop vendors need to know which features are valued in a laptop and how much customers are willing to pay for them. Manufacturers could learn this through a market research technique know as a choice experiment. This post covers the elements of experimental design for choice experiments using JMP 8.
But first, I've got to give credit where it's due. Both the R&D and examples used here are the work of Bradley Jones and Chris Gotwalt, who implemented the techniques in JMP 8. So let's design a choice experiment to figure out how valuable a number of features are to customers. In particular, we focus on the following:
In the experiment, a subject has to choose between two configurations, selecting the configuration that is most appealing. For example, one choice might be between an expensive but full-featured laptop, and a cheap but feature-compromised one.
Suppose we used an ordinary experimental design for use in a choice experiment. Each choice could be specified like a block in a traditional design. Would that be a good idea? Consider the following choice between a and b:
These are runs in which the factors have the same values. There is no real choice here; there is no information to be gained because the choice is arbitrary. Thus, we have our first rule of choice experiments: Guideline: In choice experiments, there are no within-block replicates, i.e., the alternatives tested have to be different in order to learn something. This is quite different from, say, an industrial response surface design, where replicates are valuable in estimating factors with greater precision, in getting a more precise estimate of experimental error and in getting an estimate of pure error for a lack of fit test. So choice designs are different from experimental designs. Now let's consider a situation in which one choice outweighs all others. This could easily happen if the factor levels are spread out much more in one factor than in another, relative to the situation. For example, suppose that the high price was so high that no one could ever choose it, despite any other factor values. Now consider running the typical classical design, in which all the factors vary within blocks. Suppose each subject is given three choice questions, each with two choices.
This design can tell us a lot about a dominant factor, like price. Can it show anything else? If price dominates the decision, the user doesn't even have to look at the other factor values to make a decision. The other factors are not even measurable, other than being smaller than the price effect. You sacrifice learning about other factors. To fix this, we need to keep some factors the same within a choice set for some of the trials. Guideline: You must have choice sets where one factor is constant across the choice set, for each factor. There is another reason that you shouldn't vary too many factors across a choice set: Subjects get too confused and fatigued when there are two many differences for them to evaluate the trade-offs. If the two choices are very different, the choice will tend to look like a choice between two very different things; we say it is like comparing apples and oranges -- they are each good or bad in their own way, and they don't really compare against each other. Guideline: Never vary more than three or four factors at most across a choice set. There is another problem with ordinary designs. Consider the following choice set for a laptop:
This choice is going to be easy for the subject. There are no trade-offs to make. The laptop experiment was built for trade-offs because all the factors have a naturally preferred level. Faster is always preferred to slower. Larger disk is always preferred to smaller. Longer battery life is always preferred to shorter. Cheaper price is always preferred to more expensive, other things being equal. This experiment has all polar factors. Thus, this choice set doesn't tell us anything we don't already know. This choice set is an insult to the subject. Yet traditional experimental designs will produce runs like this. Guideline: Polar factors should always have a mixture of polarity. No choice set should have all the polar factors set in the same direction within choices. There are other issues with some surveys:
All these rules sound pretty obvious in retrospect, right? The irony is that these considerations are not usually followed, especially the last few. Often market researchers use the same design of experiments (DOE) software for choice experiments that they use for industrial experiments, thinking that DOE is an abstract and general concept that is the same in every situation. It is not the same. Choice experiments are harder to design well. The one good approach to the very specific needs of each choice experiment is to use the tools of optimal experimental design, but adapted to the specific needs of choice experiments and to the specific needs in each individual situation. The technique of optimal design in general arranges factor settings in runs so that the most is learned from a given number of runs. In learn models, the optimal arrangement is invariant to what the actual parameter values are, so the situation is straightforward. It turns out that choice models, which are fit with a specialized kind of logistic regression, are not linear in the parameters, so the optimal design depends on the true value of the parameters, which is unknown. So some range, or prior distribution, of the parameters is used to represent the the range you need to consider. The optimization of the design for this is fairly difficult, involving integrating out prior densities to create the optimal design. [Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: a re-view, Statistical Science 10: 273-304.] But JMP was able to take what it had learned for Nonlinear DOE and apply it to Choice designs [Gotwalt, C., Jones, B. and Steinberg, D. (2009) Fast Computation of Designs Robust to Parameter Uncertainty for Nonlinear Settings accepted at Technometrics. ] So a good experimental design for a choice experiment is different from that for other experiments, and optimal design techniques can handle them best. In my next post, we'll see how the laptop experiment was actually designed and run. Monday, August 4. 2008Experiments on Experiments, Models of Models
(NOTE: This is part three of three-part series on stochastic optimization.)
Over the last two weeks, I introduced robust process engineering and stochastic optimization – the effort to achieve good product in the face of variation among the factors. Last week, I gave a cooking example. This week, I present a solution to the optimization problem. In-Silico Surrogate The inspiration for the solution comes from the world of computer experiments, also called in-silico science. Suppose you want to build the optimal passenger jet. You have factors like wing length, wing pitch, engine size and body composition, and you have responses like fuel economy, passenger volume, noise and speed. You create an experimental design with 64 runs, and you are ready to go. No problem. Each plane will cost around $85 million, so that makes the experiment cost around $5.44 billion. Oops. Your experimental budget is just $40,000. What do you do? You don’t build those planes. You create computer models of them and run those computer models to determine the performance characteristics. The planes are flown in-silico. If the models are good, they will report responses that are reasonably close to the real values. So you could, theoretically, optimize the characteristics using these models. But those in-silico models are expensive, too. Each run for the computational fluid dynamics could take hours on a supercomputer. So you develop what is called a surrogate model, or meta model. You make a space-filling experimental design that samples 100 or more factor combinations in the factor space. Then you carry out the runs on the supercomputer. Then you fit an interpolation model to those points, and now, instead of taking hours for each point in the visualization or optimization, the interpolation model takes a fraction of a second to evaluate. We now have a three-stage model for a computer experiment: Real World << Expensive Computer Model << Cheap Surrogate Model So let's return to the stochastic optimization problem. 'Cooking' a Chemical Process We needed to determine whether we should “cook” a chemical process “hot and fast” or “warm and slow.” If the factors could be fixed, the hot and fast settings would be best. But because the factors are subject to variation, we already know that “hot and fast” is not a good setting; the variation will cause about 4 percent of the batches to yield below the minimum of .55, and those batches will have to be discarded. So now we develop a surrogate model of the defect rate: Chemical Process << Model + Variation Simulaton << Surrogate Model for Defects Though our model of chemical yield is cheap and easy, the model of the defect rate in the presence of variation is not cheap and easy, and it is obtained through Monte Carlo simulation. We generate 10,000 runs of random factor data at given factor center settings; we then calculate the yield and then develop a defect rate based on the portion of the simulations that fall below the lower specification limit. We already know defect rates for two points:
Remember that these are not fixed factor settings, but centers of the distribution of the factor settings, which have underlying variation with standard deviations of 1 and .03, respectively. Those defect rates are just estimates based on simulation. If you do a new Monte Carlo simulation, you will get slightly different values. The Surrogate Experiment So now we need to systematically vary the centers of the temperature and time distributions according to a space-filling experimental design. We use space-filling experimental designs because we expect a complex surface, and we can afford to investigate that surface. The workhorse space-filling design is the Latin Hypercube. These are easy to make. You just make an evenly spaced set of values for each factor and scramble them individually. The result will have a uniform distribution across each factor and at least a random joint distribution. The JMP Design of Experiments platform actually optimizes the scrambles to fill the space better. If the runs are computer experiments, you don’t have to worry about randomization and replication because there are no outside factors to randomize against. The Profiler Simulator in JMP has a built-in feature called “Simulation Experiment” that makes all this very easy. It prompts you to enter the number of runs and to identify the portion of the factor space you want to investigate (around current settings), and it performs the simulations and estimates the defect rates. In our case, we will ask it to run a computer experiment in 80 runs across the whole factor space. This is a lot of work. For each of 80 defect rate estimates, the software does 10,000 runs. Fortunately, computers are fast, so this takes less than a half a minute. Here is how the space-filling design arranges the points and what the defect rates are at each point. ![]() Now we need to model this defect surface. The emerging standard fitting technique for computer models is the Gaussian Process model. This model essentially calculates a weighted average of the neighboring points to predict each point on the surface. (Kriging and radial basis function neural nets are close relatives of Gaussian Process models.) After we fit the surface, we now call the optimizer to find the minimum on this surface. ![]() Now we know that to minimize defects, we cook it warm (526 degrees) and slow (.287) -- the opposite of the optimum for fixed-factor settings, which was hot and fast. The log10 defect rate predicted is 10^-3.206, which is 0.000622, clearly much smaller than the 4 percent defect rate at the fixed-factor optimum. This is a cross-section at the minimum of the surface that looks like this: ![]() Now let’s use simulation again to see whether the defect rate holds up to this prediction. ![]() The actual rate in this simulation is .0007. We have dropped our defect rate from 4 percent to .07 percent, which is one-sixtieth of the defects from the previous settings. How about the average yield? Before, the average yield was .602; now it is .595, a small sacrifice to pay for the decreased variation. Conclusion This new technique worked when previous techniques -- which involved finding the flats -- didn’t work. Not only did it work, but it also enabled us to build an understanding of the defect rate behavior as a separate response surface that can be visualized, as well as optimized. What about the older techniques? If the variation is small relative to the curvature in the response surface, then local methods using the derivatives still work well. If the variation is large enough to be affected by the curvature (second derivative) of the response surface, then you need to switch to simulation experiments. With surrogate models, we now have a great new way to do stochastic optimization. Now we can tune our processes to be robust to variation in the factors, improving quality and reducing waste. Monday, July 28. 2008Cooking Optimization: Should You Cook Hot and Fast, or Warm and Slow?
(NOTE: This is part two of a three-part series on stochastic optimization.)
In my previous post, I introduced stochastic optimization. In this post, I show a real example. This example was reported in the classic text by George Box and Norman Draper: Empirical Model-Building and Response Surfaces (page 32), and JMP's Statistical R&D Director Brad Jones noticed that it works as a great robust process engineering example. Imagine you are doing serious cooking, but instead of making food, you are cooking up chemicals, perhaps even a life-saving drug. Your cooking pot is really a chemical reactor, and people are going to depend on your product to save lives. The reaction that cooks your chemical product has two big controllable factors:
The reaction you make converts the initial ingredient, A, into the chemical you want, B. But if you cook it too hot and long, the B that you make will turn into another chemical, C. Here is the picture. Remember that we want to maximize the green B, and minimize the blue and red, A and C: ![]() This fits a classic optimization framework that is certain to have a nice optimum that maximizes the yield of B. Here are the formulas as they are in the JMP table. The yield formula is a function of time and the reaction rates; the reaction rates are also formulas, functions of the temperature. We don’t even have to estimate the parameters theta1 to theta2; they are already known. The reaction temperature is already in Kelvin, so these are basically Arrhenius-type models, well-known to chemists. Yield ![]() k1 ![]() k2 ![]() So let's optimize. In JMP, we use the Profiler to visualize cross-sections of the response surface for yield, and we use a command there to find the settings that maximize yield. Here, we see that we must cook hot and fast to maximize yield at .621 (temperature at 539.95 degrees, time at .1158). ![]() Another perspective, using horizontal cross-sections, is available with the contour profiler, where we can see various combinations of temperature and time that will produce good yield of at least 60 (unshaded) or 61 (inside the red contour line), with the crosshairs at the optimal settings to produce a yield of .621. ![]() But we can’t really control the temperature or the reaction time exactly. The temperature and time vary, at least in a production situation. Suppose that the standard deviation of temperature is 1 and the standard deviation of time is .03. In the contour plot, that is represented by the black ellipse, which would contain 95 percent of the variation in the two factors. Notice that the variation on time is going to mean that many batches will fall into the pink zone and fail to achieve even a yield of 60. How bad will it be? The Profiler has a built-in simulation facility, so we enter the standard deviations there and click the Simulate button. ![]() We have a lower specification limit of .55 for yield, which the Profiler's simulator shows as a red line on the histogram. If a batch fails to achieve .55, it must be discarded. At the current settings for the center of temperature and time, it is producing 4.2 percent bad batches. That is not good. Let's try other settings. Suppose that I lower the temperature to 535 and then set time to the point that maximizes yield for that temperature. There, my defect rate goes down to around 1.9 percent — much better. So the combinations that maximize a fixed yield do not minimize a defect rate in the presence of variation in the inputs. ![]() Remember my blog post about finding the “flats”? Most optimization ends up on a hill against some component limit. But if we find a flatter place, it will reduce the variation. The definition of flatness is that the slopes are very small or zero in every direction. We can model those slopes (gradients, derivatives). There is a built-in feature of the Profiler to specify that one or more factors are “noise factors” and that the Profiler should model the derivatives of the response surface with respect to those noise factors, and see if it can jointly optimize to maximize yield and minimize the slope. After maximizing this, we see that we are now on a flat area where the gradient is near zero in both directions. ![]() Now we use the simulator to calculate the defect rate. It is 3.3 percent. This is not much different from the fixed optimum – in fact, note that the factor settings are not much different from the fixed-optimal hot-and-fast settings. ![]() Haven’t we landed on a flat spot? Take a look at a surface plot. ![]() The two grids intersect at the current values, and you see that we have landed on a relatively flat spot near the top of the hill. But it is on the top of a fairly narrow ridge. Even though the first derivatives may be small here, the second derivative here is large because the sharp bending leads to a steep drop-off from that point. So we might consider finding a flat spot in a second-degree sense. But there are better ways to go about finding the stochastic optimum — finding the factor centers to minimize the defect rate. Stochastic programming does this. But stochastic programming is hard. How can we make this simple? The answer will be in my next blog post. It turns out that we can reduce defects an order of magnitude smaller with this technique, so it is very valuable. We need to move from hot-and-fast to cooler-and-slower to achieve this, and there is a great way to find the best settings for this. UPDATE: The third blog post in this series is also available. Wednesday, July 9. 2008The Challenge of Optimizing Products and Processes
(NOTE: This is part one of a three-part series on stochastic optimization.)
To get to the top of a hill, you just keep going up. However, hills can have subpeaks, so sometimes you have to hunt around to keep going up. But going up is still the basic idea. This is what optimization is — finding the top of the hill. Operations research is about solving optimization problems more generally, with higher dimensional hills that might have fenced areas that are off-limits. Now imagine that instead of climbing the hill, you ride on a helicopter; you just tell the helicopter where to go, and then you parachute down from 5,000 feet above that location. Sounds easy. But there are clouds, so you can't see the hill itself, and there are random gusts of wind that can blow you hundreds of meters in any direction. Also, you have to land above a certain altitude, or you will get sick. You do get a few trial drops at different GPS locations, but you have to live around that target location, and you get one jump a day. Welcome to the world of stochastic optimization. Getting to high altitude is now a very messy business. Why study something that behaves this strangely and is this frustratingly difficult to understand? Well, it turns out that the future quality of the world's products and processes depends on just this type of situation. We try to optimize our products and processes, but then it turns out that the input factors vary, and the products and processes are no longer optimal. The input factors might change due to environmental factors: You know how to grow the best yielding corn crop, but unfortunately, you can't seem to control the weather to get the optimal yield. The input factors may vary due to natural variation: Your ingredients are the output of some other process, and you can't get all the variability out of that process — you can often control where the center of the distribution is of each factor, but you can't reduce the variation. The literature on this kind of optimization is not particularly rich. The field of study for this application is called robust process engineering — the struggle to make products and processes that behave well in the face of variation. The first good attempt at solving this kind of problem came from a Japanese engineer, Genichi Taguchi. He said that you construct an experiment in two directions. There are the Control factors that you assume are fixed, not subject to random variation. Then there are the Noise factors that in production you can't control completely — they have random variation. In an experiment, you might be able to control them, e.g., you can control the weather for a corn crop by growing it inside and controlling light and water. (In agriculture, that kind of experimental place has a name: phytotron.) Then you cross the experiment across both the Control factors and Noise factors. Next you derive the noise variation across the Noise design for each Control setting. Then you optimize with respect to both mean and variation, or some combined measure, a so-called signal-to-noise ratio. This worked. Taguchi clubs sprouted up all over the world, and engineers learned Taguchi’s method. Some Western statisticians looked at the method and said, “We can do better.” Various schemes emerged along with a recognition of what you should be looking for, which was this: There may be a lot of places on the hill that have good altitude, but among those good places, try to find the place that has the widest, flattest area around it. Then when you are randomly dropped around that target, you are likely to land in a narrow range of altitudes. For example, below you see the contours around Longs Peak in Rocky Mountain National Park. If you want to parachute to above 13,400 feet, then — rather than aiming for the peak above 14,200 feet, risking going off-course and landing at 12,400 feet off the northwest face — you aim for “The Loft,” which is a wide target above 13,400 feet. ![]() In my next blog post, we will see a real example involving some high-tech cooking, a chemical reaction example, and how a classic example with a well-known optimum gets its lesson reversed when variation is taken into account. UPDATE: The second and third blog posts in this series are now available. Credits: Warren Sarle wrote two neural net papers years ago about how optimizing is like climbing to the top of a hill, and that was the inspiration for my analogy. The map is from Google maps. Thursday, May 29. 2008JMP's Director of Statistical R&D Honored as ASA Fellow
Brad Jones, JMP's Director of Statistical R&D, has been elected a Fellow of the American Statistical Association, the most prominent professional statistical society in the US. This honor recognizes "outstanding professional contributions to and leadership in the field of statistical science."
Brad has a career-long passion for the field of optimal experimental design. Experiments are the way you learn, by trial and error, how to make things work better, and optimal designed experiments are simply the way you learn the most from a given number of experimental trials. The world is too complicated to discover things by pure theory – we need some experimental data to find out how the world works. For about 25 years, Brad has been working on a Big Idea that focuses most of his work. The Big Idea is that experimental design has to fit naturally within the workflow of an engineer, and the best way to make that happen is for great software to support that workflow. For new cutting-edge software to evolve, statisticians had to change the way they think about experimental design. In the past, statisticians created designs by algebraic and geometric patterns; the resulting designs could accommodate only certain situations with fixed numbers of runs and factors. For example, if you had a budget for 17 runs on five categorical factors, you had to throw one run away to get a classical design. In the classical design world, there was a lot to learn, and that learning burden was an impediment to engineers. Brad's Big Idea for the engineering workflow turned experimental design upside down. Instead of forcing conditions to fit a tabled design, you tell the computer software what your experiment is all about and how many runs you can afford, and the computer software creates a custom-built design for that situation – a design that is optimal for learning what you need to know from the experiment. Brad adopted the field of optimal experimental design, and, with other statisticians in the field, pushed all the boundaries to make it a rich and robust field. One early breakthrough, by Chris Nachtsheim and Ruth Meyer, was a general algorithm to optimize the design, called coordinate exchange. Brad adopted and refined the coordinate exchange algorithm. Another significant development was Brad's work with co-author Bill DuMouchel on Bayesian D-Optimal designs. These designs try to estimate as many potential interaction effects as possible, even when they are not all estimable. Then Brad recognized that the Bayesian D-Optimal method could be applied even to main effects, and he pioneered the optimal design for what are called supersaturated designs, which allow there to be more factors than runs. This occurs in screening situations where you expect only a few factors to be large, and the objective is to identify these large factors. In collaboration with Chris Gotwalt at SAS and other statisticians, Brad went on to pioneer I-Optimal designs, various space-filling designs, mixture designs, split plot designs, designs for nonlinear models, spherical designs, choice designs for market research and designs for computer experiments. For the Big Idea to work – to get engineers used to designing experiments – there also had to be fitting and analysis tools. Here Brad invented a way to visualize a response surface by taking vertical cross sections across each factor, given fixed values of other factors. This tool, called the Profiler, is now implemented in just about every DOE fitting system. When Brad joined SAS to work with JMP, the Profiler was extended to provide complete optimization services and recently includes optimization in the presence of variability (stochastic optimization). For the Big Idea to gain traction, Brad had to evangelize. To that end, he has established ties with many leading DOE researchers and has jointly authored papers with some of them. Brad is a regular speaker at academic statistics seminars, JMP seminars and academic meetings. One recent meeting was the International Conference on Design of Industrial Experiments at the University of Antwerp, where Brad presented the public defense of his PhD dissertation. In the last two years, Brad has submitted numerous papers for publications, most of them with co-authors. For seminars, Brad and I developed a popular demonstration we call the "card trick" – which involves doing a live supersaturated screening experiment with the audience. On the side, Brad is a concert violinist, a highly ranked Go player and author, has bred show dogs that compete nationally, and is a marathon runner. He was a published author in the field of photochemistry at age 17. I am delighted to see Brad recognized as an ASA Fellow.
Posted by John Sall
in Design of Experiments (DOE), JMP - General, Statistics
at
16:27
| Comments (0)
| Trackback (1)
Friday, October 5. 2007Superposition Magic - How can you identify several clusters that are at the same place at the same time?If physicists can have their superposition magic, then so can statisticians. Suppose that you have lots of points across three variables in three groups. You need to count the number of points in each group. But you don't know which group each point comes from. And, by the way, each group has the same mean across all the variables. Try inventing an approach to do that before you read the rest of this blog entry. Grouping points is usually the job of cluster analysis. The computer scientists have a more colorful name for this: unsupervised learning. It's easy. You just cluster the data so that points that are near each other form clusters; assign each point to the closest cluster, count the points in each cluster and you are done. But these clustering methods never allow the clusters to overlap, much less have the same centers. So ordinary clustering just won't do this job. Before we solve the problem, first we have to be a little more specific about the data and generate a problem data set. We will have the data have multivariate normal distributions, and though each cluster will have the same mean, we will distinguish them by having a different covariance structure for each group. To be simple, we will generate uncorrelated multivariate normal data, with each of the three clusters having a larger variance in one component direction that is unique to that cluster. So we generate a table, in this case with variances of 1 for each variable for each cluster, except that each cluster has one component with a variance of 4, instead of one. The result is that each group sticks out in a different direction, though they have the same centers. Here is the JSL code that I used to make some data: ![]() JSL script to generate data And here is a picture of the multivariate normal density contours that results from doing this: ![]() Normal ellipsoid contours for 3 groups Notice that the groups with the large X and Y variances have 35% of the data, and the group with the large Z variance has only 30% of the data. If the method estimates the proportions well, then the problem is solved. We don't need to identify each point--we just need to estimate the proportions, or equivalently, the numbers in each group. Here is the secret: Instead of doing usual hierarchical or k-means cluster, we do normal mixtures. That is we fit the means, variances, covariances, and relative proportions of each group so as to maximize the likelihood of the data. This was implemented in JMP by Chris Gotwalt. Here are the results. ![]() Normal Mixtures Results Notice how well we did. The proportions are .343, .351, and .308 for the three groups, very close to the proportions used to generate the data, .35, .35, .30. And the means and standard deviations and correlations are close, too. Problem solved. Here is a picture of the data with the points colored by the most-probable group membership. ![]() Points colored for most-probable group Is this magic useful? It turns out that there is a very important type of counting that is very important to do in measuring the infection density in HIV cases. There is a special kind of white blood cell - a helper T cell - that expresses a protein called CD4. HIV infection is measured, in part, by how many of these cells are present in the blood samples, relative to other leucocytes. It turns out that you can make different types of white blood cells identify themselves by tagging their binding sites with different fluorescent dyes. Then you send it through an instrument called a flow cytometer, which makes a tiny jet of fluid droplets containing the cells. Several lasers of different wavelengths are shot through the droplets, and each droplet is measured how it fluoresces. The result is a huge data set, maybe half a million rows on 12 or so intensity measurements at different wavelengths. The data has a row for each cell. The different cells form clusters that overlap, and you need to count the cells in each cluster. The current practice for doing this is by hand-dragging polygons over the clusters, arbitrarily dividing the groups by eye. Using normal mixtures to do the counting would give a much more objective method, and improve the data reproducibility. But there are lots of stray points that don't belong to any cluster. No problem. Chris Gotwalt's routines include a (Huber) robust method of handling outliers. Doing this in 12 dimensions for half a million points is currently pretty expensive, so we are looking for ideas on how to speed this up. Pretty good magic. Saturday, May 5. 2007Russ Wolfinger, pioneer of JMP Genomics, will be honored as an ASA Fellow
Russ Wolfinger, the leader of the JMP Genomics project and developer of SAS Proc Mixed, recently was informed that he will be honored this summer as a fellow of the American Statistical Association.
One of my best perspectives on Russ Wolfinger's work came when I went to a seminar that James Roger of GSK gave a couple of years ago. He said that one of the biggest problems in new drug approval submissions was that they used the ancient rule of last-value-carried-forward to cover for missing data. This distorted analyses in a way that would lead to falsely optimistic conclusions on the effectiveness of drugs for degenerative diseases, a tragic situation. But it was hard to analyze data any other way. The breakthrough was to switch to using Mixed models, where you could then use all your data without any carry-forwards. But none of this was possible until the arrival of full mixed models implementations in FDA-trusted commercial software, and that was due to Russ Wolfinger. Russ, inspired by the Jennrich-Schluchter (1987) paper, and with some collaboration with Randy Tobias, designed a full mixed models system implementing very general covariance parameterizations. Russ's implementation was very complete. You could model the random terms in the design space [G], or in the covariance side [R]. A large number of covariance structures was provided. You could also specify group-varying parameters. A variety of computational options were provided. At the time, it was by far the largest and most important statistical development work in our company's history. When the Kackar-Harville and Kenward-Roger refinements were published to resolve the degrees-of-freedom issues in REML, Russ quickly added them to PROC MIXED. Russ then pioneered in the generalization of mixed models to generalized linear models (with O'Connell), then the generalization to nonlinear models in two different ways. The implementation of PROC MIXED led the whole Biopharmaceutical industry to become much better users of mixed models, more properly accounting for the variation in a model. Oliver Schabenberger later followed Russ's pioneering work at SAS with a next-generation procedure PROC GLIMMIX. Oliver became an ASA fellow last year, so now three SAS Mixed Model statisticians are ASA fellows. (You might recognize the name of the first of the three, Jim Goodnight.) Of course Russ, for the past few years, has become an expert in Genomics, heading the SAS-JMP Genomics group to produce software in that area, and also helping lead the MACQ consortium for Microarray quality, which will meet here at SAS in a few weeks. Congratulations Russ, on becoming an ASA Fellow. Tuesday, April 24. 2007Dino Nuggets and Snotties versus Simpson's Paradox
Last week's news had the story from NC State University that dinosaurs probably tasted like chicken, and this week's Discovery Planet episode brought us the vivid scenes of the living snottie cave ceilings of Cueva de Villa Luz, an acid fuming cavern in the Tabasco state of Mexico.
So how can statisticians compete for attention with those juicy stories? Well it turns out that Leonard Stefanski made the local news headlines with research just published in American Statistician. You analyze certain data sets and examine the residual-by-predicted plot, and check for any patterns in the plot. This simple and routine practice can, with special data, lead to surprising patterns that reflect a regression in a different sense. I haven't seen patterns like this since the Coleman pollen data. Stefanski has it right. You have to check the graph, not just look at the estimates and p-values. This is the central message embodied in JMP, look at your data and check for patterns and points that don't fit patterns. Now, please go to your web browser, search on three words: Homer Simpson Stefanski, follow the links, and see the patterns. Wednesday, April 18. 2007Serendipity
Yesterday at the SAS Global Forum in Orlando, I had the wonderful experience of watching Scott Adams (Dilbert) make a presentation about his life in the corporation and his subsequent career as a cartoonist. Scott is a great speaker, as you might guess from his cartoons. But the talk was not all about corporate waste and misdeeds, there were some real moments of revelation. He took the "Oprah" quoted equation: Luck = Preparation + Opportunity, had some funny mathematical tricks with this equation, and then focused on the "luck" part of it. He related about a social psychology experiment which subjects were told to count the number of pictures in a document, and the statistic of interest contrasted those subjects who felt they were basically lucky, and those who did not. The lucky group and the other group had the same accuracy (100%), but the lucky group found the answer much sooner. They noticed that on page two of the document was a bold-type message that said that the answer was 42 pictures, and they could stop counting. "Luck" is noticing things like that. The people that are lucky are lucky because they notice more things than non-lucky people.
What's the point for us? The point of visualization software is to encourage us into noticing things. To notice things because graphics is easy and rich. To notice things because the graphs are in more dimensions, because they can be run in groups, even animated in groups, to notice points that move and shake when animated across time. That is our business. We are out to become luckier. To make discoveries. Discoveries that lead to "innovation", the theme of the the conference. Serendipity. Thursday, April 12. 2007JMP 7.0 enters the production pipeline
JMP 7.0 is only a few weeks away now that it has entered the production system. The first release out the door will be for 32-bit Windows on annual license, and the others will follow over the next few months.
I am very pleased with the super effort from all the developers, testers, technical writers, and release and localization engineers to put this major release together. It is a major new release that took only about a year and a half to put together. I am proud that it will be on 3 operating systems, in both 32-bit and 64-bit versions, and in six languages. I also want to thank all the users that contributed to our beta testing program.
(Page 1 of 1, totaling 13 entries)
|
ABOUT THIS BLOG
JMP Statistical Discovery Software from SAS
is proud to bring you this blog on all things related to
data visualization, visual Six Sigma, design of experiments
and other statistical topics.
The blog content appearing on this site does not necessarily represent the opinions of SAS. Your use of this blog is governed by the Terms of Use. CategoriesQuicksearchSyndicate This Blog |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

