Monday, November 16. 2009Statistics and Malcolm Gladwell
Did you see the cover article in yesterday's New York Times Sunday Book Review? Harvard psychologist Steven Pinker reviewed the new book by Malcolm Gladwell, who was the keynote speaker for the Innovators' Summit in Chicago.
Pinker writes that Gladwell is "a minor genius who unwittingly demonstrates the hazards of statistical reasoning and who occasionally blunders into spectacular failures." Also worth a read is the editors' note for the book review section, which points out that Gladwell says getting a graduate degree in statistics may be the best way to start a career in journalism now. The editors' note also quotes Pinker as saying statistical reasoning is the most important scientific concept that non-scientists lack. Tuesday, October 20. 2009Why You Need to Know About Split-Plot Designs
Bradley Jones, Director of R&D for JMP, and Christopher Nachtsheim, Professor in the Carlson School of Management at the University of Minnesota, collaborated on a paper that was published this month in the Journal of Quality Technology. Both authors have published widely on the subject of design of experiments and are recognized as experts in the field.
Their new paper is titled “Split-Plot Designs: What, Why and How,” and in this interview with Brad, I asked him those same questions contained in the paper title. Subscribers to the Journal of Quality Technology can read the full article via the ASQ Web site. Arati: What is a split-plot design? Brad: A split-plot experiment is a statistically designed study where the experimental runs are grouped so that certain variables do not change their settings within a group. The experimenter only changes these variables or factors between groups of runs. Holding these factors constant for an entire group of runs means that the run order for these experiment is not completely random. Statisticians often recommend the use of completely randomized designs rather than split-plot designs. Arati: Why are split-plot designs important for practicing statisticians to know about? Brad: Though complete randomization avoids certain problems in the analysis and interpretation of results from experiments, it often requires substantial extra effort. In many processes, certain factors are hard to change from one processing run to the next. To change some factors is as easy as turning a dial or flipping a switch. Changing others can require making time-consuming and expensive alterations to the system. It makes sense to structure your experimental runs to take this practical constraint into account. That is, you would like to do several runs in a row only changing factors that are easy to manipulate before stopping the system to make a big change. Slavish insistence on complete randomization has sometimes resulted in operational people sorting a randomized design for logistical convenience without informing the principal investigator. This sorting makes the design a split-plot design but generally a poorly constructed one. Worse yet, the principal investigator, being unaware of the sorting, analyzes the data as though the run order was random. Since the appropriate analysis of a completely randomized design is different from that of a split-plot design, the consequence of the run-sorting subterfuge can invalidate the results and make for poor decisions. Arati: Tell me about one or two of the recent developments in designing and analyzing split-plot experiments. Brad: In the last decade, the design and analysis of split-plot experiments has been a hot topic in research literature. For my money, the most exciting developments have been the methods for design and analysis of optimal split-plot experiments. It is not really the optimality of the designs that makes this approach exciting, though optimal designs certainly have desirable properties. The real value of these methods it that they allow for much more flexible problem specification and thus much wider applicability. Arati: What was your purpose in writing this article with Professor Nachtsheim? Brad: A couple of years ago I had a conversation with two past editors of the Journal of Quality Technology. They noted the resurgence of interest in split-plot designs. They also were concerned that the mathematical complexity of some of the publications were leaving most practitioners behind. There was the fact that different researchers often recommend slightly different approaches. This can confuse practitioners who do not have the background to discriminate between competing methods. Professor Nachtsheim and I wrote the article to cover all the major research lines pursued in the last decade or so and to present even-handedly their strong and weak points. I included Professor Nachtsheim in the project because I have been an advocate for one approach, and I thought it would be more objective to include an author who had no strong prior convictions. Arati: Who is the audience for this article, and how do you hope they will use the information in it? Brad: We wrote the article for two audiences. For the statistician who is unfamiliar with the recent trend in this area, our article provides a reference list and a guide to the main lines of research. The more important audience, however, is the community of practitioners. We wanted to provide information to empower this community to use these new methods profitably. Tuesday, October 13. 2009Authors Take Statistics, JMP to Engineers and Scientists
I met José Ramírez, PhD, in Chicago at the JMP Discovery conference and Innovators’ Summit. An industrial statistician and longtime JMP and SAS user, he was quite the celebrity at the conference, where he gave a well-attended talk about designing experiments using JMP and SAS.
José told me about his new book, co-authored with his wife Brenda Ramírez, who is also an industrial statistician and expert user of JMP and SAS. The pair wrote the book, Analyzing and Interpreting Continuous Data Using JMP: A Step-by-Step Guide, over two years, on weekends and evenings. They also write a blog called Stat Insights that includes excerpts from their book and discusses “statistics as a catalyst for engineering and scientific discoveries.” Here, José and Brenda share details about the book for readers of the JMP Blog. Arati: Why did you decide to write this book? José & Brenda: A few years ago, the JMP team approached us with the idea to write a book for engineers and scientists. This seemed like a natural progression in our careers, since we have been collaborating with engineers and scientists for many years and we have developed and delivered countless hours of training in statistics and continuous improvement. In addition, we are big fans of JMP software and have been using it for a long time. So writing this book seemed like the perfect opportunity for us to consolidate the significant knowledge we have gained as practicing industrial statisticians, and share it in a way that is far-reaching and useful to this community. An additional inspiration for our book comes from the National Bureau of Standards Handbook 91 Experimental Statistics by Mary Natrella. We wanted to bring the same spirit and utility of the NBS Handbook 91 to the countless engineers, scientists and data analysts whose work requires them to transform data into actionable information. Arati: Who, specifically, will benefit from reading and using your book? And how do you hope they will use the book? Brenda: The book is primarily written for engineers and scientists who need to use statistics and JMP to make sense of data and make sound decisions based on their analyses. This includes, for example, people working in semiconductor, automotive, chemical and aerospace industries. Other professionals in these industries who will find it valuable include quality engineers, reliability engineers, Six Sigma Black Belts and statisticians. In addition to the working professional, those who are studying to become engineers, scientists or even statisticians, as well as those teaching them, should get a copy of our book. It is a great teaching aid. For those who want a reference for how to solve common problems using statistics and JMP, we walk through different case studies using a seven-step problem-solving framework, with heavy emphasis on the problem setup, interpretation, and translation of the results in the context of the problem. For those who want to learn more about the statistical techniques and concepts, we provide a practical overview of the underpinnings and provide appropriate references. Finally, for those who want to learn how to benefit from the power of JMP, we have loaded the book with many step-by-step instructions and tips and tricks. Arati: What kinds of case studies or problems do you discuss in the book? José: In Chapters 3 through 7, we start with a problem description, setting the stage for the uncertainties that need to be solved using the statistical techniques described in the chapter. All of the case studies in the book are based upon common problems that engineers or scientist will come across at some point in their careers, and the chapter headings reflect the specific application. For example, in Chapter 4, “Comparing the Measured Performance of a Material, Process, or Product to a Standard,” we use a semiconductor example involving a new three-zone vertical furnace for thin film deposition of waters to illustrate the usefulness of one-sample significance tests to qualify a new piece of equipment. In Chapter 5, “Comparing the Measured Performance of Two Materials, Processes, or Products,” we compare the performance of two mass spectrometers in an analytical laboratory using the atomic weight of silver to determine if a bias exists and to understand their measurement error. Although it is not officially a case study, we are thrilled to include in Chapter 7 the data from Albert Einstein’s first published paper. In his 1901 paper, a young Einstein used least squares to fit a model to investigate the nature of intermolecular forces. Arati: It’s pretty cool that you had Professor Douglas Montgomery write the foreword to your book. How did you make that happen? José: Ever since we were students, we have been using and following the work of Professor Montgomery, and we believe his books are excellent references for engineers and scientists. We also share a passion for industrial statistics and Doug and I have crossed paths many times over the years at various statistical conferences and events, including, more recently, at JMP conferences. When we put all of these pieces together – statistics, engineering and JMP – Professor Montgomery seemed like the perfect person to entrust with this important part of our book. So we just had to find a way to ask him if he would be willing to write the foreword to our book. Luckily for us, that opportunity arose at the Quality & Productivity Research conference in June 2008 in Madison, WI. At that event, I was able to discuss this possibility with him, and without hesitation he said, “Yes.” Arati: How will you use this book going forward in your professional career? Brenda: This book is a reflection of how we collaborate with engineers and scientists to use statistics as a catalyst for new discoveries and insights. Having the book will make it easier to share our statistical engineering philosophy with others. Arati: Where is your book sold? José: Our book is available online from the SAS Web site or Amazon.com. Both Web sites allow the reader to view the table of contents and a sample chapter from the book.
Posted by Arati Bechtel
in Discovery, Innovators' Summit, JMP - General, Statistics
at
10:36
| Comments (0)
| Trackbacks (0)
Monday, October 12. 2009Answering Your Demand for Design of Experiments
Recently, JMP has been deluged with requests for information about Design of Experiments (DOE or DOX). Was it due to atmospheric disruption when NASA hit the lunar south pole last week? Or shall we just chalk it up to a growing desire to work smart and make better use of resources in the workplace?
Never fear, JMP is responding. On October 28, author and Arizona State University professor Douglas Montgomery and JMP’s Brad Jones are offering a free seminar on DOE in Phoenix. At the seminar we will give away some copies of several books, including Montgomery’s latest edition of Design and Analysis of Experiments and a new SAS Press book by W.L. Gore employees and JMP users José and Brenda Ramírez, Analyzing and Interpreting Continuous Data Using JMP: A Step-by-Step Guide. Join Doug, Brad, Susan Glick, John Guerrero and the southwest US JMP team on Thursday, October 28. Seats are filling fast so register today.
Posted by Gail Massari
in Design of Experiments (DOE), JMP 8, Statistics
at
11:20
| Comments (0)
| Trackbacks (0)
Monday, September 21. 2009Dr. George Box Speaks at Discovery 2009
It is a rare and exhilarating opportunity to dine with a legend. And at Discovery 2009, that is exactly what attendees did.
After a full day of keynote speakers, breakout sessions, poster browsing and meeting the developers, conference-goers convened for deep-dish pizza and the chance to hear from Dr. George E.P. Box, who many would label “the father of modern-day statistics.” The audience was filled with people who learned statistics from his many books, including “Statistics for Experimenters.” Each person received a copy and the chance to have it signed by the man for which Box-Cox transformations, Box-Jenkins models and Box-Behnken designs are named. In the words of one audience member, his “book is one of the best. I look at it every week when helping people set up experiments.” Dr. Bradley Jones, Director of R&D at JMP, opened the event, calling Box a “personal hero” and “the leading statistician of the previous millennium.” Box entered to electrifying applause and a standing ovation from his many admirers. Clearly overwhelmed by the moving response, he jokingly likened the moment to a story he remembered of a sultan who, on his 21st birthday, attended a celebration in his honor where there were many concubines and “he didn’t know where to start!” Infusing his entire presentation with humor and fascinating tales of his memories, Box focused on sequential design of experiments. He attributed much of what he knows about DOE to Ronald A. Fisher. Box explained that Fisher couldn’t find the things he was looking for in his data, “and he was right. Even if he had had the fastest available computer, he’d still be right,” said Box. Therefore, Fisher figured out how to study a number of factors at one time. And so, the beginnings of DOE. Having worked and studied with many other famous statisticians and analytic thinkers, Box did not hesitate to share his characterizations of them. He told a story about Dr. Bill Hunter and how he required his students to run an experiment. Apparently a variety of subjects was studied, from baking cakes to experimenting with sex to finding a better way to get out of a spin in an airplane (according to Box, the student didn’t actually kill himself, although he came close). At the conclusion of his presentation, audience members were invited to participate in a Q&A session. Dr. Dick De Veaux, professor of mathematics and statistics at Williams College and a Discovery Keynote Speaker, had a funny exchange with Box. It went like this: De Veaux: “You invented a lot of things, and we are thankful for all of those. But the box plot, you didn’t invent. And you once confided in me you’d like to invent your own plot. I would like to know how that’s going.” Box (chuckling): “Well, John Tukey was working in the same group as me at the time that he invented the box plot. And he decided to call it that. Why? I have no idea. He was a remarkable man. But on the other hand, I sometimes got irritated with him. I remember once, I had been asked to give a seminar. And he thought he knew what I would say and continued to interrupt me, but he didn’t know what I was going to say. I decided to take a vote, if it comes out in my favor, John Tukey will keep quiet. And it did come out in my favor.” De Veaux: “So there!” Box: “But he really was a remarkable person in most ways.” His answer to the why DOE has not taken root in more organizations where Six Sigma and quality process control already occur was priceless as well. He said, “I don’t see why people doing Six Sigma shouldn’t do DOE. I’d say, if they aren’t, you should teach them and say it’s Six Sigma.” Breakout session presenter and experimental design advocate Dr. Chris Nachtsheim asked Box if he had any comments on the state of the statistical profession today. Box explained that in order to teach statistics today, all you need is a math degree. He said that many professors “aren’t statisticians at all; they are actually mathematicians who didn’t quite make it.” Therefore, it is very unlikely that these mathematicians have ever run an experiment. According to Box, the difficulty of getting DOE to take root lies in the fact that these mathematicians “can’t really get the fact that it’s not about proving a theorem, it’s about being curious about things. There aren’t enough people who will apply [DOE] as a way of finding things out. But maybe with JMP, things will change that way.” Well said, Dr. Box. Thank you for sharing your time, talents and thoughts with us. Thursday, June 25. 2009Nuggets of Wisdom from Risk Visualization Expert
I attended Sam Savage’s presentation based on his book The Flaw of Averages (PDF) not once, but twice. Although I have one solitary statistics course under my belt, I found Sam’s ideas quite accessible, and worth hearing a second time. Sam uses what he calls the five “mindles.” Like a handle that is used to physically grasp an object, a mindle helps us mentally grasp information.
A few nuggets of wisdom learned by this statistically obtuse observer: 1. Do not build a model to get the right answer. Build a model to get the right question. 2. Forget the terms you learned in statistics class. Random variable. Central Limit Theorem. Correlation. They won’t be useful in a singles bar and even those with statistical insight don’t understand them. He says that “The world is an uncertain place and we must understand the language to use it.” He translates the language into approachable lingo, also known as the mindles. 3. The five mindles. For a complete understanding of these, I definitely suggest going to hear Sam speak and reading his book. a. Uncertainty vs. risk. Uncertainty is certain to exist. But risk is subjective. b. Uncertain number. An uncertain number is a shape (called the distribution). But, so what? According to Sam, “If the world could start to use the word, it would be a different place. We might not have flown the economy into the side of a mountain.” c. Combinations of uncertainties. Or, diversification. d. Plans based on uncertainties. But, all plans are based on uncertainties so, just plans. e. Interrelated uncertainties. Or co-variance, which is the basis of modern portfolio theory. And look where that got us. 4. The Levels of Stochastic Enlightenment. Want to work dumb? Say you don’t know the answer. Want to work dumber? Use a point estimate (which is an accepted accounting practice). Want to work smart or smarter? Simulate and do something. Using the JMP Profiler for interactive simulation, he shows how you can play with scenarios to identify the best case. The Profiler simulates “100,000 trials before your finger leaves the enter key. It’s a new paradigm for risk assessment,” he explains. Sam puts it this way: Interactivity is important. To learn to ride a bike, you must interact with the handle bars, physically manipulating them to stay on course. JMP provides that interaction (via mindles) to mentally manipulate your projects or issues to stay on course. However, about 50 million people base decisions for course of action on averages. Can all of those people really be wrong? Sam says yes. And JMP can demonstrate why.
Posted by Jessica Marquardt
in Biz Viz, Data Visualization, JMP - General, Statistics
at
15:01
| Comments (0)
| Trackbacks (0)
Friday, June 19. 2009Bogeys, Pictures and Numbers
This weekend features one of my all-time favorite sporting events: the golf US Open (plus Father’s Day on Sunday provides a convenient guilt-free excuse to actually watch it). This year, the tournament is held at the Bethpage Black course just outside of New York City. It has a classic sign:
Not sure who “We” represents, but there is an unquestionable tone of authority, with prepositions, definite and indefinite articles all capitalized. And note this course is not just for “Skilled Golfers”, but “Highly Skilled Golfers.” (Hats off to the pros this week who are also battling Mother Nature under very soggy conditions.) Some data sets should come with a similar warning. In genomics, we are now faced with experiments conducted on thousands of individuals with millions of measurements on each across a variety of complex molecular domains: genetic markers, transcript abundance, copy number, microRNA, protein and metabolite intensities, not to mention thousands of standard phenotypes. Analyzing such data sets properly certainly requires skill along with the best possible software. A primary goal of JMP since its inception more than 20 years ago has been to provide a dynamic and optimal combination of both statistics and graphics. Drop one of this pair, and you are going to miss something critical. JMP Genomics, although much younger, is definitely building on the same philosophy. Accomplishing this goal is difficult, but we continue to make progress and relish your feedback on how to do it better. Confession: Every time I pass a mirror and find no one is looking, I practice my golf swing. I’m a sucker for every newfangled idea on how to hit a golf ball better -- trust me, there is an infinite supply of them -- and just have to try it. I’ve been tinkering with my dang swing since graduate school and still don’t have it right. Not seeing him enter, I almost knocked a guy out one time with my mock follow-through in a small men’s room. One technology that’s a godsend is digital video with slow motion. Interactively successive freeze frames taken from good angles show exactly what I’m doing (even though I think I’m doing something else) -- that, of course, and actually striking that dumb little 1.68 inch diameter sphere and adding up my score. Pictures and Numbers. One day they might even get me to “Highly Skilled.”
Posted by Russ Wolfinger
in Genomics, JMP - General, Statistics
at
09:39
| Comments (0)
| Trackbacks (0)
Wednesday, May 20. 2009Scientific Computing Review: 'Stunning' Graphics in JMP 8
Referring to JMP as "an old friend," statistician John Wass reviewed JMP 8 for Scientific Computing and calls it a "major upgrade." The review includes several visualizations and covers many of the major new features of JMP 8.
Wass concludes: "This latest version is stunning in the quality of the graphics, and JMP has pioneered the advancement of statistical graphics by heavily linking most number-crunching operations to a graphic. Interested parties are highly encouraged to download a trial version." Here's where you can get the fully functional 30-day trial version of JMP 8, for Windows, Mac and Linux.
Posted by Arati Bechtel
in Data Visualization, JMP 8, Statistics
at
14:23
| Comments (0)
| Trackbacks (0)
Tuesday, May 19. 20093-D Pie Reply
Thanks for the enlightening comments to my blog post "I Like 3-D Pie Charts" and for the new graphs. While the bar charts from Joe and John are very nice, I prefer vertical bars because of their connection with the gravity orientation of trees, mountains, buildings and of course cell-phone signal strength. It’s interesting that Paige and Lee suggest 2D pie charts. Warning: The graphics gods are watching you. Daniel, I’m glad you did not sleep through Art Appreciation in college.
SAS is well known for its corporate amenities, and its cafés are no exception. I was in line for lunch the other day and came upon the always tempting dessert case. The middle shelf featured over a dozen haphazardly arranged pieces of another one of my all-time favorites: chocolate crème pie. Simply irresistible. Maybe it’s my upbringing, but I had absolutely no problem identifying the largest slice. I don’t think it’s a guy thing either, because in the extra second I took reveling about how good it was going to look on my tray, the woman behind me handily placed it on hers. How dare she! But I was smiling inwardly as I quickly grabbed the next largest one, not only because we were likely identical by state for a latent canine quantitative trait, but because I had some more assurance that at least some humans are actually decent at assessing the size of three-dimensional wedge shapes. The psychological experimental evidence cited to the contrary is largely conducted on college students who are really only good at determining how much liquid remains in bottle-shaped objects. This has me wondering if exploding pieces of the pie chart might actually be helpful in avoiding volumetric distortion. If nothing else, exploding and appropriately labeling one or more pieces seems like a great way to emphasize them, and, conversely, leaving the really thin slices unlabeled is desirable when they are not of interest. Regarding ordering of slices or bars, I forgot that in examples like this one there is an overriding analytical criterion. In statistical modeling in general, one typically puts main effects first, then two-way interactions, then succeeding higher order interactions and finally residual/unexplained variance. One way to accomplish this (or any other ordering) in JMP is with the Value Ordering column property. For our example, if you first run JSL code like the following…
…then the bars appear in the desired order. We use such a column property in JMP Genomics scripts to make sure chromosome plots appear in numerical instead of alphabetical order. If you’re having trouble comparing sizes of slices in the next pie chart you see, just pretend that it’s your favorite dessert and you’re really hungry. Works every time.
Posted by Russ Wolfinger
in Data Visualization, Genomics, JMP 8, JSL, Statistics
at
09:33
| Comments (2)
| Trackbacks (0)
Monday, May 18. 2009Soccer Analytics Using JMP
NOTE: This entry comes to the JMP Blog from our colleague Jerome Bryssinck of SAS Belgium. Jerome had seen Jeff Perkinson's examples of basketball analytics using JMP and created his own example using football (or soccer) data. In response to comments from readers, Jerome updated his model on May 26, and this blog post now reflects those changes.
THE QUESTION: Has the game been decided yet? HTGBD This is the question that most people constantly ask themselves when they are watching a football game. This question can take different forms depending on the circumstances. If you're lucky to support the winning team, you might ask yourself: "How secure is the lead?" And for the less fortunate of us: "Is there still a chance for my team to win?" THE ANSWER: Analytics Graph1: Probability of the game having been decided in function of the elapsed time and the number of goals difference. Graph1 shows the probablility of the game having been decided in function of the elapsed time and the number of goals difference. It is possible to change the elapsed time and the number of goal difference on the graph by clicking on a different value. Some interpretation examples: If Time=45 and Goal Difference=0: The game has been going on for 45 minutes, and the number of goal difference is 0. There is a 23% probability that the outcome of the game won't change. Here, as the teams are even (0 goal difference), this would mean that there is a 23% probability the game will end in a tie. If Time=45 and Goal Difference=1: The game has been going on for 45 minutes, and one of the teams is leading by 1 goal difference, then we have a 60% probability that the outcome of the game won't change. Here, this would mean that the leading team has a 60% probability to win. More Details about the Answer The model used above has been built using data from the UK Premier League from 2002 to 2006. The type of model used is a regression model. The following representations are useful to understand the underlying data. Graph2: Has the Game Been Decided vs. Time Graph2 shows the percentage of the games that have been decided in function of the Elapsed Time. I must say that I wasn't surprised by this graph, which basically states that the Elapsed Time and the HTGBD (Has The Game Been Decided) are directly proportional. ![]() Graph3: Has the Game Been Decided vs. Time By Goal Difference Graph3 shows the percentage of the games that have been decided in function of the Elapsed Time by the number of goal difference. According to this graph, the number of goal difference is an excellent predictor for the HTGBD. Additional readings: Similar models are available for basketball. Check out Bill James and Jeff Perkinson if you want to learn more. This entry was first published in Jerome Bryssinck's blog, Brisink. It is republished here with his permission.
Posted by Arati Bechtel
in Biz Viz, Data Visualization, JMP 8, Statistics
at
09:14
| Comments (2)
| Trackbacks (0)
Wednesday, April 29. 2009Mixed Models: Yes and No
When asked if I miss being the lead developer of mixed model software at SAS, I usually reply “yes and no.” The “yes” comes from feeling very fortunate to have been involved with this powerful methodology throughout my nearly 20 years at SAS and a part of the strong legacy that began with the work of Jim Goodnight on PROC GLM and PROC VARCOMP in the early days of the company. A big debt of gratitude goes to Dave Delong, who initially pointed me in this direction, and to John Sall and Randy Tobias (our newest ASA Fellow!), who provided key pieces of code that formed the beginnings of PROC MIXED. Interestingly, John’s code was based on the pioneering work of Jennrich and Schluchter on covariance-structure modeling (the R side of the mixed model), while Randy’s code focused on generalizing REML estimation of variance components (the G side of the mixed model). Putting the two together was somewhat serendipitous and turned out to provide a wonderful technology for such diverse applications as animal breeding, clinical trials, manufacturing quality control, education assessment, space-time modeling and statistical genetics.
The “no” comes from confidence in the very impressive work done by our current mixed model gurus: Oliver Schabenberger (PROC GLIMMIX), Chris Gotwalt (JMP), Paul Wright (EVAAS) and Tianlin Wang (PROC HPMIXED). That, and my now decade-long passion for genomics and the true joy that comes from leading a top-notch team and working with all of the dedicated professionals who have helped JMP Genomics become successful. I’ve been amazed at how mixed model theory is a unifying theme throughout statistics, encompassing such methods as empirical Bayes, ridge regression, time series, kriging, and smoothing splines. Did you know you can fit a support vector machine using the radial basis functions in PROC GLIMMIX? In a bit of a throwback to SAS in the '70s, we’re using variance components on principal components to quantitatively compare sources of variability in microarray data. Most recently, we’ve been exploring how mixed models can effectively adjust for population stratification in genome-wide association studies. So I can’t seem to get away from mixed models — not that I really want to! Let me close with a mixed model curiosity that I was looking at just last week. Consider the following SAS code, which compares a simple linear regression model with the same model fit with random effects: /* simulate data for a simple linear regression and compare fixed and random models */ %let seed = 2817340; data xy; do slope = 1 to 5; do nx = 1 to 10; x1 = rannor(&seed); x2 = rannor(&seed); do rep = 1 to 10; y = x1*slope + x2*slope*2 + rannor(&seed); output; end; end; end; run; ods exclude all; ods noresults; proc mixed data=xy method=ml; by slope; model y = x1 x2 / s; ods output fitstatistics=fs covparms=cp solutionf=sf; run; ods exclude none; ods results; title "Fit Statistics from Fixed Model"; proc sort data=fs; by descr slope; run; proc print data=fs; run; title "Covariance Parameters from Fixed Model"; proc print data=cp; run; title "Beta Hat from Fixed Model"; proc sort data=sf; by Effect Slope; run; proc print data=sf; run; ods exclude all; ods noresults; proc mixed data=xy method=ml; by slope; model y = / s; random x1 x2 / s; ods output fitstatistics=fsr covparms=cpr solutionf=sfr solutionr=srr ; run; ods exclude none; ods results; title "Fit Statistics from Random Model"; proc sort data=fsr; by descr slope; run; proc print data=fsr; run; title "Covariance Parameters from Random Model"; proc sort data=cpr; by CovParm Slope; run; proc print data=cpr; run; title "Beta Hat from Random Model"; proc print data=sfr; run; title "Gamma Hat from Random Model"; proc sort data=srr; by effect slope; run; proc print data=srr; run; The second PROC MIXED call does something that I would have (up until last week!) recommended against — using simple covariates in a RANDOM statement with no SUBJECT= effect. It turns out that the gamma-hat estimates (the empirical BLUPs) are virtually identical with the beta-hat estimates from the first simple regression model. So covariance structure modeling is intimately connected with mean modeling. If you’re really into mixed models (or would like to be), here’s a question for you: Why are the variance component estimates from the second model approximately equal to the square of the slopes? Hint: The answer is not “yes and no.”
Posted by Russ Wolfinger
in Genomics, SAS Integration, Statistics
at
13:28
| Comment (1)
| Trackbacks (0)
Monday, April 6. 2009Not Really an $11 Trillion Hole
The front page of the Wall Street Journal on March 13 highlighted an "$11 Trillion Hole" and said "Americans See 18% of Wealth Vanish." I looked at the chart, and the 2008 number indeed looked as if it had fallen off a cliff.
But then I looked at the rest of the curve and remembered the two big bubbles that were going on, the Internet Bubble in the 1990s and the Housing Bubble in the 1990s and 2000s. I thought I should just exclude those points from the long-term trend. So I got the data from the Federal Reserve's Web site and tried to reproduce the Wall Street Journal plot, adding a trend line that excluded the bubble points. So how does our current net worth look with respect to the long-term trend? Not bad at all. We are not in a $11 trillion hole but are back on track after some roller-coaster years. ![]() Legend: green = used to estimate the regression line, 1985 to 1996 red = the points in the bubbles blue = the current value that was the subject of the Wall Street Journal article I don't want to deny in any way that we are in an economic crisis. But I do want to remind everyone that portfolio valuation drops are not quite as bad as they seem if you consider that the last few years of huge yields were somewhat artificial, and just returning to normal valuations will look like a crash. Sources:
Posted by John Sall
in Data Visualization, JMP - General, Statistics
at
09:57
| Comments (3)
| Trackbacks (0)
Monday, March 23. 2009Reliability Analysis Expert Kicks Off JMP Explorers Series
More than 70 people gathered at SAS headquarters last Friday to hear Dr. William Meeker, Professor of Statistics at Iowa State University, talk about reliability analysis. Following a hot breakfast, attendees watched Meeker's presentation covering the principles behind probability plots, multiple failure modes analysis and accelerated life testing.
For each topic, JMP Statistical R&D Director Brad Jones showed JMP software's capabilities to analyze, plot and provide relevant statistics about the variety of distributions available to help users understand their data. Of particular interest to the audience was the discussion of multiple failure modes analysis, which is important for understanding the impact of the wear of different parts on the performance of a product. Among the attendees were engineers who help their companies determine reasonable warranty periods, so this topic was particularly relevant to them. This was the first live seminar in the JMP Explorers Series. Upcoming speakers include Richard DeVeaux, Sam Savage and Stephen Few. While he was at SAS headquarters, Meeker recorded four videos about reliability analysis, and Leo Wright from JMP recorded four accompanying JMP demos. View the webcast videos and demos. You will need to log in using a SAS profile, or set up a profile and then log in. The data for the demos is also available for download. Why not use the data to run the reliability analyses yourself?. Wednesday, March 11. 2009Reliability Analysis with JMP
After the sticker shock has worn off or the glow of a good deal fades into dullness, customers are left with a product that they expect will meet their needs. A large number of product designers and manufacturers that deliver great products that their customers love use JMP for reliability analysis.
JMP 8 integrates new, valuable reliability capabilities that are interactive and graphical. Many are based on the guidance of a leading expert on statistical methods for reliability, Dr. Bill Meeker, Professor of Statistics and Distinguished Professor of Liberal Arts and Sciences at Iowa State University. Meeker has written a number of textbooks, including Statistical Methods for Reliability Data, which he co-wrote with Luis A. Escobar. By the way, if you are in or near Cary on March 20, we invite you to hear Bill Meeker talk about reliability. So what’s new for reliability in JMP? Chris Gotwalt, JMP Software Development Manager and Senior Research Statistician, described and walked me through some of the new features. Two New Platforms Predict Events Life Distribution and Fit Life By X are two new platforms. Users start by fitting multiple time-to-event distributions to their data. Plots of the distributions are overlaid with a plot of the data, all in the same window. Then, with a few mouse clicks, users visually and statistically examine and compare the distributions to determine which ones offer good explanations for the data. JMP makes this easy by overlaying all the distributions onto one graph. After choosing a model, JMP profilers determine the probability that an event will occur. Based on criteria specified by the users, the profilers extract from the model relevant quantities of interest, such as estimates of median time to failure or the probability that an event will have happened by a certain time. The new platforms are analogous to the Distribution and Fit Y By X platforms in JMP, in that Life Distribution provides graphical and analytical tools for examining a single variable, while Fit Life By X allows the user to explore the relationship between a response and an explanatory variable. Fit Life By X offers accelerated life testing. Both new platforms provide the censoring that is typical of reliability data. New ‘Distribution Dredger’ Recommends a Distribution For users who want guidance selecting the distribution that best fits the data, a great new interactive interface fits all distributions behind the scenes, and then it suggests a distribution by identifying the distribution with the best AICc score. Here’s an Example Using fan.jmp data found in the JMP Sample Data, we used Life Distribution to create a model, for which we compared four different distributions on a lognormal scale simply by checking boxes next to the distribution types and selecting a radio button for the probability scale type. JMP displayed a graph that linearized the data. The graph is analogous to a probability paper plot. JMP also displayed the AICc values and profilers for each distribution. ![]() ![]() At this point, we looked at graphs and AICc values to determine which distribution explains the data best. In this case, Lognormal looked best. We didn’t stop there, however. We used the Fit All Distributions option, found under the red triangle menu, to fit all the distributions automatically. JMP identified Threshold Loglogistic as the best-fitting distribution because it had the lowest AICc value. ![]() 16 Interactive Distributions in All The JMP 8 Life Distribution platform includes 14 new distributions plus two distributions that were formerly available elsewhere in JMP and are now available in the survival reliability context. The new distributions are: • Frechet • Log Logistic • Smallest Extreme Value • Largest Extreme Value • Logistic • Threshold Frechet • Threshold Lognormal • Threshold Loglogistic • Zero Inflated Weibull • Zero Inflated Frechet • Zero Inflated Lognormal • Zero Inflated Loglogistic • Generalized Gamma • Log Generalized Gamma The two distributions that were formerly available elsewhere in JMP but are now available in the survival reliability context are: • Threshold Weibull • Normal Want to see for yourself? View a video that includes more examples. Friday, February 20. 2009Prediction Intervals Are Even Better in JMP 8
JMP® 8 offers a new capability to generate simultaneous prediction intervals, which are very useful when making claims about the future performance of small lots of products. In the latest edition of his white paper Statistical Intervals: Confidence, Prediction, Enclosure (PDF file), José Ramírez updates his examples to incorporate simultaneous prediction intervals and summarizes the corresponding conclusions.
José also adds a new section titled "How Much Can We Trust Our Claims?" in which he describes how to determine whether the data used to make the claims based on statistical intervals are homogenous. He uses JMP to create a process behavior chart (control chart) for individual measurements. The chart captures a running record with limits that allow the analyst to decide whether the data is homogeneous, whether the process is stable and whether the claims made based on the prediction intervals are, therefore, valid. Why not download the paper for some good reading?
(Page 1 of 3, totaling 33 entries)
» next page
|
ABOUT THIS BLOG
JMP Statistical Discovery Software from SAS
is proud to bring you this blog on all things related to
data visualization, visual Six Sigma, design of experiments
and other statistical topics.
The blog content appearing on this site does not necessarily represent the opinions of SAS. Your use of this blog is governed by the Terms of Use. CategoriesQuicksearchSyndicate This BlogCalendar
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

