Tuesday, June 30, 2009JMP Workshops for Professors
In the past two years, close to 1000 professors and students have attended JMP workshops held by Melodie Rush on U.S. college campuses. As the 2009-2010 academic year approaches, Mia Stephens joins the workshop circuit.
Mia, a new member of our JMP academic team, is an applied statistician who is no stranger to JMP. You may have seen her present at a JMP User Conference or be familiar with a data mining white paper she co-authored with her colleagues at North Haven Group. I spoke with Mel and Mia after a campus visit. Gail: What is the purpose of the workshops? Mel: When a university licenses JMP, unlimited numbers of professors and students in the licensing group, say a Math Department or a whole campus, have access to the software. Some professors know JMP; others don’t. We want to arm the new professors with the JMP basics they need to teach the statistics covered in their courses. Mia: And we want to show the JMP-savvy professors new features that they might use in teaching, like the reliability capabilities added in JMP 8. Gail: What do you cover? Mel: We demonstrate how to use JMP for basic stats; how to interact with the graphics JMP generates for almost all statistics; and how the data, graphs and statistics are all linked. This always creates a lot of energy in the room! Mia: We make sure to cover at least univariate, bivariate, and multivariate summary statistics; t-test; basic regression; ANOVA; and contingency tables. Often, there are professors in the room who also teach or use SAS, and we show them how JMP integrates with SAS. Gail: I've heard a lot about JMP from business schools. Mel: We get asked about Biz Viz all the time. The JMP Graph Builder is a great way to compare data, so we show people how to analyze business and other data by dragging and dropping onto graphs. Everyone loves this! Gail: How long are the workshops? Mia: Two hours is about right to cover the basics and still keep people engaged. Gail: Does each participant have JMP on a computer? Mia: No, these aren't training sessions. We use JMP on a laptop connected to a projector. Gail: How do you decide where to hold workshops? Mel: Time and travel costs are a consideration. So, we take requests from campuses that can provide a room and are sure there will be at least 15-20 professors attending the session. They can bring as many students as the room will hold. Mia: Our academic account team offers workshops to new sites that license JMP. We also get requests directly from professors. Interested? To inquire about a workshop for your campus, send an email to the JMP academic team. Hear Mel and Mia talk about the workshops. JMP Users Group Established in Atlanta
New capabilities in JMP 8 and use of the software in classroom and business environments were focal topics at the inaugural meeting of the Atlanta JMP Users Group. More than 30 people attended the June 17 gathering at the SAS Atlanta Training Center. The newly formed group is led by business effectiveness consultant Kevin Holston and includes members from a variety of industries, including healthcare, education, finance, telecom, energy, consumer products and marketing.
The kick-off event featured three well-received presentations. JMP Systems Engineer Mike Vorburger (pictured below) discussed new capabilities in JMP 8. Jennifer Priestley, PhD, of Kennesaw State University presented SAS in the Classroom: Credit Scoring Using JMP. Eric Schmidt, PhD, of Georgia-Pacific discussed Time Series Models and Forecasting in JMP 8. According to Holston, there was “a lot of excitement and interest in forming the Atlanta JMP Users Group. Many were awed to see the power in JMP 8.” More than 80 percent of the participants noted that they have some experience with Base SAS programming; 5 to 10 percent have some experience with SAS Enterprise Guide and JMP. “All wanted to know more and each spoke to potential applications in their business environment,” Holston explains. In the future, the group hopes to cover topics such as SAS and JMP integration, data mining (partitioning) and modeling. Many expressed interest in meeting on a quarterly basis. The next meeting is slated for September or October. A core team of volunteers will soon be recruited to plan meetings and select presenters. The group hopes to leverage social networking tools such as LinkedIn, Twitter and Facebook to spread the word about upcoming opportunities. Friday, June 26, 2009Talking Visual Analytics with Stephen FewI had a chance to sit down with Stephen Few during his whirlwind tour of the East Coast as part of the JMP Explorers Seminar Series. Stephen has been extolling the power of data visualization in business analytics and how it enables good decision making. In the first segment, I ask Stephen to discuss his seminar topic, visual analytics, and the importance of understanding it. The second question for Few: "Why is the topic of visual analytics so important during the current economic downturn?" Stephen will be finishing up his East Coast swing of the Explorers Series today in Atlanta. If you'd like to hear more of his ideas, check out his blog or download his latest white paper, Predictive Analytics for the Eyes and Mind. Better yet, make plans to see him during Discovery 2009, September 16-18 in Chicago.
Posted by John Jones
in Biz Viz, Data Visualization, Innovators' Summit
at
11:09
| Comments (0)
| Trackbacks (0)
Thursday, June 25, 2009Nuggets of Wisdom from Risk Visualization Expert
I attended Sam Savage’s presentation based on his book The Flaw of Averages (PDF) not once, but twice. Although I have one solitary statistics course under my belt, I found Sam’s ideas quite accessible, and worth hearing a second time. Sam uses what he calls the five “mindles.” Like a handle that is used to physically grasp an object, a mindle helps us mentally grasp information.
A few nuggets of wisdom learned by this statistically obtuse observer: 1. Do not build a model to get the right answer. Build a model to get the right question. 2. Forget the terms you learned in statistics class. Random variable. Central Limit Theorem. Correlation. They won’t be useful in a singles bar and even those with statistical insight don’t understand them. He says that “The world is an uncertain place and we must understand the language to use it.” He translates the language into approachable lingo, also known as the mindles. 3. The five mindles. For a complete understanding of these, I definitely suggest going to hear Sam speak and reading his book. a. Uncertainty vs. risk. Uncertainty is certain to exist. But risk is subjective. b. Uncertain number. An uncertain number is a shape (called the distribution). But, so what? According to Sam, “If the world could start to use the word, it would be a different place. We might not have flown the economy into the side of a mountain.” c. Combinations of uncertainties. Or, diversification. d. Plans based on uncertainties. But, all plans are based on uncertainties so, just plans. e. Interrelated uncertainties. Or co-variance, which is the basis of modern portfolio theory. And look where that got us. 4. The Levels of Stochastic Enlightenment. Want to work dumb? Say you don’t know the answer. Want to work dumber? Use a point estimate (which is an accepted accounting practice). Want to work smart or smarter? Simulate and do something. Using the JMP Profiler for interactive simulation, he shows how you can play with scenarios to identify the best case. The Profiler simulates “100,000 trials before your finger leaves the enter key. It’s a new paradigm for risk assessment,” he explains. Sam puts it this way: Interactivity is important. To learn to ride a bike, you must interact with the handle bars, physically manipulating them to stay on course. JMP provides that interaction (via mindles) to mentally manipulate your projects or issues to stay on course. However, about 50 million people base decisions for course of action on averages. Can all of those people really be wrong? Sam says yes. And JMP can demonstrate why.
Posted by Jessica Marquardt
in Biz Viz, Data Visualization, JMP - General, Statistics
at
15:01
| Comments (0)
| Trackbacks (0)
Friday, June 19, 2009Bogeys, Pictures and Numbers
This weekend features one of my all-time favorite sporting events: the golf US Open (plus Father’s Day on Sunday provides a convenient guilt-free excuse to actually watch it). This year, the tournament is held at the Bethpage Black course just outside of New York City. It has a classic sign:
Not sure who “We” represents, but there is an unquestionable tone of authority, with prepositions, definite and indefinite articles all capitalized. And note this course is not just for “Skilled Golfers”, but “Highly Skilled Golfers.” (Hats off to the pros this week who are also battling Mother Nature under very soggy conditions.) Some data sets should come with a similar warning. In genomics, we are now faced with experiments conducted on thousands of individuals with millions of measurements on each across a variety of complex molecular domains: genetic markers, transcript abundance, copy number, microRNA, protein and metabolite intensities, not to mention thousands of standard phenotypes. Analyzing such data sets properly certainly requires skill along with the best possible software. A primary goal of JMP since its inception more than 20 years ago has been to provide a dynamic and optimal combination of both statistics and graphics. Drop one of this pair, and you are going to miss something critical. JMP Genomics, although much younger, is definitely building on the same philosophy. Accomplishing this goal is difficult, but we continue to make progress and relish your feedback on how to do it better. Confession: Every time I pass a mirror and find no one is looking, I practice my golf swing. I’m a sucker for every newfangled idea on how to hit a golf ball better -- trust me, there is an infinite supply of them -- and just have to try it. I’ve been tinkering with my dang swing since graduate school and still don’t have it right. Not seeing him enter, I almost knocked a guy out one time with my mock follow-through in a small men’s room. One technology that’s a godsend is digital video with slow motion. Interactively successive freeze frames taken from good angles show exactly what I’m doing (even though I think I’m doing something else) -- that, of course, and actually striking that dumb little 1.68 inch diameter sphere and adding up my score. Pictures and Numbers. One day they might even get me to “Highly Skilled.”
Posted by Russ Wolfinger
in Genomics, JMP - General, Statistics
at
09:39
| Comments (0)
| Trackbacks (0)
Monday, June 15, 2009JSL Tip: Use Matrices Instead of ListsThe list is one of the basic building tools for organizing data in JSL. The power of lists is that they can contain any kind of items (numbers, characters, other lists, ...), and they can grow and shrink as needed. However, if you just need a fixed list of numbers, you can get much better performance from a one-dimensional matrix. Consider these two samples that make arrays of odd numbers.
For a few hundred items, there's not much difference. But as n goes beyond 1000 (or if the above code is in a loop), you start to notice the difference in speed. For lists, there is a certain amount of overhead for growing dynamically and general flexibility. Notice the J function can be used to create a new matrix of a given size. Jm,n is the name for the Unit Matrix in linear algebra. Also notice that a one-dimensional matrix can be indexed with just one subscript. That is, we can use y[i] instead of y[1][i]. That makes it easier to treat matrices as lists, and in the loop above, which is the same in both cases. Question until next time: How can the code be made even faster? Tuesday, June 9, 2009Vector Plots in JMPVector plots show arrows on a two-dimensional plot and allow one to see four dimensions of data: x position, y position, arrow angle, and arrow length. Equivalently, the four dimensions can be x start position, y start position, x end position, and y end position. The latter form is most convenient for JMP. Though JMP doesn't have a menu command to create vector plots, arrows can be added to almost any plot without much trouble.
Posted by Xan Gregg
in Data Visualization, JMP 8, Tips and Tricks
at
09:27
| Comments (0)
| Trackbacks (0)
Monday, June 8, 2009Benefitting from the Wisdom of Others
For about a year now, I’ve been having trouble with my faithful Craftsman Eager-1 push mower, which I bought at Sears about 15 years ago. The daggum thing starts up fine but then cuts off after about 2 seconds. This has led me to adopt the following algorithm:
1. Press rubber priming balloon 5 times. 2. Pull rip cord to start engine. 3. If the engine does not start, go to Step 1. 4. If the engine starts but then stalls after a few seconds, utter expletives go to Step 1. 5. If the engine stays on, mow the grass. 6. If the engine cuts off again, utter stronger expletives and go back to Step 1. My problem is that the number of iterations through Steps 1-4 has been steadily increasing by about 2 times per month. Now that I’m north of 20 times for just one mow, our longstanding relationship is really on the rocks. I think it’s time for a new mower, but what am I to do with this one? On my way to the store to check out the latest new models (urggh-urggh), I pass a small house with a hand-written sign out front: Good Used Lawn Mowers 555-1234 “What the heck,” I say to myself as I pop the number into my cell. After a few rings, an elderly voice with a friendly Southern drawl answers: “Hello.” “Hi, I’ve been having trouble with my mower and was wondering if you might be interested in it.” “Bring it by on Saturday at 9 in the mornin’, ‘cause I like to sleep in on weekends. When you come, knock on the back door.” “OK, see you then.” On Saturday, I pull around behind the house to find a double-wide carport with what must be the largest assemblage of used hand mowers in the state of North Carolina. Some are obviously very old but are neatly arranged, and all appear to be ready for action. I knock on the door, and a spry old gentlemen with a flannel shirt and workman pants pulled up over his stomach with suspenders greets me with a firm handshake and beckons me to an overstuffed chair in his small living room. We exchange pleasantries, and I learn he is 91 years old and had spent 45 years as a railroad engineer. He has lived in Cary his entire life, and after his wife passed away, he started tinkering with mowers. We go out back to take a look at the Eager-1. He deftly removes the air filter and squirts a bit of starting fluid into the exposed hole. “Give ’r a pull,” he tells me. I obey, and the Tecumseh engine roars to life but then quickly begins to stall in its usual fashion. But just before it completely dies, he gently places his index finger over the hole and, lo-and-behold, the engine coughs and cycles back to full power! It tries to stall again, but with perfect timing he chokes off the hole just enough to maintain that magic mixture of fuel and air. After a few more taps the engine is running steadily and better than ever. “Give ’r a try,” he says, pointing to a patch of grass. I am so ecstatic that I mow his whole backyard. As software users and developers, we’re often tempted to abandon and bash old technologies and go for the latest, greatest new thing. While I’m certainly in favor of using the best means possible for the task at hand, sometimes those best means are those that have stood the test of time and have benefitted from the wisdom of those who have struggled through and solved myriads of problems using them. I put classic SAS software into this category – it provides a richly deep and powerful foundation for the processes in JMP Genomics. Our team continues to learn about clever new ways to use it to more effectively handle genomics data. By the way, using this new-to-me-but-really-old-school technique, I can now start my Eager-1 with a single pull Wednesday, June 3, 2009Stephen Few's New Book Is a Must-Read
“Now You See It” is data visualization expert Stephen Few’s new book, explaining how to use simple visual techniques for quantitative analysis. In this textbook-sized offering, Stephen explores one of the more overlooked aspects of analysis: the graphic representation of information.
Stephen lays the foundation for good visual analysis in Part I by defining how we perceive information. He states, "…there are ways to visually display data that are effective because they correspond naturally to the working of vision and cognition, and there are ways that break the rules and consequently don’t work. If we wish to display information in a way that will enable us and others to make sense of it, we must understand and follow the rules.” He goes on to explain that we "…perceive several basic attributes of visual images pre-attentively, that is, prior to and without the need for conscious perception." Below is a list of those pre-attentive attributes that are quantitatively perceived in and of themselves, without having values arbitrarily assigned to them: Length – longer =greater 2-D Position – offset higher/lower, or left/right=greater Width – Wider=greater Size – Bigger=greater Intensity – Darker=greater Blur – Clearer=greater These pre-attentive attributes help us all consume and digest information and make sense of their meaning. However, Stephen points out that "pre-attentive symbols become less distinct as the variety of distracters increases. It is easy to spot a single hawk in a sky full of pigeons, but if the sky contains a greater variety of birds, the hawk will be more difficult to see." Good visualizations not only take advantage of the pre-attentive attributes mentioned above, but they also use them appropriately while considering the limitations of our visual memory. Stephen explains that we have both "working memory and long-term memory. Working memory stores information only briefly. Working memory is where information resides when we are thinking about it. If we think about it long enough, it will end up in long-term memory." And visual memory (which is part of working memory) is very limited. Visual memory processes information in “chunks.” How much is a chunk of information? Well, it depends on how it is conveyed. Information chunks have to be relatively small when they consist of text or numbers. However, they can be larger when served up graphically. Part II, which is the meat of the book, takes the reader through several different analyses and shows examples of good visual techniques to best convey the information used in each one. ![]() Part III covers promising new trends. Here, Stephen discusses “Illuminating Predictive Models,” and this is where JMP is prominently featured. Noting that his book largely focuses on analysis of existing information, or descriptive statistics, here he highlights the benefits of predictive statistics. "If we understand the past well enough to describe it clearly and accurately, we can often build a model that we can use to predict what will likely happen as a result of particular conditions, events, or decisions in the future," Stephen writes. He explains that "the goal of predictive analysis is not to produce certainty about the future, but to reduce uncertainty to a degree that enables us to make better decisions.“ He then describes what he refers to as “transparent predictive models.” And it is here where Stephen uses JMP to explain how transparent predictive models help us make better decisions with less risk. He says, “This level of involvement in the analytical process [using transparent predictive models] takes advantage of our brains in a way that throws open the windows to insights that we might never otherwise experience.” Many of the visualizations discussed in Stephen’s book are available in JMP. But even more importantly, JMP’s visualizations also incorporate world-class analytics. This is why, when Stephen turns his attention to more advanced analytics, like predictive modeling, JMP is prominently featured. “Now You See It” is a must-read for anyone who needs to explore and understand data that guides his or her decision-making process. Following Stephen’s advice will help readers explore their data better, discover trends and patterns more quickly, and make decisions with confidence.
Posted by Charles Pirrello
in Biz Viz, Data Visualization, JMP 8
at
10:31
| Comments (0)
| Trackbacks (0)
Friday, May 29, 2009JMP Tree Map in Data Visualization Report
In case you missed my Twitter update about it last week, a JMP tree map created by Daniel Arneman of UNC Energy Services was featured in a recent data visualization report by Intelligent Enterprise titled "Seeing Connections: Visualization Makes Sense of Data." The report, by Seth Grimes, is available as a free download, with registration, and it's definitely worth a read.
![]()
Posted by Arati Bechtel
in Academic, Biz Viz, Data Visualization, JMP - Customer Stories, JMP 8
at
13:01
| Comments (0)
| Trackbacks (0)
Tuesday, May 26, 2009Our Evolving Science
Sometimes, I’m totally astounded at how much our science has advanced since my days as a graduate student. Back then, the closest anyone got to “genomic” studies of eukaryotic organisms involved “melting” DNA and watching it come back together using CoT curves. Cloning and sequencing a single cDNA could get you a paper in Science or even (if it was especially important) Nature. As a classically trained molecular biologist, I was used to thinking about single genes. We ran northern blots and probed them with single cDNA probes. On a good day, our sequencing gels could resolve up to 300 nucleotides, maybe a few more if you did multiple loadings. Experiments were labor-intensive and not very quantitative. While we could generate some very pretty pictures, we certainly couldn’t do statistics. Back then, our science was limited by a lack of data.
Since that remote time (half-way back to the Pleistocene, as my kids would say), we have made incredible progress. We have sequenced the genomes of a growing list of diverse organisms. We can quantitatively assess the expression of not just one gene, but of every gene in an organism, all at the same time. We can ask global questions that we could never have asked just a few short years ago, and we can get answers to those questions in a relatively short period of time. In fact, today we have the opposite problem: far too much data! We used to spend months or even years gathering a few crucial data points that could be assessed by a mere glance at an autoradiogram. Today, we can do an experiment in a fraction of the time, but the analysis takes so much longer. Fortunately, our tools and skills are evolving along with our science. The initial release of JMP Genomics in 2006 married the visualization capabilities and ease of use of JMP software with the power of SAS. It offered researchers more than 100 different processes for importing, manipulating and analyzing the vast amounts of data generated by the new technologies. The recent release of JMP Genomics 4.0, builds on an already strong platform of data management and analysis tools. We have added features to and enhanced the power of all of the existing processes. In addition, we have added 16 totally new processes. In fact, this latest release contains almost 200 different processes for importing, assessing, normalizing, annotating, and exploring genetic and microarray data. Every process is fully documented and available for you to use as is or to adapt to your particular needs. You can modify existing processes and workflows or build new ones and add them to your menus. In addition, if there is something special that you need, just let us know, and we’ll work with you to build it. As always, we remain committed to helping you meet your research goals. We have come so far in such a short time. Where will our science go next? You will help decide, and JMP Genomics will help you take us there. We’re already hard at work on our next release. Stay tuned. The best is yet to come. Thursday, May 21, 2009Welcome to the SAS Solar Farm, Gov. Perdue
I had the privilege to lead off a press event this morning for North Carolina Gov. Bev Perdue at the SAS Solar Farm in Cary, NC. It was a beautiful photo event on a small hill overlooking the field. Here are my remarks:
Good morning and welcome to the SAS Solar Farm. I am John Sall, co-founder of SAS Institute. This field of solar panels was completed last December; it has 5,040 panels, generates 1 megawatt of electic power at peak and is projected to produce 1.7 million KWH per year. The panels swivel to track the sun across the sky. This solar farm will eliminate 1,600 tons of carbon emissions annually. The solar field occupies 4.8 acres of land -- and we also use the field as a pasture for Dorper Sheep (short sheep that fit better under the panels). Eventually, the revenues from this facility will repay our investment, but only because of the generous state and federal tax credits and NC GreenPower electric rates. Without the incentives, this solar-generating facility would not have been built. I hope that federal legislation will be forthcoming to make alternative energy and energy conservation economic by a federal charge on fossil carbon energy sourcing; this would be the most effective, efficient and ultimately the least painful way to a sustainable energy future. Until that becomes politically viable, other measures, such as alternative energy subsidies and quantity limits will at least move us in the right direction toward sustainability and energy security. This solar farm is one of several energy initiatives at SAS. We aim to conserve energy use as we grow more jobs here:
SAS is happy to call North Carolina home, with the state’s support for business, research and higher education, all of this enabling better jobs, better health and long-term sustainability. North Carolina has many opportunities in alternative energy: in solar, in biofuels and in wind. Recently, the federal Department of the Interior, under Ken Salazar, made the first step to unlock leasing for offshore wind farms, and North Carolina has some of the best opportunities. North Carolina is a home for energy research, too. We congratulate NC State University, which last year was appointed the lead institution for a smart grid NSF grant, which led to the FREEDM Systems Center (Future Renewable Electric Energy Delivery and Management). We welcome everyone here to this new energy farm, and we look forward to our governor’s announcements on the subject of sustainability. It is my privilege to introduce the Governor of the State of North Carolina, Bev Perdue. ![]() Dale Carroll, Deputy Secretary of NC Department of Commerce (left); Hilda Pinnix-Ragland, Vice President of Corporate Public Affairs for Progress Energy (second from left); and NC Gov. Bev Perdue (center) join Jerry Williams of SAS (second from right) and me (right) at the SAS Solar Farm this morning. The Dorper Sheep are in the background. Photo by Steve Muir, SAS Wednesday, May 20, 2009Scientific Computing Review: 'Stunning' Graphics in JMP 8
Referring to JMP as "an old friend," statistician John Wass reviewed JMP 8 for Scientific Computing and calls it a "major upgrade." The review includes several visualizations and covers many of the major new features of JMP 8.
Wass concludes: "This latest version is stunning in the quality of the graphics, and JMP has pioneered the advancement of statistical graphics by heavily linking most number-crunching operations to a graphic. Interested parties are highly encouraged to download a trial version." Here's where you can get the fully functional 30-day trial version of JMP 8, for Windows, Mac and Linux.
Posted by Arati Bechtel
in Data Visualization, JMP 8, Statistics
at
14:23
| Comments (0)
| Trackbacks (0)
Tuesday, May 19, 20093-D Pie Reply
Thanks for the enlightening comments to my blog post "I Like 3-D Pie Charts" and for the new graphs. While the bar charts from Joe and John are very nice, I prefer vertical bars because of their connection with the gravity orientation of trees, mountains, buildings and of course cell-phone signal strength. It’s interesting that Paige and Lee suggest 2D pie charts. Warning: The graphics gods are watching you. Daniel, I’m glad you did not sleep through Art Appreciation in college.
SAS is well known for its corporate amenities, and its cafés are no exception. I was in line for lunch the other day and came upon the always tempting dessert case. The middle shelf featured over a dozen haphazardly arranged pieces of another one of my all-time favorites: chocolate crème pie. Simply irresistible. Maybe it’s my upbringing, but I had absolutely no problem identifying the largest slice. I don’t think it’s a guy thing either, because in the extra second I took reveling about how good it was going to look on my tray, the woman behind me handily placed it on hers. How dare she! But I was smiling inwardly as I quickly grabbed the next largest one, not only because we were likely identical by state for a latent canine quantitative trait, but because I had some more assurance that at least some humans are actually decent at assessing the size of three-dimensional wedge shapes. The psychological experimental evidence cited to the contrary is largely conducted on college students who are really only good at determining how much liquid remains in bottle-shaped objects. This has me wondering if exploding pieces of the pie chart might actually be helpful in avoiding volumetric distortion. If nothing else, exploding and appropriately labeling one or more pieces seems like a great way to emphasize them, and, conversely, leaving the really thin slices unlabeled is desirable when they are not of interest. Regarding ordering of slices or bars, I forgot that in examples like this one there is an overriding analytical criterion. In statistical modeling in general, one typically puts main effects first, then two-way interactions, then succeeding higher order interactions and finally residual/unexplained variance. One way to accomplish this (or any other ordering) in JMP is with the Value Ordering column property. For our example, if you first run JSL code like the following…
…then the bars appear in the desired order. We use such a column property in JMP Genomics scripts to make sure chromosome plots appear in numerical instead of alphabetical order. If you’re having trouble comparing sizes of slices in the next pie chart you see, just pretend that it’s your favorite dessert and you’re really hungry. Works every time.
Posted by Russ Wolfinger
in Data Visualization, Genomics, JMP 8, JSL, Statistics
at
09:33
| Comments (2)
| Trackbacks (0)
Monday, May 18, 2009Soccer Analytics Using JMP
NOTE: This entry comes to the JMP Blog from our colleague Jerome Bryssinck of SAS Belgium. Jerome had seen Jeff Perkinson's examples of basketball analytics using JMP and created his own example using football (or soccer) data. In response to comments from readers, Jerome updated his model on May 26, and this blog post now reflects those changes.
THE QUESTION: Has the game been decided yet? HTGBD This is the question that most people constantly ask themselves when they are watching a football game. This question can take different forms depending on the circumstances. If you're lucky to support the winning team, you might ask yourself: "How secure is the lead?" And for the less fortunate of us: "Is there still a chance for my team to win?" THE ANSWER: Analytics Graph1: Probability of the game having been decided in function of the elapsed time and the number of goals difference. Graph1 shows the probablility of the game having been decided in function of the elapsed time and the number of goals difference. It is possible to change the elapsed time and the number of goal difference on the graph by clicking on a different value. Some interpretation examples: If Time=45 and Goal Difference=0: The game has been going on for 45 minutes, and the number of goal difference is 0. There is a 23% probability that the outcome of the game won't change. Here, as the teams are even (0 goal difference), this would mean that there is a 23% probability the game will end in a tie. If Time=45 and Goal Difference=1: The game has been going on for 45 minutes, and one of the teams is leading by 1 goal difference, then we have a 60% probability that the outcome of the game won't change. Here, this would mean that the leading team has a 60% probability to win. More Details about the Answer The model used above has been built using data from the UK Premier League from 2002 to 2006. The type of model used is a regression model. The following representations are useful to understand the underlying data. Graph2: Has the Game Been Decided vs. Time Graph2 shows the percentage of the games that have been decided in function of the Elapsed Time. I must say that I wasn't surprised by this graph, which basically states that the Elapsed Time and the HTGBD (Has The Game Been Decided) are directly proportional. ![]() Graph3: Has the Game Been Decided vs. Time By Goal Difference Graph3 shows the percentage of the games that have been decided in function of the Elapsed Time by the number of goal difference. According to this graph, the number of goal difference is an excellent predictor for the HTGBD. Additional readings: Similar models are available for basketball. Check out Bill James and Jeff Perkinson if you want to learn more. This entry was first published in Jerome Bryssinck's blog, Brisink. It is republished here with his permission.
Posted by Arati Bechtel
in Biz Viz, Data Visualization, JMP 8, Statistics
at
09:14
| Comments (2)
| Trackbacks (0)
(Page 1 of 15, totaling 217 entries)
» next page
|
CategoriesQuicksearchSyndicate This BlogCalendar
|
|||||||||||||||||||||||||||||||||||||||||||||||||


