In two previous blog posts I worked through examples in the survey article, "Robust statistics for outlier detection," by Peter Rousseeuw and Mia Hubert. Robust estimates of location in a univariate setting are well-known, with the median statistic being the classical example. Robust estimates of scale are less well-known, with
Uncategorized
The other day I encountered the following SAS DATA step for generating three normally distributed variables. Study it, and see if you can discover what is unnecessary (and misleading!) about this program: data points; drop i; do i=1 to 10; x=rannor(34343); y=rannor(12345); z=rannor(54321); output; end; run; The program creates the
Recently, I had a discussion with a user concerning the volume of imagemap data generated for an interactive, web-based visual contain a large number of graphs. The large amount of imagemap data was causing problems with the current version of their web browser. The graphs consisted of either bar charts
Ten years ago I spent some time in women's undergarments*, as Director of Forecasting at Sara Lee Intimate Apparel (now Hanesbrands). Sure, it sounds glamorous -- product posters on our office walls, quarterly runway shows of new products, and partying with the full-figured Playtex models (some of whom were fuller than I figured).
So many of us struggle with this mountain. In fact, 68.27% of us get within sight of reaching the summit (while 95.47% of us are at least on a perceivable slope). We run, walk, crawl and sometimes slide our way uphill (from one direction or the other) until we finally
In a previous blog post on robust estimation of location, I worked through some of the examples in the survey article, "Robust statistics for outlier detection," by Peter Rousseeuw and Mia Hubert. I showed that SAS/IML software and PROC UNIVARIATE both support the robust estimators of location that are mentioned
We’ve just published Chris Olsen’s Teaching Elementary Statistics with JMP, which offers the latest research on best practices and how JMP can facilitate teaching statistics. To mark the book’s publication, we asked Chris to tell us the top three things every elementary statistics student should know. Statistics is about numbers
Love can make a person do bad, dangerous, stupid, and irresponsible things. Love of country can make a politician stray from his wife. Love of Pepsi can make a pop musician lose his hair in a pyrotechnics-gone-bad commercial. Love of acting can make academy award winners accept starring roles in Ishtar. And for
Note: as this is a popular topic, I've added a few notes with minor updates, including a link to a popular how-to tutorial video. In case you missed it, the first maintenance release for SAS 9.3 was recently released. Because we're all friends here, you may call it "SAS 9.3M1"
I was on vacation when a family member sidled up to me. "Rick, you're a statistician..." he began. I knew I was in trouble. He proceeded to tell me the story of Joseph "Newsboy" Moriarty, a New Jersey mobster who rose to prominence and became known as the bookie who
The single most costly employee benefit for any organization is health insurance, and the price is going up. From 2003-2009, the costs per hour worked for employee health insurance increased from $1.03 to $2.00. These costs continue to increase from 5%-7% per year. The reality is that employee health insurance costs will continue
Many cities and counties are taking the lead of private industry and developing 311 call centers to consolidate incoming calls for service and information requests from citizens. The business advantages are clear: Citizens have one number to call for service and information rather than having to waste time searching for
All too often an unspeakable tragedy leads to a needed policy or operational change through a newly-realized criminal justice system gap. While we continually work to minimize existing gaps, the reality is that as law enforcement evolves, so does the crime and so do the criminals. In recognition of those
Statistical programmers often need mathematical constants such as π (3.14159...) and e (2.71828...). Programmers of numerical algorithms often need to know machine-specific constants such as the machine precision constant (2.22E-16 on my Windows PC) or the largest representable double-precision value (1.798E308 on my Windows PC). Some computer languages build these
I encountered a wonderful survey article, "Robust statistics for outlier detection," by Peter Rousseeuw and Mia Hubert. Not only are the authors major contributors to the field of robust estimation, but the article is short and very readable. This blog post walks through the examples in the paper and shows
This morning I logged onto my e-mail at 6:45 AM to learn that SAS was ranked as the No. 3 Best Company to Work For. No. 3 is not as high as No. 1. But it's very, very close. Perhaps even barely distinguishable, in the larger scheme of things. I
An issue that SAS/GRAPH users have wrestled with in the past has been how to put tick marks at irregular intervals on their axes. In PROC GPLOT, if you specify irregular intervals using the ORDER option on the AXIS statement, the procedure’s axis kicks into a “discrete” mode, where the
The Winter 2012 issue of Foresight is now available. Here is Editor Len Tashman's preview: --------------------------- Our last two issues featured Steve Morlidge’s Guiding Principles for managing an organization’s forecasting process. You can see the summary table of these principles on page 31. With this issue, we continue their development
In my recent article on simulating Buffon's needle experiment, I computed the "running mean" of a series of values by using a single call to the CUSUM function in the SAS/IML language. For example, the following SAS/IML statements define a RunningMean function, generate 1,000 random normal values, and compute the
Cities and counties are responsible for building and maintaining the infrastructure to support a broad range of services. Local governments must develop and implement multiyear capital projects plans that provide infrastructure for things such as jails, courts, public office buildings, streets, bridges, parks, athletic complexes and community centers, water treatment
Once again I rediscovered something that I once knew, but had forgotten. Fortunately, this blog is a good place to share little code snippets that I don't want to forget. I needed to compute the diagonal elements of a product of two matrices. In symbols, I have an nxp matrix,
Here at SAS Publishing, we’ve started the new year off with a bang, particularly when it comes to conferences. We’re attending a number of new shows in addition to the usual lineup this year. Visit our booth, meet our authors, check out our new and forthcoming titles, and talk with
The SAS/IML READ statement has a few convenient features for reading data from SAS data sets. One is that you can read all variables into vectors of the same names by using the _ALL_ keyword. The following DATA steps create a data set called Mixed that contains three numeric and
In the Star Wars movie, Obi-wan could just wave his hand, mutter a few words, and the stormtroopers would "move along". How the power of the Force makes ridding yourself of problematic characters so much easier! I recently was invited to become an alternate instructor for Ron Cody’s SAS Business
A recent question on a SAS Discussion Forum was "how can you overlay multiple kernel density estimates on a single plot?" There are three ways to do this, depending on your goals and objectives. Overlay different estimates of the same variable Sometimes you have a single variable and want to
It is "well known" that the pairwise deletion of missing values and the resulting computation of correlations can lead to problems in statistical computing. I have previously written about this phenomenon in my article "When is a correlation matrix not a correlation matrix." Specifically, consider the symmetric array whose elements
A’s in the front, Z’s in the back. How many of us grew up sitting in alphabetical order next to the same few classmates throughout school? While this is a quick and efficient way to learn student names, which is no easy task, it is not the most effective way
Before there was CNN or FOX News, people used to get their news from SAS. At least, that's how I imagine that people kept themselves informed. What else can explain the existence of the NEWS= system option, which helps SAS admins to surface the must-know information to the SAS community?
In my article on Buffon's needle experiment, I showed a graph that converges fairly nicely and regularly to the value π, which is the value that the simulation is trying to estimate. This graph is, indeed, a typical graph, as you can verify by running the simulation yourself. However, notice
Dear Miss SAS Answers, In PROC REPORT can I use one calculated (computed) variable in the calculation of another computed variable? In the example below, I’m trying to use the value of the Bonus column to calculate the Total column: compute Bonus; Bonus =sal.sum*0.05; endcomp; compute Total; total=sum(sal.sum, Bonus.sum); endcomp;