I recently returned from a five-day conference in Las Vegas. On the way there, I finally had time to read a classic statistical paper: Bayer and Diaconis (1992) describes how many shuffles are needed to randomize a deck of cards. Their famous result that it takes seven shuffles to randomize
Uncategorized
This morning I delivered a talk to visiting high school students at the SAS campus. The topic: using SAS to analyze Twitter content. Being teenagers, high school students are well familiar with Twitter. But this batch of students was also very familiar with SAS, as they all have taken SAS
Linking business analytics to economic value is a hard problem. Despite all the smarts that get poured into models, it's hard to tie them to financial measures such as profitability. And, because of that, it's hard to justify investment in analytics. Need headcount? Sorry, try again. Need tools? Sorry, can't
"Convergence after 23 iterations to (1.23, 4.56)." That's the message that I want to print at the end of a program. The problem, of course, is that when I write the program, I don't know how many iterations an algorithm requires nor the value to which an algorithm converges. How
At the beginning of 2011, I heard about the Dow Piano, which was created by CNNMoney.com. The Dow Piano visualizes the performance of the Dow Jones industrial average in 2010 with a line plot, but also adds an auditory component. As Bård Edlund, Art Director at CNNMoney.com, said, The daily
In a previous blog post about computing confidence intervals for rankings, I inadvertently used the VAR function in SAS/IML 9.22, without providing equivalent functionality for those readers who are running an earlier version of SAS/IML software. (Thanks to Eric for pointing this out.) If you are using a version of
When comparing scores from different subjects, it is often useful to rank the subjects. A rank is the order of a subject when the associated score is listed in ascending order. I've written a few articles about the importance of including confidence intervals when you display rankings, but I haven't
The federal government is more aggressively pursuing health care fraud, and helping the states do the same, by proposing funding changes and investing in new technologies. A newly proposed rule would allow 90% Federal Financial Participation (FFP) for data mining initiatives in state Medicaid Fraud Control Units (MFCU’s). Another proposed
Suppose that friendship is a 2-way relationship: Either two people are friends with each other, or they are not. (By this definition, X cannot be a friend of Y if Y is not a friend of X. Also, you cannot be a friend of yourself -- no matter how attractive
Yesterday I was in the Big Room for the rehearsal of the Technology Connection, the part of SAS Global Forum where SAS shows off its wares: what's been released recently and what's coming. I believe that customers are going to love what they see. And just about every product that
In my article on computing confidence intervals for rankings, I had to generate p random vectors that each contained N random numbers. Each vector was generated from normal distribution with different parameters. This post compares two different ways to generate p vectors that are sampled from independent normal distributions. Sampling
While talking to fellow SAS users at SAS Global Forum 2011 this week, I'll be discussing how SAS programmers can "play" with social media data that they can access on Facebook and Twitter. I always refer people to my blog for more information, and so I've prepared this blog post
This morning Rick Wicklin announced his (hostile?, nah...) takeover of the technical blog space at SAS. I'll admit that it took me by surprise when I awoke in Siberia this morning. It's so cold here; I can't feel my fingers as I type. This is probably a punishment for the
Editor's Note: This article was an April Fool's prank from 2011. The entire article is fake. Today, SAS, the leader in business analytics announces significant changes to two popular SAS blogs, The DO Loop (written by Rick Wicklin) and The SAS Dummy (previously written by Chris Hemedinger). The two blogs
"Twitter, thou art nought but data." So sayeth the SAS programmer. Many data analysts now recognize Twitter for what it is: a tremendous source of data covering almost any topic, from Justin Bieber's hair to political uprisings to technical conferences to company brands. SAS offers sophisticated solutions to harness this
This week, I posted the 100th article to The DO Loop. To celebrate, I'm going to analyze the content of my first 100 articles. In December 2010, I compiled a list of The DO Loop's most-read posts, so I won't repeat that exercise. Instead, I thought it would be interesting
Let’s start with a quiz. Which of the following is the Programmer’s Rule # 1? 1. Expert knowledge of multiple languages, like SAS and Java 2. Talent to maneuver with complex algorithms 3. Innate ability to draw flowcharts 4. None of the above Dear reader, as a savvy programmer, you
In a previous post, I described how to compute means and standard errors for data that I want to rank. The example data (which are available for download) are mean daily delays for 20 US airlines in 2007. The previous post carried out steps 1 and 2 of the method
Many SAS users love "undocumented features" within SAS software that they have found or heard about. Sometimes they can be really useful, and the fact that they are undocumented adds to the mystique. Some users have written entire conference papers on the subject. After 35 years of evolution, SAS contains
When you create a character matrix in SAS/IML software, the initial values determine the number of characters that can fit into any element of the matrix. For example, the following statements define a 1x3 character matrix: proc iml; m = {"Low" "Med" "High"}; After the matrix is defined, at most
I recently posted an article about representing uncertainty in rankings on the blog of the ASA Section for Statistical Programmers and Analysts (SSPA). The posting discusses the importance of including confidence intervals or other indicators of uncertainty when you display rankings. Today's article complements the SSPA post by showing how
I recently blogged about how to eliminate a macro loop in favor of using SAS/IML language statements. The purpose of the program was to extract N 3x3 matrices from a big 3Nx3 matrix. The main portion of my PROC IML program looked something like this: proc iml; ... do i=0
SAS procedure SUMMARY is a quick method of converting your detail table to a fully summarized one. Included is a sample. The key option to set is the NWAY - this generates the lowest level of summary for use in the OLAP cube. Essentially - the class statement contains all
In the computer software industry, 35 years is like an eon. I mean, 35 years ago, the computing power that I carry around within my mobile phone didn't even exist all in one place; but if it did, it would have filled an entire building. That's why the recent posting
Statistical programmers can be creative and innovative. But when it comes to choosing names of variables, often x1, x2, x3,... works as well as any other choice. In this blog post, I have two tips that are related to constructing variable names of the form x1, x2,..., xn. Both tips
We're having an early spring in North Carolina. Trees are budding, flowers are blooming, and the warmer temperatures make even a pistol whipping more enjoyable. What better way to take advantage of the new season than filling your spring with educational opportunities in forecasting. Plan in Perfect Sync with Customer
Loony. Zany. Brilliant. Hysterical. Those are some of the adjectives I use to describe The Far Side® cartoons by Gary Larson from the 1980s and early '90s. I recently rediscovered an old book, The Far Side Gallery 2, which collects some of the best of Larson's wonderfully wacky cartoons. Every
SAS Global Forum 2011 just over two weeks away. The R&D and product management teams are preparing the demos to show on stage during the highly-visible opening sessions. A tremendous amount of work goes into planning the program. It's great to see what they come up with. When it comes
In a previous blog post, I showed how to use the SAS/IML SORT and SORTNDX subroutines to sort rows of a matrix according to the values of one or more columns. There is another common situation in which you might need to sort a matrix: you compute a statistic for
Sorting is a fundamental operation in statistical programming, and most SAS programmers are familiar with PROC SORT for sorting data sets. But did you know that you can also sort rows of a SAS/IML matrix according to the value of one or more columns? This post shows how. Sorting a