Last week I presented the GSR algorithm, a statistical model of a riffle shuffle. In the model, a deck of n cards is split into two parts according to the binomial distribution. Each piece has roughly n/2 cards. Then cards are dropped from the two stacks according to the number
Uncategorized
In a previous post, I showed how to read data from a SAS data set into SAS/IML matrices or vectors. This article shows the converse: how to use the CREATE, APPEND, and CLOSE statements to create a SAS data set from data stored in a matrix or in vectors. Creating
Unless you’ve been living under a rock, you’ve heard about the budget problems running rampant across all levels of government. Federal, State and Local Governments are all facing historic budget shortfalls due to the economic crisis and decreased tax receipts. This has led to a much closer examination of services
On March 28 I had the pleasure of moving to our new office building on the scenic SAS campus in Cary, NC. This aesthetic and functional structure houses the sales, marketing, and SAS executive management offices, as well as a generously appointed Executive Briefing Center for hosting our visiting customers.
In a previous blog post, I showed how you can use simulation to construct confidence intervals for ranks. This idea (from a paper by E. Marshall and D. Spiegelhalter), enables you to display a graph that compares the performance of several institutions, where "institutions" can mean schools, companies, airlines, or
I recently returned from a five-day conference in Las Vegas. On the way there, I finally had time to read a classic statistical paper: Bayer and Diaconis (1992) describes how many shuffles are needed to randomize a deck of cards. Their famous result that it takes seven shuffles to randomize
This morning I delivered a talk to visiting high school students at the SAS campus. The topic: using SAS to analyze Twitter content. Being teenagers, high school students are well familiar with Twitter. But this batch of students was also very familiar with SAS, as they all have taken SAS
Linking business analytics to economic value is a hard problem. Despite all the smarts that get poured into models, it's hard to tie them to financial measures such as profitability. And, because of that, it's hard to justify investment in analytics. Need headcount? Sorry, try again. Need tools? Sorry, can't
"Convergence after 23 iterations to (1.23, 4.56)." That's the message that I want to print at the end of a program. The problem, of course, is that when I write the program, I don't know how many iterations an algorithm requires nor the value to which an algorithm converges. How
At the beginning of 2011, I heard about the Dow Piano, which was created by CNNMoney.com. The Dow Piano visualizes the performance of the Dow Jones industrial average in 2010 with a line plot, but also adds an auditory component. As Bård Edlund, Art Director at CNNMoney.com, said, The daily
In a previous blog post about computing confidence intervals for rankings, I inadvertently used the VAR function in SAS/IML 9.22, without providing equivalent functionality for those readers who are running an earlier version of SAS/IML software. (Thanks to Eric for pointing this out.) If you are using a version of
When comparing scores from different subjects, it is often useful to rank the subjects. A rank is the order of a subject when the associated score is listed in ascending order. I've written a few articles about the importance of including confidence intervals when you display rankings, but I haven't
The federal government is more aggressively pursuing health care fraud, and helping the states do the same, by proposing funding changes and investing in new technologies. A newly proposed rule would allow 90% Federal Financial Participation (FFP) for data mining initiatives in state Medicaid Fraud Control Units (MFCU’s). Another proposed
Suppose that friendship is a 2-way relationship: Either two people are friends with each other, or they are not. (By this definition, X cannot be a friend of Y if Y is not a friend of X. Also, you cannot be a friend of yourself -- no matter how attractive
Yesterday I was in the Big Room for the rehearsal of the Technology Connection, the part of SAS Global Forum where SAS shows off its wares: what's been released recently and what's coming. I believe that customers are going to love what they see. And just about every product that
In my article on computing confidence intervals for rankings, I had to generate p random vectors that each contained N random numbers. Each vector was generated from normal distribution with different parameters. This post compares two different ways to generate p vectors that are sampled from independent normal distributions. Sampling
While talking to fellow SAS users at SAS Global Forum 2011 this week, I'll be discussing how SAS programmers can "play" with social media data that they can access on Facebook and Twitter. I always refer people to my blog for more information, and so I've prepared this blog post
This morning Rick Wicklin announced his (hostile?, nah...) takeover of the technical blog space at SAS. I'll admit that it took me by surprise when I awoke in Siberia this morning. It's so cold here; I can't feel my fingers as I type. This is probably a punishment for the
Editor's Note: This article was an April Fool's prank from 2011. The entire article is fake. Today, SAS, the leader in business analytics announces significant changes to two popular SAS blogs, The DO Loop (written by Rick Wicklin) and The SAS Dummy (previously written by Chris Hemedinger). The two blogs
"Twitter, thou art nought but data." So sayeth the SAS programmer. Many data analysts now recognize Twitter for what it is: a tremendous source of data covering almost any topic, from Justin Bieber's hair to political uprisings to technical conferences to company brands. SAS offers sophisticated solutions to harness this
This week, I posted the 100th article to The DO Loop. To celebrate, I'm going to analyze the content of my first 100 articles. In December 2010, I compiled a list of The DO Loop's most-read posts, so I won't repeat that exercise. Instead, I thought it would be interesting
Let’s start with a quiz. Which of the following is the Programmer’s Rule # 1? 1. Expert knowledge of multiple languages, like SAS and Java 2. Talent to maneuver with complex algorithms 3. Innate ability to draw flowcharts 4. None of the above Dear reader, as a savvy programmer, you
In a previous post, I described how to compute means and standard errors for data that I want to rank. The example data (which are available for download) are mean daily delays for 20 US airlines in 2007. The previous post carried out steps 1 and 2 of the method
Many SAS users love "undocumented features" within SAS software that they have found or heard about. Sometimes they can be really useful, and the fact that they are undocumented adds to the mystique. Some users have written entire conference papers on the subject. After 35 years of evolution, SAS contains
When you create a character matrix in SAS/IML software, the initial values determine the number of characters that can fit into any element of the matrix. For example, the following statements define a 1x3 character matrix: proc iml; m = {"Low" "Med" "High"}; After the matrix is defined, at most
I recently posted an article about representing uncertainty in rankings on the blog of the ASA Section for Statistical Programmers and Analysts (SSPA). The posting discusses the importance of including confidence intervals or other indicators of uncertainty when you display rankings. Today's article complements the SSPA post by showing how
I recently blogged about how to eliminate a macro loop in favor of using SAS/IML language statements. The purpose of the program was to extract N 3x3 matrices from a big 3Nx3 matrix. The main portion of my PROC IML program looked something like this: proc iml; ... do i=0
SAS procedure SUMMARY is a quick method of converting your detail table to a fully summarized one. Included is a sample. The key option to set is the NWAY - this generates the lowest level of summary for use in the OLAP cube. Essentially - the class statement contains all
In the computer software industry, 35 years is like an eon. I mean, 35 years ago, the computing power that I carry around within my mobile phone didn't even exist all in one place; but if it did, it would have filled an entire building. That's why the recent posting
Statistical programmers can be creative and innovative. But when it comes to choosing names of variables, often x1, x2, x3,... works as well as any other choice. In this blog post, I have two tips that are related to constructing variable names of the form x1, x2,..., xn. Both tips