This post demonstrates how to rank data and how to place these ranks into roughly equal groups. There are certain variables, such as annual salary, that are highly skewed. There are many who earn between $50,00 and $150,000, but some who earn millions or hundreds of millions of dollars a
Thousands of SAS users are migrating from SAS University Edition to SAS OnDemand for Academics (ODA). I thought I would share some of my thoughts, having just finished two books using ODA (Getting Started with SAS Programming: Using SAS Studio in the Cloud and A Gentle Introduction to Statistics Using
In the past, the COMPRESS function was useful. Since SAS version 9, it has become a blockbuster, and you might not have noticed. The major change was the addition of a new optional parameter called MODIFIERS. The traditional use of the COMPRESS function was to remove blanks or a list
In SAS Studio, the ordering of rows and columns in the Table Analysis task are, by default, arranged by the internal ordering of the values used in the table. The table arranges the variables alphabetically or numerically by increasing value. For example, traditional coding uses 1 for Yes and 0
The more I use SAS Studio in the cloud via SAS OnDemand for Academics, the more I like it. To demonstrate how useful the Files tab is, I'm going to show you what happens when you drag a text file, a SAS data set, and a SAS program into the
A lookup table is a programming technique where one or more values can be used to retrieve another value. For example, many years ago, I had benzene exposure estimates for 10 years (1940 to 1949) for each of five locations in a factory. Given a year and a job location,
While working at the Rutgers Robert Wood Johnson Medical School, I had access to data on over ten million visits to emergency departments in central New Jersey, including ICD-9 (International Classification of Disease – 9th edition) codes along with some patient demographic data. I also had the ozone level from
One of the first and most important steps in analyzing data, whether for descriptive or inferential statistical tasks, is to check for possible errors in your data. In my book, Cody's Data Cleaning Techniques Using SAS, Third Edition, I describe a macro called %Auto_Outliers. This macro allows you to search
Did I trick you into seeing what this blog is about with its mysterious title? I am going to talk about how to use the FIND function to search text values. The FIND function searches for substrings in character values. For example, you might want to extract all email addresses
The t-test is a very useful test that compares one variable (perhaps blood pressure) between two groups. T-tests are called t-tests because the test results are all based on t-values. T-values are an example of what statisticians call test statistics. A test statistic is a standardized value that is calculated
Years ago I saw a line of SAS code that was really puzzling. It was a statement that started with: if 0 then … ; What? This was a statement that would always be evaluated as false. Why would anyone write such a statement? Recently, I was discussing with a
I often get asked for programming tips. Here, I share three of my favorite tips for beginners. Tip #1: COUNTC and CATS Functions Together The CATS function concatenates all of its arguments after it strips leading and trailing blanks. The COUNTC function counts characters. Together, they can let you operate
Find out about the new edition of Ron Cody's latest best selling book.
In a previous blog, I demonstrated a program and macro that could identify all numeric variables set to a specific value, such as 999. This blog discusses an immensely useful technique that allows you to perform an operation on all numeric or all character variables in a SAS data set.
When I teach my Data Cleaning course, the last topic I cover in the two-day course is SAS Integrity Constraints. I find that most of the students, who are usually quite advanced programmers, have never heard of Integrity Constraints (abbreviated ICs). I decided a short discussion on this topic would
Wait! Don't close this window. I understand that regular expressions can be very complicated (yes, there are many books on the subject), but some basic expressions to test patterns such as zip codes or telephone numbers are not that difficult. In addition, you can sometimes use Google to search for
How many times have you entered a phone number on a web page, only to be told that you did not type it the "correct" form? I find that annoying. Don't you? In my latest book, Cody's Data Cleaning Techniques, 3rd edition, I show how to convert a phone number
There's an old song that starts out, "You Can Get Anything You Want at Alice's Restaurant." Well, maybe you are too young to know that song, but if you’re a SAS users, you’ll be glad to know that you can capture anything produced by any SAS procedure (even if the
What?!? You mean a period (.) isn't the only SAS numeric missing value? Well, there are 27 others: .A .B, to .Z and ._ (period underscore). Your first question might be: "Why would you need more than one missing value?" One situation where multiple missing values are useful involves survey data. Suppose
How many of you have been given a SAS data set with variables such as Age, Height, and Weight and some or all of them were stored as character values instead of numeric? Probably EVERYONE! Yes, we all know how to do the old "swap and drop" (rename and convert), but
SAS temporary arrays are an underutilized jewel in the SAS toolbox. I find that many beginning to intermediate SAS programmers are not familiar with temporary arrays. The good news is that there is nothing complicated about them and they are very useful. First of all, what is a temporary array?
Suppose you are using SAS Studio and the statistical task you need to perform is not a supported option or feature in SAS. I know that sounds almost impossible because the statistical tasks in SAS Studio are so awesome. But, just in case you need to tweak a program or