Which SAS technique should you use? Consider how often the program is to be run, your comfort level and more.
When you use PROC MEANS or PROC SUMMARY to create a summary data set and include a CLASS statement, SAS includes two variables, _FREQ_ and _TYPE_, in the output data set. This blog shows you two ways to interpret and use _TYPE_ using the data set Shoes in the SASHELP
If you have ever needed to score a multiple-choice test, this blog is for you. Even if you are not planning to score a test, the techniques used in this example are useful for many other programming tasks. The example I am going to present assumes the answer key and
If you use formatted variables in a CLASS statement in procedures such as MEANS or UNIVARIATE, SAS will use the formatted values of those variables and not the internal values. For example, consider you have a data set (Health) with variables Subj, Age, Weight, and Height. You want to see
In my previous blog, you saw how to create a Beale cipher. In this blog, you will see a program that can decode a Beale cipher. As a reminder, here is a list of numbers that you can use as a substitute for a letter when creating your cipher. Now,
This blog serves two purposes: the main purpose is to show you some useful SAS coding techniques, and the second is to show you an interesting method of creating a Beale cipher. TJ Beale is famous in Virginia for leaving behind three ciphers, supposedly describing the location of hidden gold
Many SAS programmers use macros. I have seen students in my SAS classes use several methods to activate their macros. One way is to load the macro in the Display manager or editor in SAS OnDemand for Academics and submit it. Another technique is to use the statement %Include macro-name.
Last year, I wrote a blog demonstrating how to use the %Auto_Outliers macro to automatically identify possible data errors. This blog demonstrates a different approach—one that is useful for variables for which you can identify reasonable ranges of values for each variable. For example, you would not expect resting heart
Because it is near the end of the year, I thought a blog about "Summarizing" data might be in order. For these examples, I am going to use a simulated data set called Drug_Study, containing some categorical and numerical variables. For those interested readers, the SAS code that I used
This blog is a continuation of a previous blog that discussed creating simulated data sets. If you have not seen it, you might want to review it, especially if you are not familiar with the RAND function. The program that I'm going to show you simulates a drug study with
There are times when it is useful to simulate data. One of the reasons I use simulated data sets is to demonstrate statistical techniques such as multiple or logistic regression. By using SAS random functions and some DATA step logic, you can create variables that follow certain distributions or are
There are many reasons why you might want to encrypt data. I use a SAS program to encrypt a list of logon names and passwords. Before we get started describing how to encrypt data, let's discuss some basic concepts concerning encrypting and decrypting data. All computer data is stored as
The term "fuzzy matching" describes a method of comparing two strings that might have slight differences, such as misspelling or a middle initial in a name included or not included. One of my favorite functions to compare the "closeness" of two strings is the SPEDIS (spelling distance) function. Have you
This post demonstrates how to rank data and how to place these ranks into roughly equal groups. There are certain variables, such as annual salary, that are highly skewed. There are many who earn between $50,00 and $150,000, but some who earn millions or hundreds of millions of dollars a
Thousands of SAS users are migrating from SAS University Edition to SAS OnDemand for Academics (ODA). I thought I would share some of my thoughts, having just finished two books using ODA (Getting Started with SAS Programming: Using SAS Studio in the Cloud and A Gentle Introduction to Statistics Using
In the past, the COMPRESS function was useful. Since SAS version 9, it has become a blockbuster, and you might not have noticed. The major change was the addition of a new optional parameter called MODIFIERS. The traditional use of the COMPRESS function was to remove blanks or a list
In SAS Studio, the ordering of rows and columns in the Table Analysis task are, by default, arranged by the internal ordering of the values used in the table. The table arranges the variables alphabetically or numerically by increasing value. For example, traditional coding uses 1 for Yes and 0
The more I use SAS Studio in the cloud via SAS OnDemand for Academics, the more I like it. To demonstrate how useful the Files tab is, I'm going to show you what happens when you drag a text file, a SAS data set, and a SAS program into the
A lookup table is a programming technique where one or more values can be used to retrieve another value. For example, many years ago, I had benzene exposure estimates for 10 years (1940 to 1949) for each of five locations in a factory. Given a year and a job location,
While working at the Rutgers Robert Wood Johnson Medical School, I had access to data on over ten million visits to emergency departments in central New Jersey, including ICD-9 (International Classification of Disease – 9th edition) codes along with some patient demographic data. I also had the ozone level from
One of the first and most important steps in analyzing data, whether for descriptive or inferential statistical tasks, is to check for possible errors in your data. In my book, Cody's Data Cleaning Techniques Using SAS, Third Edition, I describe a macro called %Auto_Outliers. This macro allows you to search
Did I trick you into seeing what this blog is about with its mysterious title? I am going to talk about how to use the FIND function to search text values. The FIND function searches for substrings in character values. For example, you might want to extract all email addresses
The t-test is a very useful test that compares one variable (perhaps blood pressure) between two groups. T-tests are called t-tests because the test results are all based on t-values. T-values are an example of what statisticians call test statistics. A test statistic is a standardized value that is calculated
Learn about best-selling SAS author Ron Cody's programming standards.
Years ago I saw a line of SAS code that was really puzzling. It was a statement that started with: if 0 then … ; What? This was a statement that would always be evaluated as false. Why would anyone write such a statement? Recently, I was discussing with a
I often get asked for programming tips. Here, I share three of my favorite tips for beginners. Tip #1: COUNTC and CATS Functions Together The CATS function concatenates all of its arguments after it strips leading and trailing blanks. The COUNTC function counts characters. Together, they can let you operate
Find out about the new edition of Ron Cody's latest best selling book.
In a previous blog, I demonstrated a program and macro that could identify all numeric variables set to a specific value, such as 999. This blog discusses an immensely useful technique that allows you to perform an operation on all numeric or all character variables in a SAS data set.
When I teach my Data Cleaning course, the last topic I cover in the two-day course is SAS Integrity Constraints. I find that most of the students, who are usually quite advanced programmers, have never heard of Integrity Constraints (abbreviated ICs). I decided a short discussion on this topic would
Wait! Don't close this window. I understand that regular expressions can be very complicated (yes, there are many books on the subject), but some basic expressions to test patterns such as zip codes or telephone numbers are not that difficult. In addition, you can sometimes use Google to search for