Top posts from The DO Loop in 2019

3

Last year, I wrote more than 100 posts for The DO Loop blog. The most popular articles were about SAS programming tips for data analysis, statistical analysis, and data visualization. Here are the most popular articles from 2019 in each category.

SAS programming tips

The Essential Guide to Binning

  • Create training, testing, and validation data sets:: This post shows how to create training, validation, and test data sets in SAS. This technique is popular in data science and machine learning because you typically want to fit the model on one set of data and then evaluate the goodness of fit by using a different set of data.
  • 5 reasons to use PROC FORMAT to recode variables in SAS: Often SAS programmers use PROC SQL or the SAS DATA step to create new data variables to recode raw data. This is not always necessary. It is more efficient to use PROC FORMAT to recode the raw data. Learn five reasons why you should use PROC FORMAT to recode variables.
  • Conditionally append observations to a data set: In preparing data for graphing. you might want to add additional data to the end of a data set. (For example, to plot reference lines or text.) You can do this by using one DATA step to create the new observations and another DATA step to merge the new observations to the end of the original data. However, you can also use the END= option and end-of-file processing to append new observations to data. Read about how to use the END= option to append data. The comments at the end of the article show how to perform similar computations by using a hash table, PROC SQL, and a DOW loop.
  • Use PROC HPBIN to bin numerical variables: In machine learning, it is common to bin numerical variables into a set of discrete values. You can use PROC HPBIN to bin multiple variables in a single pass through the data. PROC HPBIN can bin data into equal-length bins (called bucket binning) or by using quantiles of the data. This article and the PROC FORMAT article are both referenced in my Essential Guide to Binning in SAS.

Statistical analyses

Geometric Meaning of Kolmogorov's D

Data visualization

Visualize an Interaction Effect

I always enjoy learning new programming methods, new statistical ideas, and new data visualization techniques. If you like to learn new things, too, read (or re-read!) these popular articles from 2019. Then share this page with a friend. I hope we both have many opportunities to learn and share together in the new year.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

3 Comments

  1. The volume of technical material that you churn out on so many topics is astonishing. While you obviously have a very fertile mind, I'm sure this must also require a lot of hard work and research. So I second Michelle in thanking you for sharing all this knowledge. Once again in 2019, several times when I needed to "figure out" how to do something, a web search directed me to one of your blogs which showed me exactly what I needed!

Leave A Reply

Back to Top