Got Data? Teaching SAS Programming for the Real World


For students to become capable data analysts, they need experience that they can take with them into the real world after graduation.  By far the most critical skill for their toolkit is learning to work with real-life data. Therefore, it is important from a teaching standpoint that instructors provide students with programming assignments that will challenge them and allow them to explore all the nuances of realistic data.  A SAS programming course that combines a focus on manipulating data, a solid foundation using visual and analytic tools, and experience working with realistic data sets will give students the opportunity to learn from situations similar to what they will encounter in the workplace. Of these areas, the most time-consuming task for an instructor is identifying meaningful data sets to use for classroom examples, exam questions, and programming assignments.

The book, Exercises and Projects for The Little SAS® Book, Fifth Edition, uses more than 70 data sets that are any combination of: 1) current and interesting; 2) messy; and 3) extensive.  This new exercise book, which I coauthored with Lora Delwiche and Susan Slaughter, contains multiple-choice and short answer questions, along with programming exercises using the aforementioned data sets.  The chapters in this book are linked to the same chapters in The Little SAS® Book, Fifth Edition.  We made a special effort to include extra variables in many of the data sets for Chapter 8, “Visualizing Your Data” and Chapter 9, “Using Basic Statistical Procedures” so that instructors could append additional questions of their own depending on the content covered in their course.  The following are brief descriptions of a few data sets used in the book.

  • From the United States Department of Transportation, the AIRLINES SAS data set contains 21,938 observations and 8 variables about air travel in the United States over a 20 year period.
  • The SAS data set LOANAPP contains 4,999 observations and 17 variables representing mortgage application data from a national bank with five branches in California.
  • A local gym records data from their computerized check-in system in a file called NewYears.dat. The raw data represent check-in and check-out times for 245 gym members contained in 238 variables.
  • Central State University tracks registration information on users of their supercomputer cluster which runs in a grid environment. The raw data file CompUsers.dat contains data on 7 different variables for 1,797 faculty, staff and student users.
  • The SAS data set SFF contains information on 179 countries and 22 variables that the World Health Organization (WHO) collected regarding the outbreak of swine flu cases and deaths in 2009.

As you can see the data sets used in the book are sometimes long, sometimes wide, or both.  In addition, many of the data sets contain real-life information, while others represent realistic scenarios.  The examples used in our book were carefully designed to present readers with an opportunity to test their SAS programming skills on data sets that would challenge and educate them.  More information about the book including the table of contents, an excerpt, and the data sets can be found on my author page on the SAS support website.


About Author

Rebecca Ottesen

Professor at California Polytechnic State University

Rebecca A. Ottesen first learned SAS as a student at California Polytechnic State University, San Luis Obispo, where she now teaches for the Statistics Department. As a Biostatistician for the City of Hope, Rebecca uses every opportunity to incorporate her research and programming experience into the coursework for her Cal Poly students.

Related Posts

Leave A Reply

Back to Top