Meet SAS Press authors at SAS Global Forum

BOoksThis year SAS Global Forum attendees have the chance to have lunch with SAS authors and find out what it takes to write a book.

When: Tuesday, April 28, 12 - 12:50 p.m.

Where: Ballroom D2.

Lunch is provided!  We’ll talk a little about what it’s like to write a book and then open the floor for questions. And, those who attend the lunch will have a chance to win a copy of a book of their choice from the SAS Bookstore.

We are continuing to expand the SAS Press program - so if you're interested in writing a SAS book, or know someone who should write a book for SAS then please pop over to the SAS Press booth in the Quad and let us know -- or send us your author recommendations to saspress@sas.com.

And don’t forget the SAS Author Roundup Tuesday at lunch! We look forward to seeing you in Dallas!

Post a Comment

Keep your hands off of my SAS data sets

I am all about sharing.  Knock on my door and I will gladly lend you a stick of margarine, a cup of sugar, an egg or two, some flour, a corkscrew, or a beer.  Not a problem to borrow a tie, one of my extra belts, a white shirt (if it fits), a scarf, or a pair of gloves.  Sure, I will gladly lend you one the books in my home library, a couple of my music CD's, or one of my movie DVDs.  I am reasonably sure that most of these items will either be returned to me in due time, or reciprocated, or paid forward.  But, keep your hands off of my SAS data sets!

I would bet that final sentiment regarding SAS data sets is pretty prevalent in organizations where SAS programmers work with shared storage resources.  Whether you are storing your SAS data sets in server directories or on network directories, you do not want to have them deleted, resorted, updated, or otherwise overwritten by other programmers on your team.  At least, not without your permission.

Fortunately, most organizations create security permissions that largely safeguard against unauthorized access of data.  They implement security packages--either native to the OS or purchased from vendors--to ensure that data is accessed only by those with the need to know.  Groups of programmers are given access to specific directories and other staff, with the exception of systems administrators, cannot get into them or see them at all.  Despite all of this protection, data integrity issues can come from programmers within your own group; programmers who have access to the same directories that you do; programmers who have the same access rights that you do.

Unfortunately, there is no foolproof way for you to keep your teammates from crunching your SAS data sets.  But, here are a few protection measures that you can put into place to help safeguard your SAS data sets:

  • SAS Data Set Passwords.  SAS allows you to specify ALTER, READ, and WRITE passwords.  Users must know the password in order to process password-protected data sets with SAS.  So, you can specify passwords for your do-not-disturb SAS data sets, wait for your co-workers to complain, and determine if they really do need access to those data sets.
  • ACCESS=READONLY Libname Option.  This option specifies that no data sets in the library can be updated and no new files can be written to the library.  This option can be effectively deployed in a shared group AUTOEXEC.sas file.  If your colleagues grumble, tell them to copy the data sets in question to one of their libraries and process them there.  Caution them to be mindful of version control issues.
  • SAS Views.  Create views of your SAS data sets via the DATA step or PROC SQL and allow your colleagues to use the views instead of the actual data sets.  The beauty of this approach is that you can have your valued data sets in one directory and the views in another.  Point your coworkers at the views directory and do not tell them the whereabouts of the permanent data sets.
  • Generation Groups.  You can specify for SAS to keep several generations of your SAS data sets available.  Consequently, when one of them is modified, you will still have the older version available.
  • SAS Audit Logs.  SAS Audit Logs can be used to determine who updated SAS data sets after the fact. They don't stop your colleagues from actually updating SAS data sets, but you can use them to determine the who, what, and when so that you can storm into the right office for an explanation.
  • LOCK Statement.  You could use the LOCK statement to lock your SAS data sets so that no other SAS program can read or write to the file.  This is a bit extreme and would require that your SAS program with the LOCK statements be running for the duration of when you wanted to safeguard your SAS data sets.  This option can be effectively deployed in a shared group AUTOEXEC.sas file.
  • Zip Files.  You could simply zip your SAS data sets up into zip files, delete the permanent SAS data sets, and restore the data sets from the zip files when you need them.  This is another extreme measure, but if your data sets get clobbered on a regular basis, you may find it more appealing than having them constantly restored.

Obviously, the best solution for keeping your important SAS data sets from being updated by your teammates is good communications between all involved, and a shared set of best practices for accessing and modifying SAS data sets.  But, when that is not available, you may have to reach for some of the ideas I have posted here.

Oh, and about that book that you borrowed from my SAS bookshelf library a couple of months ago... can I get it back?

Best of luck in all your SAS endeavors!

Have a great work-week!

Post a Comment

We don’t need no education!

book_appleSo, how did you first learn SAS programming?

Originally, I was self-taught.  Many years ago, I learned SAS on the job when a systems programmer quit and I took over supporting a mainframe performance software package that was written in SAS. I got a copy of the Base SAS users guide and the SAS Procedures guide and learned how to write my own SAS programs.  It was new and fun; and SAS was more powerful than any of the other programming languages I had been using.

After I had been programming with SAS for about three years, my then employer finally paid to send me to a SAS class.  I entered that class pretty confident of my SAS programming abilities, but was humbled within the first hour.  I had been running my SAS programs in batch on mainframe computers.  All of the students in the class ran their SAS programs interactively with SAS Display Manager, which I had never seen before.  So, I had to catch up to their level of expertise just to do the class exercises.

It was a great class!  I learned to use the SAS Display Manager; the proper ways to perform match merges; how the Program Data Vector works in the DATA step; the intricacies of PROC MEANS and its first cousin PROC SUMMARY; and a host of other very useful programming techniques. I learned the fundamentals of SAS that I had missed by simply jumping into using it and getting all of my information from the manuals.  That class made me a much stronger SAS programmer.

Today's programming professionals who are interested in taking formal SAS classes to increase their SAS programming acumen have a lot of choices.  SAS currently has the following training formats:

  • Classroom.  You can take an instructor-led SAS class in one of SAS's state-of-the-art training facilities located throughout the US and in many other countries.
  • Live Web Classroom.  Let an instructor-led SAS class come to you via the Internet.  This format saves on travel costs and time out of the office.
  • E-Learning.  This option allows you to take SAS classes at your own pace via the Internet 24/7.  This is another way to save travel costs and time out of the office.
  • On-Site Training.  Have qualified SAS instructors come on-site to teach you and your colleagues various SAS courses.
  • Mentoring Services.  This option provides you with a SAS instructor who becomes your coach to help you learn how to write SAS programs that address your organization's unique data processing needs.

You can find out much more about available SAS classes being offered, their content, when they are scheduled, and the training formats in the Training section of the support.sas.com web site.

Check it out because there is likely a SAS class that would make you a stronger programmer in a format that fits with your own work schedule.

"We don't need no education"?  Nope, that couldn't be further from the truth!

Best of luck in all your SAS endeavors!

Post a Comment

What to do when all your boss wants is a spreadsheet

Most SAS programmers have been here. Someone just wants a handful of numbers that they can add to a graph or power point presentation that is due tomorrow. You have the data files, you have a job to summarize it, and you have a dilemma. How do I get my data where the boss wants it, into Excel?

Transferring data between SAS and Microsoft Excel may be easier than you think.

I do not know how many times I have “Googled” something and gotten a cryptic answer that was marginally effective or even useless. You know that something somewhere will tell you how to do this, but where is that. Then you remember that the company that wrote the software has information online that will tell you how to do everything that your software can perform. But if you do not know the name of the procedure to use, how do you find the documentation about it?

My new book, Exchanging Data between SAS and Microsoft Excel: Tips and Techniques to Transfer and Manage Data More Efficiently is designed to help solve that problem by culling information from the SAS manuals and my personal experience into a document that shows you how to transfer data between SAS and Excel. In this first article on the subject, I will show you a simple way to transfer data to Excel with very little effort on your part. It is done with a “Right Click” of your mouse.

When viewing your SAS datasets in the SAS Explorer window all of the datasets have an icon or other display representing the SAS dataset. By using a “Right Click” on the dataset icon a menu appears with an option entitled “View in Excel”. Selecting this option creates an HTML file that Excel can open and use to view the data. In fact, SAS actually invokes Excel to open the HTML output file so you can use the data in Excel. The file will typically have a name similar to “#LNxxxxxx.xls”. The three byte extension (xls) allows Excel to open the file without hesitation prior to Excel version 2007. The newer versions check the contents of the file and if the file name ends in .xls but contains HTML or XML formatted commands for Excel then a message is displayed asking you to verify that you want to proceed.  Select “Yes” and Excel opens and your data appears.

Post a Comment

SAS author’s tip: Macro language timing is everything

This SAS tip is from Robert Virgile and his book “SAS Macro Language Magic: Discovering Advanced Techniques”.

We hope you find this tip useful. You can also read an excerpt from Virgile’s book.

In macro language, as in life, timing is everything.  Macro language students need to learn the timing of the DATA step, the timing of macro language, and the relationship between the two.

Let’s begin with the DATA step.  All DATA steps operate in two separate phases:

  1. The compilation phase. In a nutshell, the software checks the syntax of the DATA step statements, and sets up storage space in memory to hold each variable.
  2. The execution phase. Given that there are no syntax errors, the software executes the DATA step … reading data, performing calculations, outputting results.

Macro language statements may have an impact on step 1, the compilation phase.  The resolution of macro variables affects the statements within the DATA step:

%let dataset=MALES;
data &dataset;
   set everyone;
   if gender='M';
run;

During the compilation phase of the DATA step, &DATASET resolves into MALES.  Therefore, the name of the output data set becomes MALES.  However, macro language statements impact only the compilation phase, not the execution phase of the DATA step.  This concept forms a frequent stumbling block when learning macro language.  To illustrate, consider this DATA step (before the programmer complicated it by adding macro language):

data MALES FEMALES;
   set everyone;
   if gender='M' then output MALES;
   else if gender='F' then output FEMALES;
run;

Perhaps the programmer was trying to learn macro language, and using this as an experiment.  Perhaps the programmer sought job security.  But the simple DATA step above morphed into this nonworking version:

data MALES FEMALES;
   set everyone;
   if gender='M' then do;
      %let dataset=MALES;
   end;
   else if gender='F' then do;
      %let dataset=FEMALES;
   end;
   output &dataset;
run;

Mistakenly, the programmer believed that %LET statements could execute as part of the DATA step.  That is just never true.  %LET statements execute immediately … in this case before the compilation phase of the DATA step completes.  So the order of execution of these statements is:

%let dataset=MALES;
%let dataset=FEMALES;
data MALES FEMALES;
   set everyone;
   if gender='M' then do;
   end;
   else if gender='F' then do;
   end;
   output FEMALES;
run;

Clearly, the program revisions alter the outcome, forcing every observation into a single data set.  Remember these basics:

  • %LET statements are never part of a DATA step. Macro language statements execute immediately, and do not wait for the DATA step to begin executing.
  • If you need to control macro variables (either assigning or retrieving a value) while the DATA step executes, tools exist. But they are DATA step tools, not macro language tools.  The primary ones, CALL SYMPUT and SYMGET, will become the subject of a future article.

Let’s consider another example that both illustrates timing and illustrates a basic use of CALL SYMPUT.  Once again, improper use of macro language complicates the program.  Here is the original version, without macro language:

data percentages;
   state_pop=0;
   do until (last.state);
      set cities;
      by state;
      state_pop + city_pop;
   end;
   do until (last.state);
      set cities;
      by state;
      percent_pop = city_pop / state_pop;
      output;
   end;
run;

For each STATE:

  • The top DO loop computes STATE_POP (the total population for the STATE).
  • The bottom DO loop reads the same observations, computes PERCENT_POP for each, and outputs the result.

Now a macro language student might attempt a slightly different, nonworking variation:

data percentages;
   state_pop=0;
   do until (last.state);
      set cities;
      by state;
      state_pop + city_pop;
   end;
   call symputx ('denom', state_pop);
   do until (last.state);
      set cities;
      by state;
      percent_pop = city_pop / &denom;
      output;
   end;
run;

Bad timing is the critical issue:

  • Before the DATA step runs, &DENOM does not exist.
  • The software doesn’t begin to run the DATA step until it encounters the RUN statement.
  • By that time, the reference to &DENOM has already been encountered, generating an error.

There are many ways to introduce timing errors.  The remedy begins with understanding the relationship between macro language statements, DATA step compilation, and DATA step execution.  Most importantly, macro language statements execute immediately, and are never part of DATA step execution.

For more information about the macro language and the magic you can create with it, check out Robert Virgile’s book “SAS Macro Language Magic: Discovering Advanced Techniques”.

Post a Comment

SAS author’s tip: Bayesian analysis of item response theory models

This SAS tip comes from Clement A. Stone and Xiaowen Zhu, authors of Bayesian Analysis of Item Response Theory Models using SAS.

Item response theory (IRT) models are the models of choice for analyzing item responses from assessments in the educational, psychological, health, social, and behavioral sciences. SAS PROC MCMC can be used in all types of assessment applications to investigate how particular characteristics of items and how particular characteristics of persons affect item performance. Use of the SAS system for Bayesian analysis of IRT models has several significant advantages over other available programs: (1) It is commonly used by researchers across disciplines; (2) it provides a robust programming language that extends the capability of the program—in particular, the capability for model checking; and (3) it shows increased performance and efficiency through the use of parallel processing.

Our book Bayesian Analysis of Item Response Theory Models using SAS provides step-by-step instructions for using SAS PROC MCMC to analyze various IRT models. Working through the examples in the book or with some prior knowledge of IRT models and Bayesian methods, you can…

Estimate simple as well as complex IRT models using PROC MCMC. It is a straightforward task in PROC MCMC to implement Bayesian estimation of a variety of simple and more complex IRT models. All you need to do is express the response probability function or likelihood for your particular model, declare the model parameters, and specify prior probability distributions for these parameters. PROC MCMC may be particularly useful for applications investigating multidimensionality or heterogeneity in item responses due to, for example, differential item functioning, content related processes (shared context or word orientation), or response related processes (solution strategies, response styles, response sets).

Evaluate the estimation of the model. Because the Markov Chain Monte Carlo (MCMC) method is a simulation based approach, you should determine whether the simulated draws have converged to the target posterior distributions for model parameters. PROC MCMC includes a number of tools and statistics for evaluating the convergence of the sampling process in the posterior distributions for model parameters. These include history and autocorrelation plots as well as various diagnostic tests and statistics: Gelman-Rubin, Geweke, Heidelberger-Welch (stationary and half-width tests), Raferty Lewis, and effective sample size.

Compare competing models and evaluate model fit. In many applications, different models may be estimated that reflect competing theoretical perspectives or competing formulizations of the item and person characteristics that are modeled. PROC MCMC and the SAS system provide the tools for choosing among competing models. The Posterior Predictive Model Checking (PPMC) method is a commonly used Bayesian model checking tool and has proved useful for evaluating the fit of models. PPMC can be implemented using the robust programming language in the SAS system and a variety of different plots can also be obtained to display results.

In conclusion, PROC MCMC makes estimating and model checking of IRT models in a Bayesian paradigm more accessible to researchers, scale developers, and measurement practitioners.

We hope you find this blog informative and invite you to read a free chapter from the book here.

Post a Comment

Raiders of the lost spreadsheet

Have you ever peered intently into an unfamiliar data delivery directory, realized what was in it, rolled over onto your side, stared blankly into the distance, and dejectedly uttered something akin to:

"Spreadsheets! Why did it have to be spreadsheets?"

If so, then we are definitely on the same page. Why does it always have to be spreadsheets?

The answer to that question is actually pretty obvious when you think about it. The popularity of Microsoft Office has made Excel one of the most popular mediums for storing data. It is used extensively in grade schools, middle schools, high schools, and colleges. People with home businesses use it; office administrators use it; clerical staff use it; scientists use it; lawyers use it; hospital workers use it; Federal, state and local government workers use it; and programmers use it too.

An individual who needs to store data in electronic format and then process it may not have SAS, or C++, or JAVA, or C#, or PYTHON, or PHP, or R, or MATLAB, or ColdFusion, or FOCUS, or FORTRAN, or Groovy, or JavaScript, or MOBY, or MUMPS, or NATURAL, or Perl, or PHP, or PL/SQL, or PowerShell, or Python, or S-PLUS, or Visual Basic installed on his or her PC. But, that person will undoubtedly have Microsoft Office and thus have Excel. That is why it always has to be spreadsheets.

But, processing data stored in spreadsheets is not really a problem for intrepid SAS programmers. When I go on a data exploration expedition where there is a good chance of encountering spreadsheets, I pack the usual: my brown leather jacket, fedora, and bullwhip. But, most importantly, I put SAS/Access Interface to PC Files into my backpack.

SAS/Access Interface to PC Files is a SAS for Windows product that allows you to read, write, and update data in Excel and Access. As such, it is a must-have for your Windows SAS installation.

Here is an example of a program that I use to map out the contents of an unexplored spreadsheet:

ods rtf file="G:\BigProject\Worksheets in NewDataSpreadsheet.rtf";
 
libname xlslib "G:\NewProject\DeliveryDirectory\NewDataSpreadheet.xlsx" access=readonly;
 
proc sql;
create table WorkSheets as
select distinct(compress(MEMNAME,"',$")) as WorkSheet_Name,
name as ColumnName
from dictionary.columns
where libname = 'XLSLIB';
quit;
 
proc print noobs data=WorkSheets;
var WorkSheet_Name ColumnName;
title1 "Workseets in NewDataSpreadsheet.xlsx";
run;
 
ods rtf close;

The ODS statement specifies that my report will be created as an RTF document. Because I have SAS/Access Interface to PC Files, the LIBNAME statement allocates the NewDataSpreadheet.xlsx spreadsheet much the same way as it would for a SAS data set. (Notice that I specified access=readonly so that I do not accidentally update the spreadsheet). Since I have "LIBNAME-d" the spreadsheet, information about its worksheets and column names is now available in the SAS Dictionary Tables.

I use PROC SQL to extract the name of each worksheet (variableWorkSheet_Name) in the Excel file; and the names of the columns (variableColumnName) within each worksheet and then plop them into a SAS data set for further exploration. The code snippet compress(MEMNAME,"',$") gets rid of the annoying quotes and dollar signs that are found in spreadsheet MEMNAMEs. Then, I use the PRINT procedure to create a report. A simple, neat, quick, and easily macro-tized piece of code.

Here are several good references that you can use to find out more about processing spreadsheets with SAS:

Armed with those resources, some pluck, a sense of adventure, and with your own trusty copy of SAS/Access Interface to PC Files, you too can be a raider of the lost spreadsheet!

Best of luck in all your SAS endeavors!

Post a Comment

3 bestselling books at ENAR 2015 Spring Meeting

SASBooks_ENARWe had a lot of books at the ENAR 2015 Spring Meeting in Miami last week, but these were the top three bestsellers.

  1. Analysis of Observational Healthcare Data using SAS by Douglas E. Faries, Robert L. Obenchain, Josep Maria Haro, and Andrew C. Leon
  2. Survival Analysis Using SAS®: A Practical Guide, Second Edition by Paul D. Allison
  3. Bayesian Analysis of Item Response Theory Models Using SAS by Clement A. Stone and Xiaowen Zhu

I also met a young girl who’s ready to become our next author. Or maybe she just likes our buttons. Either way, you’re never too young to think about becoming a SAS Press author. If you have any publishing ideas, visit SAS Books to learn more.

YoungSASauthor

If you were at the conference and picked up the card with the ENAR discount code, don’t forget to use it before April 1st.

Post a Comment

I Know What You Did Last Summer!

I know what you did last summer.

If it was unintentional, then you probably don't know what I am talking about.  If it was intentional, then you probably thought that I would never find out.  Either way, the damage is done.  The actions that you took on that warm summer evening are as clear to me now as they would have been if I had been watching over your shoulder while you did them.  I know what you did last summer: You updated one of my SAS data sets.

We work on the same project  and both have read, write, update, and delete rights to the project's directories.  The production SAS data set that I created for the spring data delivery was inexplicably updated in the summer.  And, you were the one who did it.  Because we have been teammates for a while, I am giving you the benefit of the doubt.  I bet that you made a copy of the production SAS program for a different use, updated it, but forgot to change the LIBREF to point to your test SAS data library.  So when you ran it, you accidentally deleted 400 observations and updated 273 observations in the production data set.

Oh, you want to know how I determined it was you and how I know exactly what changed.

Well, because that production data set is very important, I used PROC DATASETS to create a SAS audit trail file for it.  SAS audit trails record changes to SAS data sets.  They can record the before and after image of observations that were changed, the type of change, the date/time of the change, and the userid of the person who changed the SAS data set.  So, SAS audit trails can be very useful in a shared directory environment where many staff members have access to important SAS data sets.

Here is the code I used to create the audit trail for the production SAS data set:

proc datasets library=prodlib nolist;
        audit SpringDeliveryData;
        initiate;
        log admin_image=yes
              before_image=yes
              data_image=no
              error_image=yes;
        run;
quit;

When I executed that DATASETS procedure code, SAS created a file named SpringDeliveryData.sas7baud in the same directory as the SAS data set.  When an observation is updated, added, or deleted from SpringDeliveryData, SAS writes an observation to the audit trail data set containing the variables in the original SAS data set and six specific audit trail variables.  Of note are _ATDATETIME_ which specifies the date/time of the change; _ATOPCODE_ which specifies the type of change that took place--e.g. add, delete, modify; and _ATUSERID_ which specifies the userid of the person whose SAS program made the change.

When I noticed that SpringDeliveryData had been modified, I used a PROC PRINT to dump the audit trail file.  That is how I know that the data set was updated at 5:27 PM on August 5th by a program submitted under your userid.

You are interested in using SAS audit trails for your own production SAS data sets?  Great!  You can find a comprehensive write-up in the documentation on support.sas.com at this link.

Don't fret about the updates to the SpringDeliveryData SAS data set.  I am going to request that our systems administrator restore the data set to the day before the summer update.  That way, we will have the original data set available in case our client has questions about it.

Good to know that I was right that you accidentally updated the production data set last summer.  Oh, don't go.  Unfortunately we have one more thing to talk about:

I know what you did last fall...

Best of luck in all your SAS endeavors!

Post a Comment

SAS Press is heading to ENAR 2015 Spring Meeting

Are you heading to the ENAR 2015 Spring Meeting in Miami this week? SAS author and Program Chair Mithat Gönen, of Memorial Sloan-Kettering Cancer Center, and Associate Chair Brisa Sánchez, of the University of Michigan School of Public Health have created an outstanding scientific program this year. The sessions cover a wide range of topics such as, data sciences (big data), genomics, clinical trials, neuroimaging, biomarkers, health policy, electronic health records, ecology, and epidemiology.

After whetting your appetite at some of these great sessions, come and browse the SAS Press booth and find informative, up to date titles to further your knowledge, such as SAS classics: Analyzing Receiver Operating Characteristic Curves with SAS, Gönen; and Analysis of Clinical Trials Using SAS, Dmitrienko et al; and preview new titles: Bayesian Analysis of Item Response Theory Models Using SAS by Clement Stone & Xiaowen Zhu and Time Series Modeling Using the SAS VARMAX Procedure, by Anders Milhoj.

While we do have some great titles, I know we haven’t covered everything. Please stop by and have a quick chat with me. While I am happy to discuss what we do have available - perhaps there is a topic you would like to see covered but we don’t have? Perhaps you have a topic you would love to write about?

Post a Comment