SAS users collaborating, improving SAS®

As many of you know, I'm also the Editor of the SAS Tech Report.  For those who don't know what that is, the  SAS Tech Report is a free newsletter published once a month and sent straight to your email inbox. I gather SAS papers, tips and practical information about SAS software and cram it in the newsletter. I also hunt for places where SAS users are gathering to talk about SAS or answer SAS questions. You can subscribe online. Today, I published the first edition of the newsletter for 2012.

In every newsletter, I begin with an Editor's note with information that I consider special. In today's newsletter, I included emails that I'd received in response to the December SAS Tech Report. These emails included ideas or tips from SAS users about ways to improve SAS software. I've gotten another email since the January newsletter mailed (mailed today!!), so I decided to post my Editor's note here so that we could have a broader conversation.

************************************************************************************************************************************************

Dear Readers,

I get a lot of emails from SAS Tech Report readers, and in December, I received a couple of emails that I want to share with you because they give you more information and make your work easier.

The first email was from Mike Zdeb, from the University at Albany School of Public Health, Rensselaer, New York. He noted that the method used in a SAS Sample in the Tips & Techniques section ( Counting the Number of Missing and Non-Missing Values for Each Variable in a Data Set) "seems overly complicated."

Zdeb wrote a paper for NESUG 2011 where he conquered this problem. He believes that his code is a lot easier to follow. He says there are “none of those multiple ampersands: there are seven occurrences in the SAS support code. And the output is pretty nice too – the SAS code in the paper uses PROC FREQ just once and gets all of the counts in one pass." [The SAS support code makes multiple passes through PROC FREQ (once per variable).]

The second email was also very helpful. It came from Mike O'Neil, Manager of the Data Warehouse for the Ministry of Social Development, Wellington, New Zealand. O'Neil wrote regarding Andrea Wainwright-Zimmerman's SAS Global Forum paper, While You Were Sleeping, SAS® Was Hard at Work.

“There are a number of techniques presented in this paper [that] are perfectly appropriate for a small organisation using SAS or a single SAS user managing their own workflow. For a large IT installation where there are dependencies on completion of non-SAS processing or other events such as file arrival, and where there is a considerable stream of SAS processing that has its own set of dependencies, the techniques described do not scale.

The ability to re-start processing part way through a processing stream of many tasks, or place some job streams on hold because of a failure elsewhere, is an important part of production control.

We are a large Unix-based IT shop where more than 60 percent of the entire IO on the disk arrays is generated by data warehouse processing or user processes accessing the warehouse data, and we use Control-M to schedule more than 500 processing tasks, most of which run daily. Our biggest headache is user-processing scheduled by them using the Unix ‘at’ command. We encourage users to use Control-M as soon as they know the job will run regularly.

The techniques described in this paper can be improved to make them more 'fit for IT.' SAS users and Administrators at sites need to be aware that these techniques do not scale."

 

 

Thank you Mike and Mike, this is great information. If you have information like this, please send it to me to share with our SAS colleagues.

************************************************************************************************************************************************

Perhaps I'll start a series so that we can talk about the emails that you send to me in response to the newsletter. After all, there can be more than one way to do something. Thank you to Mike and Mike for sending in their comments. Tell me how you are handling these situations above - please add them in the comment section.


These code examples are provided as is, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

tags: proc, sas tech report, tips & techniques

9 Comments

  1. Steve Sanders
    Posted January 18, 2012 at 6:16 pm | Permalink

    I agree with Mr. O'Neil's comments, and felt that the automated processes described in that particular paper were a great starting point.

    Over the last several years, I have developed a task scheduler that runs in batch SAS, using SAS Datastep and SAS Macro languages. It addresses the data dependency issues, but in what I consider to be a rather unique way.

    Instead of processing jobs sequentially as most schedulers do, my system processes jobs based on data dependencies and the availability of the data that the specific job is dependent upon. It monitors available data and then selects jobs from the job list that can be run with the data that is currently available. The sequence of execution may vary significantly from one day to the next, but it automatically seeks its own optimum run order by sorting the job list in the order of execution the day before. If the processes that create the source data are consistently executed by a chronological scheduler, my scheduler will quickly determine the optimal order to execute its programs in.

    I've talked about this process with some others at SAS, and I've considered writing a paper about it to present at SGF or SESUG. The problem is that it is extremely difficult to explain.

    Anyway, I thought I would add my two cents.

    • Waynette Tubbs Waynette Tubbs
      Posted January 18, 2012 at 6:32 pm | Permalink

      Hey Steve,
      Thanks for your suggestion. The idea of writing a paper is a good one. The idea seems daunting, but you may be surprised that the act of putting it all down on paper helps you wrestle with the explaining part. That's where you can move around the words and chapters to help it all make sense. Let me know if you write that paper; I'd love to meet you in person!

    • Posted February 9, 2012 at 1:38 pm | Permalink

      I'd love to see this paper! Peter has added an advanced section for 'black belt' programers at SESUG 12 that this might be a great fit for!
      The original paper was just me dealing with the monotonous task of running the same PC SAS job everyday. I only had Windows Scheduler to help me, so I did the best I could and figured there were others out there in the same boat who might benefit from what I had to figure out on my own, but I'd love to learn more and take it to a new level!

      • Waynette Tubbs Waynette Tubbs
        Posted February 9, 2012 at 2:35 pm | Permalink

        Thank you Andrea. As you can tell from the popularity of your paper in the SAS Tech Report, there are many SAS users who are in that same boat and find your method very useful. There are also others whose companies are growing into that next level. Like you, they are eager for a 'next step' paper. Thanks for letting us know about Peter Eberhardt's decision to include a new, advanced section. I will be reporting from SESUG 2012 this year (the conference will be held here in NC), so I'm going to be checking out some of those sessions in the 'black belt' section. I'm sure they will be very popular with the SAS Tech Report audience.

  2. Posted January 19, 2012 at 4:40 pm | Permalink

    FYI...I forgot I could post it here! LOL!

    >>> On 1/19/2012 at 8:42 AM, Rex Pruitt wrote:
    Awesome Article Waynette!!!

    Specifically...I've downloaded the SAS_Log_Checker and will be sharing this with our entire user base. Your articles are always "Value Add" in my world!!!

    Appreciatively,
    Rex

  3. Kevin Elliott
    Posted January 20, 2012 at 9:30 am | Permalink

    Hi Waynette

    I can only second the comments published from Mike O'Neil. I read the orignal article and my thoughts were probably very similar to his. Kudos to him for commenting so constructively. Like him, we run many SAS jobs every day in a large automated Unix environment. Much of our design for production was based on mainframe experience, so when moving to Unix, we kept the good side of mainframe production and dropped the clutter that had built up over the years.

    A couple of additions to Mike's suggestions that will help as smaller shops become bigger:

    -Cron jobs should also be banned and only a recognised scheduling package used.

    -All jobs should be designed to be re-runnable without special recovery action being taken (e.g. no file restores, flags to set, parameter files to edit, initialisation jobs to run), so that if there's a failure, you fix the problem, restart the job at the point of failure. This can take a big change of mind-set and some work, especially if people are used to appending to historical data files, or using GDGs.

    -Separate environments must be set up to allow job testing under the scheduling package in a pre-production environment (e.g. I'd see a minimum of three ad-hoc/development, pre-production, production, with a fourth for testing new releases as appropriate).

    -For maintenance, the submission of jobs must be controllable at a single point. e.g. for Control M all SAS jobs should be dependant on a single control resource as well as any other resources that good scheduling requires.

    -Intelligent use can/should be made of symbolic links to simplify maintenance and testing.

    -GDGs should not be used. Substitute date based files instead and manage with simple scripts.

    -Parameters should be passed at run time by the scheduler so that no code changes are necessary between environments. This includes run dates, so enabling a previous day to be run in need.

    -SAS' Sysout should be written to a standard place, organised by job/suite/time and managed by scripts.

    A lot will depend on the individual site philosophy, but general principles outlined in Mike's comments and those above will help a lot.

  4. Posted February 6, 2012 at 11:54 am | Permalink

    I really appreciate the communications from Mike O'Neil, Kevin Elliot, Andrea Wainwright-Zimmerman, and Steve Sanders about automation and scheduling of SAS jobs. Technical papers describing how to automate SAS jobs, using task schedulers and scripts, as in bash or other Linux/Unix/Mac OS X scripting languges, could greatly benefit the SAS community. Too many people are unaware of the power of Unix! Mike O'Neil and/or Kevin Elliot, please write a SAS User Group conference paper.

  5. Sammi Khan
    Posted February 15, 2012 at 5:19 pm | Permalink

    Reffering to Andrea comment about the amount of work involve in putting to gether a paper. I just did spent quite a bit time writting the paper and my paper was approved. However I was told that my paper will not be published unless I attend the conference. Other conferences allowe the paper to be published even one doesn't able to present live.
    Sammi Khan

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>