SAS administrators tip: Automatically deleting old SAS logs

14

Automation for SAS Administrators - deleting old filesAttention SAS administrators! When running SAS batch jobs on schedule (or manually), they usually produce date-stamped SAS logs which are essential for automated system maintenance and troubleshooting. Similar log files have been created by various SAS infrastructure services (Metadata server, Mid-tier servers, etc.) However, as time goes on, the relevance of such logs diminishes while clutter stockpiles. In some cases, this may even lead to disk space problems.

I also recommend: SAS administrators tip: Keeping track of SAS users

There are multiple ways to solve this problem, either by deleting older log files or by stashing them away for auditing purposes (zipping and archiving). One solution would be using Unix/Linux or Windows scripts run on schedule. The other is much "SAS-sier."

Let SAS clean up its "mess"

We are going to write a SAS code that you can run manually or on schedule, which for a specified directory (folder) deletes all .log files that are older than 30 days.
First, we need to capture the contents of that directory, then select those file names with extension .log, and finally, subset that file selection to a sub-list where Date Modified is less than Today's Date minus 30 days.

Perhaps the easiest way to get the contents of a directory is by using the X statement (submitting DOS’ DIR command from within SAS with a pipe (>) option, e.g.

x 'dir > dirlist.txt';

or using pipe option in the filename statement:

filename DIRLIST pipe 'dir "C:\Documents and Settings"';

However, SAS administrators know that in many organizations, due to cyber-security concerns IT department policies do not allow enabling the X statement by setting SAS XCMD system option to NOXCMD (XCMD system option for Unix). This is usually done system-wide for the whole SAS Enterprise client-server installation via SAS configuration. In this case, no operating system command can be executed from within SAS. Try running any X statement in your environment; if it is disabled you will get the following ERROR in the SAS log:

ERROR: Shell escape is not valid in this SAS session.

To avoid that potential roadblock, we’ll use a different technique of capturing the contents of a directory along with file date stamps. This coding technique described below does not rely on the XCMD system option and therefore is preferable.

Macro to delete old log files in a directory/folder

The following SAS macro cleans up a Unix directory or a Windows folder removing old .log files. I must admit that this statement is a little misleading. The macro is much more powerful. Not only it can delete old .log files, it can remove ANY file types specified by their extension.

%macro mr_clean(dirpath=,dayskeep=30,ext=.log);
   data _null_;
      length memname $256;
      deldate = today() - &dayskeep;
      rc = filename('indir',"&dirpath");
      did = dopen('indir');
      if did then
      do i=1 to dnum(did);
         memname = dread(did,i);
         if reverse(trim(memname)) ^=: reverse("&ext") then continue;
         rc = filename('inmem',"&dirpath/"!!memname);
         fid = fopen('inmem');
         if fid then 
         do;
            moddate = input(finfo(fid,'Last Modified'),date9.); /* see WARNING below */
            rc = fclose(fid);
            if . < moddate <= deldate then rc = fdelete('inmem');
         end;
      end; 
      rc = dclose(did);
      rc = filename('inmem');
      rc = filename('indir');
   run;
%mend mr_clean;

This macro has 3 parameters:

  • dirpath - directory path (required);
  • dayskeep - days to keep (optional, default 30);
  • ext - file extension (optional, default .log).

This macro works in both Windows and Linux/Unix environments. Please note that dirpath and ext parameter values are case-sensitive.

WARNING: In most cases, finfo(fid,'Last Modified') returns a date/time string in the DDMMMYYYY:HH:MM:SS format as described in the Usage Note 40934. In these cases applying DATE9 informat produces valid SAS date. In some other cases using ANYDTDTM informat may be appropriate. However, as reported by reader Rajeev Meena there are OS installations that return date/time strings in some odd formats, such as "Mon Jun 21 11:11:00 2018". I am not aware of any SAS informat that can directly convert such a string into SAS date. In such cases, some string parsing might be in order to get a convertible string (see Rajeev's suggested solution in his comment below). I suggest checking your operating system for returned date/time string format by running the following code:

data _null_;
   rc = filename('infile','full_path_name_of_any_existing_file');
   fid = fopen('infile');
   s = finfo(fid,'Last Modified');
   rc = fclose(fid);
   put s=;
run;

By looking at the s value in the SAS log you can decide what informat to use, DATE9, ANYDTDTM or you need to rearrange your date/time string before converting it to a SAS date.

Macro invocation

Here are examples of the macro invocation:

1. Using defaults

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean)

With this macro call, all files with extension .log (default) which are older than 30 days (default) will be deleted from the specified directory.

2. Using default extension

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean,dayskeep=20)

With this macro call, all files with extension .log (default) which are older than 20 days will be deleted from the specified directory.

3. Using explicit parameters

%let dir_to_clean = C:\PROJECTS\Automatically deleting old SAS logs\Logs;
%mr_clean(dirpath=&dir_to_clean,dayskeep=10,ext=.xls)

With this macro call, all files with extension .xls (Excel files) which are older than 10 days will be deleted from the specified directory.

Old file deletion SAS macro code explanation

The above SAS macro logic and actions are done within a single data _NULL_ step. First, we calculate the date from which file deletion starts (going back) deldate = today() - &dayskeep. Then we assign fileref indir to the specified directory &dirpath:

rc = filename('indir',"&dirpath");

Then we open that directory:

did = dopen('indir');

and if it opened successfully (did>0) we loop through its members which can be either files or directories:

do i=1 to dnum(did);

In that loop, first we grab the directory member name:

memname = dread(did,i);

and look for our candidates for deletion, i.e., determine if that name (memname) ends with "&ext". In order to do that we reverse both character strings and compare their first characters. If they don’t match (^=: operator) then we are not going to touch that member - the continue statement skips to the end of the loop. If they do match it means that the member name does end with "&ext" and it’s a candidate for deletion. We assign fileref inmem to that member:

rc = filename('inmem',"&dirpath/"!!memname);

Note that forward slash (/) Unix/Linux path separator in the above statement is also a valid path separator in Windows. Windows will convert it to back slash (\) for display purposes, but it interprets forward slash as a valid path separator along with back slash.
Then we open that file using fopen function:

fid = fopen('inmem');

If inmem is a directory, the opening will fail (fid=0) and we will skip the following do-group that is responsible for the file deletion. If it is file and is opened successfully (fid>0) then we go through the deletion do-group where we first grab the file Last Modified date as moddate, close the file, and if moddate <= deldate we delete that file:

rc = fdelete('inmem');

Then we close the directory and un-assign filerefs for the members and directory itself.

Deleting old files across multiple directories/folders

Macro %mr_clean is flexible enough to address various SAS administrators needs. You can use this macro to delete old files of various types across multiple directories/folders. First, let’s create a driver table as follows:

data delete_instructions;
   length days 8 extn $9 path $256;
   infile datalines truncover;
   input days 1-2 extn $ 4-12 path $ 14-270;
   datalines;
30 .log      C:\PROJECTS\Automatically deleting old files\Logs1
20 .log      C:\PROJECTS\Automatically deleting old files\Logs2
25 .txt      C:\PROJECTS\Automatically deleting old files\Texts
35 .xls      C:\PROJECTS\Automatically deleting old files\Excel
30 .sas7bdat C:\PROJECTS\Automatically deleting old files\SAS_Backups
;

This driver table specifies how many days to keep files of certain extensions in each directory. In this example, perhaps the most beneficial deletion applies to the SAS_Backups folder since it contains SAS data tables (extension .sas7bdat). Data files typically have much larger size than SAS log files, and therefore their deletion frees up much more of the valuable disk space.

Then we can use this driver table to loop through its observations and dynamically build macro invocations using CALL EXECUTE:

data _null_;
   set delete_instructions;
   s = cats('%nrstr(%mr_clean(dirpath=',path,',dayskeep=',days,',ext=',extn,'))');
   call execute(s);
run;

Alternatively, we can use DOSUBL() function to dynamically execute our macro at every iteration of the driver table:

data _null_;
   set delete_instructions;
   s = cats('%mr_clean(dirpath=',path,',dayskeep=',days,',ext=',extn,')');
   rc = dosubl(s);
run;

Put it on autopilot

When it comes to cleaning your old files (logs, backups, etc.), the best practice for SAS administrators is to schedule your cleaning job to automatically run on a regular basis. Then you can forget about this chore around your "SAS house" as %mr_clean macro will do it quietly for you without the noise and fuss of a Roomba.

Your turn, SAS administrators

Would you use this approach in your SAS environment? Any suggestions for improvement? How do you deal with old log files? Other old files? Please share below.

 

I also recommend: SAS administrators tip: Keeping track of SAS users
Share

About Author

Leonid Batkhan

Leonid Batkhan, Ph.D. in Computer Science and Automatic Control Systems, has been a SAS user for more than 25 years. He came to work for SAS in 1995 and is currently a Senior Consultant with the SAS Federal Data Management and Business Intelligence Practice. During his career, Leonid has successfully implemented dozens of SAS applications and projects in various industries. All posts by Leonid Batkhan >>>

14 Comments

  1. Andrew Howell on

    Leonid, great article, especially using SAS functions to bypass "X" commands.

    I do find it a tad ironic to use a SAS program (which itself will generate logs) to delete SAS logs.

    I've previously run monthly batch jobs which execute O/S script in each log folder, namely to archive the previous month's logs into a zip file, and to erase any zip file older than 6 months. (I'll try to dig up the script & post it.)

    • Leonid Batkhan

      Andrew, thank you for your comment. I see what you are trying to say "I do find it a tad ironic to use a SAS program (which itself will generate logs) to delete SAS logs". I would call it "self-sustaining system". It's about time to behave as a responsible adult and clean up your own mess 🙂 .

      Looking forward to seeing your example of zipping the logs...

  2. Hi Leonid,
    Really good article.

    I would like to add something on this .

    moddate = input(finfo(fid,'Last Modified'),date9.);

    finfo(fid,'Last Modified') would provide "Last Modified" timestamp, but as this will be in different formats in different operating systems , we might need to tweak to to use as per the OS.

    For instance , in AIX, finfo(fid,'Last Modified') would return "Mon Jun 21 11:11:00 2018" (Day of Week, Month, Day of Month, Time, Year) and as a result , input("Mon Jun 21 11:11:00 2018",date9.) won't be able be able to convert it to numeric (date9.)

    I have used scan and cats to alleviate this issue as follows ,

    replaced
    moddate = input(finfo(fid,'Last Modified'),date9.);

    with
    tmp1 = finfo(fid,'Last Modified');
    tmp2=cats(scan(tmp1,3),scan(tmp1,2),scan(tmp1,5));
    moddate = input(tmp2,date9.);

    scan parameter would need to change as per target OS.

    • Leonid Batkhan

      Hi Rajeev,
      Thank you for bringing it to my attention.
      When I wrote this macro I was under impression that finfo(fid,'Last Modified') always returns date in the DDMMMYYYY:HH:MM:SS format as described in the Usage Note 40934. This was true for all the environments where I tested that macro and applying informat date9. produced correct date values.
      As your example shows, in different OS setups finfo() function might return datetime strings in some odd formats and that would require proper parsing.
      I can add that applying ANYDTDTM. informat would be more robust choice than DATE. as it extracts date values from a greater variety of datetime strings than than DATE. informat. For example, it will produce the same correct date values when ANYDTDTM. (default length = 9) applied to '21Jun2018:11:11:00' or ANYDTDTM32. applied to 'Jun 21, 2018' strings. Still, it does not cover your case 🙁 .

  3. Craig Malone on

    Hi,

    If you're running this on multiple different servers (mid-tier, metadata etc), what do you need to change? Is it just enter the server host names and port into the directory path, or is it based on where the code runs?

    Ideally, I'd be able to delete log files from different servers within the one script using that driver table

    • Leonid Batkhan

      Hi Craig, thank you for your great question.
      Yes, the idea was to run this code on a SAS server (machine that has SAS on it) and access all other servers on that network. You don't have to change macro code, you would just need to provide valid paths to your PATH variable in the driver table. See my reply below to your followup question.

  4. Craig Malone on

    Hi,

    Where do you differentiate between servers e.g. mid tier, metadata etc in the code? Is it just enter the hostname into the directory path, or should you be entering a different parameter for it. Ideally, I'd like to be delete log files in multiple folders off multiple servers using the drivers table, and just using the one script.

    • Leonid Batkhan

      Hi Craig,
      You can use a single driver table to clean all your folders/directories on your network.
      In Windows environment, you can map a drive to a remote folder. Then in the driver table you would specify the path variable value as
      T:\\subfolder1\subfolder2\...
      For example, if you map your C:\SAS folder on a mid-tier server to S: drive on a machine running your cleaning code then your path variable value may look like this:
      S:\\Config\Lev1\Web\WebAppServer\SASServer1_1\logs
      Similarly, you can make a shared mount point in Unix/Linux environment using NFS (Network File System), something like this:
      mount -t nfs remote.host.com:/remote/shared/directory /my/local/mountpoint
      Then in the driver table you would specify the path variable value as:
      /my/local/mountpoint
      Hope this helps.

  5. jeremy ovaere on

    Very interesting post about doing the job without X-command. I have a preference on not using SAS code if it's an IT job. If you're administrating and managing your server this is great but if you have a team of sysadmins monitoring your server be kind with them and write dos or powershell commands so they see what's happening and can intervene.
    robocopy is a good tool to move and delete your files.

    compressing on a windows server with powershell (source is google):
    $AllLogs = Get-Childitem -Path D:\Logs\Notification\Temp -Recurse -Filter *.log
    $LastWeek = (Get-Date).AddDays(-7)
    $OldLogs = $AllLogs | Where-Object { $_.LastWriteTime -lt $LastWeek }
    Foreach ($Log in $OldLogs) {
    $Destination = ($Log.Directory.ToString() + "\" + $Log.BaseName)
    Compress-Archive -Path $Log.FullPath -Destination $Destination
    }

    • Leonid Batkhan

      Thank you, Jeremy, for your constructive feedback. While I agree that IT system administrators like taking care of their servers themselves, in many cases they would prefer not to deal with SAS applications logs, outputs, or other operational files. That is where good applications development practices step in. Wouldn't it be nice to clean up your own mess after yourself! I think both ways have their justification, place and right to co-exist.
      And, by the way, ironically, you can use SAS to automate scripts writing, those same OS scripts utilized by IT people 🙂

Leave A Reply

Back to Top