DATA STEP text file tricks

3

Files_FilesWhen reading a text file (common extensions: TXT, DAT; or, for the adventurous: HTML) with the DATA STEP, you should always view several lines from the text file, and compare to the record layout, before completing the INPUT statement.  There are many ways to view a text file.  I use the DATA STEP as a file viewer, which is convenient, since I use the DATA STEP to read the text file anyway.

Advantages of using the DATA STEP as a text file viewer:

  1. Platform independence: Windows, Unix, or Z/os
  2. Hex code for unprintable characters
  3. Column ruler

It's super easy!  Check out this SAS log:

95   data _null_;
96      infile 'c:\myfile1.txt' obs=10;
97      input;
98      list;
99   run;
 
NOTE: The infile 'c:\myfile1.txt' is:
      Filename=c:\myfile1.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=19,
      Last Modified=27May2015:14:30:25,
      Create Time=27May2015:11:18:52
 
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
1         1,a 3
2         234,bcd 7
3         56,ef 5
NOTE: 3 records were read from the infile 'c:\myfile1.txt'.
      The minimum record length was 3.
      The maximum record length was 7.

The above file has two fields, comma delimited.   There are only 3 records in this silly file, but the OBS=10 INFILE statement option gives me a quick peek at any size file.   The null DATA statement does not create a SAS dataset.  The null INPUT statement does not create variables.  The LIST statement copies each record into the log.  The numbers 3, 7, and 5 following each record represent the record lengths.

Now let's try a similar, but different, file.  Check out the new SAS log:

100  data _null_;
101     infile 'c:\myfile2.txt' obs=10;
102     input;
103     list;
104  run;
 
NOTE: The infile 'c:\myfile2.txt' is:
      Filename=c:\myfile2.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=19,
      Last Modified=27May2015:14:31:03,
      Create Time=27May2015:11:15:55
 
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
 
1   CHAR  1.a 3
    ZONE  306
    NUMR  191
 
2   CHAR  234.bcd 7
    ZONE  3330666
    NUMR  2349234
 
3   CHAR  56.ef 5
    ZONE  33066
    NUMR  56956
NOTE: 3 records were read from the infile 'c:\myfile2.txt'.
      The minimum record length was 3.
      The maximum record length was 7.

Each record from the text file is now represented by 3 lines in the log: CHAR, ZONE, NUMR.  CHAR represents the actual characters, identical to the prior example, except for the delimiter.  ZONE and NUMR represent the two halves of the ASCII hexadecimal code.  Why did hex codes suddenly appear?  And why should I care about hex codes?  Before answering these questions, notice the different delimiter.  It's a period, instead of a comma, right?  Wrong!  This is a tab-delimited file.  The period is a placeholder for a tab character, which is unprintable.  How can I tell?  Notice ZONE and NUMR are 0 and 9.

Google "ascii table" or visit this page and you will find something like the partial table below.

ASCII Code

Whenever an unprintable character appears in a file, the LIST statement automatically displays hex codes, as shown in the partial ASCII table above.

How do you read a tab-delimted file?  After viewing the file, make these five changes:

  1. Change the DATA statement to specify a dataset name.
  2. Add DLM='09'x to the INFILE statement.
  3. Add variables to the INPUT statement.
  4. Remove the LIST statement.
  5. After testing with the above changes, remove OBS=10 from the INFILE statement.

Your final DATA STEP will look something like this:

data mySASfile2;
   infile 'c:\myfile2.txt' dlm='09'x;
   input Number Text: $3.;
run;

Watch the SAS Training Post blog for more tips and tricks!

Share

About Author

Jim Simon

Principal Technical Training Consultant

Jim Simon is a principal instructor and course developer for SAS Education. Jim has a bachelor's degree from UCLA and a master's degree from California State University at Northridge. Prior to joining the SAS Irvine office in 1988, Jim was an instructor at Ventura College and a SAS programmer at The Medstat Group in Santa Barbara. Jim's areas of specialization include the DATA step, application development, web enablement, and the SAS macro language. A native of Southern California, Jim enjoys anything in the warm California sun. On weekends, Jim loves jumping in his Corvette, turning up the stereo, and cruising Pacific Coast Highway, top down, South to Laguna Beach or North to his old home town, Santa Barbara.

3 Comments

    • Jim Simon

      Thanks, Michelle, for the helpful link. Additional details: The PUT statement writes from the PDV, while the LIST statement writes from the input buffer, placing a snapshot of the input buffer into the log. The LIST statement therefore writes pure text, before it is brought into the PDV, while the PUT statement writes data following interpretation by the informat and placement into the PDV. Both are useful!

      • Thanks for the differentiation on the PUT and LIST statement. I will certainly keep that in mind! I have used the hex method to "view" unprintable characters from mainframe flat files moved to my PC. In fact, I typically move my mainframe data using the binary FTP option and use the SAS PC capabilities to read EBCDIC. Or read on the mainframe and use Proc Export and Import.

Leave A Reply

Back to Top