Tricks for importing text files in SAS Enterprise Guide

29

I'm a big fan of the Import Data task in SAS Enterprise Guide, especially for its support of text-based files (CSV, tab delimited, fixed width, and more). There's no faster method for generating SAS code that reads your data exactly the way you need it. I use the tool so often that I take for granted some of its neatest features, and I forget that many new users (and even veteran users) might not know about them. In this article, I'll review a few of the cool things that this task can do for you.

(Update 23Mar2020) I've recorded a video that shows how to use the Import Data task as a start, and then adapt the code that it generates for reuse in other environments, with other files, and to read multiple text files in a single step. Watch the video to learn more, or scroll down to read my step-by-step tips.

Read fixed-width text files into SAS

We think of CSV files (and...alas...Excel files) as the main standard for data exchange among systems, but many legacy systems still produce and consume fixed-width text data formats. The SAS DATA step is a perfect tool for reading these files, but defining the columns and their properties can be tedious. The "Fixed columns" option on the Import Data task can make this job simple.

Suppose that you're beginning with a spec like this:

And a raw data file like this:

You can use the Import Data wizard to define the boundaries of your columns by adding boundary lines with just click-and-drag operations. Beginning with the File->Import Data task, select your source text file and advance to the second page of the wizard. When you select "Fixed columns" as the input text format, you'll see a layout ruler that looks like this:

Click at the column boundaries (referring to your original spec!) and drag the rule lines as needed to define those column boundaries. Then click Next, and fill out details for the column names and types:

Which then tells the Import Data task how to generate the proper INPUT statements:

When you click Finish, you end up with a data set that's ready for business:

Modify the properties for multiple columns -- with one step

Here's a click-saving trick. Sometimes you have an input data file that contains many columns that share the same properties: type, length, and SAS format. It can be tedious to click and modify the properties of each column that you want to import. There's a shortcut on the Define Field Attributes page of the wizard that you can use to change the attributes for several columns at the same time. Simply SHIFT+Click to select multiple column definitions on the page, then click Modify.... The "Field Attributes for Multiple Selections" window appears, and you can change the necessary attributes just once and apply to the many items you picked.

This trick works as you import any text file or Excel file.

Create SAS program code that you can reuse anywhere

In a previous article I described how the Import Data task works "behind the scenes." Some of the magic that the task performs is not captured in SAS code, and that can present a challenge when you want to reuse this work in other settings -- for example, in a batch process or in a larger SAS program. However, with a couple of tweaks you can coerce the Import Data task into creating SAS code that you can almost just "lift and shift," as is.

The first option is hidden under the Performance window, labeled as "Bypass the data cleansing process." By default, the Import Data task reformats your input text file to normalize it for a cleaner import step. While doing no harm, most of the time this step isn't needed -- especially if your original data file is well formed. And since this step changes the input file, it's isn't repeatable outside of this task. My first tip for the best reusable code: click Performance... on the first page of the wizard, then select the "Bypass.." checkbox. That guarantees that the code will be formulated to read your original raw file. (Note that the Performance button is available only when importing text files, not Excel files.)

The second option you'll want to change is related to this, but you'll find it on the final page with the Advanced Options. Select "Generalize import step to run outside of SAS Enterprise Guide." This ensures that the task won't attempt any behind-the-scenes monkey business with your original file -- everything is captured in the DATA step that the task generates. Well, almost everything...

The one missing piece, a confounding factor when you select a local text file to import on a remote SAS Workspace session, is the transfer of the local file to the remote server. SAS Enterprise Guide copies the file for you -- behind the scenes -- and there is no SAS code to represent this step.

You can take control of even this step, though, if you make use of the Copy Files task (now available for you on the Tasks->Data menu). You can then copy the file from a local source folder, and land it wherever you want on the SAS server. Modify your newly repurposed Import Data code to pull from that server-based destination, giving you more control over the individual steps in the import process.

Learn more about importing text files

If you're new to importing data into SAS, whether using a SAS program or SAS Enterprise Guide, you might learn some of the basics from these video tutorials that were produced by SAS instructors:

Share

About Author

Chris Hemedinger

Director, SAS User Engagement

+Chris Hemedinger is the Director of SAS User Engagement, which includes our SAS Communities and SAS User Groups. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies

29 Comments

  1. This is great Chris, thank you!

    We literally began some projects this week where have an issue with a simple CSV file import; a libref that is assigned (verified libref properties are readOnly:no, Temporary:no) using the servers menu does not get recognized by the ImportData until after a base sas program is executed containing the libname does the ImportData produce a SAS datatset on the server directory assigned.

    The libname statement being used is copied from Management Console>Data Library Manager>lib-rightClick "Display Libname Statement" and it's about a simple as it gets...
    LIBNAME ImpactAP BASE "D:\Impact_APP";

    What am I missing here? Is there a option or config setting that I've overlooked?
    (Other that making the library assign on server connect via Options, but we don't need this for all projects)
    Does right clicking the library not assign the libname ?

    Or worse, am I expecting too much auto-magic?

    thanks!

    • Chris Hemedinger
      Chris Hemedinger on

      If this is a metadata library, you probably want to make sure that "Pre-assigned" is checked ON within the definition. That ensures that the library will show as Assigned when you connect to the server. See more in this documentation topic.

  2. Excellent under the hood article and immediately useful for me. Yes, I often like to do a task in SAS Enterprise Guide first, then generalize to a SAS program often in fact in a SAS macro program loop. So often, I must import a data file at some point in my SAS program, so I make the import code in SAS Enterprise Guide. I am writing a SAS Global Forum paper about this type of step in SAS using SAS 9.2, 9.3, and 9.4 and going from 32 bit to 64 bit then mixing in Microsoft Office versions and Microsoft Windows versions. I had not thought of the changes in versions of SAS Enterprise Guide. Is the step with the "Generalize import step to run outside of SAS Enterprise Guide." check box, only in some more recent versions of SAS Enterprise Guide?

    • Chris Hemedinger
      Chris Hemedinger on

      Peter, these features for the Import Data task have been in the application for several releases.

  3. Hi Chris,

    I have a question about SAS EG. Is there any way to redirect the log of SAS to a file when executing a SAS EG project?

    A second question..., about parameters:

    1) Can yo export a parameter defined in a SAS project (proj1) to another SAS project(proj2)

    2) Can you send a parameter or macrovariable of a SAS project to the LASR server of SAS VA?

    Thanks in advance

    • Chris Hemedinger
      Chris Hemedinger on

      Juan, You can use the Project Log feature in EG, and Export As a Step at the end of your flow. That should get you what you need as the record of a project run.

      Regarding parameters -- the only way to share a prompt value from one project to another would be to save the value from Project 1 (using VBS if automated, or DATA step with FILE statement to store macro var externally), and then read it in for Project 2. There is no built-in mechanism.

      And regarding sharing a value from an EG project to LASR -- I don't think so. But there is probably a way to achieve whatever it is you're trying to do. I suggest posting the question to the Visual Analytics board in SAS Support Communities.

  4. Thank you for the post! This is exactly what I need. I have another question related to this. I have 5 files that needs to be imported into Oracle. The first file has a header and the 5 files have the same data format. Is it possible to import all the 5 files at once and into the same table?

    • Chris Hemedinger
      Chris Hemedinger on

      Yes! The INFILE accepts filenames with wildcards, so if your files are all in the same folder, you can use:

      filename in '/path-to-your-folder';
       
      data want;
       /* file spec within your folder */
       infile in(*.csv);
      ...
      ;
  5. I inadvertently added an extra boundary line, how can I delete it? I've tried simply dragging to another boundary line, but this is duplicating the variable.

    Thanks

    • Chris Hemedinger
      Chris Hemedinger on

      One way: use the mouse to click on the boundary line (selecting it) then press the Delete key.

      Also, from the online help: "Use the arrow keys to move the gray break line to the break line that you want to remove and press DELETE. You can also press SPACEBAR to delete a break link. "

    • Chris Hemedinger
      Chris Hemedinger on

      I use the Import Data task to generate the code for me, and then copy/paste that into a program that I can run repeatedly in production. Works with CSV or fixed-width text files!

  6. Eva-Maria Kegelmann on

    Is it possible to exchange the input file of the import-task?

    I want to import several identically structured text files. I thought I could define the import once, copy it and exchange the input file (like in other taks).
    But I could not find a way doing this.

    • Chris Hemedinger
      Chris Hemedinger on

      Yes, you can. Right-click on the input file in your process flow (for example, the text or Excel file), then select Properties, then the Change button. You can browse to a new source to change where the flow shortcut is pointing.

      • Eva-Maria Kegelmann on

        If I copy the IMPORT-Task, both Import Tasks have the same Input-Table. But I only want to change the input table of the second IMPORT-Task. IS that possible?

        • Chris Hemedinger
          Chris Hemedinger on

          Try this - add the second file to the project (with File->Open) and DON'T import it. Then right-click->Copy on the Import Data task you want, and right-click->Paste on the second text file.

  7. Hi, Chris,

    Eminently useful stuff, as always.

    QUESTION: Can I get Enterprise Guide to generate the SAS code for me *without* transferring the data across to the server? The reason I ask is that I already have the files on my server but would just like to use Enterprise Guide to generate the DATA step to read the data. The file is large and takes forever (well, it feels like it anyway) to transfer across to where I already have the data.

    I tried limiting the rows to 10 in the Import Wizard (Limit number of records read to: 10), but Enterprise Guide transfers the whole shebang across and then uses an OBS=10 on the SAS code to limit the rows. :(

    Is there a way that I can either
    a) Limit the number of rows transferred across
    or
    b) Just have Enterprise Guide generate the DATA step *without* transferring anything?

    Option "b" would be ideal, but "a" would be nearly as good.

    Thank you,

    Jim

    • Chris Hemedinger
      Chris Hemedinger on

      Jim, I don't think you can limit the rows that EG will download/read, but you can use a DATA step to create a smaller subset of your text file, then use that smaller file as a way to create your template for import. Maybe STOP the DATA step when _N_=20 or so, and that should be enough of a file sample to design your import process.

      • Chris,

        Yes, I thought of just reading in the csv file and writing it back out with a PUT _INFILE_ with an OBS = 500 so as to subset the data. I might be able to do it even faster in UNIX with HEAD -500 my_file.csv > new_file.csv.

        Using either method to subset, I'd then download the subset and generate a detailed DATA step using SAS Enterprise Guide. It's just a bit of fiddling and a number of manual steps.

        Now, say a certain leading software company were in the process of developing a fantastic new version of their Enterprise Guide software and that company decided to add a little check box like:
        [] Generate code but do not upload data
        or something along those lines, wouldn't that be a great idea (hint, hint)? :)

        Jim

  8. Hi there,
    is there a way to put filters into the import programm, in order to reduce a very large csv (approx. 120 mio lines) to those, that fulfill certain conditions?
    For example: only import those lines, where column x and z are not "0"...
    :-)

    • Chris Hemedinger
      Chris Hemedinger on

      Assuming that you end up with a SAS program that you're customizing, using DATA step and INFILE, then yes. You can add an IF statement that will test for a condition, and only those records that meet the condition will be kept. Your IF condition can scan the _INFILE_ (current input buffer) or it can test condition of variables you INPUT/assign.

  9. Chris, This is so helpful! I am a Base SAS coder and don't know much about SAS Enterprise, so would not have thought to use it in this way. But I have 5000 txt files to read in with a few different formats and a | delimiter so was dreading writing the datastep. This is such a time saver!
    Thank you!
    Kim

  10. Hi Chris.

    Is it possible to override the format that the Import Wizard is prepared to use for date columns, for instance? Using the Modify feature I can change the format to ”None”, but in the end the format is still applied. I made a copy of the program to make that change, but I wonder why this is not possible at a first glance in the import wizard. Thank you!

    • Chris Hemedinger
      Chris Hemedinger on

      It depends on what type of file you're reading, I think. If you want the data to be a DATE value but just formatted as a number, not using a date format? Or do you want SAS to read the value differently, such as a character value? You may be able to change the Type (numeric->character) as you import.

  11. Louis Deconinck on

    Is it possible to limit the amount of rows that are read to determine the attributes? I still want to import all data.

Back to Top