The Daunting Delimiter and the Intriguing Informat

6

You may (or may not) have noticed, but we took the week off from our rigorous blogging schedule (and, frankly, from studying for the certification exam) to celebrate Thanksgiving. The tryptophan-filled turkey would have probably made me nod off in the pages of my certification guide anyway. My holiday was filled with food, football, and raking leaves. Hope everyone had a great one.

But, luckily, I do have a couple of topics to write about, since our study group met the week before Thanksgiving. We had another good session. It is so good to hash out these topics with other people, who more times than not are having the same issues I am. If there is anyone out there on the SAS campus who wants to join us, please let me know. The more the merrier.

For our last assignment, we studied chapter 3 in the SAS Certification Prep Guide and chapter 3 in Ron Cody’s Learning SAS by Example. We pretty much breezed through the quiz at the end of the Prep Guide chapter (but it still had its share of those tricky questions), but Cody’s chapter was challenging. We had two questions that we could not resolve ourselves. I plan to write to Ron to get his insight (which I’ll post here, if he doesn’t), but perhaps our loyal readers can help us out.

The first involves delimiters. In chapter 3 he talks about using the DSD and DLM= options together so as to get the benefits of using DSD, but by adding on DLM= we can override the default delimiter (the comma) to tell SAS what delimiter our raw data is using (for instance, a colon). The question that resulted from our discussion is this: What if there are multiple delimiters in our raw data? For instance, what if we had a file where periods and colons were used as delimiters? Could we just use:

infile ‘file-description’ dsd dlm= ‘.:’;

Or do we have to do some data cleaning first?

Second, and perhaps we’re showing our inexperience by asking this, but we were stumped about the difference between Programs 3-11 and 3-12 in Cody’s book. If you don’t have the book in front of you (and if not why not???), this involves using an informat with list input versus supplying an INFORMAT statement with list input. Here is program 3-11:

And here is program 3-12:

Program 3-12 requires quite a few more keystrokes than program 3-11, but we get the same result. I know that doesn’t mean much for us, as mere aspirants to programming expertise, but if we wrote this kind of code all the time, why would we follow program 3-12? I’m sure there is a very good answer to this, and I hope someone out there can tell me what it is!

Share

About Author

Stacey Hamilton

Acquisitions Editor

Stacey joined SAS in 2008 as an editor for SAS Press, after a long career at a university press. She has worked in nearly every facet of the book publishing industry, from acquisitions to proofreading to manufacturing.

6 Comments

  1. Pingback: Zuzu’s petals - The SAS Bookshelf

  2. Michelle Buchecker on

    Another reason for using the INFORMAT statement, is that informat is now stored in the descriptor portion of the data set. So if you were to edit the data interactively through Viewtable for instance, you could type in the date as 12/25/2009 instead of having to guess what the SAS date was.

  3. I didn't know about the DLMSTR option. Very nice. Thanks for bring it to my attention. See, you ALWAYS can learning something new with SAS.

  4. Richard Hallquist on

    I just learned this a couple weeks ago: starting in SAS 9.2, there's a DLMSTR= option on the infile statement, which is an alternative to DLM=. The difference is that the exact combination of characters on the DLMSTR= option gets treated as a delimiter. So DLMSTR="~!" would treat the two-character string "~!" as a delimiter. DLM="~!" would treat both "~" and "!" as individual delimiters, just like Ron Cody indicated. Not on the certification exam, but cool to know about.
    For your scenario where both : and . were delimiters in the file, you would want to use DLM=".:".

  5. Here's a situation that would entice you into using 3-12 style code instead of the more succinct 3-11 style: if you want the the variables in output data set (LIST_EXAMPLE) arranged in a specific order that differs from the order in which the values are stored in the raw data file. For example, say you want the variables in LIST_EXAMPLE to be arranged as Name, DOB, Salary, Subject. Because you are using list input, you can't change the order in which you read the values in the INPUT statement. And in program 3-11 the INPUT statement sets the order in which the variables are created in the PDV at compile time, which controls the variables' logical storage order in the output data set. In program 3-12, the INFORMAT statement will set the variable order in the PDV, allowing you to separately control the variables' logical storage order while maintaining the required "read" order in the INPUT statement.

  6. Hi Stacie.
    Great question! I had to do some testing to see what would happen with you included more than one delimter in the DLM= option. My conclusion is that you CAN include more than one, BUT SAS interprets it as any ONE of them as a delimiter, NOT two in a row. Thus if you used a period and colon and put them together in the data, SAS would assume the period was the first delimiter and the colon was the second and, because of the DSD option, give you a missing value. If you wanted to specify that multiple delimiters were separating data values, you would need to leave out the DSD option so that multiple delimiters were treated as a single delimiter.
    As to you question about programs 3-11 and 3-12. It is just a matter of style. I kind of like the INFORMAT statement but, since I'm lazy, I often use the colons.
    Best,
    Ron Cody

Back to Top