SAS author's tip: Encoding of external files

11

Manfred Kiefer is a Globalization Specialist for SAS and the author of SAS Encoding: Understanding the Details. This week's tip is from his new book. In a review, Edwin Hart said "This book provides a very readable description of a topic that has long needed exposure: Why do my characters get garbled on the computer, and how do I fix the problem?" Kiefer's featured tip provides a good start.

The following excerpt is from SAS Press author Manfred Kiefer and his book "SAS Encoding: Understanding the Details" Copyright © 2012, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. (please note that results may vary depending on your version of SAS software).

Encoding of External Files

The FILE, FILENAME, and INFILE statements support the ENCODING= option, which enables users to dynamically change the encoding for processing external data.

SAS reads and writes external files using the current session encoding. This means that the system assumes the external file is in the same encoding as the session encoding. For example, if your session encoding is UTF-8 and you are creating a new SAS data set by reading an external file, SAS assumes that the file’s encoding is also UTF-8. If it is not, the data could be written to the new SAS data set incorrectly unless you specify an appropriate ENCODING option. Here is an example:

filename in 'external-file' encoding='Shift-JIS';
data mylib.contacts;
infile in;
length name $ 30 first $ 30 street $ 60 zip $ 10 city $ 30;
input name first street zip city;
run;

This code makes sure that the data of the external file is transcoded from Shift-JIS to UTF-8.

Likewise, when you write data to an external file, the data is written out in session encoding unless you specify an appropriate ENCODING= option. In the following example, we first create a subset of our contacts data, and then write the output to an external file with an encoding of Shift-JIS:

/* Create a subset with Japanese data from the main table */
proc sql;
create table japan as
select * from mylib.contacts where country_e = 'Japan';
quit;

/* Write the output to an external file with an appropriate
encoding */
filename out 'external-file' encoding='Shift-JIS'
data _null_;
set WORK.japan;
file out;
put @1 name @31 first @62 street @133 zip @144 city;
run;

When an external file contains a mix of character and binary data, you must use the KCVT function to convert individual fields from the file encoding to the session encoding. See the SAS National Language Support (NLS): Reference Guide for details on using the KCVT function.

To learn more about SAS Encoding, read Hart's review in its entirety, as well as a free chapter from the book.

Share

About Author

Shelly Goodin

Social Media Specialist, SAS Publications

Shelly Goodin is SAS Publications’ social media marketer and the editor of “SAS Publishing News”. She’s worked in the publishing industry for over thirteen years, including seven years at SAS, and enjoys creating opportunities for fans of SAS and JMP software to get to know SAS Publications’ many offerings and authors.

11 Comments

  1. Dear SAS Lady ,
    i tried the first coding and used encoding = 'shift-JIS' option with filename statement .
    however , it gave me below mentioned result in the log window;
    ERROR: Unable to load translate table .
    FATAL: Unrecoverable I/O error detected in the execution of the data step program.
    Aborted during the EXECUTION phase.

    why didn't it create the data set ? I'm using Portable 9.1 sas version.

  2. why/when whould I ever need this problem of transcoding?
    (a modest question from a uk user who tried to install a tagsets.excelxp upgrade)

    • Peter,

      Even a user using English only might run into trancoding issues.

      Just think of the ASCII standard which is not even minimally adequate for British English
      since it lacks the pound sterling sign. Think of the various encodings that are very
      similar but not the same; e.g. Windows Latin1 (used on Windows platforms) and ISO Latin 1 (used on Unix platforms). Problem Note 35355 (http://support.sas.com/kb/35/355.html) explains that the Euro symbol (€) as well as special quotation marks (“ and ”) used in Microsoft Word or Microsoft Excel might be potential sources of transcoding errors.

      Think of data containing strings in foreign languages you might get from others and which not be adequately handled when using the wrong encoding.

      Just to name a few things ...

      Regards,
      Manfred

  3. Priyanka,

    It looks to me as if you were not using Unicode Server when running the code. Since the SAS version used in this case is 9.1 I recommend reading the Technical Paper "SAS 9.1.3 Service Pack 4 in a Unicode Environment (http://support.sas.com/resources/papers/unicode913.pdf) that explains the installation, configuration and usage of the SAS 9.1.3 Unicode Server. It contains workarounds to known issues and general helpful information to improve the overall customer experience.

    Regards,
    Manfred

  4. Why might a file's encoding change automatically just by saving it from a network drive to a local drive?

    One of our SAS users has a data file. When he runs proc contents, encoding is listed as wlatin1. He runs a data step to save the file on his C: Drive, then runs proc contents again and encoding is listed as shift-jis. When he tries to use this file, he gets an error: "Data file ... is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance."

    We thought maybe something was wrong with his SAS profile, so we uninstalled and reinstalled SAS. But this did not help the issue. Any ideas?

Leave A Reply

Back to Top