Yesterday a frustrated SAS user complained on Twitter. He's working with a database that stores an ID field as a big long number (perhaps using the database BIGINT type), and SAS can't display a number greater than 15 digits. Well, it's actually 16 digits, depending on the value:
%put Biggest Exact Int = %sysfunc(constant(EXACTINT,8)); >> Biggest Exact Int = 9007199254740992
It's a controversial design decision to use an integer to represent an ID value in a database. You might save a few bytes of storage, but it limits your ability to write programs (not just SAS programs) that have to store and manipulate that value. And if you don't need to do math operations with the ID, your data consumers would rather see a nice character value there.
Fortunately, when working with databases, you can tell SAS to read numeric values as character values into your SAS data sets. In addition to solving the precision problem I've just described, this can also help when you need to join database fields with other source systems that store their key fields differently. It's usually much easier to convert the field "on the way in" rather than try to mangle it after you've already read in the records. Use the DBSASTYPE= data set option to tell SAS how to read database fields. Here's a sample SAS program that shows how I access a table using ODBC, one step without and one step with the DBSASTYPE= option.
libname wpblogs odbc datasrc="wpblogs"; options obs=10; data users_IDint (keep=ID display_name); set wpblogs.wp_users; run; data users_IDchar (keep=ID display_name); set wpblogs.wp_users (dbsastype=(ID='char(20)')); run; |
Here are the resulting tables; you can see the simple difference. One has ID as a number, and one has it as a character. Magic!
The DBSASTYPE= option is supported for virtually all SAS/ACCESS database engines, including the ubiquitous SAS/ACCESS to ODBC.
Oh, and you might be wondering how things turned out for our frustrated user on Twitter. Our SAS Cares social media team heard his plea and responded -- as they always do. And our user not only found the information useful, he took it a step further by replying back with an additional syntax tip.
@cjdinger @SAS_Cares @SASsoftware Multiple vars can be reformatted in one PROC SQL statement for a join. #SQLprobs pic.twitter.com/2kEWdmOmPv
— Sam Van Horne (@LearningPlaces) May 19, 2016
7 Comments
Ahh @SAS_CARES the power is yours! And now Sam Van Horne also!
Is the difference between
(dbsastype=(ID='char(20)')) and DBDSOPTS="DBTYPE=(BilledTo=’CHAR(8)’)" is that the former works in the DATA step with relational databases and the latter works in PROC IMPORT with MS Access and Excel spreadsheets?
Yes, I think that's right. The DBDSOPTS= option is specific to PROC IMPORT and the EXCEL/PCFILES engines, which passes through these options to the subsystems that process Excel and MS Access files. Also, DBTYPE= usually means "take this SAS value and write it to the database as this type" whereas DBSASTYPE= means "take this database value and read it into SAS as this type".
This is a great tip, and I wish I had know about this back when I worked at the bank :-)
Another option, available to SAS 9.4 users, is to process the data using a DS2 DATA program and/or a FedSQL query. If you read from database tables and write the result back to the database, you don't need to worry about those pesky data type conversions, because both DS2 and FedSQL can handle those big numbers at native precision.
I find DBSASTYPE invaluable for reading Excel columns. The automatic type guessing frequently fails, particularly if you have many null values.
Is there a way around the 256 character limit for DBDSOPTS= option's text length?
Hi Gordon, from what I've read the current allowance is 4096 characters for this value (extended as of SAS 9.3) You might still get a warning about "unbalanced quotes" -- Base SAS issues that automatically, but you can suppress with the OPTIONS NOQUOTELENMAX statement.