When you weren't watching, SAS did it again. We smuggled Yet Another Excel Engine into a SAS release.
SAS 9.4 Maintenance 2 added the XLSX engine, which allows you to read and write Microsoft Excel files as if they were data sets in a library. The big advantage of using this engine is that it accesses the XLSX file directly, and doesn't use the Microsoft data APIs as a go-between. (LIBNAME EXCEL and LIBNAME PCFILES rely on those Microsoft components.) That means that you can use this engine on Windows or Unix systems without having to worry about bitness (32-bit versus 64-bit) or setting up a separate PC Files Server process.
The XLSX engine does require a license for SAS/ACCESS to PC Files. Are you a SAS OnDemand for Academics user? The SAS/ACCESS product is part of that package, so this technique works there. It's an easy way to get well-formed Excel data into your SAS process.
/* because Excel field names often have spaces */ options validvarname=any; libname xl XLSX '/folders/myfolders/sas_tech_talks_15.xlsx'; /* discover member (DATA) names */ proc datasets lib=xl; quit; libname xl CLEAR; |
Example output:
Once the library is assigned, I can read the contents of a spreadsheet into a new SAS data set:
/* because Excel field names often have spaces */ options validvarname=any; libname xl XLSX '/folders/myfolders/sas_tech_talks_15.xlsx'; /* read in one of the tables */ data confirmed; set xl.confirmed; run; libname xl CLEAR; |
Sometimes you need just one value from a spreadsheet. That's a common use case for dynamic data exchange (DDE), which isn't as feasible as it once was. With the XLSX engine, you can use FIRSTOBS= and OBS= options to control how much data you retain:
/* read in just one value */ data _null_; set xl.confirmed (firstobs=6 obs=6 keep='Job title'n); call symput('VALUE','Job Title'n); run; %put &value; |
Output:
76 %put &value; Testing Manager, Quality-driven User Experience Testing
You can also use the XLSX engine to create and update XLSX files.
libname xlout XLSX '/folders/myfolders/samples.xlsx'; data xlout.cars; set sashelp.cars; run; data xlout.classfit; set sashelp.classfit; run; data xlout.baseball; set sashelp.baseball; run; data xlout.air; set sashelp.air; run; libname xlout clear; |
Here is my output in Microsoft Excel with all of these data sets now as sheets:
Remember, you can also create Microsoft Excel files with Base SAS by using ODS EXCEL (as of 9.4 Maintenance 3).
The XLSX libname is different from the EXCEL and PCFILES engines in other ways. For example, the XLSX engine does not support Excel named ranges (which can surface a portion of a spreadsheet as a discrete table). Also, you won't see the familiar "$" decoration around the spreadsheet names when they are surfaced in the library within SAS. If you need that sort of flexibility, you can use PROC IMPORT to provide more control over exactly what Excel content is brought into SAS and how.
One other IMPORTANT caution: The XLSX engine is a sequential access engine in that it processes data one record after the other. The engine starts at the beginning of the file and continues in sequence to the end of the file. Some techniques to MODIFY the data in-place will not work. For these reasons, I recommend using DATA step to copy the Excel content that you want to another SAS library, then CLEAR the XLSX library to free the lock on the Excel file.
I have found LIBNAME XLSX to be a quick, convenient method to bring in Excel data on any SAS platform. If you have SAS 9.4 Maintenance 2 or later, try it out! Let me know how it works for you by sharing a comment here.
WANT MORE GREAT INSIGHTS MONTHLY? | SUBSCRIBE TO THE SAS TECH REPORT
123 Comments
Hi Chris,
Would this new XLSX engine have the same old issues of having to guess which columns should be numeric, which should be character, and the length of the character field based on the first 20 rows? That seems to related to MS Jet engine, if like you said this has nothing to do with MS API, potentially it should not have that problem, right?
Thanks
Ya
Ya,
The XLSX method takes advantage of the Excel metadata that is encoded into the XLSX file format, so I think that there are fewer nuances with "guessing" column types based on values. However, there might be some other nuances and limitations. You'll have to test it with your content to see how it works for you. I think that it works best when your data are well-formed in the spreadsheet, database-like -- and not sparse mixed content like you might encounter in some Excel files.
Hi Chris,
Thanks! I'm still using 9.4M0, wonder if the benefit of this new engine is good enough to push me to update to M2, in addition to some other benefits.
If anyone reading Chris' blog has 9.4M2, would you please run a test and let us know?
Thanks
Ya
Ya, you can try it for yourself in SAS University Edition -- free software for learners from SAS.
Is it possible to use row headers for variable names?
Matt, yes. In fact, that's the default behavior for the XLSX engine. One note: since sometimes Excel headers have spaces or special characters in the text, use OPTIONS VALIDVARNAME=ANY to tell SAS to preserve these.
Nice this feature is there in 2015 after it was introduced by MS in 2007.
That is leaving us with another question. As the xlsx file and the open-office format (OASIS eg libre office) are closely related (MS did some add-ons) is that format also supported?
The other one is Excel is supporting an AES encryption on the whole zip (xlsx) file. Any option to specify that?
Jaap, do I detect some sarcasm in your comment? :)
As you know, the XLSX libname is not the first support for the XLSX format in SAS. XLSX support has been in SAS for several years, and this latest advance is another method that makes it more convenient to treat Excel content like data. I've been blogging about XLSX techniques since 2010.
I don't think that password-protected Excel XLSX files are supported in SAS, yet. Are they widely used? We would look to our customers to tell us how important encrypted XLSX files are to them.
And I'm not aware of any projects to support Libre Office specifically. Since Libre supports exchange with the Microsoft standard file formats, I'm not sure of the demand among our users. Again, we would look to our customers to provide that feedback. Adding ideas on http://communities.sas.com allows us and our users to review and "vote" as part of the SASWare Ballot.
You are right for the "some sarcasm".
1/ Customers do not always tell the things they are really needing but instead telling things they are having some difficulties with. There is a difference that is doing root cause analyses. The feed-back with ideas it a good one but do not forget doing a root cause. I am remembering your "autoexec" EGuide post.
http://blogs.sas.com/content/sasdummy/2011/01/10/you-asked-for-it-the-autoexec-process-flow/
2/ The Excel (better MS office) question is having the same background. Some users are wanting that as primary interface, others have replacements not wanting to use MS. http://www.theguardian.com/technology/2014/jan/29/uk-government-plans-switch-to-open-source-from-microsoft-office-suite
Personal I have the experience that an Excel is often getting the primary concern instead of the data-analytics area. The first and most difficult part is getting acceptance for those XLSX interface on the Unix area. They want to have it being delivered and accessed as easy as on their windows desktop. I think that would be no problem with a nice sftp browser associating the xlsx with Excel and processing it all server-side with SAS. Distributing by mail is another option. That are user functionality requests and not the same as IT-people requests for blocking that.
The encryption will be a next step. To much to solve first with higher prio's. But at some moment there will be the functionality requirement of data having being encrypted. There is an aes option added with 9.4 for sas datasets. To agree I hope you see the same kind of conflict interest of the "users end-users" and "users ts-support and audit - decision makers" is there. To whom are you listening in those cases?
The below program is not working in SAS University edition. Infile is accepting for .txt .dat .csv files except .xlsx. Can you explain the reason why?
Data USpresidents;
Infile "/folders/myshortcuts/Raw/Excel/USPRES.xlsx";
Input PresidentName $ Party $ Number;
Run;
Proc print data=USpresidents;
Run;
Proc contents data=USpresidents;
Run;
Yes. You can't use DATA step to read an Excel file directly. You need to either import the file with PROC IMPORT, or use LIBNAME XLSX (as described in this post). Example:
libname xl XLSX "/folders/myshortcuts/Raw/Excel/USPRES.xlsx";
proc datasets lib=xl;
quit;
Hi,
Just wondering what is the advantage to using this method as apposed to the PROC EXPORT function (allows .csv file export) :)
I would expect that .csv files would allow for larger amounts of data due to the file type?
Thanks
B
CSV does allow for large data and doesn't have the same restrictions as Excel, although XLSX does allow for some large spreadsheets. CSV doesn't have the same level of metadata encoded into the column types, but much of that can be inferred upon import.
Sometimes, the requirement (from a client, from the business) is Excel output, and this helps to manage that.
I've also found it to be a good way to get around the 200 character datafile name limit.
I noticed some unexpected behavior when the following steps are taken
1. output a SAS dataset to an existing sheet sheet in an xlsx file using the basic method described in the post
2. open the xlsx file and manually make changes to the data in the sheet from step 1
3. create a new dataset from the sheet in step 2, treating it as a dataset as described in the blog post
Reviewing the new data shows that only the records from step 1 are captured in the new dataset. Worth noting is that after step2, a .bak file is created that is a copy of the original excel file prior to being written to but deleting this has no impact on the results of performing step 3
Hi Chris,
I have come back periodically over the last 5 months hoping that I would at least be told that my results can't be duplicated and therefore I must be doing something wrong. Are my steps hard to follow? I could write something more thorough if that would help. Do you have any idea what is going on? I have had several occasions when this would have been useful but I hesitated at implementing it in our process so as not to introduce unexpected and hard to trace bugs.
Thanks,
David
Hi David, I'm not sure what's happening for you here. I'd suggest tracking with SAS Technical Support -- they can get you a definitive answer.
how do I specify in the 2nd example that my first row in excel contains the columnnames like the GETNAMES=yes option in proc import?
I think that's the default behavior, at least from what I've seen. The first row is treated as column names. You might need to specify OPTIONS VALIDVARNAME=ANY in order for the names to be valid, if they contain spaces or special characters.
thanks for the reply. See this:
options validvarname=any;
libname xl XLSX "my_excel.xlsx" ;
data DATA_PWZ ; set xl.'DATA PWZ'n ; run;
NOTE: The import data set has 446 observations and 91 variables.
NOTE: There were 446 observations read from the data set XL.'DATA PWZ'n.
NOTE: The data set WORK.DATA_PWZ has 446 observations and 91 variables.
NOTE: DATA statement used (Total process time):
real time 0.50 seconds
Looks good, but a proc print shows that the columnnames are in first row :
Obs A B C D
1 BEGINDATUM BEGINTIJD EINDDATUM EINDTIJD
2 41971 0.4305556 41971 0.43125
3 41971 0.4131944 41971 0.4166667
4 41971 0.4840278 41971 0.4847222
5 41971 0.4604167 41971 0.4625
6 41971 0.425 41971 0.4284722
7 41971 0.4583333 41971 0.4604167
8 41971 0.5236111 41971 0.5291667
9 41971 0.4097222 41971 0.4104167
10 41971 0.41875 41971 0.4201389
gr, Arie
I don't see that behavior when the column names are in the first row of the sheet. You might need to track this with SAS Technical Support, as the issue might be triggered by your particular data. Does this happen with all Excel sheets for you?
I'll check SAS / thnks
Hi Chris,
Good info. Is there any easy way to remove the spaces in the dataset names once brought in (if they exist)? In your example, the second dataset TOPIC IDEAS has spaces.
Alan
Alan,
You can't automatically change the sheet names on the way in, but you could use PROC DATASETS to copy and rename the data members into another library. You could use the NVALID function to determine whether the name was valid, and then use COMPRESS to remove characters that don't comply with the SAS rules.
Thanks, Chris! Knew about the COMPRESS function, but did not know about the NVALID function. Worked marvelously.
Hi Alan,
An example of this for the xl library example above would be very helpful. Thanks for posting this.
John
Hi Chris:
Is there a roadmap for XLSX engine to provide member level status to the named ranges, of either worksheet or workbook scope, or named tables ?
Hi Richard -- good question. I'm not aware of a project for that, but it would be a good suggestion if you want to submit it to the SASWare Ballot. In the meantime you can accomplish it with PROC IMPORT.
Hi Chris,
I am having an interesting problem here, I followed the steps posted here with proc dataset but for some reason the individual sheet are not read as data member type
/*because excel field names often have spaces*/
options validvarname=any;
libname xl XLSX 'ntummc\datashares\Dept\JHS\SC\Working\Stanford\Datareq\Jh1480\SNPs.xlsx';
/*discover member (data) names */
proc datasets lib = xl ; quit;
libname xl clear;
log message is WARNING: No matching members in directory.
What is it that it cannot recognize the individuals sheets as member type?
thanks
Stanford
Hi Stanford, I'm not sure why -- what are the member names (sheet names)? Have you tried a PROC IMPORT with DBMS=XLSX to see what that might reveal? You might need to add OPTIONS VALIDMEMNAME=EXTEND if the sheet names have special characters.
Hi Chris,
I am trying to determine if libname XLSX can reduce the need for a PCFIles Server in a WIndows Data / Linux SAS situation.
The documentation for libname xlsx says "The XLSX engine enables you to read and write XLSX data directly across UNIX, Linux, and Microsoft Windows systems. If you have XLSX files stored
on a UNIX or Linux system, you do not need to use the SAS PC Files Server (or the PCFILES LIBNAME engine) to access the Excel (.xlsx) files. If you have SAS running on UNIX or Linux and your XLSX files reside remotely on Windows, you need to use the PCFILES LIBNAME engine access them."
However what if the files on the Windows Server are available on the Linux service via a Samba share? Will Libname XLSX work then?
Joe,
If the XLSX files are network-accessible from a file mount on UNIX, you should be able to work this without PC Files Server. This would be true for LIBNAME XLSX as well as the DBMS=XLSX option on PROC IMPORT and PROC EXPORT.
Hi Crhis,
Thanks for the article. I am trying to update a sheet using this method but Iget the error below. Do you know any solution?
Can not replace sheet because it has existing formulas that reference other pages
Romain,
This is by design -- so that the export/update process doesn't overwrite a linked sheet and corrupt the file. You might try using LIBNAME PCFILES instead, but that would require more setup (with the PC Files Server) and I'm not sure it's possible even then. This is the sort of thing DDE is good at, but DDE relies on even more moving parts that are increasingly rare in a SAS environment (SAS and Excel together on a single Windows machine).
I get the same error "can not replace sheet because it has existing formulas that reference other pages" on some, but not all, of my Excel destination export sheets using the XLSX DBMS for proc export.
It seems to affect random sheets, but consistently affect the same sheets. I've checked several times that the sheets do not have formulas embedded. Their cells are referenced from other sheets within the workbook, but this is true also for other destination sheets where the proc export executes without fail.
Even if I replace the sheet the error will subsequently occur. Nothing apparently funny about the sheet names associated with this error either.
I have 9.4 EG (64-bit) and Office 2010 (32-bit)
I think the message is issued if an existing sheet has any reference to other content in the workbook (such as a link to another sheet). In the XLSX file, there is some metadata of sheet references called a "calculation chain", and if there is any reference from your target sheet, the XLSX method errs on the side of caution and won't overwrite. If you think this is NOT the case for your workbook, you should open a case with SAS Technical Support with the details.
Thanks, Chris. I think our company may have to open a ticket.
What's weird about you described as the references problem is that I can "fix" it by doing the following. Let's say there are two sheets in the Excel book (A and B). The proc export writes to sheet B and sheet A has formulas with cell references to sheet B. No formulas on sheet B. If the proc export fails with this error (above), I can create a new sheet "Bx" and successfully use a proc export to sheet Bx. Then change the formulas on sheet A to reference cells on sheet B2 instead, then the proc export will continue to work on subsequent attempts.
It can't be the existence of formulas pointing to an export sheet, but something as you say about Excel's underlying metadata for a particular sheet that comes into conflict with SAS in circumstances I have yet to pinpoint.
Hi Chris,
I have a similar issue. Would you know if a pivot table built with a data source coming from a proc export in SAS is considered a reference? I'm not using any formula, but rather a pivot table. What's it is odd is that even when I remove all the pivot tables associated with the Excel table, the export just won't work.
It basically seems quite random why it works or not, since sometimes it does, but I'm losing hope in finding the solution for this one. I need consistency to run the report automatically, can't just expect it works
Hi Fabio,
You could post the details of your question, including an example and details about your SAS software version, on the SAS Support Communities. Experts might be able to help. Or you could track these details with SAS Tech Support. And have you seen this blog post about SAS and Excel and Pivot tables?
Pingback: Your favorite SAS posts from 2015 - SAS Voices
Hi Chris,
At http://support.sas.com/documentation/cdl/en/acpcref/67382/HTML/default/viewer.htm#p05hxa7xihwolmn1sdjpfui9nosc.htm it says that LIBNAME options and data set options are "limited" for the XLSX LIBNAME engine.
Can you please illuminate me or point me to the doc that says which ones are or are not supported?
Thanks!
Michelle
Michelle,
LIBNAME XLSX is a sequential engine, which has some limitations. From the doc:
The XLSX engine allows the sequential reading of data only. That is, it does not support random access. It does not support certain tasks that require random access, such as the RANK procedure, which requires the reading of rows in a random order.
I'm not sure which data set options are specific to random-access engines, but I guess the limitations are similar to those for the XML engines.
Thanks. POINTOBS= is the only one I can think of off the top of my head.
I'm afraid I've run into the gotcha as was noted in the original post when trying to "view" the xlsx sheet while in EG. The first, I could find no way to interrupt the hung EG session except to End the process from Windows Task Manager. I was 95% confident the .EGP would be autosaved, so I re-entered EG and chose the Autosave to the .EGP's original location (on Linux btw). I was also prompted to save an autocall macro to it's original location, which I did. From there, I noticed the .EGP file disappeared from the folder directory as I viewed it on Linux. The entire .EGP was lost and unfortunately our IT is not doing backups of SAS-related files in this filesystem at a new data center.
I tried to recapture the EGP from the location assigned as the Project Recovery area on Windows, but the only thing saved there, was the last copy of a program I had last executed, nothing else from the EGP.
I have not been able to find reference to this known bug so any help like what is the safest way to interrupt EG at this point would be greatly appreciated. We are runnnig this version of EG:
7.11 HF3 (7.100.1.2805) (64-bit)
Regards,
Marty Light
Hi Marty,
Sorry to hear that you lost work with this hangup. The Autosave location for EG is usually something like "%appdata%\SAS\EnterpriseGuide\7.1\ProjectRecovery" on you local machine, so be sure to check there for historical versions of you project file.
I don't know of a way to interrupt EG safely, other than killing the process, when you get this particular hang. I know it's a defect that the EG developers have on their list to fix (I entered it, but I wasn't the only one). In the meantime, EG programmers need to be careful about direct access using this engine.
How can i make other than 1th row as a variable row.Example-->I have an excel file where i want to make 4th row as a header row while importing because first 3 rows has redundant data. My data start from 6th row as 5th row also has redundant data.
I knew the option as name row in proc import for xls file but what option to use for xlsx file while importing from unix and windows environment.
You can use DBMS=XLSX in a PROC IMPORT statement, and you should be able to achieve the same effect with STARTROW and DATAROW options.
Hi Chris,
Does this approach allow to export variable labels instead of names?
data xlout.CMMC2014;
set work.hosp;
If Hospital eq 'CMMC' & Match in ('E','Y');
run;
Stanford,
I think an approach that might work for you is: use ODS EXCEL, and PROC PRINT with the LABEL option.
ods excel file="c:\temp\CMMC2014.xlsx"; proc print data=work.hosp (where=(Hospital eq 'CMMC' AND Match in ('E','Y'))) label noobs; run; ods excel close;
Many thanks Chris, I've to say I have saved your page on my favorites and do visit frequently, you're doing a great job in helping the intermediate programmers such as myself,
Hi Chris,
Can we also password protect our excel while exporting the same.
Thanks
DG
No, password-protected Excel files are not supported with this...
Thanks for this article. Super helpful. Quick question, do you know if it's possible to start writing data from the second row instead of the first? Thanks!
Do you mean on the second row of the sheet -- or beginning with the second row of data? For the latter, use FIRSTOBS or similar:
libname t xlsx "c:\temp\t.xlsx"; data t.a; set sashelp.class(firstobs=2); run; libname t clear;
For the former -- no, I don't think you have that control with LIBNAME XLSX. Check out ODS EXCEL though -- it has lots of options for placement and formatting.
When i try to use xlsx engine and create a xlsx file from a sas dataset it gets created and at the same time it creates define name which has the range of the sheet.
It can bee seen in excel formulas in excel:
Below is th code used to create xlsx file.
libname xlout xlsx"c:\temp\define.xlsx" ;
data xlout.air_help;
set sashelp.air;
run;
Now when i try to modify the sheet manually created in the above step by adding the more rows and columns and save it and try to convert this modified xlsx back to SAS dataset by referring the to the libname assigned using xlsx engine i do not get the updated data as per the updated xlsx sheet which i modified after xlsx was created by sas using xlsx engine in the above step.
data air;
set xlout.air_help;
run;
It gives me 144 observations and 2 variables where as i added one more column in the sheet manually after it got created in first step.
Ideally it should have shown 144 obs and 3 variables.
Where as proc import works fine on the modified sheet.
Is this a defect in xlsx engine.
It might be a defect, or just a nuance. That's open to interpretation...
The process of creating the sheet "air_help" creates an Excel named range with the same name, and that's what shows even after you make changes to the workbook. To workaround this: when you edit the workbook to add new rows/columns, be sure to also visit the Formulas->Name Manager menu. DELETE the named range ("air_help") that you see listed there. This affects only the range definition -- it won't affect the data. Save the workbook and then SAS will be able to see the full sheet again, not just that predefined range.
Edit a formula would do but if we do all this using proc import then formula does not need to be edited in the scenario stated. This makes me think it is a defect .and SAS user might not realize it .
I agree -- I'll pass it on to the developers.
Hi Chris,
I tried to combine multiple excels - It worked . but the combined excel does not have the formats(colour ,font) etc ? Do we have any way to preserve the existing formats?
Thanks
Teena.
Not with LIBNAME XLSX, which is purely a data exchange (no formatting). For colors and formatted coming from SAS, you need to use ODS EXCEL as a destination. To merge with an existing Excel file, you might need to use LIBNAME PCFILES (which treats Excel more like a database) or else the SAS Add-In for Microsoft Office.
Hello Chris! I am using SAS On Demand for Academics in my classes at University. When read a Excel file in SAS using Libname and Xlsx some variables that is numerical on Excel shows as a Character variable on SAS table. I tried to configure for Number o Excel but it turns Character on SAS table.
I am using the following program:
Options valivarname=v7;
Libname Name xlsx “path”;
There are some configuration on Excel or Libname options to do?
Best regards
Unfortunately you don't have a lot of control with LIBNAME XLSX. You could try importing with PROC IMPORT and DBMS=XLSX to see if that yields a better result. Or, if it's detecting that a field is character (maybe there are some non-numeric values in some rows), you can convert after importing by using the INPUT statement in DATA step.
When I use the proc datasets lib=[excel workbook libname] to give the list of sheets within the Excel workbook, sometimes it does not return any of the names, even though I have used the option validmemname=extend. Examples of the sheet names within the file are 'FRN 2853951 - IA' (quotes mine). In other examples such as 'test-123-456' (no spaces) show up just fine in the datasets statement. So I'm wondering if the spaces are the problem. Without the ability to do proc datasets statement I don't see how the Excel files can be opened in SAS... Any way around the not recognizing sheet names, do you know? Thanks.
This should work, and it does in my test cases. Another approach you might try (assuming a libname of "xLIB"):
proc sql;
select * from sashelp.vmember where libname="XLIB";
quit;
If it's not working, I suggest reaching out to SAS Tech Support.
When i try to read the xlsx file using proc import in SAS grid 9.4 M3 using dbms=xlsx i get the below error:
ERROR: An exception has been encountered.
Please contact technical support and provide them with the following traceback information:
The SAS task name is [IMPORT (]
Segmentation Violation
Traceback of the Exception:
ERROR: An exception has been encountered.
But when i try to run proc import in PC sas 9.2 with dbms =excel option it runs fine without error.
I am not able to figure out whats going on
That's most likely a bug in the XLSX method for PROC IMPORT -- I recommend you contact SAS Technical Support to see if there is a fix available. Anything unusual about the file? Very large column names or other strange attributes?
No around 59k records when i break it into two it works fine so no data issue. DO u have any idea what does segmentation error mean.
[SEGMENTATION error] is simply a crash in SAS. Shouldn't happen, but sometimes does -- so working with Tech Support is the best way to get it resolved.
thanks
Is there any other way to read this file in sas grid.
DBMS=XLSX is the best way if you can get it to work. Otherwise you could try LIBNAME XLSX. Work with SAS Tech Support if you continue to see errors.
Hi Chris,
It is possible to capture the order of the worksheets in the Excel file?
Datasets/contents/dictionary tables all appear to list the worksheets in alphabetical order.
Thanks,
Daniel.
Daniel, I guess the order is not preserved with LIBNAME XLSX, since "table sequence" is not really a concept for most SAS libraries. It might work with something like LIBNAME PCFILES which uses Microsoft APIs to read the Excel file. It would be nice to have some sort of ordinal metadata available.
I have a xls file which has 2 diff column which has values as Chinese character and numeric value i am trying to import using proc import
proc import datafile="path/china.xls"
out=tp dbms=XLS replace; sheet=sheet1; GETNAMES=YES;
run;
It imports successfully without any error in the log but when i see the sas dataset the column which has Chinese character the column with numeric values have ?? and -e in place of Chinese character displayed
When i save the xls file as xlsx and run the below code it runs pecfectlt fine with correct values dispalyed
Note I am running the above code in SAS GRID using EG and has utf 8 encoding set at server level
proc import datafile="path/china.xlsx"
out=tp dbms=XLSX replace; sheet=sheet1; GETNAMES=YES;
run;
Not able to figure out whats wrong with this file
DBMS=XLS is a legacy method that does not support the most modern Excel formats. You can use DBMS=EXCELCS if you have a PC Files Server set up, or DBMS=XLSX (as you tried with success).
Hi Chris,
Your post is very useful! Thank you!
I also have a few questions on using LIBNAME to read excel data.
1. Can you say in the code that you just want to read sheet1, regardless of what the name of sheet1 is?
2. If not, how would you read the data sheet that has a space in the end, such as "Rater_1 " (there is a space after 1)?
3. How would you read the data sheet whose name has a space in between two words, such as "Rater 1"? I tried the following, but it didn't work.
OPTIONS VALIDVARNAME = ANY;
LIBNAME MYDATA XLSX "&DATA_PATH.\&DATA_NAME..xlsx";
DATA RAWDAT;
SET MYDATA.'Rater 1$'n;
RUN;
LIBNAME MYDATA CLEAR;
Thank you in advance for your help!
I find that if you run PROC DATASETS with LIB= pointing to the XLSX library, you'll learn the sheet names as that engine can reference them. It's slightly different than LIBNAME EXCEL or PCFILES.
Thank you Chris. I'll check that out :)
In your 3rd sheet, "YOUTUBE METADATA", how would you read that with a space between the two words? Would you please show an example?
Thank you!!!
Ji
For such names, use the "literal" notation:
/* read in one of the tables */ data yt; set xl.'YOUTUBE METADATA'n; run; libname xl CLEAR;
Hi,
This was very helpful. I have an existing program that uses proc sql to import excel files. This article was timely as I had a request where I had to update my program with a small excel file. So, I thought, what a better way to try this new method. It worked to import that one excel file but the proc sql statments follwoing all failed. When I remove the new method of import, my program ran just fine, once again. I find this very odd if the xl libname is cleared.
Any insight?
Thanks
Michelle
Are you trying to use the Excel data directly from the XLSX libname as a table in your query? If so, I would instead use a DATA step to copy the table to a work data set, then use that in your PROC SQL step.
LIBNAME XLSX is a convenient way to pull data into SAS, but I think there are limitations for how you can use it directly from the library "in situe".
Is the option
Header = no
valid in LIBNAME XLSX statement?
Farideh -- I don't think so. I think that works only in the EXCEL/PCFILES engine.
Using SAS 9.4 TS1M4, using HEADER=NO in the LIBNAME statement with the XLSX engine gives an invalid option error. Is there any way using LIBNAME XLSX ... to prevent SAS from assuming the first row is a header row?
I don't think this option is supported in the XLSX engine. For that, you might need to resort to the PCFILES or EXCEL engine (which uses native APIs or the PC File Server).
Nice article, Chris!
Hi,
would this have issues on data truncation?how would sas know the max length to assign to a field?
Cheers,
R
Good question. I think that in general it works okay, because the XLSX file has a little bit of metadata about its fields and SAS should pick that up. However, it's always a good idea to test the edge cases if you have some Excel files with very long cell values.
Hello Chris,
I was trying to run the following code to create multiple sheets within the Excel file. (My actual code has multiple conditions and creates around 30-40 different datasets in a single DATA SET query)
I tried with the sample code to see if I can create 'N' number of sheets. But it is throwing me an error about file-lock. Can you please let me know how to rectify this?
Mine is an UNIX server.
options validvarname=any;
libname dsn xlsx ".../Tes.xlsx" ;
data dsn.class dsn.class2 dsn.class3;
set sashelp.class;
if Sex='M' then output dsn.class;
if Sex='F' then output dsn.class2;
if Age=12 then output dsn.class3;
run;
libname _all_ clear;
ERROR: Resource is write-locked by another thread. File =..../Tes.1 . It is either not an Excel spreadsheet or
it is damaged. Error code=80001019
This looks like a bug in the XLSX engine. I suggest opening a track with SAS Technical Support. Workarounds: use PROC EXPORT to export multiple sheets, or multiple DATA steps. (Not ideal, I know.)
I am having an issue. I'm running SAS 9.4 on a Windows 10 machine with Office 2016.
When I use the xlsx library engine on a file, I want to either use proc sql or proc datasets to delete one of the tabs in the Excel. In both instances I get:
Excel.tran_table (the name of the libname and the exact tab I want to delete) cannot be deleted because files cannot be deleted from the Excel library. The library was created this way:
libname excel xlsx "path\filename.xlsx"
Now from looking around this does seem to be something that can be done. The file is NOT file protected. I even made a very short simple example Excel file myself and get the same message when I try to remove the sheet.
Any idea?
I don't think LIBNAME XLSX supports deleting sheets. You could use LIBNAME EXCEL or EXCELCS, but you would need PC Files Server in order to make that work.
Hi Chris,
That is really helpful with your post with XLSX engine using in LIBNAME statement.
I have tried in SAS 9.4 TS1M5 to import data sets with 2016 excel file. But the strange thing is there's no any sheets read in library. Could you please help to debug this? I'm not sure if this related to code.
The log is as followings:
38 options validvarname=any;
39 libname RSSB xlsx "C:\...\Data_2018.xlsx";
NOTE: Libref RSSB was successfully assigned as follows:
Engine: XLSX
Physical Name: C:\...\Data_2018.xlsx
40 proc datasets lib=rssb;
Directory
Libref RSSB
Engine XLSX
Physical Name C:\...\Data_2018.xlsx
Por .
WARNING: No matching members in directory.
40 ! quit;
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.06 seconds
cpu time 0.03 seconds
Rachel,
Not sure what is going on. Make sure that the file is not ALSO open in another app (like Excel), as that causes conflicts. If that's not it, this might be a question for SAS Tech Support.
Hi Chris
I have a question I am using the following:
libname xyz xlsx ''c:\anyfile.xlsx';
data decodes;
set lk.decodes ;
run;
I get the following warning
395 data decodes;
396 set lk.decodes ;
397 qsorres= compress(qsorres,';','');
398 run;
WARNING: Some character data was lost during transcoding in column: QSORRES at obs 145.
WARNING: Some character data was lost during transcoding in column: QSORRES at obs 175.
WARNING: Some character data was lost during transcoding in column: QSORRES at obs 248.
NOTE: The import data set has 1256 observations and 9 variables.
NOTE: There were 1256 observations read from the data set LK.decodes.
NOTE: The data set WORK.DECODES has 1256 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.15 seconds
cpu time 0.15 seconds
when i look at the sas dataset everything seems to be there except in the observations mentioned above the ';' seems to have been transcoded to an unrecognizable charecter, is there any way to get rid of this warning?
Are you using a SAS session with UTF-8 encoding? I recommend that for use with data that might have national characters. And then look into the KCOMPRESS function, which is the version of the COMPRESS function designed with multibyte characters in mind.
it works when i run it on a utf-8 sas session.
thanks
Rajesh
Great! I use UTF-8 by default for most of my SAS jobs now.
Hi Chris,
I know this has been addressed, but I still cannot get it to work even after using validvarname = any.
I've never had this problem before!
I have an excel spreadsheet with a sheet name with a space in it. I've used validvarname = any and v7. Tried the proc datasets pointing to the excel library but the log says 'No matching members in directory.'
I KNOW sheets with spaces and special characters can be imported, I just can't get it to work! I just started at this job, do you think I need some kind of plug or extension?
OPTIONS VALIDVARNAME=any;
libname og xlsx "H:\test\nback_raw_460control_day1_3-12-19.xlsx";
PROC DATASETS LIB = og;
quit;
Nevermind. It's because my SAS is 64 bit and Excel is 32. The horror!
Actually, bitness should not make a difference when using the XLSX libname. Did you get it to work?
Hi Chris,
What can cause the difference noted below. Using EG 7.1, when I run this program on the SAS Grid Server (SASApp), I get the result of "no matching members"
Program:
libname outxl xlsx "h:\testxlout.xlsx";
proc datasets lib=outxl; quit;
Log:
26 libname outxl xlsx "h:\testxlout.xlsx";
NOTE: Libref OUTXL was successfully assigned as follows:
Engine: XLSX
Physical Name: h:\testxlout.xlsx
27 proc datasets lib=outxl;
WARNING: No matching members in directory.
27 ! quit;
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
In the "Results" tab I see:
Directory
Libref OUTXL
Engine XLSX
Physical Name h:\testxlout.xlsx
Por .
However, when I connect to the "Local" Server, the program works correctly:
Log:
26 libname outxl xlsx "h:\testxlout.xlsx";
NOTE: Libref OUTXL was successfully assigned as follows:
Engine: XLSX
Physical Name: h:\testxlout.xlsx
27 proc datasets lib=outxl;
27 ! quit;
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.20 seconds
cpu time 0.01 seconds
Results tab:
Directory
Libref OUTXL
Engine XLSX
Physical Name h:\testxlout.xlsx
Por .
# Name Member Type DBMSTYPE
1 CLASS DATA TABLE
Note the first table is the same as before, but now there is a "Member" table identifying the spreadsheet in the Excel file.
Does this happen because Excel is not on the SASApp Server, but is on the Local server?
Thanks
Not sure what's happening here. Nothing to do with Excel being present, as Excel is not used to manage the XLSX reading. Maybe something else is going on with the options you have set. Or, LIBNAME XLSX does like exclusive access -- so make sure you CLEAR the libname in between tests.
Is there any option so that the first row isn't automatically treated as the variable name?
I don't think so. For that, you would use PROC IMPORT with DBMS=XLSX and use the RANGE option.
Pingback: SAS export to Excel just got a little bit easier - The SAS Dummy
Is there a way to save as a file. For example, I have a general template that I export data to using the libname statements above. But I don't want to write over my general template but do a file save as after exporting the data.
Debra,
I don't think that LIBNAME XLSX will allow you to write to a sheet in a "template" XLSX without destroying your template. You might be able to use LIBNAME EXCELCS (which uses PC Files Server) to write to a named range, or use PROC EXPORT to write to a different sheet (and reference that sheet from your template to pull in data). Sounds like you might be trying to replace a process that used DDE?
I'm looking for a way to import an Excel spreadsheet, regardless of whether or not a user may have the spreadsheet open. It doesn't look there are any options on the proc import to specify this, and using the libname statement with XLSX and access=readonly doesn't quite accomplish it either because it gives me an error ERROR: Error opening XLSX file -> \\emcspa\APS\Business Intelligence\PLAN\Plan_2020.xlsx . It is either not an Excel spreadsheet
or it is damaged. Error code=80001019
Requested Input File Is Invalid
if the file is open. Any thoughts??
Unfortunately, SAS really wants an exclusive lock on the file. I admit it's a frustrating behavior, but that's always been how it works.
I will need to import information on a tab with special characters in the tab name that include "&" and "#" - is there way to reference tab number instead?
You can refer to these sheets with literal syntax:
Hello Chris,
Thank you for this informative post.
I have some questions regarding reading excel files.
I am using current release of SAS : 9.04.01M6P110718 and OS is Linux
I am using libname xlsx engine to read excel file with multiple tabs.
Libname xlsref xlsx “path/filename.xlsx”;
1. Could I know how many rows are considered by SAS to auto-assign length and datatype to every column imported by using the above libname stmt.
2.Suppose the filename has say 10000 or more records and it has a column with character values , for first 1000 records, the length of the character is 30, and the rest of the records has length greater than 30 (say 100).
When we use the above libname statement (xlsx engine), is there a possibility that my rest of the records will be truncated to length 30.
If so, is it possible to avoid this truncation.
As convenient as it is, the XLSX library engine does not offer much flexibility for types/lengths. You can use the LIBNAME EXCEL (or PCFILES) engine and then add data set options like DBSASTYPE to have more control over how data is read.
Hi Chris,
I am just now getting to use SAS again, this maybe trivial but I used to know how to get around it, I am running a following code and it is not reading the member files
/* because Excel field names often have spaces */
options validvarname=any;
libname xl XLSX "S:\Stanford\COVID-19_Cases_Sept 8_Cleaned_DedupsREMOVED_NEW FORMAT.xlsx";
/* discover member (DATA) names */
proc datasets lib=xl; quit;
libname xl CLEAR;
Log page
126
127 libname xl CLEAR;
NOTE: Libref XL has been deassigned.
128 /* because Excel field names often have spaces */
129 options validvarname=any;
130
131 libname xl XLSX "S:\Stanford\COVID-19_Cases_Sept 8_Cleaned_DedupsREMOVED_NEW FORMAT.xlsx";
NOTE: Libref XL was successfully assigned as follows:
Engine: XLSX
Physical Name: S:\Stanford\COVID-19_Cases_Sept 8_Cleaned_DedupsREMOVED_NEW FORMAT.xlsx
132
133 /* discover member (DATA) names */
134 proc datasets lib=xl;
WARNING: No matching members in directory.
Stanford - any chance you have the Excel file open in another application? SAS likes (demands) exclusive access to the file to read it.
Something is going on with this particular file, I need to figure out what it is, I was able to read the member name fine for the other excel file. Thank for your prompt response as always
Stanford
Hi Chris,
can xlsx engine import csv files directly (i.e. without changing them to xlsx extension), I want to save time and not having to go download the file then save into excel
No, but you can use PROC IMPORT and DBMS=CSV to read these directly!
is it possible use "set xl._all_ " , if we don't know the sheet name in excel?
Not exactly, but you can generate a list of sheets (tables) easily:
Is there a way to control how SAS reads in a variable when using the xlsx engine? I have an Excel file with a variable called "Date" that has a mix of date and date time values:
Date
2/5/2015
4/4/2016 11:10:00AM
3/23/2015
6/2/2015
When I use the libname xlxs engine, this is what the resulting SAS dataset looks like:
Date
42040
4/4/2016 11:10:00AM
42086
42157
This new SAS "Date" var is formatted as character ($21.), with the original date values as a SAS date, and the date time values as character. I don't mind if the SAS "Date" is character but I don't want it to convert the date values to SAS dates (want to keep as "2/5/2015" instead of "42040").
I have to do this for several Excel files with multiple sheets in each file, so want to have this process automated within a macro and not have to manually adjust any columns myself before using the xlsx engine... am I out of luck??
Hi Amy, you cannot change the data "on the way in" -- it's going to require post-processing, especially with mixed types like this. Possibly you could use CSV as an intermediary.