Using SAS to access data stored on Dropbox

I used "Dropbox" in the title for this post, but these techniques can be used for other cloud-based file sharing services, such as GitHub and Google Drive.

Using PROC HTTP (added in SAS 9.2), you can easily access any "cloud-based" file as long as you have a public link to reach it. I'm a cloudy sort of guy, so I've got profiles on Dropbox, Google Drive and Github. I've shared a few sample data sources on these accounts so that you can try these techniques for yourself.

Here's one working example to get you started. (In all of my examples, I included a placeholder for PROXYHOST=, which you need only if your Internet connection is routed through a gateway proxy. That's common for corporate intranet connections, less common for schools and homes.)

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat" 
 out=_inbox
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;

There are a few additional considerations for accessing cloud-based data.

You need the direct download link for the file


When you click "Share Link" on a service like Dropbox, you are provided with a browser-friendly link that you can give to your friends or colleagues. When they visit that link, they are presented with a simple view of the file name and a Download button.

Here's the Share Link for my sample data set:

https://www.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat

I can't use that link in my SAS program because PROC HTTP isn't going to "click" on the Download button for me. Instead, I need the link that results from clicking on the Download button, which I can get by substituting the dl.dropbox.com domain instead of www.dropbox.com in the address:

https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat

Note: You should assume that URL schemes for the "direct download" links can be subject to change, and of course they vary based on the cloud service provider. Today, the "dl.dropbox.com" scheme works for Dropbox.

A cloud-based folder is NOT a cloud-based path for a SAS library

Dropbox allows you to share a link to a folder that contains multiple files. As tempting as it might be to try, you cannot treat a cloud folder as a "path" for a SAS library. Did you notice the FILENAME statement in my first code example? I'm providing a local file destination for the download operation. PROC HTTP is my method to copy the file that I need into my SAS session.

If I have multiple files to fetch, I'll need to repeat those steps for each file, using its direct-download URL.

Your file might require additional processing before you can use it

My earlier program example works by copying a SAS7BDAT file (a SAS data set) into my SAS WORK folder. Because it's a data set, my SAS session recognizes as a valid data member. I was lucky because the data set encoding was compatible with my SAS session.

Sometimes the data might not be quite ready for you to use directly. If a colleague shares a file in a CSV or Excel format, you will need to run a PROC IMPORT step before you can use it. For convenience, a colleague might use PROC CPORT to save multiple SAS data sets into a single transport file. You'll need to run it through PROC CIMPORT before you can use it.

Here's an example that uses a CPORT file that I created with the CARS and CLASS sample data sets:

filename _inbox TEMP;
 
proc http method="get" 
 url="https://dl.dropbox.com/s/5uehelsok9oslgv/sample.cpo" 
 out=_inbox
  /* proxyhost="http://yourproxy.company.com" */
;
run;
 
proc cimport file=_inbox lib=work;
run;
filename _inbox clear;

From the SAS log:

NOTE: PROCEDURE HTTP used (Total process time):
      real time           0.91 seconds
      cpu time            0.00 seconds

NOTE: PROC CIMPORT begins to create/update data set WORK.CARS
NOTE: Data set contains 15 variables and 428 observations. 
      Logical record length is 152

NOTE: PROC CIMPORT begins to create/update data set WORK.CLASS
NOTE: Data set contains 5 variables and 19 observations. 
      Logical record length is 40

You can %include SAS programs from the cloud

It might not be the wisest move to blindly access a cloud-based SAS program and execute it with your SAS session, but you can do it. And because SAS programs can contain data records (in the form of CARDS or DATALINES), this is another way to share data in a portable way.

This program downloads and runs Rick Wicklin's "Christmas Tree" challenge, which he posted on The DO Loop blog last week.

filename _inbox "%sysfunc(getoption(work))/ifsxmas.sas";
proc http method="get" 
 url="https://dl.dropbox.com/s/g1hat0woohud9yc/IFSChristmasTree.sas" 
 out=_inbox
  /* proxyhost="http://yourproxy.company.com" */
;
run;
%include _inbox;
filename _inbox clear;

Working with Google Drive and GitHub

These techniques also work with the other popular services, but the download links are different.

For files stored on GitHub, you need to find the direct link to the "raw" file in your repository. Here's a working example:

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://github.com/cjdinger/proc_http_example/raw/master/data/streaming.sas7bdat" 
 out=_inbox
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;

For Google Drive, you must first select to make the file or folder public, or at least reachable by "anyone who has the link". By default, your Google Drive content is private to your account.

The "Share Link" is more cryptic than some of the other services. The important part is the id= portion of the URL, which allows you to form the direct link like this:

https://docs.google.com/uc?export=download&id=<cryptic_id_from_Google_Drive>

Here's my SAS-enabled example. Note the use of the %str() function to avoid complaints about trying to resolve the ampersand portion of the URL:

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://docs.google.com/uc?export=download%str(&)id=0BwSh7LOTCPQ5MmlJZFdOOFJhOHc" 
 out=_inbox 
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;
tags: cloud, Dropbox, github, Google Drive, PROC HTTP, SAS programming

16 Comments

  1. Posted December 18, 2012 at 11:12 am | Permalink

    Cool collection.

    For text files hosted on cloud, a simple FILENAME statement with URL option works, like

    filename Tree url 'https://dl.dropbox.com/s/g1hat0woohud9yc/IFSChristmasTree.sas';
    %include Tree;

    • Chris Hemedinger Chris Hemedinger
      Posted December 18, 2012 at 11:16 am | Permalink

      Yes, FILENAME URL works for certain types of files. I've got examples of that in this post as well.

      PROC HTTP offers more options to provide a little more control, and it's immensely useful for other Web interactions.

  2. David Moors
    Posted December 18, 2012 at 12:32 pm | Permalink

    Will SAS get a proc https anytime soon? I'm sure most people/companies would like their data to be secure when transferring it over the internet.

    I'm aware you can use the following (below) to set https via the JRE, however it must be easier for people use a proc.

    http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a003286920.htm

    • Chris Hemedinger Chris Hemedinger
      Posted December 18, 2012 at 12:47 pm | Permalink

      PROC HTTP supports access to https web sites (as seen in my examples). It also supports Basic Authentication via WEBUSERNAME and WEBPASSWORD (or WEBAUTHDOMAIN) options.

    • Chris Hemedinger Chris Hemedinger
      Posted December 19, 2012 at 9:32 am | Permalink

      David, now I see what you're talking about. Looks like you can use PROC HTTP and METHOD="GET" to pull information via SSL with no additional setup, but if you want to exchange information with another endpoint via https, you need to configure the SSL certificates. Because that involves encryption from the "client" side (where SAS is a client, like a browser), it must rely on the capabilities of your operating environment. Outside of SAS/SECURE, SAS doesn't provide any built-in encryption capabilities.

      • David Moors
        Posted December 20, 2012 at 8:41 am | Permalink

        Hi Chris,

        Thanks for the information, I didn't realise you could pull info over SSL with PROC HTTP without additional setup. That's quite interesting. Perhaps I should read the documentation first before posting questions!

        regards

        David

  3. Christian H.
    Posted January 16, 2013 at 5:03 am | Permalink

    Interesting idea, but it doesn´t work on our site.
    I found, it´s because PROC HTTP is included since 9.2 and we are still on 9.1.

    May I suggest to include the software requirements on these entries?

    • Chris Hemedinger Chris Hemedinger
      Posted January 16, 2013 at 9:16 am | Permalink

      Apologies for the confusion -- yes, PROC HTTP was added in 9.2. In 9.1 you can use FILENAME URL for similar effect, although it's not quite as flexible. I'll update the post to indicate the 9.2-or-later requirement.

  4. Amy E.
    Posted February 14, 2013 at 3:40 pm | Permalink

    The ability to import files from dropbox has been extremely useful. Now I'm wondering if it is possible to export to a dropbox (or other cloud-based service) location. It would be amazing to be able to automatically output a file such as a csv or xml file that other users can access in a dropbox folder.

    • Chris Hemedinger Chris Hemedinger
      Posted February 14, 2013 at 3:48 pm | Permalink

      Amy,

      I think that the PROC HTTP interaction to create a file on Dropbox would be complex. However, if you use the Dropbox capability to synch files from a local folder to a Dropbox account, you can have your program create its output into that local folder, and let the "cloud synchro" magic do the rest.

  5. Sitarama
    Posted March 13, 2013 at 12:10 am | Permalink

    Hi Chris,
    One of our vendor is set-up the cloud based MY-SQL for data. Our site is trying access them.
    Could you please tell me is there a way for EXPORT/IMPORT to cloud based MY-SQL using SAS?
    It would really helpful if you point any references.

    Thanks,
    Sitarama

    • Chris Hemedinger Chris Hemedinger
      Posted March 13, 2013 at 8:28 am | Permalink

      I have limited experience with cloud-based MySQL. If you can configure an ODBC connection (using the MySQL Connector or ODBC driver), then you should be able to access from SAS using SAS/ACCESS to ODBC, the same as with any generic database driver. I have used this approach to read data from MySQL over the Internet before, but I'm not sure that's the same as "cloud-based". In my case, my ODBC connection was to a specific host and port that just happened to be on the wide Internet.

      This might be a good question for SAS Technical Support. Be prepared to provide the types of connection details your vendor issues: is there a specific host node, port, authentication mechanism -- that sort of thing.

  6. Posted September 23, 2013 at 6:45 pm | Permalink

    Chris,

    I'm just starting out in SAS, so I'm not an experienced programmer. I have a REST that pulls data into Excel - how do I get this to pull the data into a work table in SAS 9.2/9.3? I'm a bit thrown by the ampersands and macro variables. In Excel, the begin date and end date are kept in cells referenced by the braces, but are generally in this format: 2013-08-25 00:00:00 and 2013-08-31 23:59:59.

    Here's the URL (with fake auth_key and Merchant_ID):
    https://www.avantlink.com/api.php?auth_key=123456789&merchant_id=123456&module=MerchantReport&report_id=8&date_begin=&date_end=&output=tab

    • Chris Hemedinger Chris Hemedinger
      Posted September 25, 2013 at 8:56 am | Permalink

      Carla,

      I'm not sure of the exact form of the URL for your web service, but you might want to look at the URLENCODE function (to help create a correctly formed URL) as well as the use of the %str() function to "protect" non-macro instances of ampersands.

      Something like this:

      https://www.avantlink.com/api.php?auth_key=123456789%str(&merchant_id)=123456%str(&module)=MerchantReport%str(&report_id)=8&date_begin.=&date_end.=%str(&output)=tab
      
  7. Chase DeHan
    Posted October 27, 2014 at 8:58 pm | Permalink

    Despite what is said in the above post, you can write to a google drive as your library if you have it synced up with the computer. As such, it creates a path that can be used to write your files to. This is the method I use since I have computers at home and at work.

    • Chris Hemedinger Chris Hemedinger
      Posted October 28, 2014 at 9:17 am | Permalink

      Chase, you're correct! You can use the "sync" features of Google Drive, Dropbox, Box, Microsoft OneDrive and others to maintain a local version of what you have stored in the cloud, and then reference those synchronized folders in any LIBNAME statement that you run from local SAS. That's a good workaround as long as your SAS session has access to those local folders.

2 Trackbacks

  1. [...] If you follow the link, there are several examples of how to read the entire sample, or just read the data. The last example shows how to "extract the initial comment block, title, and DATA step and submit them to SAS." If you like the idea of reading data directly from the Web, Chris Hemedinger recently showed how to read data from a public URL by using PROC HTTP. [...]

  2. [...] (source) and _bcout (target), then call the %binaryFileCopy() macro. Here is an example use that copies a file from my Dropbox account: filename _bcin TEMP; filename _bcout "C:tempstreaming.sas7bdat"; proc http [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>