Jedi SAS Tricks: Explicit SQL Pass-through in DS2

23

One of the things I’ve come to love most about DS2 is the tight integration with SQL which makes so many data prep chores so much less onerous. An example is DATA program BY group processing. With a traditional DATA step, you must first sort or index the source data before using a BY statement. For example, this program produces an error, because the data has not been properly sorted:

data new;
   keep make model cylinders;
   set work.cars;
   by descending cylinders make model;
   if _n_=10 then stop;
run;

But, because DS2 always retrieves your data with an implicit SQL query, there is no need to pre-sort the data. This program runs just fine:

proc ds2;
title 'DS2 BY processing with base SAS datasets' ;
title2'BY descending cylinders make model';
data;
   keep make model cylinders;
   method run();
   set work.cars;
   by descending cylinders make model;
   if _n_=10 then stop;
   end;
enddata;
run;
quit;

Results of DS2 BY processing without pre-sorting the data
And it makes no difference if the source data resides in SAS or in a Relational Database Management System (RDBMS) like Oracle.

proc ds2;
title 'DS2 BY processing with Oracle tables' ;
title2'BY descending cylinders make model';
data;
   keep make model cylinders;
   method run();
   set myora.cars;
   by descending cylinders make model;
   if _n_=10 then stop;
   end;
enddata;
run;
quit;

Results of processing Oracle data with a BY statement
Another DS2 superpower is the ability to use an explicit SQL query in the SET statement. Because FedSQL does not require a CONNECT statement before writing explicit pass-through SQL, your query can even leverage the unique capabilities of the RDBMS. For example, this DS2 DATA program uses explicit pass-through to leverage Oracle’s DECODE function, and explicitly orders the results in the RDBMS before processing the results in DS2:

proc ds2;
title 'DS2 with SQL on SET statement and Explicit Passthrough to Oracle' ;
title2'Oracle SQL includes DECODE function and ORDER BY clause ';
data;
   keep make model cylinders engine_size;
   method run();
   set {select * from connection to myora
            (select Make
                  , Model
                  , Cylinders
                  , decode (cylinders, 
                            4, 'Compact', 
                            6, 'Mid',
                            8, 'Full',
                            'Oversized') as "ENGINE_SIZE"
                from cars
                order by cylinders desc, make, model)
       };
   if _n_=10 then stop;
   end;
enddata;
run;
quit;

Results of Explicit Pass-through with embedded Oracle DECODE function and ORDER BY clause
You could, if desired, leverage a DS2 BY statement to order the results instead, though this may be less efficient than having the RDBMS sort the data:

proc ds2;
title 'DS2 with SQL on SET statement and Explicit Passthrough to Oracle' ;
title2'Oracle SQL includes DECODE function ';
title3'Ordering performed by DS2 with a BY statement ';
data;
   keep make model cylinders engine_size;
   method run();
   set {select * from connection to myora
            (select Make
                  , Model
                  , Cylinders
                  , decode (cylinders, 
                            4, 'Compact', 
                            6, 'Mid',
                            8, 'Full',
                            'Oversized') as "ENGINE_SIZE"
                from cars)
       };
   by descending cylinders make model;
   if _n_=10 then stop;
   end;
enddata;
run;
quit;

Results of Explicit Pass-through with embedded Oracle DECODE function, results ordered with a DS2 BY statement
The more I use DS2, the more I have come to appreciate its tremendous power and flexibility. It is a delight to manipulate my data with this amazing tool. The SAS is truly strong with this one!

UPDATE 2/7/2018:
User Josh noted in a comment that the ordering of rows in DS2 is a complex subject, with platform, threading and code execution in-database or in SAS Cloud Analytic Services (CAS) having some profound effects. The wise DS2 programmer will read the SAS Online Documentation article SET Statement with Embedded FedSQL in the SAS® 9.4 DS2 Programmer’s Guide to gain a better understanding of this topic.

You can download a ZIP file containing a PDF of this blog entry and the SAS code used to create it from this link. The program includes a setup section that creates the SAS and Oracle data for you. The free Oracle Express (Oracle XE) served as the RDBMS. Of course, for all of this to run on your SAS installation, you’ll need a license for the SAS/Access Interface to Oracle in addition to your Base SAS.

Until next time, may the SAS be with you!
Mark

Image of book cover
Find loads of information like this in my book "Mastering the SAS(R) DS2 Procedure: Advanced Data Wrangling Techniques".

Tags
Share

About Author

SAS Jedi

Principal Technical Training Consultant

Mark Jordan (a.k.a. SAS Jedi) grew up in northeast Brazil as the son of Baptist missionaries. After 20 years as a US Navy submariner pursuing his passion for programming as a hobby, in 1994 he retired, turned his hobby into a dream job, and has been a SAS programmer ever since. Mark writes and teaches a broad spectrum of SAS programming classes, and his book, "Mastering the SAS® DS2 Procedure: Advanced Data Wrangling Techniques" is in its second edition. When he isn’t writing, teaching, or posting “Jedi SAS Tricks”, Mark enjoys playing with his grand and great-grandchildren, hanging out at the beach, and reading science fiction novels. His secret obsession is flying toys – kites, rockets, drones – and though he usually tries to convince Lori that they are for the grandkids, she isn't buying it. Mark lives in historic Williamsburg, VA with his wife, Lori, and Stella, their cat. To connect with Mark, check out his SAS Press Author page, follow him on Twitter @SASJedi or connect on Facebook or LinkedIn.

23 Comments

  1. Hey Mark, great article. I usually end up coming back to it when trying to remember how to incorporate FedSQL queries into DS2 code, which I've been using more and more often as Viya increases in prominence.

    One issue I've noted though is that you recommend in this article that users rely on the RDBMS's sorting functionality via the explicit FedSQL set statement. However, the SAS 9.4 documentation on DS2 has a note buried deep inside it that you actually cannot rely on the ordering from the data source to remain the same when it comes through to the DS2 program, in particular not in the distributed CAS environment.

    The documentation I mean can be found here: http://go.documentation.sas.com/?docsetId=ds2pg&docsetTarget=p1polmk2yv18uvn15rp9wcdvwpay.htm&docsetVersion=9.4&locale=en#n14nliy5jmxld3n1wt4whlm1or3k

    Is this a distinction that has changed over time? It may be worth making a note in your paper here so that future users are not led astray into believing they can rely on the passed-through data to be sorted correctly.

    • SAS Jedi

      You make an excellent point. This blog was written quite some time ago, when I was younger and my knowledge of DS2 was younger, too. I set out to demonstrate the ability to execute explicit pass-though code from a DS2 SET statement, and I should have stuck with the DECODE function and not tried to get too fancy :-)

      This code returns ordered rows reliably in Windows-based SAS, but the Online Docs do note that this may not be true in all environments. As for distributed processing, that all of the DS2 code in this example is being executed on a traditional SAS platform and DS2ACCEL is not set to YES. Because the code is run single threaded on Base SAS, row order is more predictable here than when running distributed (threaded). If running threaded, in-database or in CAS, the order in which rows are returned to the DS2 data program becomes even more fluid. That said, as noted in the Online Docs, you can better ensure row order by using a BY statement after the SET statement in the data program instead of using an ORDER BY clause in the pass-through SQL. Thanks for taking the time to share your insight. I've added an update in the body of the article to make this issue more prominent.
      May the SAS be with you!
      Mark

  2. Nice post. I'd be interested in taking this a step further and querying with parameters... something like:

    LIBNAME devdb SQLSVR Datasrc="Dev DB" SCHEMA=dbo;
    proc ds2 libs=(devdb);
    data work.myTestDS2 /overwrite=yes;
    method run(varchar testParm);
    set {select * from connection to devdb (
    select * from myTestTable where myTestColumn = testParm
    )};
    end;
    enddata;
    run;
    quit;

    This, of course, doesn't work, but I'm hoping perhaps you have pointers to help make it work? (For that example, simply removing the parameter and the where clause, it does work for me... I'm just trying to add in the where condition, as a parameter into the method.

    • SAS Jedi

      System methods (like RUN) do not accept parameters. To execute parameterized dynamic SQL queries, you'll want to use the SQLSTMT package. I believe you'll be able to do exactly what you want with that.
      Regards,
      Mark

  3. Hi Mark,
    I have a question: What is the benefit of using DS2 to execute native database SQL using set {select * from table} instead of proc sql implicit SQL pass through? If you know how to deal with your database in it's native SQL proficiently is there still a benefit to this DS2 technique simply for this purpose?
    Cheers,
    Chris

    • SAS Jedi

      Chris,

      This post shows "how" without a good "why" so this is an excellent question. For the example you proposed - a simple "select * from table" - I can see little efficiency advantage from any method you choose. PROC SQL or PROC FedSQL (either explicit or implicit pass-through) or execution from DS2 - all would produce the same result with little advantage over the other. DS2 would probably have the most processing overhead in that case, and I would personally choose PROC FedSQL with implicit pass-through, due the simplicity of writing the code and PROC FedSQL's ability to handle ANSI data types at full precision (like DS2). PROC SQL is limited to CHAR and DOUBLE values, like the other traditional SAS procedures, but if precision wasn't an issue, it would work just as well.

      Now consider a more complex case. Say you know some super cool summary function in DBMS-native SQL, and there isn't an easy way yo do the same thing in SAS. Now, it would be worth the effort of writing the explicit code. Let's also say you need to pre-summarize the result set of a join by ID, order the result set by Country, produce 3 separate detail subsets and a custom summarization by Country. Let's say that the custom summarization process requires DATA step techniques and or functions. In this case, DS2 would be the ideal choice.

      For example, you could write something like this:
      proc ds2;
      data good bad ugly summary;
      method run();
      set {select * from connection to libref
      (select ID, country, fancyDMBSFunction(col2)
      from dbmstable1 as t1, dbmstable2 as t2
      where t1.id=t2.id
      group by ID
      order by country, ID)};
      by country;
      if first.country then do;
      /* pre-summarization stuff */
      end;
      /* other sas code */
      if condition=1 then output good;
      else if condition=2 then output bad;
      else output ugly;
      if last.country then output summary;
      end;
      enddata;
      run;
      quit;

      With one program, and without having to write intermediate data to disk, I have managed to do it all. To solve this with pure SQL, you would have to write the join result out to a table, then read that table at least 4 times - once to produce each subset and once to produce the summary - to get it all done. That's a whole lot of code and a boat load of disk I/O to boot. But we have DS2 - so woot!!

      May the SAS be with you :-)
      Mark

  4. Thanks for your samples,

    I was trying to run Oracle sepcific queries using default sqlstmt package but it was not going through . So needs away to use sql pass thru queries with SQLSTMT DS2 package .. Any idea?

    • SAS Jedi

      The most common error when trying to use explicit pass-through in an SQLSTMT package is to forget to include the required FedSQL syntax as part of the query. Here is an example of a functional explicit pass-through SQL query on a SET statement:
      proc ds2;
      data;
      method run();
      set {select * from CONNECTION TO db
      (SELECT id, hire_date FROM test)};
      end;
      enddata;
      run;
      quit;

      The requirements for using explicit pass-through in FedSQL are:
      1. There must be a LIBREF assigned pointing to the database you wish to query.
      2. There are 2 parts to the pass-through query, and both are required for proper execution:
      a. The FedSQL portion:
      select * from CONNECTION TO db
      b. The RDBMS-specific code in parenthesis:
      (SELECT id, hire_date FROM test);

      To execute that same query in an SQLSTMT package, you would use something like this:
      DCL PACKAGE SQLSTMT T('select * from CONNECTION TO db (SELECT ID, Hire_Date FROM test)');
      I've tested this syntax using explicit pass-through to Teradata, and it works just fine. Hope this helps.

      May the SAS be with you!
      Mark

  5. Hello Mark,

    Thank you for all your answers.

    Yes, I will contact the support.

    Thank you for your book and your interesting blog.
    May the force be with US !

    Amazigh.

  6. Hello Mark.

    Can you please tell how can I see the pass-through code in log?

    I just try to switch on "options sastrace='d,,d,d' sastraceloc=saslog nostsuffix" but it doesn't help.

    If I write proc sql block then I see my native code in sas log, but it doesn't work with DS2 block.

    Is any options that help me to see pass through code in log for DS2 ?

    • SAS Jedi

      The SASTRACE system option can be used to make a SAS/Access engine to report its generated SQL in the SAS log, but PROC DS2 does not use the SAS/Access engine to retrieve data. DS2 uses a special threaded driver to access all data. This driver allows DS2 to retrieve the native ANSI data types instead of having everything converted to SAS NUMERIC or CHARACTER data, as would happen if the LIBNAME engine were used. Because the LIBNAME engine does not retrieve the data for a DS2 program, SASTRACE produces no output.

      Getting access to DS2 generated SQL is not an easy task, nor is it for the faint of heart. You will need to use the SAS Logging Facility and the DS2-specific loggers in order to get this information.

      May the SAS be with you, Nikolay, if you undertake this quest!
      Mark

    • I got rid of all the macro reference and I still have an error :

      The code is :

      PROC DS2 NOLIBS CONN="DRIVER=ODBC;DB=GERS_HD; UID=LoginSasSocAcl; PWD=*; CATALOG= (SI=DW_GersIt)";
      DATA;
      METHOD RUN();
      SET

      And the error message is :

      23 PROC DS2 NOLIBS
      23 ! CONN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      23 ! XXXXXXXXXXXXXXXXXXXXXXXX;
      ERROR: Invalid connection string.
      ERROR: TKTS initialization failed.
      ERROR: Expected list item separator.
      ERROR: Parsing failed at or near column 57: Columns 41-72 = "cl; PWD=************

      NOTE: The SAS System stopped processing this step because of errors.

      I remind you that I am under SAS 9.4.01M2PXXXXXXXX.

      Amazigh.

      • SAS Jedi

        Amazigh,
        I don't think this is a DS2 language issue, which is where my expertise lies. It looks more like a driver connection issue. I'm about to start teaching a class, and will have significantly less opportunity to respond here. So at this point I'd recommend you open a track with SAS Tech Support. They can get experts to help you resolve the issue more quickly than I will be able to help you here.

        May the SAS be with you!
        Mark

  7. Bonjour,

    Thanks for your answer.

    The reason why I use the nolibs option is that when I run the code without it, I get the following error message :

    ERROR: Duplicate catalog name, DWArchivage_GIt, encountered in connection string, DSN or file DSN
    WARNING: WARNING: [Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Changed database context to 'DW_GIt'.
    WARNING: WARNING: [Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Changed language setting to us_english.
    WARNING: WARNING: [Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Changed database context to 'DW_GIt'.
    WARNING: WARNING: [Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Changed language setting to us_english.
    ERROR: TKTS initialization failed.

    When I include the msglevel=i option, I can see that :

    (DRIVER=ODBC; DB=GERS_HD; UID=LoginSasSocAcl; PWD=*; CATALOG= (COMMUN=DW_GIt));
    (DRIVER=ODBC; DB=GERS_HD; UID=LoginSasSocAcl; PWD=*; CATALOG= (DWSI=DW_GIt));
    (DRIVER=ODBC; DB=GERS_HD; UID=LoginSasSocAcl; PWD=*; CATALOG= (DWSO=DW_GIt));

    (DRIVER=ODBC; DB=GERS_HD; UID=LoginSasSocAcl; PWD=*; CATALOG= (SI=DW_GIt));
    (DRIVER=ODBC; DB=GERS_HD; UID=LoginSasSocAcl; PWD=*; CATALOG= (SO=DW_GIt));
    (DRIVER=ODBC; DB=GERS_HD; UID=LoginSasSocAcl; PWD=*; CATALOG= (COMMUN =DW_GIt));

    (DRIVER=base;CATALOG=SOURCE; SCHEMA= (NAME=SOURCE; PRIMARYPATH={E:\PROD\Data}));
    (DRIVER=base;CATALOG=STPSAMP; SCHEMA= (NAME=STPSAMP; PRIMARYPATH={C:\Program Files\SASHome\SASFoundation\9.4\inttech\sample}));
    (DRIVER=base;CATALOG=MAPS; SCHEMA= (NAME=MAPS; PRIMARYPATH={C:\Program Files\SASHome\SASFoundation\9.4\maps}));
    (DRIVER=base;CATALOG=MAPSSAS; SCHEMA= (NAME=MAPSSAS; PRIMARYPATH={C:\Program Files\SASHome\SASFoundation\9.4\maps}));
    (DRIVER=base;CATALOG=MAPSGFK; SCHEMA= (NAME=MAPSGFK; PRIMARYPATH={C:\Program Files\SASHome\SASFoundation\9.4\mapsgfk}));
    (DRIVER=base;CATALOG=SASUSER; SCHEMA= (NAME=SASUSER; PRIMARYPATH={C:\Users\mazi\Documents\My SAS Files\9.4}))

    The first three library COMMUN, DWSI and DWSO are assigned by me with libname statement.
    The 3 next SI, SO and COMMUN are assign troughout the autoexec.

    I tried without assigning COMMUN, DWSI and DWSO but unsuccessfully.

    So I've been trying any options that I discover. But none permit me to handle that simple query.

    May the force be with me !

    Amazigh.

    • SAS Jedi

      OK - so the first thing to do is get rid of all the macro references in your code to ensure that is not the problem. From the log info above, try something like:
      PROC DS2 NOLIBS conn="DRIVER=ODBC; DB=GERS_HD; UID=LoginSasSocAcl; PWD=*; CATALOG= (SI=DW_GIt)";

      Of course, you will need to fix "PWD=*" so that it provides the actual, clear-text password.

      Let me know how it goes. And may the SAS be with you!
      Mark

  8. Hi,

    I'm trying to execute a simple proc DS2 and I have an ERROR message that gives me a headache !

    I'm under SAS 9.4.01M2PXXXXXXXX.

    PROC DS2 NOLIBS CONN="DRIVER=ODBC;DB=&Db.; UID=&uid.; PWD=&DbPwdEncrypt.; CATALOG= (SI=DW_GIt)";
    DATA;
    METHOD RUN();
    SET
    {
    SELECT DISTINCT
    a.ClePointDeVenteSI
    , STRIP(PUT(a.CleMoisAnnee,6.)) AS DT_PERIODE
    , SUBSTR(CALCULATED DT_PERIODE,5,2) AS DT_MOIS
    , SUBSTR(CALCULATED DT_PERIODE,1,4) AS DT_ANNEE
    , a.ClePresentation
    , a.CleCanalDeDistribution
    , SUM(a.Unite) AS Unite
    , SUM(a.CAHT) AS CAHT

    FROM SI.FaitMoisAchat(WHERE =(CleMoisAnnee = 201604 AND CleCanalDeDistribution in (1,2))) AS a
    GROUP BY a.ClePointDeVenteSI, a.ClePresentation

    };
    END;
    ENDDATA;

    RUN;QUIT;

    ERROR: Invalid connection string.
    ERROR: TKTS initialization failed.
    ERROR: Expected list item separator.

    I can not manage to handle this issue that hampers me to handle further merges with PROC DS2

    Anyone could help me please ?

    Amazigh.

    • SAS Jedi

      From the error message, the problem is with your connect string. Unless you have an exceptionally good reason to do so, you should not use the NOLIBS option unless you have expertise with connect string writing. It's an arcane art form, there is very little documentation available, and the correct connect string will vary depending on your SAS installation.

      instead of using
      PROC DS2 NOLIBS CONN="DRIVER=ODBC;DB=&Db.; UID=&uid.; PWD=&DbPwdEncrypt.; CATALOG= (SI=DW_GIt)";

      Try ensuring you have a LIBNAME assigned pointing a LIBREF to the data you need, then just use this instead:
      PROC DS2;

      That should resolve the first issue.
      The second issue is see is your use of a WHERE= dataset (table) option. If you review the DS2 documentation for supported table options, you will see this is not listed. Use a WHERE clause in the SQL query instead. Something like this should work:
      PROC DS2;
      DATA;
      METHOD RUN();
      SET {
      SELECT DISTINCT a.ClePointDeVenteSI
      ,STRIP(PUT(a.CleMoisAnnee, 6.)) AS DT_PERIODE
      ,SUBSTR(CALCULATED DT_PERIODE, 5, 2) AS DT_MOIS
      ,SUBSTR(CALCULATED DT_PERIODE, 1, 4) AS DT_ANNEE
      ,a.ClePresentation
      ,a.CleCanalDeDistribution
      ,SUM(a.Unite) AS Unite
      ,SUM(a.CAHT) AS CAHT
      FROM SI.FaitMoisAchat AS a
      WHERE CleMoisAnnee=201604
      AND CleCanalDeDistribution in (1, 2)
      GROUP BY a.ClePointDeVenteSI
      ,a.ClePresentation
      };
      END;
      ENDDATA;
      RUN;
      QUIT;

  9. Anders Sköllermo on

    Interesting to see examples of DS2 code. My questions:
    * What problems can you solve using DS2, that you could NOT solving using the ordinary Data step.
    * What problems can you solve BETTER using DS2, than solving using the ordinary Data step.
    * What problems can you solve WORSE using DS2, than solving using the ordinary Data step.
    * What problems can you NOT solve using DS2, that you can solve using the ordinary Data step.

    • SAS Jedi

      Wow - talk about big, open questions! :-) Here's the quick answer:
      * What problems can you solve WORSE using DS2, than solving using the ordinary Data step?
      The only thing I can think of offhand is that, if your DATA step process is I/O bound, the single-threaded processing is already causing the CPU to idle while it waits for additional data to be delivered. If you don’t have in-database processing capability with DS2, threading that process on the SAS platform will just result in multiple CPUs idling & waiting for additional data to be delivered, while adding the additional overhead of running the a PROC and monitoring the threads. DS2 will probably run the process more slowly in this case.
      * What problems can you NOT solve using DS2, that you can solve using the ordinary Data step?
      Reading directly from and writing directly to raw data files (text files) is a capability in the traditional SAS DATA step which does not currently exist in DS2.
      * What problems can you solve using DS2, that you could NOT solving using the ordinary Data step?
      The traditional Base SAS DATA step is a mature, flexible and powerful too. There is almost nothing I can’t do with a DATA step. So at first I wanted to say “nothing” – but that’s not 100% true. If you are doing complex data manipulation on large data and are running into network latency issues, DS2 with the In-Database Code Accelerator can resolve that problem by taking the code to where the data resides instead of having to bring the data to the code.
      * What problems can you solve BETTER using DS2, than solving using the ordinary Data step?
      Now THIS gives us some room to expound on the benefits of DS2. Any data manipulation problem that is CPU bound can be easily converted to a DS2 THREAD and safely executed in parallel by a DATA program with very little effort on the part of the programmer. You COULD accomplish this with a traditional DATA step and MPCONNECT, but it’s not pretty, safe or easy…
      In addition, DS2 can directly ingest the result of an SQL query on the DATA program SET statement, eliminating the need to write the query result to a data set before it can be processed with DATA program logic.
      Another better thing in DS2: if you want to produce a report but need to manipulate some data first, in traditional DATA step this might require an SQL join, a DATA step to manipulate the data, and a reporting step (PROC PRINT, PROC SQL, PROC REPORT, etc.) to produce the report. You could accomplish all of that in a single DS2 DATA program! Leaving out the output destination on the DATA statement causes DS2 to output a report instead of a DATA set. It’s pretty awesome!
      Of course, you would learn all of this (and much more) by taking the “DS2 Programming: Essentials” or "DS2 Programming Essentials with Hadoop" course, or getting my book “Mastering the SAS® DS2 Procedure: Advanced Data Wrangling Techniques

      • Bart Heinsius on

        Thanks for these answers.

        The one thing I can't seem to find on DS2: How do I specify the parallellism?

        -Bart

        • SAS Jedi

          Hey, Bart! Welcome to parallel processing in Base SAS :-)
          If you are running threads in-database using the SAS In-Database Code Accelerator (PROC DS2 DS2ACCEL=YES;), your threads will automatically run in parallel on the available worker nodes. If you are running threads on the SAS server (PROC DS2;) just use the THREADS= option on the SET FROM statement in your DATA program to control the number of threads executed in parallel. It is a good rule not to specify more threads than you have CPUs.

  10. thanks Mark for another amazing tip. I just got your new book in the mail - DS2 advanced data wrangling procedures. Can't wait to dig in after our internal office move! happy summer

Back to Top