How to Emulate PROC APPEND in CAS

15

SAS Viya provides a robust, scalable, cloud-ready, distributed runtime engine. This engine is driven by CAS (Cloud Analytic Services), providing fast processing for many data management techniques that run distributive, i.e. using all threads on all defined compute nodes.

Note:

SAS Viya 3.5+ now supports the data set option APPEND=YES. The code displayed in figure 4 can be re-written as:

data casuser.append_target_table (append=yes);
set casuser.table_two;
run;

Why

PROC APPEND is a common technique used in SAS processes. This technique will concatenate two data sets together. However, PROC APPEND will produce an ERROR if the target CAS table exists prior to the PROC APPEND.

Simulating PROC APPEND
Figure 1. SAS Log with the PROC APPEND ERROR message

Now what?

How

To explain how to emulate PROC APPEND we first need to create two CAS tables. The first CAS table is named CASUSER.APPEND_TARGET. Notice the variables table, row and variable in figure 2.

Figure 2. Creating the CAS table we need to append rows to

The second CAS table is called CASUSER.TABLE_TWO and in figure 3 we can review the variables table, row, and variable.

Figure 3. Creating the table with the rows that we need to append to the existing CAS table

To emulate PROC APPEND we will use a DATA Step. Notice on line 77 in figure 4 we will overwrite an existing CAS table, i.e. CASUSER.APEND_TARGET. On Line 78, we see the first table on the SET statement is CASUSER.APPEND_TARGET, followed by CASUSER.TABLE_TWO. When this DATA Step runs, all of the rows in CASUSER.APPEND_TARGET will be processed first, followed by the rows in CASUSER.TABLE_TWO. Also, note we are not limited to two tables on the SET statement with DATA Step; we can use as many as we need to solve our business problems.

Figure 4. SAS log validating the DATA Step ran in CAS i.e. distributed

The result table created by the DATA Step is shown in figure 5.

Figure 5. Result table from emulating PROC APPEND using a DATA Step

Conclusion

SAS Viya’s CAS processing allows us to stage data for downstream consumption by leveraging robust SAS programming techniques that run distributed, i.e. fast. PROC APPEND is a common procedure used in SAS processes. To emulate PROC APPEND when using CAS tables as source and target tables to the procedure, use DATA Step.

Share

About Author

Steven Sober

Advisory Solutions Architect, Data Management

Steven is responsible for empowering SAS sales and system engineers in the positioning and integration of SAS® Viya, SAS® In-Database Code Accelerator for Hadoop, SAS® Scalable Performance Data Server, and SAS® Grid Manager. During his tenure at SAS, Steven has traveled globally working with customers to quickly integrate the SAS system to automate processes. Before joining SAS in 1989 as the first employee of SAS Switzerland, Steven worked for the U.S. Geological Survey, Water Resources Division, assisting hydrologists to leverage the SAS system to derive intelligence on ground and surface water in Southern Colorado. He holds a Bachelor of Science degree in Computer Science from the University of Southern Colorado in Pueblo, Colorado.

15 Comments

  1. The target does not exist. Table 2 is in hive. I want to create the target in hive by appending Table 2. Then I will continue to append other hive tables to the target.

    • Steven Sober
      Steven Sober on

      Hi ljc,
      Researching this I found this link which explains your options using HIVEQL to append 2 Hive tables using the INSERT INTO statement https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Syntax.1

      To test if PROC SQL INSERT INTO will push down submit these OPTION statements prior to your PROC SQL code:

      /*-------------------------------------------------------------------------------
      **1.0 SAS system option to push PROC SQL, FREQ, MEANS, RANK, REPORT, SUMMARY and
      ** TABULATE down in their SQL
      */
      options SQLGENERATION=(NONE DBMS='TERADATA DB2 ORACLE NETEZZA ASTER GREENPLM HADOOP SAPHANA IMPALA HAWQ POSTGRES REDSHIFT SQLSVR VERTICA');
      /*------------------------------------------------------
      **2.0 SAS system option to prove where SQL was processed
      */
      OPTIONS sastrace=',,,ds' sastraceloc=saslog nostsuffix;
      OPTIONS MSGLEVEL=I;

      Best regards,
      Steve

      • Steven Sober
        Steven Sober on

        Hi ljc,
        I was able to validate using PROC SQL and or PROC FEDSQL one can append Hive tables:

        /* Full pushdown */
        proc FedSQL iptrace libs=hivelib;
        insert into hivelib.trash2
        select * from hivelib.trash;
        quit;

        /* Full pushdown */
        proc sql;
        insert into hivelib.trash2
        select * from hivelib.trash;
        quit;

        148 /* Full pushdown */
        149 proc FedSQL iptrace libs=hivelib;
        NOTE: FEDSQL: Using libname HIVELIB (engine HADOOP).
        NOTE: FEDSQL: Ignoring libname SASHELP (engine V9); not in LIBS= option.
        NOTE: FEDSQL: Ignoring libname MAPS (engine V9); not in LIBS= option.
        NOTE: FEDSQL: Ignoring libname MAPSSAS (engine V9); not in LIBS= option.
        NOTE: FEDSQL: Ignoring libname MAPSGFK (engine V9); not in LIBS= option.
        NOTE: FEDSQL: Ignoring libname SASUSER (engine V9); not in LIBS= option.
        NOTE: FEDSQL: Using libname WORK (engine V9).
        NOTE: Connection string:
        NOTE: DRIVER=FEDSQL;CONOPTS= (
        (DRIVER=HIVE;SERVER=controller.gtpdm.sashq-d.openstack.sas.com;BULK_LOAD=YES;DBMAX_TEXT=256;PORT=10000;SCHEMA=default;CATALOG=
        HIVELIB); (DRIVER=BASE;CATALOG=WORK;SCHEMA=
        (NAME=WORK;PRIMARYPATH={/opt/sas/viya/config/var/tmp/compsrv/default/c282f4fd-510a-4f8d-8c35-4e12ac4de221/SAS_work39C700000778
        _gtpviya.gtpdm.sashq-d.openstack.sas.com})))
        150 insert into hivelib.trash2
        151 select * from hivelib.trash;
        IPTRACE: Query:
        insert into hivelib.trash2 select * from hivelib.trash
        IPTRACE: FULL (DDL) pushdown to HIVE SUCCESS!
        IPTRACE: Retextualized child query:
        insert into table `default`.`TRASH2`select `TRASH`.`make`, `TRASH`.`model`, `TRASH`.`type`, `TRASH`.`origin`, `TRASH`.`drivetrain`,
        `TRASH`.`msrp`, `TRASH`.`invoice`, `TRASH`.`enginesize`, `TRASH`.`cylinders`, `TRASH`.`horsepower`, `TRASH`.`mpg_city`,
        `TRASH`.`mpg_highway`, `TRASH`.`weight`, `TRASH`.`wheelbase`, `TRASH`.`length` from `default`.`TRASH`
        IPTRACE: END
        NOTE: Execution succeeded. No rows affected.
        152 quit;
        NOTE: PROCEDURE FEDSQL used (Total process time):
        real time 25.54 seconds
        cpu time 0.14 seconds

        153
        154 /* Full pushdown */
        155 proc sql;
        156 insert into hivelib.trash2
        157 select * from hivelib.trash;
        60 1591806039 no_name 0 SQL
        HADOOP_16: Prepared: on connection 0 61 1591806039 no_name 0 SQL
        DESCRIBE FORMATTED `default`.`TRASH2` 62 1591806039 no_name 0 SQL
        63 1591806039 no_name 0 SQL
        64 1591806039 no_name 0 SQL
        HADOOP_17: Prepared: on connection 0 65 1591806039 no_name 0 SQL
        DESCRIBE FORMATTED `default`.`TRASH` 66 1591806039 no_name 0 SQL
        67 1591806039 no_name 0 SQL
        68 1591806039 no_name 0 SQL
        HADOOP_18: Executed: on connection 1 69 1591806039 no_name 0 SQL
        insert into table `default`.`TRASH2` (`make`, `model`, `type`, `origin`, `drivetrain`, `msrp`, `invoice`, `enginesize`,
        `cylinders`, `horsepower`, `mpg_city`, `mpg_highway`, `weight`, `wheelbase`, `length`) select TXT_2.`make`, TXT_2.`model`,
        TXT_2.`type`, TXT_2.`origin`, TXT_2.`drivetrain`, TXT_2.`msrp`, TXT_2.`invoice`, TXT_2.`enginesize`, TXT_2.`cylinders`,
        TXT_2.`horsepower`, TXT_2.`mpg_city`, TXT_2.`mpg_highway`, TXT_2.`weight`, TXT_2.`wheelbase`, TXT_2.`length` from `default`.`TRASH`
        TXT_2 70 1591806039 no_name 0 SQL
        71 1591806039 no_name 0 SQL
        158 quit;
        NOTE: PROCEDURE SQL used (Total process time):
        real time 25.02 seconds
        cpu time 0.27 seconds

        Best regards,
        Steve

    • Steven Sober
      Steven Sober on

      Hi ljc,
      I need clarification on your question, what do you mean by first dataset does not exist? Also, the source data you want to use to append, is that an existing Hive table or a CAS table?
      Best regards,
      Steve

    • Steven Sober
      Steven Sober on

      Hi Ramprakash,
      Great question but no one cannot. If you try you will get an error in the SAS Log:

      ERROR: Appending to a table that is both ordered and partitioned is not supported.

      Best regards,
      Steve

  2. In 9.4 Proc Append, the procedure would create the base dataset if it did not exist whereas the data step complains if you try to set a dataset that does not exist. Anyway around this?

  3. Steven Sober on

    Thank you for your comment Fernando. You are correct, with Viya 3.4 support was added for the DATA Step option APPEND=YES.

    Proc & Roll,
    Steven

  4. Hi
    Do you tryed:

    data casuser.append_target (append=yes);
    set casuser.table_two;
    run;

    This data step is executed in CAS

    Regards

  5. Hi Steven, when our source data is Oracle there is a mismatch column type issue when we append to CAS.
    The process goes like this - we do initial bulk load from Oracle to CAS (so varchar type is read and imported). The incremental load is based on the max date from the initial load, and use it as a filter for the source data Oracle. The delta rows are then loaded as a SAS work dataset. Then when we append to the initial bulk CAS data, there will be a mismatch of type since in the SAS work the varchar was converted to character.

    An alternative is for us to load the delta rows directly into a CAS data. But, how do we append CAS data directly to the initial CAS data? When we test this, the output is a session-only data, then we need to promote. Using the promote option shows error - "ERROR: The target table intialCAS of the promotion already exists. Please specify a different name.
    ERROR: The action stopped due to errors."

    • Thank you for your question Mark. I need to research to ensure I provide a working solution. To facilitate the research please email me directly at steven.sober@sas.com so I can ask you additional questions as I research.

      Proc & Roll,
      Steven

  6. Steven Sober

    Hi Bart,

    Great comment; I should of mentioned in my blog that I like using DATA Step for APPENDs when I must apply transformations to the data being processed by the DATA Step i.e.

    /*PROC APPEND BASE=&SOURCE OUT=⌖*//*RUN;*/ DATA ⌖
    SET &TARGET(IN=S1) &SOURCE2(IN=S2) &SOURCE3(IN=S3);
    IF S1 THEN…;
    IF S3 THEN…;
    RUN;

    In the case where there are no transformations to be made to the data and you only want to append a few rows to an existing CAS table consider these two alternatives:

    1. PROC CASUTIL http://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.2&docsetId=casref&docsetTarget=n1lmowayn4qghtn18lj7kupsdzi7.htm&locale=en

    2. PROC CAS addTable Action http://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.2&docsetId=caspg&docsetTarget=cas-table-addtable.htm&locale=en

    Proc & Roll,
    Steven

  7. Hi Steven,

    If APPEND_TARGET were existing and big and i would have only a few rows in TABLE_TWO that I need to append to APPEND_TARGET, this does not look very efficient since APPEND_TARGET is first copied onto itself before the few rows in TABLE_TWO are appended.
    Or does this work more efficient behind the covers?

    Regards,
    Bart

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top