How to Emulate PROC APPEND in CAS

By Steven Sober on SAS Users November 20, 2017 Topics | Advanced Analytics Programming Tips

SAS Viya provides a robust, scalable, cloud-ready, distributed runtime engine. This engine is driven by CAS (Cloud Analytic Services), providing fast processing for many data management techniques that run distributive, i.e. using all threads on all defined compute nodes.

Note:

SAS Viya 3.5+ now supports the data set option APPEND=YES. The code displayed in figure 4 can be re-written as:

data casuser.append_target_table (append=yes);
set casuser.table_two;
run;

Why

PROC APPEND is a common technique used in SAS processes. This technique will concatenate two data sets together. However, PROC APPEND will produce an ERROR if the target CAS table exists prior to the PROC APPEND.

Simulating PROC APPEND — Figure 1. SAS Log with the PROC APPEND ERROR message

Now what?

How

To explain how to emulate PROC APPEND we first need to create two CAS tables. The first CAS table is named CASUSER.APPEND_TARGET. Notice the variables table, row and variable in figure 2.

Figure 2. Creating the CAS table we need to append rows to

The second CAS table is called CASUSER.TABLE_TWO and in figure 3 we can review the variables table, row, and variable.

Figure 3. Creating the table with the rows that we need to append to the existing CAS table

To emulate PROC APPEND we will use a DATA Step. Notice on line 77 in figure 4 we will overwrite an existing CAS table, i.e. CASUSER.APEND_TARGET. On Line 78, we see the first table on the SET statement is CASUSER.APPEND_TARGET, followed by CASUSER.TABLE_TWO. When this DATA Step runs, all of the rows in CASUSER.APPEND_TARGET will be processed first, followed by the rows in CASUSER.TABLE_TWO. Also, note we are not limited to two tables on the SET statement with DATA Step; we can use as many as we need to solve our business problems.

Figure 4. SAS log validating the DATA Step ran in CAS i.e. distributed

The result table created by the DATA Step is shown in figure 5.

Figure 5. Result table from emulating PROC APPEND using a DATA Step

Conclusion

SAS Viya’s CAS processing allows us to stage data for downstream consumption by leveraging robust SAS programming techniques that run distributed, i.e. fast. PROC APPEND is a common procedure used in SAS processes. To emulate PROC APPEND when using CAS tables as source and target tables to the procedure, use DATA Step.

About Author

Steven Sober
Advisory Solutions Architect, Data Management

Steven is responsible for empowering SAS sales and system engineers in the positioning and integration of SAS® Viya, SAS® In-Database Code Accelerator for Hadoop, SAS® Scalable Performance Data Server, and SAS® Grid Manager. During his tenure at SAS, Steven has traveled globally working with customers to quickly integrate the SAS system to automate processes. Before joining SAS in 1989 as the first employee of SAS Switzerland, Steven worked for the U.S. Geological Survey, Water Resources Division, assisting hydrologists to leverage the SAS system to derive intelligence on ground and surface water in Southern Colorado. He holds a Bachelor of Science degree in Computer Science from the University of Southern Colorado in Pueblo, Colorado.

15 Comments

ljc on June 5, 2020 9:59 pm

The target does not exist. Table 2 is in hive. I want to create the target in hive by appending Table 2. Then I will continue to append other hive tables to the target.

Reply
- Steven Sober on June 9, 2020 12:35 pm
  
  Hi ljc,
  Researching this I found this link which explains your options using HIVEQL to append 2 Hive tables using the INSERT INTO statement https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Syntax.1
  
  To test if PROC SQL INSERT INTO will push down submit these OPTION statements prior to your PROC SQL code:
  
  /*-------------------------------------------------------------------------------
  **1.0 SAS system option to push PROC SQL, FREQ, MEANS, RANK, REPORT, SUMMARY and
  ** TABULATE down in their SQL
  */
  options SQLGENERATION=(NONE DBMS='TERADATA DB2 ORACLE NETEZZA ASTER GREENPLM HADOOP SAPHANA IMPALA HAWQ POSTGRES REDSHIFT SQLSVR VERTICA');
  /*------------------------------------------------------
  **2.0 SAS system option to prove where SQL was processed
  */
  OPTIONS sastrace=',,,ds' sastraceloc=saslog nostsuffix;
  OPTIONS MSGLEVEL=I;
  
  Best regards,
  Steve
  
  Reply
  - Steven Sober on June 10, 2020 12:41 pm
    
    Hi ljc,
    I was able to validate using PROC SQL and or PROC FEDSQL one can append Hive tables:
    
    /* Full pushdown */
    proc FedSQL iptrace libs=hivelib;
    insert into hivelib.trash2
    select * from hivelib.trash;
    quit;
    
    /* Full pushdown */
    proc sql;
    insert into hivelib.trash2
    select * from hivelib.trash;
    quit;
    
    148 /* Full pushdown */
    149 proc FedSQL iptrace libs=hivelib;
    NOTE: FEDSQL: Using libname HIVELIB (engine HADOOP).
    NOTE: FEDSQL: Ignoring libname SASHELP (engine V9); not in LIBS= option.
    NOTE: FEDSQL: Ignoring libname MAPS (engine V9); not in LIBS= option.
    NOTE: FEDSQL: Ignoring libname MAPSSAS (engine V9); not in LIBS= option.
    NOTE: FEDSQL: Ignoring libname MAPSGFK (engine V9); not in LIBS= option.
    NOTE: FEDSQL: Ignoring libname SASUSER (engine V9); not in LIBS= option.
    NOTE: FEDSQL: Using libname WORK (engine V9).
    NOTE: Connection string:
    NOTE: DRIVER=FEDSQL;CONOPTS= (
    (DRIVER=HIVE;SERVER=controller.gtpdm.sashq-d.openstack.sas.com;BULK_LOAD=YES;DBMAX_TEXT=256;PORT=10000;SCHEMA=default;CATALOG=
    HIVELIB); (DRIVER=BASE;CATALOG=WORK;SCHEMA=
    (NAME=WORK;PRIMARYPATH={/opt/sas/viya/config/var/tmp/compsrv/default/c282f4fd-510a-4f8d-8c35-4e12ac4de221/SAS_work39C700000778
    _gtpviya.gtpdm.sashq-d.openstack.sas.com})))
    150 insert into hivelib.trash2
    151 select * from hivelib.trash;
    IPTRACE: Query:
    insert into hivelib.trash2 select * from hivelib.trash
    IPTRACE: FULL (DDL) pushdown to HIVE SUCCESS!
    IPTRACE: Retextualized child query:
    insert into table `default`.`TRASH2`select `TRASH`.`make`, `TRASH`.`model`, `TRASH`.`type`, `TRASH`.`origin`, `TRASH`.`drivetrain`,
    `TRASH`.`msrp`, `TRASH`.`invoice`, `TRASH`.`enginesize`, `TRASH`.`cylinders`, `TRASH`.`horsepower`, `TRASH`.`mpg_city`,
    `TRASH`.`mpg_highway`, `TRASH`.`weight`, `TRASH`.`wheelbase`, `TRASH`.`length` from `default`.`TRASH`
    IPTRACE: END
    NOTE: Execution succeeded. No rows affected.
    152 quit;
    NOTE: PROCEDURE FEDSQL used (Total process time):
    real time 25.54 seconds
    cpu time 0.14 seconds
    
    153
    154 /* Full pushdown */
    155 proc sql;
    156 insert into hivelib.trash2
    157 select * from hivelib.trash;
    60 1591806039 no_name 0 SQL
    HADOOP_16: Prepared: on connection 0 61 1591806039 no_name 0 SQL
    DESCRIBE FORMATTED `default`.`TRASH2` 62 1591806039 no_name 0 SQL
    63 1591806039 no_name 0 SQL
    64 1591806039 no_name 0 SQL
    HADOOP_17: Prepared: on connection 0 65 1591806039 no_name 0 SQL
    DESCRIBE FORMATTED `default`.`TRASH` 66 1591806039 no_name 0 SQL
    67 1591806039 no_name 0 SQL
    68 1591806039 no_name 0 SQL
    HADOOP_18: Executed: on connection 1 69 1591806039 no_name 0 SQL
    insert into table `default`.`TRASH2` (`make`, `model`, `type`, `origin`, `drivetrain`, `msrp`, `invoice`, `enginesize`,
    `cylinders`, `horsepower`, `mpg_city`, `mpg_highway`, `weight`, `wheelbase`, `length`) select TXT_2.`make`, TXT_2.`model`,
    TXT_2.`type`, TXT_2.`origin`, TXT_2.`drivetrain`, TXT_2.`msrp`, TXT_2.`invoice`, TXT_2.`enginesize`, TXT_2.`cylinders`,
    TXT_2.`horsepower`, TXT_2.`mpg_city`, TXT_2.`mpg_highway`, TXT_2.`weight`, TXT_2.`wheelbase`, TXT_2.`length` from `default`.`TRASH`
    TXT_2 70 1591806039 no_name 0 SQL
    71 1591806039 no_name 0 SQL
    158 quit;
    NOTE: PROCEDURE SQL used (Total process time):
    real time 25.02 seconds
    cpu time 0.27 seconds
    
    Best regards,
    Steve
    
    Reply
ljc on June 3, 2020 11:19 pm

How would you append in Hive/Hadoop (first dataset does not exist)?

Reply
- Steven Sober on June 4, 2020 3:43 pm
  
  Hi ljc,
  I need clarification on your question, what do you mean by first dataset does not exist? Also, the source data you want to use to append, is that an existing Hive table or a CAS table?
  Best regards,
  Steve
  
  Reply
Ramprakash Durairaj on May 28, 2020 1:03 pm

Can we append to a in memory partitioned table?

Reply
- Steven Sober on May 28, 2020 1:31 pm
  
  Hi Ramprakash,
  Great question but no one cannot. If you try you will get an error in the SAS Log:
  
  ERROR: Appending to a table that is both ordered and partitioned is not supported.
  
  Best regards,
  Steve
  
  Reply
ljc on February 6, 2020 5:42 pm

In 9.4 Proc Append, the procedure would create the base dataset if it did not exist whereas the data step complains if you try to set a dataset that does not exist. Anyway around this?

Reply
- Steven Sober on February 6, 2020 6:37 pm
  
  Thank you for your question but there is no way around this when using DATA Step to append data.
  
  Reply
Steven Sober on June 18, 2019 4:31 pm

Thank you for your comment Fernando. You are correct, with Viya 3.4 support was added for the DATA Step option APPEND=YES.

Proc & Roll,
Steven

Reply
Fernando on September 14, 2018 6:30 pm

Hi
Do you tryed:

data casuser.append_target (append=yes);
set casuser.table_two;
run;

This data step is executed in CAS

Regards

Reply
Mark on February 2, 2018 3:04 am

Hi Steven, when our source data is Oracle there is a mismatch column type issue when we append to CAS.
The process goes like this - we do initial bulk load from Oracle to CAS (so varchar type is read and imported). The incremental load is based on the max date from the initial load, and use it as a filter for the source data Oracle. The delta rows are then loaded as a SAS work dataset. Then when we append to the initial bulk CAS data, there will be a mismatch of type since in the SAS work the varchar was converted to character.

An alternative is for us to load the delta rows directly into a CAS data. But, how do we append CAS data directly to the initial CAS data? When we test this, the output is a session-only data, then we need to promote. Using the promote option shows error - "ERROR: The target table intialCAS of the promotion already exists. Please specify a different name.
ERROR: The action stopped due to errors."

Reply
- Steven Sober on February 2, 2018 12:31 pm
  
  Thank you for your question Mark. I need to research to ensure I provide a working solution. To facilitate the research please email me directly at steven.sober@sas.com so I can ask you additional questions as I research.
  
  Proc & Roll,
  Steven
  
  Reply
Steven Sober on December 4, 2017 1:54 pm

Hi Bart,

Great comment; I should of mentioned in my blog that I like using DATA Step for APPENDs when I must apply transformations to the data being processed by the DATA Step i.e.

/*PROC APPEND BASE=&SOURCE OUT=&TARGET;*//*RUN;*/ DATA &TARGET;
SET &TARGET(IN=S1) &SOURCE2(IN=S2) &SOURCE3(IN=S3);
IF S1 THEN…;
IF S3 THEN…;
RUN;

In the case where there are no transformations to be made to the data and you only want to append a few rows to an existing CAS table consider these two alternatives:

1. PROC CASUTIL http://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.2&docsetId=casref&docsetTarget=n1lmowayn4qghtn18lj7kupsdzi7.htm&locale=en

2. PROC CAS addTable Action http://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.2&docsetId=caspg&docsetTarget=cas-table-addtable.htm&locale=en

Proc & Roll,
Steven

Reply
Bart Heinsius on November 21, 2017 6:53 pm

Hi Steven,

If APPEND_TARGET were existing and big and i would have only a few rows in TABLE_TWO that I need to append to APPEND_TARGET, this does not look very efficient since APPEND_TARGET is first copied onto itself before the few rows in TABLE_TWO are appended.
Or does this work more efficient behind the covers?

Regards,
Bart

Reply

Blogs