How to Emulate PROC APPEND in CAS

4

SAS Viya provides a robust, scalable, cloud-ready, distributed runtime engine. This engine is driven by CAS (Cloud Analytic Services), providing fast processing for many data management techniques that run distributive, i.e. using all threads on all defined compute nodes.

Why

PROC APPEND is a common technique used in SAS processes. This technique will concatenate two data sets together. However, PROC APPEND will produce an ERROR if the target CAS table exists prior to the PROC APPEND.

Simulating PROC APPEND

Figure 1. SAS Log with the PROC APPEND ERROR message

Now what?

How

To explain how to emulate PROC APPEND we first need to create two CAS tables. The first CAS table is named CASUSER.APPEND_TARGET. Notice the variables table, row and variable in figure 2.

Figure 2. Creating the CAS table we need to append rows to

The second CAS table is called CASUSER.TABLE_TWO and in figure 3 we can review the variables table, row, and variable.

Figure 3. Creating the table with the rows that we need to append to the existing CAS table

To emulate PROC APPEND we will use a DATA Step. Notice on line 77 in figure 4 we will overwrite an existing CAS table, i.e. CASUSER.APEND_TARGET. On Line 78, we see the first table on the SET statement is CASUSER.APPEND_TARGET, followed by CASUSER.TABLE_TWO. When this DATA Step runs, all of the rows in CASUSER.APPEND_TARGET will be processed first, followed by the rows in CASUSER.TABLE_TWO. Also, note we are not limited to two tables on the SET statement with DATA Step; we can use as many as we need to solve our business problems.

Figure 4. SAS log validating the DATA Step ran in CAS i.e. distributed

The result table created by the DATA Step is shown in figure 5.

Figure 5. Result table from simulating PROC APPEND using a DATA Step

Conclusion

SAS Viya’s CAS processing allows us to stage data for downstream consumption by leveraging robust SAS programming techniques that run distributed, i.e. fast. PROC APPEND is a common procedure used in SAS processes. To emulate PROC APPEND when using CAS tables as source and target tables to the procedure, use DATA Step.

Share

About Author

Steven Sober

Advisory Solutions Architect, Data Management

Steven is responsible for empowering SAS sales and system engineers in the positioning and integration of SAS® Viya, SAS® In-Database Code Accelerator for Hadoop, SAS® Scalable Performance Data Server, and SAS® Grid Manager. During his tenure at SAS, Steven has traveled globally working with customers to quickly integrate the SAS system to automate processes. Before joining SAS in 1989 as the first employee of SAS Switzerland, Steven worked for the U.S. Geological Survey, Water Resources Division, assisting hydrologists to leverage the SAS system to derive intelligence on ground and surface water in Southern Colorado. He holds a Bachelor of Science degree in Computer Science from the University of Southern Colorado in Pueblo, Colorado.

4 Comments

  1. Hi Steven,

    If APPEND_TARGET were existing and big and i would have only a few rows in TABLE_TWO that I need to append to APPEND_TARGET, this does not look very efficient since APPEND_TARGET is first copied onto itself before the few rows in TABLE_TWO are appended.
    Or does this work more efficient behind the covers?

    Regards,
    Bart

  2. Steven Sober

    Hi Bart,

    Great comment; I should of mentioned in my blog that I like using DATA Step for APPENDs when I must apply transformations to the data being processed by the DATA Step i.e.

    /*PROC APPEND BASE=&SOURCE OUT=⌖*//*RUN;*/ DATA ⌖
    SET &TARGET(IN=S1) &SOURCE2(IN=S2) &SOURCE3(IN=S3);
    IF S1 THEN…;
    IF S3 THEN…;
    RUN;

    In the case where there are no transformations to be made to the data and you only want to append a few rows to an existing CAS table consider these two alternatives:

    1. PROC CASUTIL http://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.2&docsetId=casref&docsetTarget=n1lmowayn4qghtn18lj7kupsdzi7.htm&locale=en

    2. PROC CAS addTable Action http://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.2&docsetId=caspg&docsetTarget=cas-table-addtable.htm&locale=en

    Proc & Roll,
    Steven

  3. Hi Steven, when our source data is Oracle there is a mismatch column type issue when we append to CAS.
    The process goes like this - we do initial bulk load from Oracle to CAS (so varchar type is read and imported). The incremental load is based on the max date from the initial load, and use it as a filter for the source data Oracle. The delta rows are then loaded as a SAS work dataset. Then when we append to the initial bulk CAS data, there will be a mismatch of type since in the SAS work the varchar was converted to character.

    An alternative is for us to load the delta rows directly into a CAS data. But, how do we append CAS data directly to the initial CAS data? When we test this, the output is a session-only data, then we need to promote. Using the promote option shows error - "ERROR: The target table intialCAS of the promotion already exists. Please specify a different name.
    ERROR: The action stopped due to errors."

    • Steven Sober

      Thank you for your question Mark. I need to research to ensure I provide a working solution. To facilitate the research please email me directly at steven.sober@sas.com so I can ask you additional questions as I research.

      Proc & Roll,
      Steven

Leave A Reply

Back to Top