When using conventional methods to access and analyze data sets from Teradata tables, SAS brings all the rows from a Teradata table to SAS Workspace Server. As the number of rows in the table grows over time, it adds to the network latency to fetch the data from a database management system (DBMS) and move it to SAS Workspace Server. Considering big data, SAS Workspace Server may not have enough capacity to hold all the rows from a Teradata table.
SAS In-Database processing can help solve the problem of returning too much data from the database. SAS In-Database processing allows you to perform data operations inside the DBMS and use the distributed processing over multiple Access Module Processors (AMPs). Select SAS procedures take advantage of Teradata SQL functionality, and in some cases leverage SAS functions deployed inside the DBMS. The goal of in-database processing is to reduce the I/O required to transfer the data from Teradata to SAS.
Using SAS In-Database processing, you can run scoring models, some SAS procedures, DS2 threaded programs, and formatted SQL queries inside the Teradata database.
The list of SAS In-Database features supported for Teradata include:
- Format publishing and SAS_PUT()function
- Scoring Models
- Select BASE SAS® Procedures ( FREQ, RANK, REPORT, SORT, SUMMARY/MEAN , TABULATE)
- Select SAS/STAT® Procedures (CORR, CANCORR, DMDB, DMINE, DMREG, FACTOR, PRINCOMP,
- REG, SCORE, TIMESERIES, VARCLUS )
- DS2 Threaded programs
- Data quality operations
- Extract and transform data
SAS In-Database Deployment Package for Teradata
The in-database deployment package for Teradata includes the following:
- The SAS formats library, accelterafmt-######.rpm, installs a SAS formats library on the Teradata server. By having a SAS formats library on your Teradata system, you can publish SAS formats in Teradata, which enables you to process SAS statements with SAS formats in the Teradata database. This also enables you to publish SAS PUT functions to Teradata as a SAS_PUT() function. This software can be found in your SAS Install folder under /SAS-install-directory/SASFormatsLibraryforTeradata/3.1/TeradataonLinux/.
- The SAS® Embedded Process package, sepcoretera-######.rpm, installs SAS Embedded Process in the Teradata database. This is the core package of in-database components. This software can be found in your software depot under folder /depot/standalone_installs/SAS_Core_Embedded_Process_Package_for_Teradata/13_0/Teradata_on_Linux.
- The SASEPFUNC package, sasepfunc-#####.x86_64.tar.gz, installs SAS Embedded Process support functions on Teradata. SAS Embedded Process support functions are Teradata stored procedures that generate SQL to interface with SAS Embedded Process. The script from the package creates a Teradata database named SAS_SYSFNLIB with a list of tables, views, functions, and procedures to support SAS Embedded Process. The same script also adds a list of functions in the TD_SYSFNLIB database. The package can be obtained from the Teradata support group.
The following figure shows the list of objects from the SAS_SYSFNLIB database to support SAS Embedded Process:
The following shows the list of objects from the TD_SYSFNLIB database to support SAS Embedded Process:
- The SAS® Quality Knowledge Base package, sasqkb_ci-27.#####.noarch.rpm, installs SAS Quality Knowledge Base on the Teradata server. This is an optional package to SAS Embedded Process. This package is needed along with SAS® Quality Accelerator, if you are planning to run data cleansing operations in the Teradata database. The package can be downloaded from the SAS support site.
- The SAS Quality Accelerator package. There are two scripts (dq_install.sh and dq_grant.sh) located under SAS-Install-directory to install the data quality accelerator at Teradata. This is an optional package to SAS Embedded Process, and needed only if you are planning to run data cleansing operations in Teradata. The software install files can be found in the folder /SAS-install-directory/SASDataQualityAcceleratorforTeradata/9.4/dqacctera/sasmisc/. As a part of script execution, it adds a list of objects (procedures, functions) to the SAS_SYSFNLIB database.
Sample list of data quality related objects from the SAS_SYSFNLIB database.
Examples of running DS2 Code to perform data quality, data extract, and transform operations in Teradata:
The following example describes the execution of DS2 code by using SAS Data Quality Accelerator and SAS Quality Knowledge Base to match and extract a data set from the Teradata database. The log shows that both Threads program and Data program ran in the Teradata database as in-database program execution.
45 46 47 proc ds2 ds2accel=yes; 48 thread t_pgm / overwrite=yes; 49 dcl package dq dq(); 50 dcl package dqtokens _PARSE_RESULT_(); 51 52 dcl varchar(256) customer_address_match_code; 53 dcl varchar(256) customer_name_match_code; 54 dcl varchar(256) customer_name_analyzed_gender; 55 dcl varchar(256) givenname; 56 dcl varchar(256) familyname; 57 58 method init(); 59 dq.loadlocale('ENUSA'); 60 end; 61 62 63 method run(); 64 dcl integer _PARSE_TOKEN_COUNT_; 65 dcl integer i; 66 set mytera01.customers; 67 customer_address_match_code = dq.match('Address (Long)', customer_address, 85); 68 customer_name_match_code = dq.match('Name', customer_name, 85); 69 customer_name_analyzed_gender = dq.gender('Name', customer_name); 70 71 dq.parse2('Name', customer_name, _PARSE_RESULT_); 72 _PARSE_TOKEN_COUNT_ = _PARSE_RESULT_.getcount(); 73 74 if _PARSE_TOKEN_COUNT_ > 0 then do; 75 do i = 1 to _PARSE_TOKEN_COUNT_; 76 if _PARSE_RESULT_.getname(i) = 'Given Name' then 77 givenname = _PARSE_RESULT_.getvalue(i); 78 else if _PARSE_RESULT_.getname(i) = 'Family Name' then 79 familyname = _PARSE_RESULT_.getvalue(i); 80 end; 81 end; 82 83 output; 84 end; 85 endthread; 86 87 88 89 data mytera01.customer_dq (overwrite=yes); 90 declare thread t_pgm t; 3 The SAS System 16:27 Monday, December 18, 2017 91 method run(); 92 set from t; 93 end; 94 enddata; 95 96 run; NOTE: Created thread t_pgm in data set work.t_pgm. NOTE: Running THREAD program in-database NOTE: Running DATA program in-database NOTE: Execution succeeded. No rows affected. 97 98 quit; NOTE: PROCEDURE DS2 used (Total process time): real time 20.89 seconds cpu time 1.20 seconds
Stay tuned in for the next part of the SAS In-Database Processing in Teradata blog series. Coming up is one about publishing SAS format in Teradata.