There is a well-known Russian saying that goes “Если нельзя, но очень хочется, то можно.” The English translation of it can span anywhere from “If you can’t, but want it badly, then you can” to “If you shouldn’t, but want it badly, then you should” to “If you may not, but want it badly, then you may.” Depending on your situation, any possible combination of “may,” “can,” or “should” may apply. You can even replace “want” with “need” to get a slightly different flavor.
There are known means of modifying variable attributes with PROC DATASETS, but they are limited to variable name, format, informat, and label. But what if we want/need to modify a variable length, or change a variable type? And I am not talking about creating a new variable with a different length or converting a numeric variable value into a character value of another variable. We want to change variable type and/or variable length in place, without adding new variables. If you believe it can’t be done, read the first paragraph again.
Imagine that we have two data tables that we need to concatenate into one table. However, there is one common variable that is of different type in each table – in the first table it is numeric, but in the second table it is character.
Let’s create some sample data to emulate our situation by running the following SAS code:
libname sasdl 'C:\PROJECTS\_BLOG_SAS\changing_variable_type_and_length_in_sas_datasets'; /* create study2016 data table */ data sasdl.study2016; length subjid dob 8 state $2 start_date end_date 8; infile datalines truncover; input subjid dob : mmddyy10. state start_date : mmddyy10. end_date : mmddyy10.; format dob start_date end_date mmddyy10.; datalines; 123456 08/15/1960 MD 01/02/2016 234567 11/13/1970 AL 05/12/2016 12/30/2016 ; /* create study2017 data table */ data sasdl.study2017; length subjid $6 dob 4 state $2 start_date end_date 4; infile datalines truncover; input subjid dob : mmddyy10. state start_date : mmddyy10. end_date : mmddyy10.; format dob start_date end_date mmddyy10.; datalines; 987654 03/15/1980 VA 02/13/2017 876543 11/13/1970 NC 01/11/2017 01/30/2017 765432 12/15/1990 NY 03/14/2017 ;
The produced data tables will look as follows:
If we look at the tables’ variable properties, we will see that the subjid variable is of different type in these two data tables: it is of type Numeric (length of 8) in STUDY2016 and of type Character (length of 6) in STUDY2017:
Also, notice that variables dob, start_date, and end_date, although of the same Numeric type, have different length attributes - 8 in the STUDY2016 table, and 4 in the STUDY2017 table.
Data table concatenating problem
If we try to concatenate these two tables using PROC APPEND, SAS will generate an ERROR:
proc append base=sasdl.study2016 data=sasdl.study2017; run; NOTE: Appending SASDL.STUDY2017 to SASDL.STUDY2016. WARNING: Variable subjid not appended because of type mismatch. WARNING: Variable dob has different lengths on BASE and DATA files (BASE 8 DATA 4). WARNING: Variable start_date has different lengths on BASE and DATA files (BASE 8 DATA 4). WARNING: Variable end_date has different lengths on BASE and DATA files (BASE 8 DATA 4). ERROR: No appending done because of anomalies listed above. Use FORCE option to append these files. NOTE: 0 observations added.
Even if we do use the FORCE option as the ERROR message suggests, the result will be disappointing:
proc append base=sasdl.study2016 data=sasdl.study2017 force; run; NOTE: Appending SASDL.STUDY2017 to SASDL.STUDY2016. WARNING: Variable subjid not appended because of type mismatch. WARNING: Variable dob has different lengths on BASE and DATA files (BASE 8 DATA 4). WARNING: Variable start_date has different lengths on BASE and DATA files (BASE 8 DATA 4). WARNING: Variable end_date has different lengths on BASE and DATA files (BASE 8 DATA 4). NOTE: FORCE is specified, so dropping/truncating will occur.
The resulting data table will have missing values for the appended subjid:
In order to concatenate these tables, we must make the mismatching variable subjid of the same type in both data tables, either Character or Numeric. Making them both of Character type seems more robust, since it would allow for the values to contain both digit and non-digit characters. But if you know for sure that the value contains only digits, making them both Numeric works just as well.
Let’s say we decide to make them of Character type. Also note that our numeric variables representing dates (dob, start_date and end_date) are of different lengths: they are length 8 in STUDY2016 and length 4 in STUDY2017. Let’s make them the same length as well. From the standpoint of numerical accuracy in SAS for the dates, a length of 4 seems to be quite adequate to represent them accurately.
Let’s apply all our modifications to the STUDY 2016 dataset. Even though we are going to re-build the dataset in order to modify variable type and length, we are going to preserve the variable order so it feels like we just modified those variable attributes.
Here is how it can be done.
/* create macrovariable varlist containing a list of variable names */ proc sql noprint; select name into :varlist separated by ' ' from sashelp.vcolumn where upcase(libname) eq 'SASDL' and upcase(memname) eq 'STUDY2016'; quit; /* modify variable type and length */ data sasdl.study2016 (drop=v1-v4); retain &varlist; *<-- preserve variable order ; length subjid $6 dob start_date end_date 4; *<-- define new types/lengths ; format dob start_date end_date mmddyy10.; *<-- recreate formats ; set sasdl.study2016 (rename=(subjid=v1 dob=v2 start_date=v3 end_date=v4)); subjid = put(v1,6.); *<-- redefine subjid variable ; dob = v2; *<-- redefine dob variable ; start_date = v3; *<-- redefine start_date variable ; end_date = v4; *<-- redefine end_date variable ; run; /* make sure new concatenated file (study_all) does not exist */ proc datasets library=sasdl nolist; delete study_all; quit; /* append both (study2016 and study2017) to study_all */ proc append base=sasdl.study_all data=sasdl.study2016; run; proc append base=sasdl.study_all data=sasdl.study2017; run;
In this code, first, using proc sql and SAS view sashelp.vcolumn, we create a macro variable varlist to hold the list of all the variable names in our table, sasdl.study2016.
Then in the data step, we use a retain statement to preserve the variable order. When we read the sasdl.study2016 dataset using the set statement, we rename our variables-to-be-modified to some temporary names (e.g. v1 – v4) which we eventually drop in the data statement.
Then we re-assign the values of those temporary variables to the original variable names, thereby essentially creating new variables with new type and length. Since these new variables are named exactly as the old ones, the effect is as if their type and length attributes where modified, while in fact the whole table was rebuilt and replaced. Problem solved.
When we concatenate the data tables we create a new table sasdl.study_all. Before concatenating our two tables using proc append twice, we use proc datasets to delete that new table first. Even if the table does not exist, proc datasets will at least attempt to delete it. With all the seeming redundancy of this step, you will definitely appreciate it when you try running this code more than one time.