Introducing TRIMS function to remove any leading and/or trailing characters from SAS strings

10

Trimming strings left and right

I am pretty sure you have never heard of the TRIMS function, and I would be genuinely surprised if you told me otherwise. This is because this function does not exist (at least at the time of this writing).

But don’t worry, the difference between "nonexistence" and "existence" is only a matter of time, and from now it is less than a blog away. Let me explain. Recently, I published two complementary blog posts:

[1] Removing leading characters from SAS strings

[2] Removing trailing characters from SAS strings

While working on these pieces and researching “prior art” I stumbled upon a multipurpose function in the SAS FedSQL Language that alone does either one or both of these things – remove leading or/and trailing characters from SAS strings.

FedSQL Language and Proc FedSQL

The FedSQL language is the SAS proprietary implementation of the ANSI SQL:1999 core standard. Expectedly, the FedSQL language is implemented in SAS by means of the FedSQL procedure (PROC FEDSQL). This procedure enables you to submit FedSQL language statements from a Base SAS session, and it is supported in both SAS 9.4 and SAS Viya.

Using the FEDSQL procedure, you can submit FedSQL language statements to SAS and third-party data sources that are accessed with SAS and SAS/ACCESS library engines. Or, if you have SAS Cloud Analytic Services (CAS) configured, you can submit FedSQL language statements to the CAS server.

FedSQL TRIM function

FedSQL language has its own vast FedSQL Functions library with hundreds of functions many of which replicate SAS 9.4 Functions. Many, but not all. Deep inside this FedSQL functions library, there is a unique treasure modestly called TRIM Function which is quite different from the BASE SAS Language TRIM() function.

While SAS 9.4 BASE TRIM() function capabilities are quite limited - it removes just trailing blanks from a character string, the FedSQL TRIM() function is way much more powerful. This triple-action function can remove not just trailing blanks, but also leading blanks, as well as both, leading and trailing blanks. On top of it, it can remove not just blanks, but any characters (although one character at a time). See for yourself, this function has the following pretty self-explanatory syntax:

TRIM( [BOTH | LEADING | TRAILING] [trim-character] FROM column)

Here trim-character specifies one character (in single quotations marks) to remove from column. If trim-character is not specified, the function removes blanks.

While being called a function, it does not look like a regular SAS function where arguments are separated by commas.  It looks more like an SQL statement (which it understandably is – it is part of the FedSQL language). However, this function is available only in PROC FEDSQL; it’s not available in SAS DATA steps or other PROC steps. Still, it gives us pretty good idea of what such a universal function may look like.

User-defined function TRIMS to remove leading or/and trailing characters in SAS strings

Let’s build such a function by means of the PROC FCMP for the outside the FedSQL usage (it is worth noticing that the FCMP procedure is not supported for FedSQL). To avoid confusion with the existing TRIM function we will call our new function TRIMS (with an ‘S’ at the end) which suits our purpose quite well denoting its plural purpose. First, we define what we are going to create.

User-defined TRIMS function

TRIMS Function

Removes leading characters, trailing characters, or both from a character string.

Syntax

TRIMS(function-modifier, string, trim-list, trim-list-modifier)

Required Arguments

  • function-modifier is a case-insensitive character constant, variable, or expression that specifies one of three possible operations:
    'L' or 'l' – removes leading characters.
    'T' or 't' – removes trailing characters.
    'B' or 'b' – removes both, leading and trailing characters.
  • string is a case-sensitive character constant, variable, or expression that specifies the character string to be trimmed.
  • trim-list is a case-sensitive character constant, variable, or expression that specifies character(s) to remove from the string.
  • trim-list-modifier is a case-insensitive character constant variable, or expression that supplements the trim-list.
    The valid values are those modifiers of the FINDC function that “add” groups of characters (e.g. 'a' or 'A', 'c' or 'C', 'd' or 'D', etc.) to the trim-list.

The following user-defined function implementation is based on the coding techniques described in the two previous posts, [1] and [2] that I mentioned above. Here goes.

 
libname funclib 'c:\projects\functions';
 
/* delete previous function definition during debugging */
options cmplib=funclib.userfuncs;
proc fcmp outlib=funclib.userfuncs.package1;
   deletefunc trims;
run;
 
/* new function defintion */
proc fcmp outlib=funclib.userfuncs.package1;
   function trims(f $, str $, clist $, mod $) $32767;
      from = 1;
      last = length(str);
      if upcase(f) in ('L', 'B') then from = findc(str, clist, 'K'||mod);
      if from=0 then return('');
      if upcase(f) in ('T', 'B') then last = findc(str, clist, 'K'||mod, -last); 
      if last=0 then return('');
      return(substr(str, from, last-from+1));      
   endfunc; 
run;

Code highlights

  • In the FUNCTION statement we assign the length of the returned value as $32767 to accommodate the longest possible character string.
  • In the function definition, we first assign initial values of the target substring positions as from=1 and last=length(str).
  • Then for Leading or Both character removal, we calculate an adjusted value of from as a position of the first character in str that is not listed in clist and not defined by the mod
  • If from=0 then we return blank and stop further calculations as this means that ALL characters are to be removed.
  • Then for Trailing or Both character removal, we calculate an adjusted value of last as a position of the last character in str that is not listed in clist and not defined by the mod
  • If last=0 then we return blank and stop further calculations as this means that ALL characters are to be removed.
  • And finally, we return a substring of str starting at the from position and ending at the last position, that is with the length of last-from+1.

TRIMS function usage

Let’s define SAS data set SOURCE as follows:

data SOURCE;
   input X $ 1-30;
   datalines;
*00It's done*2*1**-
*--*1****9*55
94*Clean record-*00
;

In the following DATA step, we will create three new variables with removed leading (variable XL), trailing (variable XT) and both - leading and trailing (variable XB) characters '*' and '-' as well as any digits:

options cmplib=funclib.userfuncs;
data TARGET;
   set SOURCE;
   length XB XL XT $30;
   XB = trims('b', X, '*-', 'd');
   XL = trims('L', X, '*-', 'd');
   XT = trims('t', X, '*-', 'd');
run;

In this code we use the TRIMS function three times, each time with a different first argument to illustrate how this affects the outcome.

Usage highlights

  • The first argument of the TRIMS function specifies whether we remove characters from both leading and trailing positions ('b'), from leading positions only ('L'), or from trailing positions only ('t'). This argument is case-insensitive. (I prefer using capital 'L' for clarity since lowercase 'l' looks like digit '1').
  • The second argument specifies the name of the variable (X) that we are going to remove characters from (variable X is coming from the dataset SOURCE).
  • The third argument '*-' specifies which character (or characters) to remove. In our example we are removing '*' and '-'. If you do not need to explicitly specify any character here, you still must supply a null value ('') since it is a required argument. In this case, the fourth argument (trim-list-modifier) will determine the set of characters to be removed.
  • The fourth argument (case-insensitive) of the TRIMS function specifies the FINDC function modifier(s) to remove certain characters in bulk (in our example 'd' will remove all digits). If such modifier is not needed, you still must supply a null value ('') since all four arguments of the TRIMS function are positional and required.
  • Please note that before calling the TRIMS function we assign the lengths for the variables XB, XL, and XT. Had we not done that, these variables would be assigned the maximum length of 32,767 bytes as specified in the TRIMS function definition.

Here is the output data table TARGET showing the original string X and the resulting strings XB (Both leading and trailing characters removed), XL (Leading characters removed) and XT (Trailing characters removed) side by side:

Result of leading and trailing characters trimming

Conclusion

The new TRIMS function presented in this blog post goes ways further the ubiquitous LEFT, TRIM, and STRIP functions that remove the leading (LEFT), trailing (TRIM), or both - leading and trailing (STRIP) blanks. The TRIMS function handles ANY characters, not just blanks. It also expands the character deletion functionality of the powerful  FedSQL TRIM function beyond just removing any single leading and/or trailing character. The TRIMS function single-handedly removes any number of explicitly specified characters from either leading, trailing or both (leading and trailing) positions. Plus, it removes in bulk many implicitly specified characters. For example 'd' modifier removes all digits, 'du' modifier removes all digits ('d') and all uppercase letters ('u'), 'dup' modifier removes all digits ('d'), all uppercase letters ('u') and all punctuation marks ('p'); and so on as described by the FINDC function modifiers. The order in which modifier characters are listed does not matter.

Additional resources

Questions? Thoughts? Comments?

Do you find this post useful? Please share your thoughts with us below.

Share

About Author

Leonid Batkhan

Leonid Batkhan is an independent SAS consultant and blogger. He holds a Ph.D. in Computer Science and Automatic Control Systems and has been a SAS user for more than 25 years. From 1995 to 2021 he worked as a Data Management and Business Intelligence consultant at SAS Institute. During his career, Leonid has successfully implemented dozens of SAS applications and projects in various industries. All posts by Leonid Batkhan >>>

10 Comments

  1. It seems like you need to supply all four arguments whenever you use TRIMS(). There are no defaults set (I don't know if that is even possible with PROC FCMP). I would like to see defaults, and also a re-arrangement of the order of the parameters so I can just type TRIMS(variablename), which is in my mind more intuitive than typing TRIMS('b',variablename,' ',' '), and also TRIMS(variablename) is less error-prone the way I type.

    • Leonid Batkhan

      Correct, PROC FCMP does not support optional arguments and you need to supply all four argument values. While working on this piece I thought about it and gave it a try, but was not successful. Another option would be using the VARARGS option which allows you to pass all your arguments as a single array, but that's hardly a simplification. The FCMP procedure has many limitations, but for this implementation consider this as a small inconvenience price for universal functionality 🙂

      My hope is that some day SAS will implement this function natively and include defaults and optional arguments as you suggest. Thank you for your input, Paige!

      • I have run this code and stored this function so I can now use it if needed. Thank you.

        One concern:

        data class;
        set sashelp.class;
        nname=trims('b',name,' ',' ');
        run;

        This works, but the new variable nname has length 32767. It's easy to add a LENGTH statement in the DATA step to prevent this from happening, but many people might not realize this extremely long length does result and obviously that's going to chew up a lot of space.

        • Leonid Batkhan

          Agree, this needs to be clarified. I added the following justification to the Code highlights section:
          ▪ In the FUNCTION statement we assign the length of the returned value as $32767 to accommodate the longest possible character string.

          I also added the following clarification to the Usage highlights section:
          ▪ Please note that before calling the TRIMS function we assign the lengths for the variables XB, XL, and XT. Had we not done that, these variables would be assigned the maximum length of 32,767 bytes as specified in the TRIMS function definition.

          Thank you, Paige, for your very insightful feedback.

  2. Great complementary blogpost to your previous post on removing trailing characters from strings! Have used Proc fcmp a few times to create simple functions and call routines, but the Trims function definitely demonstrates its capabilities!

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top