Add Unicode symbols and format text labels in SAS

0

It has been more than a decade since SAS 9.3 changed the default ODS destination from the old LISTING destination to more modern destinations such as HTML. One of the advantages of modern output destinations is support for Unicode symbols, superscripts, subscripts, and for formatting text by using boldface, italics, and color. SAS supports a markup-like syntax (called ODS markup) that enables you to embed "escape characters" into a string. When ODS renders the string, it interprets the text that follows the escaped character as a command. The command tells ODS to apply a format or to render a symbol.

This article shows that you can use these ideas to format column headers and row headers when you print a matrix in the SAS IML language. This article was inspired by a question on the SAS Support Communities. A SAS programmer wanted to force the headers for a matrix to break across multiple lines. The user 'KSharp' responded that you can use an escape character to embed a newline character into the strings that are used for headers. This article shows that KSharp's solution applies more generally: You can use a wide array of formatting options to manipulate the row and column headers when you print a SAS IML matrix.

The end of this article contains a References section that cites several SAS papers and blogs posts that explain how you can use escape characters to customize SAS titles, footnotes, graphs, and embed symbols into data strings.

Define the escape character

By default, you can use the string "(*ESC*)" as an escape character, but many SAS programmers prefer to define a shorter sequence of characters. Two popular choices are the tilde (~) and the caret (^). In this article, I will use the caret. You can use the ODS ESCAPECHAR statement to define the escape character. To use the caret, submit the following statement:

ods escapechar = "^";   /* use a caret to embed formats and symbols in a string */

Embed superscripts, subscripts, and symbols

By far, the most important application of ODS rendering is embedding superscripts, subscripts, and mathematical symbols into character strings. Certainly, using an ASCII string in a label is adequate for personal and informal use. For example, I often use text strings such as R^2, X_1, alpha, and P(X<=1) for my blog and for informal work. However, if you want to show the output in a presentation to upper management, you might prefer to use symbols. For example, if I am giving a webinar, I will format the previous ASCII text strings as R2, X1, α, and P(X≤1), respectively.

In the SAS IML language, you can add headers to a matrix when you print it:

  • Use the COLNAME= option in the PRINT statement to add column headers to a matrix. The C= option is an alias.
  • Use the ROWNAME= option to add row headers. The R= option is an alias.
  • Use the LABEL= option to add a spanning header. The L= option is an alias.

The following example shows how you can use an escape character to embed superscripts, subscripts, Greek letters, and mathematical symbols in column headers. The example also shows how to embed an arbitrary symbol (the "male" and "female" symbols) in a row header:

proc iml;
/* superscripts */
powers = {5 25 125};
h = {"5^{super 1}" "5^{super 2}" "5^{super 3}"};
print powers[colname=h];
 
s = "Atomic Weight H^{sub 2}O";
wt = 18.0153;
print wt[label=s];
 
/* greek letters and symbols */
Parameters = {0.05 2.3 0.8 0.456};
hdr  ={'^{unicode ALPHA}' 
       '^{unicode MU}'
       '^{unicode SIGMA}'
       'P(X ^{unicode 2264} 1)'};
print Parameters[colname=hdr];

If you have ever used a markup language, the syntax is self-explanatory. The escaped character (here, '^') begins the markup environment. The curly braces delimit the scope of the command. Keywords such as SUPER, SUB, or UNICODE provide the context for the markup command. This syntax is sometimes called inline formatting.

For the UNICODE command, you can use any four-digit Unicode specification to represent a mathematical symbol. For example, the Unicode specification U2264 will render a less-than-or-equal-to character (≤). SAS also provides some aliases for common Unicode characters such as Greek letters. See the Appendix for more details about the SAS aliases.

Embed line breaks

The original motivation for this article was a user who wanted to embed line breaks into header. This is accomplished by using the NEWLINE keyword, as follows:

/* line breaks in column headers */
GPA = { 3.61 3.89 0.62, 
        3.24 3.35 0.78};
/* symbols in row headers; line breaks in column headers */
sex = {'Female ^{unicode 2640}' 'Male ^{unicode 2642}'};
hdr  ={'Raw^{newline}Mean' 
       'Weighted^{newline}Mean' 
       'Weighted^{newline}Standard^{newline}Deviation'};
print GPA[rowname=sex colname=hdr];

In this example, the NEWLINE command forces the header to break across multiple lines, which prevents any column from becoming excessively wide.

Boldface, italics, and colors

Except for rare situations, I do not use or encourage the use of boldface, italics, or color in table headers. However, it is possible to change these attributes. For the sake of completeness, this section shows how to use the STYLE keyword to control formatting in a text string.

When you use the STYLE keyword, you can specify one or more suboptions such as the COLOR=, BACKGROUND=, and FONTSTYLE= options. These are specified in square brackets prior to the text that will be modified by the suboptions. For example, the following examples shows how to use the COLOR=RED, BACKGROUND=LIGHTGREY, and FONTSTYLE=ITALIC options:

/* colors in column headers 
   COLOR= and FOREGROUND= are aliases to the same feature
*/
hdr2 ={'^{style [color=red] Raw}^{newline}Mean'   
       '^{style [background=lightgrey] Weighted}^{newline}Mean' 
       '^{style [fontstyle=Italic] Weighted}^{newline}Standard^{newline}Deviation'};
print GPA[rowname=sex colname=hdr2];

Summary

This article shows how you can use the ODS ESCAPECHAR statement to define an escape character that enables you to embed superscripts, subscripts, symbols, and formatting instructions into text strings. The examples in this article are all related to printing headers for SAS IML matrices, but the ideas in this article apply more generally to other SAS features, such as titles, footnotes, labels, insets, and character data. For more details, see the References section.

Do you have additional resources that you have found useful for ODS markup and inline formatting? Leave a comment!

References

  • "ODS ESCAPECHAR Statement", SAS Output Delivery System: User’s Guide. Provides syntax and examples of using inline formatting commands.
  • Hadden, Louise S. (2010) "The Great Escape(char)," Proceedings of the 2010 SAS Global Forum conference.
  • Heath, Dan (2011) "The Power of Unicode," the Graphically Speaking blog. Shows how to use Unicode in the SG procedures for axis labels, and inserts.
  • Matange, Sanjay (2015) "Displaying Unicode Symbols in Legend," the Graphically Speaking blog. Shows how to use Unicode in the SG procedures for legends.
  • Matange, Sanjay (2015) "Marker Symbols," the Graphically Speaking blog. Shows how to use Unicode for marker symbols.

Appendix

SAS provides aliases for common Unicode characters such as Greek letters. These are provided in the Base.Template.Tagsets tagset, which contains a table of Unicode values and their aliases. You can add additional aliases by editing the tagset. The following call to PROC TEMPLATE enables you to see the aliases that are defined on your system:

proc template;
 source base.template.tagset;
run;

The log displays the aliases. I have provided a partial list below that shows the important aliases.

define tagset Base.Template.Tagset;
   notes "Implicit parent for all tagsets";
...
 
   define event unicode_init;
      set $unicodeMap["ALPHA" ] "03B1";
      set $unicodeMap["BAR" ] "0305";
      set $unicodeMap["BAR2" ] "033F";
      set $unicodeMap["BETA" ] "03B2";
      set $unicodeMap["CHI" ] "03C7";
      set $unicodeMap["DAGGER" ] "2020";
      set $unicodeMap["DBL_DAGGER" ] "2021";
      set $unicodeMap["DELTA" ] "03B4";
      set $unicodeMap["EPSILON" ] "03B5";
      set $unicodeMap["ETA" ] "03B7";
      set $unicodeMap["GAMMA" ] "03B3";
      set $unicodeMap["DIGAMMA" ] "03DD";
      set $unicodeMap["HAT" ] "0302";
      set $unicodeMap["IOTA" ] "03B9";
      set $unicodeMap["KAPPA" ] "03BA";
      set $unicodeMap["LAMBDA" ] "03BB";
      set $unicodeMap["MU" ] "03BC";
      set $unicodeMap["NU" ] "03BD";
      set $unicodeMap["OMEGA" ] "03C9";
      set $unicodeMap["OMICRON" ] "03BF";
      set $unicodeMap["PHI" ] "03C6";
      set $unicodeMap["PI" ] "03C0";
      set $unicodeMap["PRIME" ] "00B4";
      set $unicodeMap["PSI" ] "03C8";
      set $unicodeMap["RHO" ] "03C1";
      set $unicodeMap["SIGMA" ] "03C3";
      set $unicodeMap["TAU" ] "03C4";
      set $unicodeMap["THETA" ] "03B8";
      set $unicodeMap["TILDE" ] "0303";
      set $unicodeMap["UPSILON" ] "03C5";
      set $unicodeMap["XI" ] "03BE";
      set $unicodeMap["ZETA" ] "03B6";
   end;
...
Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top