Create a deviation plot to visualize values relative to a baseline

0

A colleague recently posted an article about how to use SAS Visual Analytics to create a circular graph that displays a year's worth of temperature data. Specifically, the graph shows the air temperature for each day in a year relative to some baseline temperature, such as 65F (18C). Days warmer than baseline are displayed in one color (red for warm) whereas days colder than the baseline are displayed in another color (blue for cold). The graph was very pretty. A reader posted a comment asking whether a similar graph could be created by using other graphical tools such as GTL or even PROC SGPLOT. The answer is yes, but I am going to propose a different graph that I think is more flexible and easier to read.

Let's generalize the problem. Suppose you have a time series and you want to compare the values to a baseline (or reference) value. One way to do this is to visualize the data as deviations from the baseline. Data values that are close to the baseline will be small and almost unnoticeable. The eye will be drawn to values that indicate large deviations from the baseline. A "deviation plot" like this can be used for many purposes. Some applications include monitoring blood glucose relative to a target value, showing expenditures relative to a fixed income amount, and, yes, displaying the temperature relative to some comfortable reference value. Deviation plots sometimes accompany a hypothesis test for a one-way frequency distribution.

Linear displays versus circular displays

My colleague's display shows one year's worth of temperatures by plotting the day of the year along a circle. While this makes for an eye-catching display, there are a few shortcomings to this approach:

  • It is difficult to read the data values. It is also difficult to compare values that are on opposite sides of a circle. For example, how does March data compare with October data?
  • Although a circle can show data for one year, it is less effective for showing 8 or 14 months of data.
  • Even for one year's worth of data, it has a problem: It places December 31 next to January 1. In the temperature graph, the series began on 01JAN2018. However, the graph places 31DEC2018 next to 01JAN2018 even though those values are a year apart.

As mentioned earlier, you can use SAS/GRAPH or the statistical graphics (SG) procedure in SAS to display the data in polar coordinates. Sanjay Matange's article shows how to create a polar plot. For some of my thought about circular versus rectangular displays, see "Smoothers for periodic data."

A deviation-from-baseline plot

The graph to the right (click to enlarge) shows an example of a deviation plot (or deviation-from-baseline plot). It is similar to a waterfall chart, but in many waterfall charts the values are shown as percentages, whereas for the deviation plot we will show the observed values. You can see that the values are plotted for each day. The high values are plotted in one color (red) whereas low values are plotted in a different color (blue). A reference line (in this case, at 100) is displayed.

To create a deviation plot, you need to perform these three steps:

  1. Use the SAS DATA step to encode the data as 'High' or 'Low' by using the reference value. Compute the deviations from the reference value.
  2. Create a discrete attribute map that maps values to colors. This step is optional. Alternatively, SAS will assign colors based on the current ODS style.
  3. Use a HIGHLOW plot to graph the deviations from the reference value.

Let's implement these steps on a time series for three months of daily blood glucose values. An elderly male takes oral medications to control his blood glucose level. Each morning he takes his fasting blood glucose level and records it. The doctor has advised him to try to keep the blood glucose level below 100 mg/dL, so the reference value is 100. The following DATA step defines the dates and glucose levels for a three-month period.

data Series;
informat Date date.;
format Date Date.;
input Date y @@;
label y = "Blood Glucose (mg/dL)";
datalines;
01SEP19 100 02SEP19  96 03SEP19  86 04SEP19  93 05SEP19 105 06SEP19 106 07SEP19 123 
08SEP19 121 09SEP19 115 10SEP19 108 11SEP19  94 12SEP19  96 13SEP19  95 14SEP19 120
15SEP19 112 16SEP19 104 17SEP19  97 18SEP19 101 19SEP19 108 20SEP19 108 21SEP19 117 
22SEP19 103 23SEP19 109 24SEP19  97 25SEP19  93 26SEP19 100 27SEP19  98 28SEP19 122 
29SEP19 116 30SEP19  99 01OCT19 102 02OCT19  99 03OCT19  95 04OCT19  99 05OCT19 116 
06OCT19 109 07OCT19 106 08OCT19  94 09OCT19 104 10OCT19 112 11OCT19 119 12OCT19 111 
13OCT19 104 14OCT19 101 15OCT19  99 16OCT19  92 17OCT19 101 18OCT19 115 19OCT19 109 
20OCT19  98 21OCT19  91 22OCT19  92 23OCT19 100 24OCT19 109 25OCT19 102 26OCT19 117 
27OCT19 106 28OCT19  98 29OCT19  98 30OCT19  95 31OCT19  97 01NOV19 129 02NOV19 120 
03NOV19 117 04NOV19   . 05NOV19 101 06NOV19 105 07NOV19 105 08NOV19 106 09NOV19 118 
10NOV19 109 11NOV19 102 12NOV19  98 13NOV19  97 14NOV19 .   15NOV19  92 16NOV19 114 
17NOV19 107 18NOV19  98 19NOV19  91 20NOV19  97 21NOV19 109 22NOV19  98 23NOV19 95 
24NOV19  95 25NOV19  94 26NOV19   . 27NOV19  98 28NOV19 115 29NOV19 123 30NOV19 114 
01DEC19 104 02DEC19  96 03DEC19  97 04DEC19 100 05DEC19  94 06DEC19  93 07DEC19 105 
08DEC19   . 09DEC19  88 10DEC19  84 11DEC19 101 12DEC19 122 13DEC19 114 14DEC19 108 
15DEC19 103 16DEC19  88 17DEC19  74 18DEC19  92 19DEC19 110 20DEC19 118 21DEC19 106 
22DEC19 100 23DEC19 106 24DEC19 107 25DEC19 116 26DEC19 113 27DEC19 113 28DEC19 117 
29DEC19 101 30DEC19  96 31DEC19 101  
;

Encode the data

The first step is to compute the deviation of each observed value from the reference value. If an observed value is above the reference value, mark it as 'High', otherwise mark it as 'Low'. We will plot a vertical bar that goes from the reference level to the observed value. Because we will use a HIGHLOW statement to display the graph, the DATA step computes two new variables, High and Low.

/* 1. Compute the deviation and encode the data as 'High' or 'Low' by using the reference value */
%let RefValue = 100;
 
data Center;
set Series;
if (y > &RefValue) then Group="High";
else Group="Low";
Low  = min(y, &RefValue);    /* lower end of highlow bar */
High = max(y, &RefValue);    /* upper end of highlow bar */
run;

Maps high and low values to colors

If you want SAS to assign colors to the two groups, you can skip this step. However, in many cases you might want to choose which color is plotted for the high and low categories. You can map levels of a group to colors by using a discrete attribute map ("DATTR map", for short) in PROC SGPLOT. Because we are going to use a HIGHLOW statement to graph the data, we need to define a map that has the FillColor and LineColor for the vertical bars. The following DATA step maps the 'High' category to red and the 'Low' category to blue:

/* 2. Create a discrete attribute map that maps values to colors */
data DAttrs;
length c FillColor LineColor $16.;
ID = "HighLow";
Value = "High"; c="DarkRed";  FillColor=c; LineColor=c; output;
Value = "Low";  c="DarkBlue"; FillColor=c; LineColor=c; output;
run;

Create a high-low plot

The final step is to create a high-low plot that shows the deviations from the reference value. You can use the DATTRMAP= option to tell PROC SGPLOT how to assign colors for the group values. Because a data set can contain multiple maps, the ATTRID= option specifies which mapping to use.

/* 3. Use a HIGHLOW plot to graph the deviations from the reference value */
title "Deviations from Reference Value (&RefValue)";
title2 "Morning Fasting Blood Glucose";
ods graphics / width=600px height=400px;
proc sgplot data=Center DATTRMAP=DAttrs noautolegend;
   highlow x=day low=low high=high / group=Group ATTRID=HighLow;
   refline &RefValue / axis=y;
   yaxis grid label="Blood Glucose Level";
run;

The graph is shown at the top of this section. It is clear that on most days the patient has high blood sugar. With additional investigation, you can discover that the highest levels are associated with weekends and holidays.

Note that these data would not be appropriate to plot on a circular graph because the data are not for a full year. Furthermore, on this graph it is easy to see specific values and days and to compare days in September with days in December.

A deviation plot for daily average temperatures

My colleague's graph displayed daily average temperatures. The following deviation plot shows average temperatures and a reference value of 65F. The graph shows the daily average temperature in Raleigh, NC, for 2018:

In this graph, it is easy to find the approximate temperature for any range of dates (such as "mid-October") and to compare the temperature for different time periods, such as March versus October. I think the rectangular deviation plot makes an effective visualization of how these data compare to a baseline value.

You can download the SAS program that creates the graphs in this article.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top