I've written about how to add a diagonal line to a scatter plot by using the SGPLOT procedure in SAS 9.2. The main idea (use the VECTOR statement) is easy enough, but writing a program that handles a line with any slope requires some additional effort.
Add a Diagonal Line
Adding a diagonal line is now trivial with the LINEPARM statement: just specify the equation for the line in "point-slope" form. Use the X= and Y= options to specify a point that the line passes through; use the SLOPE= option to specify the slope. For example, the following statements create a scatter plot of variables in the SASHelp.Cars data set, and overlay the identity line, which has unit slope and passes through the origin:
proc sgplot data=sashelp.cars noautolegend; title "Graph with Diagonal Line"; scatter x=MPG_City y=MPG_Highway; lineparm x=0 y=0 slope=1; /** intercept, slope **/ xaxis grid; yaxis grid; run;
Add Multiple Lines
You can overlay multiple lines on a plot by specifying the name of a variable for the X=, Y=, and SLOPE= options. For example, suppose that you want to plot several lines that represent different linear regression models. The following DATA step creates a SAS data set with parameter estimates for three lines: ordinary least squares regression, robust M-estimation, and robust least trimmed squares (LTS) estimation:
data lines; input Method $ Intercept Slope; datalines; OLS 6.15322 1.03138 M 4.8254 1.1026 LTS 6.2569 1.0514 ;
The following statements add these variables and observations to the SASHelp.Cars data, and create a plot (click to enlarge) that displays the lines and the data:
data cars; set sashelp.cars lines; run; proc sgplot data=cars noautolegend; title "Three Regression Lines"; scatter x=MPG_City y=MPG_Highway; lineparm x=0 y=Intercept slope=Slope / group=Method curvelabel noextend; xaxis grid offsetmax=0.07; yaxis grid offsetmax=0.05; run;
Notice that outliers and high-leverage points "pull down" the OLS line, whereas the robust regression lines are not affected. Also, notice a few techniques used in the PROC SGPLOT call:
- The CARS data set has a block structure: the SASHelp.Cars data occupies the upper left block and the LINES data occupies the lower right block. Missing values occupy the other two blocks.
- The names of the Intercept and Slope variables are used for the Y= and SLOPE= options, respectively.
- The GROUP= option is used so that each line gets different graphical attributes (such as colors).
- The NOEXTEND option is used in conjunction with the OFFSETMAX= option in the XAXIS and YAXIS statements in order to create blank space in the upper right corner of the plot. That area is used to label the lines. The labels come from the Method variable.
Thanks, SAS 9.3. Sometimes a small feature can make a big difference.