Maybe if we think and wish and hope and pray
It might come true.
Oh, wouldn't it be nice?

The Beach Boys

Months ago, I wrote about how to use the EFFECT statement in SAS to perform regression with restricted cubic splines. This is the modern way to use splines in a regression analysis in SAS, and it replaces the need to use older macros such as Frank Harrell's %RCSPLINE macro. I shared my blog post with a colleague at SAS and mentioned that the process could be simplified. In order to specify the placement of the knots as suggested by Harrell (Regression Modeling Strategies, 2010 and 2015), I had to use PROC UNIVARIATE to get the percentiles of the explanatory variable. "Wouldn't it be nice," I said, "if the EFFECT statement could perform that computation automatically?"

I am happy to report that the 15.1 release of SAS/STAT (SAS 9.4M6) includes a new option that makes it easy to place internal knots at percentiles of the data. You can now use the KNOTMETHOD=PERCENTILELIST option on the EFFECT statement to place knots. For example, the following statement places five internal knots at percentiles that are recommended in Harrell's book:
EFFECT spl = spline(x / knotmethod=percentilelist(5 27.5 50 72.5 95));

### An example of using restricted cubic in regression in SAS

Restricted cubic splines are also called "natural cubic splines." This section shows how to perform a regression fit by using restricted cubic splines in SAS.

For the example, I use the same Sashelp.Cars data that I used in the previous article. For clarity, the following SAS DATA step renames the Weight and MPG_City variables to X and Y, respectively. If you want to graph the regression curve, you can sort the data by the X variable, but this step is not required to perform the regression.

```/* create (X,Y) data from the Sashelp.Cars data. Sort by X for easy graphing. */ data Have; set sashelp.cars; rename mpg_city = Y weight = X model = ID; run;   proc sort data=Have; by X; run;```

The following call to PROC GLMSELECT includes an EFFECT statement that generates a natural cubic spline basis using internal knots placed at specified percentiles of the data. The MODEL statement fits the regression model and the OUTPUT statement writes an output data set that contains the predicted values. The SGPLOT procedure displays a graph of the regression curve overlaid on the data:

```/* fit data by using restricted cubic splines using SAS/STAT 15.1 (SAS 9.4M6) */ ods select ANOVA ParameterEstimates SplineKnots; proc glmselect data=Have; effect spl = spline(X/ details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95) ); /* new in SAS/STAT 15.1 (SAS 9.4M6) */ model Y = spl / selection=none; /* fit model by using spline effects */ output out=SplineOut predicted=Fit; /* output predicted values */ quit;   title "Restricted Cubic Spline Regression"; title2 "Five Knots Placed at Percentiles"; proc sgplot data=SplineOut noautolegend; scatter x=X y=Y; series x=X y=Fit / lineattrs=(thickness=3 color=red); run;``` In summary, the new KNOTMETHOD=PERCENTILELIST option on the EFFECT statement simplifies the process of using percentiles of a variable to place internal knots for a spline basis. The example shows knots placed at the 5th, 27.5th, 50th, 72.5th, and 95th percentiles of an explanatory variable. These heuristic values are recommended in Harrell's book. For more details about the EFFECT statement and how the location of knots affects the regression fit, see my previous article "Regression with restricted cubic splines in SAS."

You can download the complete SAS program that generates this example, which requires SAS/STAT 15.1 (SAS 9.4M6). If you have an earlier release of SAS, the program also shows how to perform the same computations by calling PROC UNIVARIATE to obtain the location of the knots.

Share Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

1. Thank you for your wonderful work. And I want to konw how to adjusted other confounders in the restricted cubic spline.

2. Hi,

I have 4 knots.
Is there anyway to get the parameters for the curve between the second and third knots, third and fourth knots, etc.

Thank you.

• Yes. When you use knots in a linear regression, you will get estimates for the regression coefficients. The curve of the predicted values is therefore a linear combination of the spline basis, as shown in the article "Visualize a regression with splines." If you want to restrict the interval between the second and third knots, just add together the knots with nonzero support on that interval.

3. can we add spline model in proc MCMC ?

4. Thank you for your wonderful work. And I want to konw how to adjusted other confounders in the restricted cubic spline.

• You can add other effects on the MODEL statement. If you have specific data or syntax that you want to discuss, you can post them to the SAS Support Communities.

5. Hi Rick,
I have 9 data points and I would like to ask you if it's appropriate to apply regression with restricted cubic splines. I have the scatterplot from sgplot, but I can't add it here. The plot shows an upward quadratic then decreases linearly. I will be happy to send the display.

Thanks!

• 6. This forum doesn't support tables or graphs. I don't have your email. How can I share it with you?

• Sorry that I was unclear. I meant "Sure, go ahead and fit a cubic spline model to 9 data points."

If you need more help, you can post questions to the SAS Support Communities. The Communities enable you to upload data, programs, and graphics.

7. Hi Rick,

Following your above SAS syntax, I wrote the following one (Model one), by specifying the values at each knot.

```/*model one*/ proc glmselect data=Have; effect spl = spline(X/ details naturalcubic basis=tpf(noint) knotmethod=list(2513, 3174, 3474.5, 3903, 5000) ); model Y = spl / selection=none; output out=SplineOut2 predicted=Fit; quit;```

I then created 3 spline variables and wrote model two. The results generated by the two models are almost the same, except the coefficients of the thee spline variables (the t-values and p-values are the same. Would you please share with me your thoughts about the differences?
Many thanks, Hongjie

```/*model two*/ data Have2; set Have;   if x<=2513 then x2513=0; else x2513=x-2513; if x<=3174 then x3174=0; else x3174=x-3174; if x<=3474.5 then x3474=0; else x3474=x-3474.5; if x<=3903 then x3903=0; else x3903=x-3903; if x<=5000 then x5000=0; else x5000=x-5000; x_sq=x*x; x_q=x**3; x2513q=x2513**3; x3174q=x3174**3; x3474q=x3474**3; x3903q=x3903**3; x5000q=x5000**3;   term2=x2513q-x3903q*((5000-2513) /(5000-3903))+x5000q*((3903-2513) /(5000-3903)); term3=x3174q-x3903q*((5000-3174) /(5000-3903))+x5000q*((3903-3174) /(5000-3903)); term4=x3474q-x3903q*((5000-3474.5)/(5000-3903))+x5000q*((3903-3474.5)/(5000-3903)); run;   proc reg data=Have2; model y=x term2 term3 term4; output out=out4 p=Fit; run;```
• Your definitions of the spline variables (term2-term4) are wrong. You do not need to use a separate DATA step to generate the spline effects because PROC GLMSELECT will give them to you. See the article, "Visualize a regression with splines."

If you add OUTDESIGN=Splines option to the PROC GLMSELECT statement, then you can reproduce the parameter estimates by using the following:

```proc reg data=Splines plots=none; ods select ParameterEstimates; model y= spl_: ; /* Note: spl_1 = x */ quit;```
• Hi Rick,
Thank you very much for your quick help!
Do you happen to know how the SAS EFFECT statement calculates SPLs for each subject or what equation it uses? I could not find references from SAS.

• As mentioned in "Regression with restricted cubic splines in SAS," the EFFECT statement uses the definition from The Elements of Statistical Learning (Hastie, Tibshirani, and Friedman, 2009, 2nd Ed, pp. 144-146).

• Great and thanks! Hongjie

8. Hi Rick,

I have to manually create basis functions for a variable but SAS doesn't seem to have a procedure to do it. I am using Frank Harrell's macro RCSPLINE but the basis variables are not identical to those created by SAS using OUTDESIGN. It seems like SAS is dividing the second basis by 4 and the last one by 4.5. Not sure where these numbers come from for the example given below. Why are SAS numbers different from the ones created by the macro?

data simulation;
do i=1 to 10000;
t=rand('uniform',0,10);
p=1/(1+exp(sin(t)));
y=rand('bernoulli',p);
output;
end;
run;

*model a natural cubic spline;
*and store the result in "mystore";
proc logistic data=simulation OUTDESIGN=splines;
effect myspline=spline(t/knotmethod=list(1,2,3,4) NATURALCUBIC);
model y(event="1")=myspline;
store mystore;
run;
/*-------------------------
MACRO FOR CUBIC SPLINES
-------------------------*/
%macro rcspline(x,knot1,knot2,knot3,knot4,knot5,knot6,knot7,knot8,knot9,knot10, norm=2);
%LOCAL j v7 k tk tk1 t k1 k2;
%LET v7=&x; %IF %length(&v7)=8 %THEN %LET v7=%SUBSTR(&v7,1,7);
%*get no. knots, last knot, next to last knot;
%DO k=1 %TO 10;
%IF %QUOTE(&&knot&k)= %THEN %GOTO nomorek;
%END;
%LET k=11;
%nomorek: %LET k=%EVAL(&k-1); %LET k1=%EVAL(&k-1); %LET k2=%EVAL(&k-2);
%IF &k<3 %THEN %PUT ERROR: <3 knots given. no spline variables CREATEd.;
%ELSE %DO;
%LET tk=&&knot&k;
%LET tk1=&&knot&k1;
DROP _kd_; _kd_=
%IF &norm=0 %THEN 1;
%ELSE %IF &norm=1 %THEN &tk - &tk1;
%ELSE (&tk - &knot1)**.666666666666; ;
%DO j=1 %TO &k2;
%LET t=&&knot&j;
&v7&j=max((&x-&t)/_kd_,0)**3+((&tk1-&t)*max((&x-&tk)/_kd_,0)**3
-(&tk-&t)*max((&x-&tk1)/_kd_,0)**3)/(&tk-&tk1)%STR(;);
%END;
%END;
%mend rcspline;

**comparing to splines datasets from PROC LOGISTIC
**t1 from PROC LOGISTIC=t1 from below divided by 4-1=3;
DATA want; SET simulation;
%rcspline(t,1,2,3,4);
RUN;

• Near the end of my previous article about splines, I explain why the basis functions are not equal, but the predicted values are the same. Briefly, the macro uses a different scaling.

9. Thank you Rick!

I am also looking to implement penalized spline and I was wondering if there is a way to do so in SAS. I am looking for a SAS equivalent of the R- PSPLINE functionthat can be used in PROC LOGISTIC.

10. Harells %PSLINE macro is for plotting the restricted cubic spline. The PSPLINE function is R is a piecewise polynomial with a smoothing penalty. I am not sure if SAS has anything at the moment.

11. I just found out that there is a PSPLINE(x) that can be used only in PROC TRANSREG. I am running conditional logit so I cannot use it in PROC LOGISTIC or PROC PHREG.