Maybe if we think and wish and hope and pray
It might come true.
Oh, wouldn't it be nice?

The Beach Boys

Months ago, I wrote about how to use the EFFECT statement in SAS to perform regression with restricted cubic splines. This is the modern way to use splines in a regression analysis in SAS, and it replaces the need to use older macros such as Frank Harrell's %RCSPLINE macro. I shared my blog post with a colleague at SAS and mentioned that the process could be simplified. In order to specify the placement of the knots as suggested by Harrell (Regression Modeling Strategies, 2010 and 2015), I had to use PROC UNIVARIATE to get the percentiles of the explanatory variable. "Wouldn't it be nice," I said, "if the EFFECT statement could perform that computation automatically?"

I am happy to report that the 15.1 release of SAS/STAT (SAS 9.4M6) includes a new option that makes it easy to place internal knots at percentiles of the data. You can now use the KNOTMETHOD=PERCENTILELIST option on the EFFECT statement to place knots. For example, the following statement places five internal knots at percentiles that are recommended in Harrell's book:
EFFECT spl = spline(x / knotmethod=percentilelist(5 27.5 50 72.5 95));

### An example of using restricted cubic in regression in SAS

Restricted cubic splines are also called "natural cubic splines." This section shows how to perform a regression fit by using restricted cubic splines in SAS.

For the example, I use the same Sashelp.Cars data that I used in the previous article. For clarity, the following SAS DATA step renames the Weight and MPG_City variables to X and Y, respectively. If you want to graph the regression curve, you can sort the data by the X variable, but this step is not required to perform the regression.

```/* create (X,Y) data from the Sashelp.Cars data. Sort by X for easy graphing. */ data Have; set sashelp.cars; rename mpg_city = Y weight = X model = ID; run;   proc sort data=Have; by X; run;```

The following call to PROC GLMSELECT includes an EFFECT statement that generates a natural cubic spline basis using internal knots placed at specified percentiles of the data. The MODEL statement fits the regression model and the OUTPUT statement writes an output data set that contains the predicted values. The SGPLOT procedure displays a graph of the regression curve overlaid on the data:

```/* fit data by using restricted cubic splines using SAS/STAT 15.1 (SAS 9.4M6) */ ods select ANOVA ParameterEstimates SplineKnots; proc glmselect data=Have; effect spl = spline(X/ details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95) ); /* new in SAS/STAT 15.1 (SAS 9.4M6) */ model Y = spl / selection=none; /* fit model by using spline effects */ output out=SplineOut predicted=Fit; /* output predicted values */ quit;   title "Restricted Cubic Spline Regression"; title2 "Five Knots Placed at Percentiles"; proc sgplot data=SplineOut noautolegend; scatter x=X y=Y; series x=X y=Fit / lineattrs=(thickness=3 color=red); run;```

In summary, the new KNOTMETHOD=PERCENTILELIST option on the EFFECT statement simplifies the process of using percentiles of a variable to place internal knots for a spline basis. The example shows knots placed at the 5th, 27.5th, 50th, 72.5th, and 95th percentiles of an explanatory variable. These heuristic values are recommended in Harrell's book. For more details about the EFFECT statement and how the location of knots affects the regression fit, see my previous article "Regression with restricted cubic splines in SAS."

You can download the complete SAS program that generates this example, which requires SAS/STAT 15.1 (SAS 9.4M6). If you have an earlier release of SAS, the program also shows how to perform the same computations by calling PROC UNIVARIATE to obtain the location of the knots.

Share

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

1. Thank you for your wonderful work. And I want to konw how to adjusted other confounders in the restricted cubic spline.

2. Hi,

I have 4 knots.
Is there anyway to get the parameters for the curve between the second and third knots, third and fourth knots, etc.

Thank you.

• Yes. When you use knots in a linear regression, you will get estimates for the regression coefficients. The curve of the predicted values is therefore a linear combination of the spline basis, as shown in the article "Visualize a regression with splines." If you want to restrict the interval between the second and third knots, just add together the knots with nonzero support on that interval.

3. can we add spline model in proc MCMC ?

4. Thank you for your wonderful work. And I want to konw how to adjusted other confounders in the restricted cubic spline.

• You can add other effects on the MODEL statement. If you have specific data or syntax that you want to discuss, you can post them to the SAS Support Communities.

5. Hi Rick,
I have 9 data points and I would like to ask you if it's appropriate to apply regression with restricted cubic splines. I have the scatterplot from sgplot, but I can't add it here. The plot shows an upward quadratic then decreases linearly. I will be happy to send the display.

Thanks!

• Sure. Go ahead.

6. This forum doesn't support tables or graphs. I don't have your email. How can I share it with you?

• Sorry that I was unclear. I meant "Sure, go ahead and fit a cubic spline model to 9 data points."

If you need more help, you can post questions to the SAS Support Communities. The Communities enable you to upload data, programs, and graphics.

7. Hi Rick,
Thank you for this post and your comments. They are really helpful.

Following your above SAS syntax, I wrote the following one (Model one), by specifying the values at each knot.

```/*model one*/ proc glmselect data=Have; effect spl = spline(X/ details naturalcubic basis=tpf(noint) knotmethod=list(2513, 3174, 3474.5, 3903, 5000) ); model Y = spl / selection=none; output out=SplineOut2 predicted=Fit; quit;```

I then created 3 spline variables and wrote model two. The results generated by the two models are almost the same, except the coefficients of the thee spline variables (the t-values and p-values are the same. Would you please share with me your thoughts about the differences?
Many thanks, Hongjie

```/*model two*/ data Have2; set Have;   if x<=2513 then x2513=0; else x2513=x-2513; if x<=3174 then x3174=0; else x3174=x-3174; if x<=3474.5 then x3474=0; else x3474=x-3474.5; if x<=3903 then x3903=0; else x3903=x-3903; if x<=5000 then x5000=0; else x5000=x-5000; x_sq=x*x; x_q=x**3; x2513q=x2513**3; x3174q=x3174**3; x3474q=x3474**3; x3903q=x3903**3; x5000q=x5000**3;   term2=x2513q-x3903q*((5000-2513) /(5000-3903))+x5000q*((3903-2513) /(5000-3903)); term3=x3174q-x3903q*((5000-3174) /(5000-3903))+x5000q*((3903-3174) /(5000-3903)); term4=x3474q-x3903q*((5000-3474.5)/(5000-3903))+x5000q*((3903-3474.5)/(5000-3903)); run;   proc reg data=Have2; model y=x term2 term3 term4; output out=out4 p=Fit; run;```
• Your definitions of the spline variables (term2-term4) are wrong. You do not need to use a separate DATA step to generate the spline effects because PROC GLMSELECT will give them to you. See the article, "Visualize a regression with splines."

If you add OUTDESIGN=Splines option to the PROC GLMSELECT statement, then you can reproduce the parameter estimates by using the following:

```proc reg data=Splines plots=none; ods select ParameterEstimates; model y= spl_: ; /* Note: spl_1 = x */ quit;```
8. Hi Rick,

I have to manually create basis functions for a variable but SAS doesn't seem to have a procedure to do it. I am using Frank Harrell's macro RCSPLINE but the basis variables are not identical to those created by SAS using OUTDESIGN. It seems like SAS is dividing the second basis by 4 and the last one by 4.5. Not sure where these numbers come from for the example given below. Why are SAS numbers different from the ones created by the macro?

data simulation;
do i=1 to 10000;
t=rand('uniform',0,10);
p=1/(1+exp(sin(t)));
y=rand('bernoulli',p);
output;
end;
run;

*model a natural cubic spline;
*and store the result in "mystore";
proc logistic data=simulation OUTDESIGN=splines;
effect myspline=spline(t/knotmethod=list(1,2,3,4) NATURALCUBIC);
model y(event="1")=myspline;
store mystore;
run;
/*-------------------------
MACRO FOR CUBIC SPLINES
-------------------------*/
%macro rcspline(x,knot1,knot2,knot3,knot4,knot5,knot6,knot7,knot8,knot9,knot10, norm=2);
%LOCAL j v7 k tk tk1 t k1 k2;
%LET v7=&x; %IF %length(&v7)=8 %THEN %LET v7=%SUBSTR(&v7,1,7);
%*get no. knots, last knot, next to last knot;
%DO k=1 %TO 10;
%IF %QUOTE(&&knot&k)= %THEN %GOTO nomorek;
%END;
%LET k=11;
%nomorek: %LET k=%EVAL(&k-1); %LET k1=%EVAL(&k-1); %LET k2=%EVAL(&k-2);
%IF &k<3 %THEN %PUT ERROR: <3 knots given. no spline variables CREATEd.;
%ELSE %DO;
%LET tk=&&knot&k;
%LET tk1=&&knot&k1;
DROP _kd_; _kd_=
%IF &norm=0 %THEN 1;
%ELSE %IF &norm=1 %THEN &tk - &tk1;
%ELSE (&tk - &knot1)**.666666666666; ;
%DO j=1 %TO &k2;
%LET t=&&knot&j;
&v7&j=max((&x-&t)/_kd_,0)**3+((&tk1-&t)*max((&x-&tk)/_kd_,0)**3
-(&tk-&t)*max((&x-&tk1)/_kd_,0)**3)/(&tk-&tk1)%STR(;);
%END;
%END;
%mend rcspline;

**comparing to splines datasets from PROC LOGISTIC
**t1 from PROC LOGISTIC=t1 from below divided by 4-1=3;
DATA want; SET simulation;
%rcspline(t,1,2,3,4);
RUN;

• Near the end of my previous article about splines, I explain why the basis functions are not equal, but the predicted values are the same. Briefly, the macro uses a different scaling.

9. Thank you Rick!

I am also looking to implement penalized spline and I was wondering if there is a way to do so in SAS. I am looking for a SAS equivalent of the R- PSPLINE functionthat can be used in PROC LOGISTIC.

10. Harells %PSLINE macro is for plotting the restricted cubic spline. The PSPLINE function is R is a piecewise polynomial with a smoothing penalty. I am not sure if SAS has anything at the moment.

11. I just found out that there is a PSPLINE(x) that can be used only in PROC TRANSREG. I am running conditional logit so I cannot use it in PROC LOGISTIC or PROC PHREG.

• Yes. PROC PHREG supports the EFFECT statement, so you can create spline effects and use them as explanatory variables in the P-H model.

12. Hello Rick,
I am having trouble creating these visualizations in SAS using PROC PHREG. I tried using your code above by trying to request an output dataset with predicted fit values using the following line within PHREG: OUTPUT out =SplineOut predicted = Fit;. However, it does not seem as if PHREG supports requesting predicted fit. Is there some other way to get a similar graph in PHREG? Thanks.

• Check the doc. The SURVIVAL= keyword on the OUTPUT statement enables you to output a variable that contains the survival probability.

13. Hi Rick,
I would like to ask you if you have used the %RCS_reg package, I am using it and I have found that I can run the results but I cannot generate the RCS curves. The log reports an error saying, I would like to consult you on what this can be done. Here is my code%RCS_Reg( infile= NHSALL_2, outfile= NHSALL_out,
Dir_data= ,
Main_spline_var= LBXVIDMS, ref_val= , AVK_msv= 0, knots_msv=5 50 95,
Oth_spline_var1=,
typ_reg= cox,
dep_var= mort_all, surv_time_var= PERyear_EXM,
adjust_var=sex1 sex2 race1 race2 race3 race4 BMI1 BMI2 BMI2 edu1
edu2 edu3 income1 income2 income3 drinking1 drinking2
drinking3 smoking1 smoking2 smoking3 phyacty1 phyacty2 phyacty3
DURDM1 DURDM2 DURDM3 MEDDM1 MEDDM2 MEDDM3 MEDDM4
hbp cvd ghpgp hbc,
Y_ref_line= 1, max_Xaxis= ,
exp_beta= 1, round=,
specif_val= );
quit;
. The variables are converted to dummy variables except for age which is continuous.

• No, I have never used that macro. Now that SAS regression routines support spline effects natively, there is no need to run a third-party macro from 2008.

14. Hi Rick,
This blog is really so helpful and appreciated. Your code for plotting linear gression with RCS works very well when there are no covariates, but how can I visualize the multivaraible-adjusted spline. Here is my code. I found this code only plots unadjusted predicted value.

proc glmselect data=diet outdesign(addinputvars fullmodel)=SplineBasis;
effect splT_fish= spline(raw_T_fish/ basis=tpf(noint) NATURALCUBIC details knotmethod=percentilelist(5 27.5 50 72.5 95));
model n3pufas=splT_fish AGE SEX BMI raw_cereals raw_pulses raw_meat raw_egg raw_milk
raw_potato raw_fruits raw_veg nuts vegoilfat seaweeds energy /selection=none;

proc reg data=SplineBasis plots=none;
model n3pufas=splT_fish_: ;
output out=regout predicted=fit LCLM=Lower UCLM=Upper;

proc sgplot data=regOut noautolegend Noborder nowall;
scatter x=raw_T_fish y=Fit;
series x=raw_T_fish y=Fit ;
Series y=Lower x=raw_T_fish ;
Series y=Upper x=raw_T_fish;

15. Hocine Tighiouart on

Hi Rick,
I am fitting a restricted cubic spline using proc phreg using the effect statement, see below. My question is when you use the baseline statement to predicted survival at certain time point for different values of the variable dpgfr3m, does below work if the covariate included in the model is spl? the dataset cox1_temp has value for dpgfr3m and not spl

proc phreg data=resample_treated;
effect spl=spline(dpgfr3m/details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
model v1_fu_d*v1_ev_d(0)=spl;
baseline covariates=cox1_temp out=cox1_temp1 survival=surv1 timelist=12 24 36 48;
by replicate;
run;

16. Md Bayzidur Rahman on

Hi Rick,
Is it possible to fit a Poisson model with offset that allowed the effect statement?

• PROC GLMIMMIX supports the EFFECT statement. You can use DIST=POISSON on the MODEL statement.