Should you use principal component regression?

This article describes the advantages and disadvantages of principal component regression (PCR). This article also presents alternative techniques to PCR.

In a previous article, I showed how to compute a principal component regression in SAS. Recall that principal component regression is a technique for handling near collinearities among the regression variables in a linear regression. The PCR algorithm in most statistical software is more correctly called "incomplete" PCR because it uses only a subset of the principal components. Incomplete PCR means that you compute the principal components for the explanatory variables, keep only the first k principal components (which explain most of the variance among the regressors), and regress the response variable onto those k components.

The principal components that are dropped correspond to the near collinearities. Consequently, the standard errors of the parameter estimates are reduced, although the tradeoff is that the estimates are biased, and "the bias increases as more [principal components]are dropped" (Jackson, p. 276).

Some arguments in this article are from J. E. Jackson's excellent book, A User's Guide to Principal Components, (1991, pp. 271–278). Jackson introduces PCR and then immediately cautions against using it (p. 271). He writes that PCR "is a widely used technique," but "it also has some serious drawbacks." Let's examine the advantages and disadvantages of principal component regression.

Advantages of principal component regression

Principal component regression is a popular and widely used method. Advantages of PCR include the following:

PCR can perform regression when the explanatory variables are highly correlated or even collinear.
PCR is intuitive: you replace the basis {X1, X2, ..., Xp} with an orthogonal basis of principal components, drop the components that do not explain much variance, and regress the response onto the remaining components.
PCR is automatic: The only decision you need to make is how many principal components to keep.
The principal components that are dropped give insight into which linear combinations of variables are responsible for the collinearities.
PCR has a discrete parameter, namely the number of components kept. This parameter is very interpretable in terms of geometry (linear dimensions kept) and in terms of linear algebra (low-rank approximations).
You can run PCR when there are more variables than observations (wide data).

Drawbacks of principal component regression

The algorithm that is currently known as PCR is actually a misinterpretation of the original ideas behind PCR (Jolliffe, 1982, p. 201). When Kendall and Hotelling first proposed PCR in the 1950s, they proposed "complete" PCR, which means replacing the original variables by all the principal components, thereby stabilizing the numerical computations. Which principal components are included in the final model is determined by looking at the significance of the parameter estimates. By the early 1980s, the term PCR had changed to mean "incomplete PCR."

The primary argument against using (incomplete) principal component regression can be summarized in a single sentence: Principal component regression does not consider the response variable when deciding which principal components to drop. The decision to drop components is based only on the magnitude of the variance of the components.

There is no a priori reason to believe that the principal components with the largest variance are the components that best predict the response. In fact, it is trivial to construct an artificial example in which the best predictor is the last component, which will surely be dropped from the analysis. (Just define the response to be the last principal component!) More damning, Jolliffe (1982, p. 302) presents four examples from published papers that advocate PCR, and he shows that some of the low-variance components (which were dropped) have greater predictive power than the high-variance components that were kept. Jolliffe concludes that "it is not necessary to find obscure or bizarre data in order for the last few principal components to be important in principal component regression. Rather it seems that such examples may be rather common in practice."

There is a hybrid version of PCR that enables you to use cross validation and the predicted residual sum of squares (PRESS) criterion to select how many components to keep. (In SAS, the syntax is proc pls method=PCR cv=one cvtest(stat=press).) Although this partially addresses the issue by including the response variable in the selection of components, it is still the case that the first k components are selected and the last p – k are dropped. The method never keeps the first, third, and sixth components, for example.

Alternatives to principal component regression

Some alternatives to principal component regression include the following:

Ridge regression: In ridge regression, a diagonal matrix is added to the X`X matrix so that it becomes better conditioned. This results in biased parameter estimates. You can read an explanation of ridge regression and how to compute it by using PROC REG in SAS.
Complete PCR: As mentioned previously, use the PCs as the variables and keep the components whose parameter estimates are significant.
Complete PCR with variable selection: Use the PCs as the variables and use the variable-selection techniques to decide which components to retain. However, if your primary goal is variable reduction, then use variable-selection techniques on the original variables.
Partial Least Squares (PLS): Partial least square regression is similar to PCR in that both select components that explain the most variance in the model. The difference is that PLS incorporates the response variable. That is, the components that are produced are those that explain the most variance in the explanatory AND response variables. In SAS, you can compute a PLS regression by using PROC PLS with METHOD=PLS or METHOD=SIMPLS. You will probably also want to use the CV and CVTEST options.

Summary

In summary, principal component regression is a technique for computing regressions when the explanatory variables are highly correlated. It has several advantages, but the main drawback of PCR is that the decision about how many principal components to keep does not depend on the response variable. Consequently, some of the variables that you keep might not be strong predictors of the response, and some of the components that you drop might be excellent predictors. A good alternative is partial least squares regression, which I recommend. In SAS, you can run partial least squares regression by using PROC PLS with METHOD=PLS.

References

Jackson, J. E. (1991), A User’s Guide to Principal Components, New York: John Wiley & Sons.
Jolliffe, I, T. (1982) “A Note on the Use of Principal Components in Regression,” J. Royal Statistical Society.

2 Comments

Peter Flom on October 25, 2017 3:38 pm

"PCR is automatic" .... is that an advantage or a disadvantage? I'd say it is some of each, but maybe more disadvantage. Letting the computer do your thinking is a way of telling your boss to pay you less.

Pingback: Principal component regression in SAS - The DO Loop

Blogs