Blogs

Blogs

Tag: linear regression

Analytics | Learn SAS

Graph of norm of solutions to the singular system A*b=c. The norm is plotted for vectors b + alpha*x_Null where b is the Moore-Penrose solution and x_Null is a basis for the nullspace of A.

Rick WicklinNovember 21, 2018 0

Generalized inverses for matrices

A data analyst asked how to compute parameter estimates in a linear regression model when the underlying data matrix is rank deficient. This situation can occur if one of the variables in the regression is a linear combination of other variables. It also occurs when you use the GLM parameterization

Read More

Programming Tips

Rick WicklinAugust 27, 2018 0

On the assumptions (and misconceptions) of linear regression

A frequent topic on SAS discussion forums is how to check the assumptions of an ordinary least squares linear regression model. Some posts indicate misconceptions about the assumptions of linear regression. In particular, I see incorrect statements such as the following: Help! A histogram of my variables shows that they

Read More

Analytics | Programming Tips

Rick WicklinApril 25, 2018 0

An easier way to run thousands of regressions

SAS programmers on SAS discussion forums sometimes ask how to run thousands of regressions of the form Y = B0 + B1*X_i, where i=1,2,.... A similar question asks how to solve thousands of regressions of the form Y_i = B0 + B1*X for thousands of response variables. I have previously

Read More

Programming Tips

Rick WicklinApril 18, 2018 0

The sweep operator: A fundamental operation in regression

The sweep operator performs elementary row operations on a system of linear equations. The sweep operator enables you to build regression models by "sweeping in" or "sweeping out" particular rows of the X`X matrix. As you do so, the estimates for the regression coefficients, the error sum of squares, and

Read More

Analytics

Principal component regression in SAS: Loadings plot

Rick WicklinOctober 25, 2017 0

Should you use principal component regression?

This article describes the advantages and disadvantages of principal component regression (PCR). This article also presents alternative techniques to PCR. In a previous article, I showed how to compute a principal component regression in SAS. Recall that principal component regression is a technique for handling near collinearities among the regression

Read More

Analytics | Learn SAS

Principal component regression in SAS: Loadings plot

Rick WicklinOctober 23, 2017 0

Principal component regression in SAS

A common question on discussion forums is how to compute a principal component regression in SAS. One reason people give for wanting to run a principal component regression is that the explanatory variables in the model are highly correlated which each other, a condition known as multicollinearity. Although principal component

Read More

Advanced Analytics | Programming Tips

Makoto Unemi (畝見真)September 4, 2017 0

SAS Viyaで線形回帰

SAS Viyaで線形回帰を行う方法を紹介します。言語はPythonを使います。 SAS Viyaで線形回帰を行う方法には大きく以下の手法が用意されています。多項回帰：　simpleアクションセットで提供。一般化線形回帰または一般線形回帰：　regressionアクションセットで提供。機械学習で回帰：　各種機械学習用のアクションセットで提供。今回は単純なサインカーブを利用して、上記3種類の回帰モデルを作ってみます。【サインカーブ】 -4≦x<4の範囲でサインカーブを作ります。普通に $$y = sin(x) $$を算出しても面白みがないので、乱数を加減して以下のようなデータを作りました。これをトレーニングデータとします。青い点線が $$y=sin(x)$$ の曲線、グレーの円は $$y=sin(x)$$ に乱数を加減したプロットです。グレーのプロットの中心を青い点線が通っていることがわかります。今回はグレーのプロットをトレーニングデータとして線形回帰を行います。グレーのプロットはだいぶ散らばって見えますが、回帰モデルとしては青い点線のように中心を通った曲線が描けるはずです。トレーニングデータのデータセット名は "sinx" とします。説明変数は "x"、ターゲット変数は "y" になります。各手法で生成したモデルで回帰を行うため、-4≦x<4 の範囲で0.01刻みで"x" の値をとった "rangex" というデータセットも用意します。まずはCASセッションを生成し、それぞれのデータをCASにアップロードします。 import swat host = "localhost" port = 5570 user = "cas" password = "p@ssw0rd"

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinApril 26, 2017 0

Visualize a design matrix

Most SAS regression procedures support a CLASS statement which internally generates dummy variables for categorical variables. I have previously described what dummy variables are and how are they used. I have also written about how to create design matrices that contain dummy variables in SAS, and in particular how to

Read More

Advanced Analytics | Learn SAS

Rick WicklinFebruary 13, 2017 0

An easy way to run thousands of regressions in SAS

A common question on SAS discussion forums is how to repeat an analysis multiple times. Most programmers know that the most efficient way to analyze one model across many subsets of the data (perhaps each country or each state) is to sort the data and use a BY statement to

Read More

Advanced Analytics

Rick WicklinFebruary 1, 2017 0

Simulate many samples from a linear regression model

In a previous article, I showed how to simulate data for a linear regression model with an arbitrary number of continuous explanatory variables. To keep the discussion simple, I simulated a single sample with N observations and p variables. However, to use Monte Carlo methods to approximate the sampling distribution

Read More

Advanced Analytics

Rick WicklinJanuary 25, 2017 0

Simulate data for a linear regression model

This article shows how to simulate a data set in SAS that satisfies a least squares regression model for continuous variables. When you simulate to create "synthetic" (or "fake") data, you (the programmer) control the true parameter values, the form of the model, the sample size, and magnitude of the

Read More

Rick WicklinMarch 2, 2016 0

Dummy variables in SAS/IML

Last week I showed how to create dummy variables in SAS by using the GLMMOD procedure. The procedure enables you to create design matrices that encode continuous variables, categorical variables, and their interactions. You can use dummy variables to replace categorical variables in procedures that do not support a CLASS

Read More

Rick WicklinFebruary 24, 2016 0

Four ways to create a design matrix in SAS

SAS programmers sometimes ask, "How do I create a design matrix in SAS?" A design matrix is a numerical matrix that represents the explanatory variables in regression models. In simple models, the design matrix contains one column for each continuous variable and multiple columns (called dummy variables) for each classification

Read More

Learn SAS

Rick WicklinFebruary 22, 2016 0

Create dummy variables in SAS

A dummy variable (also known as indicator variable) is a numeric variable that indicates the presence or absence of some level of a categorical variable. The word "dummy" does not imply that these variables are not smart. Rather, dummy variables serve as a substitute or a proxy for a categorical

Read More

Rick WicklinJune 12, 2013 0

How to interpret a residual-fit spread plot

In a previous blog post, I described how to use a spread plot to compare the distributions of several variables. Each spread plot is a graph of centered data values plotted against the estimated cumulative probability. Thus, spread plots are similar to a (rotated) plot of the empirical cumulative distribution

Read More

Advanced Analytics

Rick WicklinMarch 13, 2013 0

The case of spilled coffee and the regression intercept

Argh! I've just spilled coffee on output that shows the least squares coefficients for a regression model that I was investigating. Now the parameter estimate for the intercept is completely obscured, although I can still see the parameter estimates for the coefficients of the continuous explanatory variable. What can I

Read More

Learn SAS

Rick WicklinFebruary 27, 2013 0

How to use PROC SGPLOT to display the slope and intercept of a regression line

A SAS user asked an interesting question on the SAS/GRAPH and ODS Graphics Support Forum. The question is: Does PROC SGPLOT support a way to display the slope of the regression line that is computed by the REG statement? Recall that the REG statement in PROC SGPLOT fits and displays

Read More