I started out as a Psychology major. During my third year as an undergraduate, I was hired on as a research assistant for my advisor in her cognitive psychology lab. Through this and progressively more complicated psychological research experience, I quickly grew to love statistics. By the end of that year, I decided to declare it as a second major. My first introduction to SAS was as a fourth-year undergraduate psychology student - still new to my statistics degree curriculum and working on a large-scale meta-analysis project spanning years of data. I had never programmed before seeing my first SAS procedure. I broke down in tears, terrified at what I had gotten myself into. I toughed it out (with help from my statistics professor), finished my psychology honors thesis with top grades and went on later to use SAS in my statistics thesis for good measure.
About a year later, in 2011, that same statistics professor encouraged us to submit our work for presentation at MWSUG, sweetening the deal with a promise of extra credit if we did. I hopped on that opportunity and submitted both my psychology thesis as well as my statistics thesis that night. A couple of months later, I received an email…they accepted both of my papers and awarded me a FULL student scholarship to attend!
I have come a long way from presenting my first thesis projects (I just arrived home from my 27th conference last weekend). I have learned to love not only SAS, but the statistics behind each procedure. This year, at MWSUG 2016 in Cincinnati, OH. I will be presenting 3 projects. One project will be in ePoster format. As the chair of this section (yes, this is correct. I’ve gone from terrified student to a section chair!), I felt the need to support it with my own research as well. This project is dedicated to the common and very pesky concept of Multicollinearity.
What is Multicollinearity? Why, it is precisely the statistical phenomenon wherin there exists a perfect or exact relationship between the identified predictor variables and the interested outcome variable. Simply put, it is the existence of predictor co-dependence. Coincidently, it is quite easy to detect. You can do so with three very simple to utilize options and one procedure, such as those given in the below example:
/* Examination of the Correlation Matrix */ Proc corr data=temp; Var hypertension aspirin hicholesterol anginachd smokingstatus obese_BMI exercise _AGE_G sex alcoholbinge; Run; /* Multicollinearity Investigation: VIF TOL COLLIN */ Proc reg data=temp; Model stroke = hypertension aspirin hicholesterol anginachd smokingstatus obese_BMI exercise _AGE_G sex alcoholbinge / vif tol collin; Run; Quit;
Through the CORR procedure, we can examine the correlation matrix and manually check for any predictor variables that show high correlation with other variables. In our REG procedure, we can indicate the VIF, TOL, and COLLIN options in the MODEL statement to pull information measuring the variance inflation factor, tolerance, and collinearity.
Would you like to learn more about how to interpret the results produced by these procedures? Would you like to learn ways to control for multicollinearity after it has been detected? Come check out my poster at MWSUG 2016, October 9-11 in Cincinnati! I would love to chat about your multicollinearity issues, interest, or just other curious questions.