Examining your inputs with SAS Enterprise Miner


To build accurate predictive models you need clean data. Having a good understanding of the data helps you to interpret the models correctly and enables you to make sound intelligent business decisions.

SAS Enterprise Miner can help you to achieve the twin goals of improving the quality of data and gaining a good understanding the variables. Here are a few examples of how you can examine your inputs prior to model building.

(1)    To examine the distributions of your input variables,  

  • Right click on data source
  • select Explore
  • Change Sample Method to Random
  • Set Fetch size to Max
  • Click on the “Apply” button
  • Then click on the “Plot” button
  • Select “Histogram”
  • Select “Percent” in the  “Response Statistics” box 
  • Select a variable and assign the role of “X”   
  • Click finish.
  • To plot another variable click “Actions” from the menu bar, select “Histogram” and repeat the above steps.

(2)    Sometimes a variable that is skewed can be made to be more symmetric by applying transformations as discussed in Chapter 2 of my new book Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Second Edition.

(3)    Use Stat Explore node as described in Chapter 2 of my book. The output window shows mode for class variables and mean, Median and other statistics for interval variables by target level.

(4)    Use Decision Tree node for each input individually, or for sets of inputs to uncover patterns in the data.  See Chapter 4 of my book for a detailed explanation of the Decision Tree Node.

(5)    You can use the Replacement node to set ceilings on extreme values; use the filter node to delete observations with extreme values. See Chapter 2 of my book to see how these two nodes differ in treating the extreme values.

Learn more about Kattamuri Sarma and his new book Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Second Edition.


About Author

Kattamuri Sarma

Economist and Statistician

Kattamuri S. Sarma, PhD, is an economist and statistician with 30 years of experience in American business, including stints with IBM and AT&T. He is the founder and president of Ecostat Research Corp., a consulting firm specializing in predictive modeling and forecasting. He has been a SAS user since 1992, and he has extensive experience with multivariate statistical methods, econometrics, decision trees, and data mining with neural networks. Dr. Sarma is a SAS Certified Professional and a SAS Alliance Partner. Dr. Sarma received his PhD in economics from the University of Pennsylvania, where he worked under the supervision of Nobel Laureate Lawrence R. Klein.

Comments are closed.

Back to Top