To build accurate predictive models you need clean data. Having a good understanding of the data helps you to interpret the models correctly and enables you to make sound intelligent business decisions.
SAS Enterprise Miner can help you to achieve the twin goals of improving the quality of data and gaining a good understanding the variables. Here are a few examples of how you can examine your inputs prior to model building.
(1) To examine the distributions of your input variables,
- Right click on data source
- select Explore
- Change Sample Method to Random
- Set Fetch size to Max
- Click on the “Apply” button
- Then click on the “Plot” button
- Select “Histogram”
- Select “Percent” in the “Response Statistics” box
- Select a variable and assign the role of “X”
- Click finish.
- To plot another variable click “Actions” from the menu bar, select “Histogram” and repeat the above steps.
(2) Sometimes a variable that is skewed can be made to be more symmetric by applying transformations as discussed in Chapter 2 of my new book Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Second Edition.
(3) Use Stat Explore node as described in Chapter 2 of my book. The output window shows mode for class variables and mean, Median and other statistics for interval variables by target level.
(4) Use Decision Tree node for each input individually, or for sets of inputs to uncover patterns in the data. See Chapter 4 of my book for a detailed explanation of the Decision Tree Node.
(5) You can use the Replacement node to set ceilings on extreme values; use the filter node to delete observations with extreme values. See Chapter 2 of my book to see how these two nodes differ in treating the extreme values.
Learn more about Kattamuri Sarma and his new book Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Second Edition.