Are you struggling to master SAS Enterprise Miner? Does it feel as if there are ten duotrigintillion (i.e. a Googol) choices and options that are keeping you from making progress? Why not proceed in the same way Neanderthals consumed a mammoth; piece by piece!
You first have to decide on your analytical objective; what would you like to achieve with this project (e.g. increase student retention at my school). If possible choose a domain that you are familiar with, as that would assist you in assessing and interpreting the generated results. This also provides the opportunity to finally exploit that “pet” data set you have nurtured for so many years.
If you then follow the SAS Data Mining methodology, SEMMA (Sample, Explore, Modify, Model, Access), you already have a skeleton framework that you can exploit to solve your analytical objective. Choose one node from each of the five steps in the SAS data mining methodology, connect the nodes in the order given in the methodology to your input data set, and you have a functioning work flow or plan.
The five nodes that you might choose from the SEMMA categories could be:
• Sample: Data Partition (to split my data set into a training and a validation data set)
• Explore: Graph Explore (to produce histograms of variable distributions)
• Modify: Impute (to fill in missing values)
• Model: Regression (to model the relationship between inputs and the target)
• Access: Model Comparison (to summarize different measures)
Most nodes in SAS Enterprise Miner should run with their default settings which you could then modify if the results are different from what you expect or do not meet your expectations (therefore the suggestion to start with a problem from a familiar domain). If a node is not needed to solve a particular problem (e.g. you want to use the whole data set), there is usually a combination of settings that would let the data pass to the following node in the flow without modification.
The Node Reference section in the SAS Enterprise Miner Help provides a wealth of resources on Data Mining in general and also on each node in particular. The default settings are described and explained and all the alternatives enumerated. Run the work flow, inspect the results and volià, the mammoth is gone (OK, only the teeth remain, but who would want to eat those anyway)!
The methodology proposed here is similar to that of SAS RPM (Rapid Predictive Modeler) that was developed to assist business users in generating sound models quickly. Once your confidence with using SAS Enterprise Miner increases, you can explore more nodes and delve deeper into the options and the algorithms behind the nodes.
One of my esteemed professors once told me that Mathematics is not a spectator sport; the same holds true for Data Mining. You can only get so far reading text books. So go on, get your hands dirty and the world as you know it will never be the same again!