This week's SAS tip is from Barry de Ville and Padraic Neville's enterprising new book Decision Trees for Analytics Using SAS Enterprise Miner. With their combined vast expertise, De Ville and Neville have created a comprehensive guide to decision tree theory, use, and applications.
If you're interested in this week's free tip and want to learn more about the topic or book, visit our online catalog. You'll find a free book excerpt, example code and data, and more...
The following excerpt is from SAS Press authors Barry de Ville and Padraic Neville and their book “Decision Trees for Analytics Using SAS Enterprise Miner” Copyright © 2013, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. (please note that results may vary depending on your version of SAS software).
Crafting the Decision Tree Structure for Insight and Exposition
Here we talk about the art of growing a decision tree for insight (extracting conceptually appealing information from data) and exposition (displaying the decision tree results in a form that communicates insight and informs policy and planning). The goals of insight and exposition differ and complement the goal of using decision trees to extract key relationships and predictive structure from data (which satisfies the requirement of maintaining an overall form, structure, and sequence of branch formation in the decision tree).
You might find it useful to think in terms of telling a story when growing a decision tree to reveal information and communicate results. The storyline and theme needs to support the conceptual framework of the audience. The story illuminates key interests and potentially contains a few plot twists that upset conventional ways of looking at the data and therefore pave the way for the development of insight and improved understanding.
In telling the story, it is important to have a beginning, middle, and end. The story should be told in terms that are familiar to the audience. And, while it can be useful to include a few twists in the plot, the insights that are revealed should be plausible. The best way to ensure a good story line is to construct the decision tree in-line with the conceptual model of the area that the decision tree is designed to illuminate. For example, if you are looking at purchase behavior, then the attributes of the decision tree need to reflect concepts that are relevant to purchasing behavior. If the application is quality control and you are looking at part failures, then the attributes of the decision tree need to reflect concepts that are relevant to part failures.
Every application area in which expository decision trees can be deployed is characterized by concepts that either explicitly or implicitly exist in the minds of the audience. Concepts have been measured and reflected by different entities in the data set and can be linked differently, particularly if the entities suggest different links based on the empirical characteristics of the data. However, there is always an underlying story line, a presumed relation, and a presumed cause and effect or sequence of causes and effects. Some decision trees are more comprehensive than others. One characteristic of a comprehensive decision tree is that the data in the conceptual area that is being explored contains a range of related attributes. As a result, the story that is told by the decision tree reflects both a plausible set of relationships and a fairly complete set of relationships (i.e., to the extent possible, the substantial drivers of the relationships being explored have been included).
To build this type of decision tree for exposition, the following tasks should be performed:
1.) Define the business and/or scientific question.
2.) Determine the main features of a conceptual model that describes the major constructs
involved in the question resolution.
3.) Determine the data measures, fields, and field values that will become the operational
components of the conceptual model when the model is translated to form the decision tree.
4.) Develop the story line (i.e., the presumed sequence of events as the operational components
unfold to tell the story).
5.) Determine key relationships or potential plot twists to be examined in shaping the form of the
6.) Translate the tree results into a form that illuminates the original question.