I first was exposed to decision trees several years ago when working on a predictive modeling project for Baylor University. At first glance the graphics resembled an organizational chart and the terminology associated with trees was different and strange. These were terms such as root, branch, leaves, pruning, splitting, random forest, maximum tree, optimum tree, node purity, stump, bagging, boosting, and many other terms that might be associated with forestry. After the initial exposure to these terms and decision tree concepts, I soon learned that the decision tree technique is one of the most intuitive and popular data mining methods.
There are many definitions for data mining, but in this context data mining is the science and art of exploring large amounts of data in order to discover useful patterns. There are numerous tools and techniques available for data miners to apply for discovery and prediction. As mentioned above, one such tool is the decision tree. An important characteristic of decision trees are that they are easy to use and simple to explain, being a simple person I like decision trees. A decision tree could be described as a set of algorithms whose purpose is to develop a hierarchical set of rules that describe how to partition a data set into smaller data sets. The objective being that the observations in the smaller datasets are more similar. Trees are successful in explaining the relationship between predictor variables and the target variable, making them a good tool for predictive modeling. In data analysis without a target variable, decision trees are useful in detecting patterns in the data and grouping or clustering the data.
Some of the applications and benefits of decision trees are:
- A tool for data mining tasks , such as classification, regression, clustering, and variable selection
- Intuitive and easy to use
- Handles continuous, categorical, and textual data
- Handles missing data from non-normal distributions, missing values and extreme values
- Relatively small computational resources yields high predictive performance
- Used as an exploratory tool
- Available in almost all data mining software packages.
To learn more about decision trees and their application to data mining tasks, Google “Decision Trees” and browse some of the many documents on this subject. If you are new to data mining and want to gain knowledge and hands-on experience with decision trees, SAS Education also has several training courses that present excellent coverage of this topic.
If you are teaching data mining or are interested in developing a data mining course and are looking for good data sets and teaching materials, the SAS Global Academic Program offers teaching materials on several data mining topics. Contact email@example.com to request data mining teaching materials.