Mining With Trees?

0

I first was exposed to decision trees several years ago when working on a predictive modeling project for Baylor University. At first glance the graphics resembled an organizational chart and the terminology associated with trees was different and strange. These were terms such as root, branch, leaves, pruning, splitting, random forest, maximum tree, optimum tree, node purity, stump, bagging, boosting, and many other terms that might be associated with forestry. After the initial exposure to these terms and decision tree concepts, I soon learned that the decision tree technique is one of the most intuitive and popular data mining methods.

There are many definitions for data mining, but in this context data mining is the science and art of exploring large amounts of data in order to discover useful patterns. There are numerous tools and techniques available for data miners to apply for discovery and prediction.  As mentioned above, one such tool is the decision tree.  An important characteristic of decision trees are that they are easy to use and simple to explain, being a simple person I like decision trees.  A decision tree could be described as a set of algorithms whose purpose is to develop a hierarchical set of rules that describe how to partition a data set into smaller data sets. The objective being that the observations in the smaller datasets are more similar. Trees are successful in explaining the relationship between predictor variables and the target variable, making them a good tool for predictive modeling. In data analysis without a target variable, decision trees are useful in detecting patterns in the data and grouping or clustering the data.

Some of the applications and benefits of decision trees are:

  • A tool for data mining tasks , such as classification, regression, clustering, and variable selection
  • Intuitive and easy to use
  • Handles continuous, categorical, and textual data
  • Handles missing data from non-normal distributions, missing values and extreme values
  • Relatively small computational resources yields high predictive performance
  • Used as an exploratory tool
  • Available in almost all data mining software packages.

To learn more about decision trees and their application to data mining tasks, Google “Decision Trees” and browse some of the many documents on this subject. If you are new to data mining and want to gain knowledge and hands-on experience with decision trees, SAS Education also has several training courses that present excellent coverage of this topic.

If you are teaching data mining or are interested in developing a data mining course and are looking for good data sets and teaching materials, the SAS Global Academic Program offers teaching materials on several data mining topics.  Contact academic@sas.com to request data mining teaching materials.

Share

About Author

Tom Bohannon

Dr. Tom R. Bohannon is an analytical consultant for SAS Institute specializing in applying analytical methods to business problems in industry and higher education. For the past three he has served also as a visiting professor in the statistics department at Texas A&M University. Before retiring from Baylor University in April of 2007, Bohannon was Director and Assistant Vice President for the Office of Institutional Research and Testing for twenty years. Prior to joining Baylor University, Bohannon spent ten years at Appalachian State University as an Associate Professor of mathematics and statistics, as the University Statistical Consultant, and as Director of Institutional Research. Dr. Bohannon has spent nearly 30 years in the institutional research field specializing in application of statistical methods to business problems in higher education. These applications include overseeing construction of data warehouses for Baylor University and applying data mining methods to enrollment management, retention, and fund raising. Dr. Bohannon earned a PhD in Statistics from Texas A & M University in 1976 and an MA in Mathematics from Wake Forest University in 1965. He also holds a BS in Mathematics with a Physics Minor from McNeese State University.

Related Posts

Back to Top