A contract instructor for SAS since 1981 and an NC State University professor, David A. Dickey has a passion for statistics and sharing that passion with others. On Tuesday at SAS Global Forum he gave a general overview of data mining, touching on decision trees, regression trees, logistic regression, neural networks, association analysis, text mining and more during his one-hour talk.
To illustrate the concepts and show real-life application of data mining, Dickey used data from the Framingham Heart Study, police reports on car accidents in Portugal and the US space shuttle missions.
People like decision trees, Dickey said, because they reflect the way we make decisions in our life anyway. As a light-hearted example, Dickey cited “speed dating,” where people ask each other a string of yes-no questions and then make a final decision as to whether they want to date a particular person.
Highlights from the presentation included:
- Recursive splitting – continually splitting the data into parts – is a key technique in decision trees. In the Framingham Heart Study that looked at age, blood pressure and cholesterol levels as indicator of first state coronary heart disease, recursive splitting led to the finding that first stage coronary heart disease is lower among people who keep their cholesterol under control.
- A study of police reports in Portugal was conducted to figure out the cost to society of auto accidents. Researchers split the data first on whether alcohol or drugs were involved. The next split was based on small or large engine size. A third split was based on whether it was a rear end collision or not.
- Another study took publicly available data from the space shuttle missions and used logistic regression to determine the probability of an O-ring failure at lower temperatures, which was determined to be the cause of the Space Shuttle Challenger disaster in 1986.
- Sometimes you discover the obvious, but sometimes you discover something interesting and useful.
For more, view the archived video of Dickey’s presentation below.