Decision Trees: a Gentle Introduction

Richard Hector
Banner M.D. Anderson Cancer Center


Every car is a vehicle, but not every vehicle is a car. Similarly, classification and regression trees (CART) and decision trees look similar. Both begin with a single node followed by an increasing number of branches. However, they serve different purposes. The purpose of a family automobile is different from that of giant mining truck. This paper is a gentle introduction to decision trees using PROC DTREE. When you need to explore the relationship to factors and an outcome, CART is a useful non-parametric tool. The branching algorithm facilitates moving through the variables in the data to determine their effect on the outcome. In contrast, in decision trees the variables are chosen because the decision maker knows or can infer their effects on the outcome. Also, the decision maker knows or can estimate their distribution among relevant groups. Finally, the result (reward or pay-off) of each pathway is known or can be estimated. The goal of a decision tree is to ascertain the most desirable outcome given the combination of variables and costs (in other words, the best pathway). In addition, the amount of risk the decision maker is willing to accept can be incorporated in a decision tree analysis. This paper focuses on an example from medical care. Intermediate level familiarity with the data step is sufficient for understanding this paper.