Variable Selection Using Random Forests in SAS

DENIS NYONGESA
KAISER PERMANENTE


Abstract

Random forest is an increasingly used statistical method for classification and regression. It was introduced by Leo Breiman in 2001. A good prediction model begins with a great feature selection process. This paper proposes the ways of selecting important variables to be included in the model using random forest. The variables to be included in the model will be indexed or ranked according to the score of importance of each variable.