Ordinal vs. Multinomial. Does model selection make a difference?

Corey Leadbeater
National University


Abstract

Regression modeling, a foundational component of data analysis and machine learning, is one of the most highly sought-after skills by employers seeking new data scientists.1 While most data science curricula tend to include regression modeling techniques, the conceptual nuances between theoretical and practical applications can be nebulous. In this paper we use SAS 9.4 and the 2016 Monitoring Futures Survey to demonstrate the utility and effects of model selection on context. That is, the ability of the model to properly communicate the users intended information. A cross-sectional secondary analysis using Pearson’s Chi-Square test statistic for independence was conducted on the 2016 Monitoring Futures Study dataset to determine the association between “behavior risk” (v7335) and each of the three predictor variables, “parental communication” (v7254), “time spent alone after school” (loner) and student letter grades (v7221). Next, two logistic models were computer using these predictors. Statistically significant associations were identified, and adjusted odds ratios were produced using both multinomial and ordinal regression models. Output from both models were evaluated and compared to demonstrate the utility of ordinal modeling (and output) as a “higher-view” and generalizing procedure, whereas multinomial models are more appropriate for more specific “detail-view “ and group-level comparisons.

SAS 9.4 statistical software was used for all analysis. This paper is intended for students, healthcare administrators and any others interested in utilizing SAS software for biostatistical analytics and reporting purposes.