Analyzing Non-normal Data in the Presence of Missing Data: Application in Social and Biomedical Studies

Niloofar Ramezani
George Mason University


In many applications, the response variable is neither continuous nor normally distributed. If the outcome variable is binary, count, multinomial, or ordinal, using ordinary linear models developed for continuous data are inappropriate due to the discrete nature of the response and the non-linear relationship between predictors and response variable. Therefore, more advanced models such as binomial, multinomial, and ordinal logistic regression are some of the robust predictive methods to use for modeling the relationship between non-normal discrete response and any kind of predictors. This study looks at several methods of modeling binary, count, categorical and ordinal response variables within regression models in the presence of missing data as well as statistical classification approaches. Starting with the simplest case of binary responses, through Poisson, multinomial, and ordinal response variables, this study discusses different modeling options as well as the appropriate missing data handling techniques in SAS. Three missing data handling techniques including multiple imputation are considered to appropriately account for the high percentages of missing observations which are present in the majority of social and biomedical studies. Various techniques are discussed, and applied to real data with non-normal outcome variables, in this study. This paper discusses different options within SAS 9.4 for the aforementioned models. These procedures include, but are not limited to, PROC LOGISTIC, PROC GENMOD, PROC NLMIXED, PROC CATMOD and PROC STANDARD, PROC MI and PROC MIANALYZE.