Using Factor Analysis and Multivariate Analysis of Variance to Explore Academic Achievement in the 2016 Monitoring the Future Study

Stefanie Kairs
National University


The 2016 Monitoring the Future survey is part of an annual, long-term study of American adolescents and adult high school graduates conducted by the University of Michigan’s Institute for Social Research. This secondary data analysis uses the FACTOR procedure in the SAS(TM) Studio software to perform factor analysis to extract latent structures describing academic achievement, environment, and student delinquency. 17,719 observations were used to perform multivariate analysis of variance via the GLM procedure to explore the relationships between the extracted factors and demographic variables for ethnicity, gender, and population density. The SAS Code and results are presented, along with a discussion of the necessary data cleaning steps, data quality assessment and post-hoc analyses. A statistically significant overall effect was found for each independent variable: gender (Wilks’ Lambda = 0.96, F= 273.23, df= (3, 17711), p < 0.0001); race (Wilks’ Lambda = 0.86, F= 479.83, df= (6, 35422), p < 0.0001); population density (Wilks’ Lambda = 0.98, F= 65.31, df= (6, 35422), p < 0.0001). Population density explains 2% (Pillai’s trace = 0.022, p < 0.0001) of the variance in academic achievement, academic environment, and at-risk behaviors. Gender explains 4% (Pillai’s trace = 0.044, p < 0.0001), and race explains 14% (Pillai’s trace = 0.145, p < 0.0001) of the variance. The academic environment for 8th and 10th grade students was described by an extracted factor with high loadings for the variables for parental education, college preparatory program, and remedial schooling (negative loading), and was shown to vary significantly by race.