What is the regression equation for imputing missing values on cesd?.
Using the answer to Exercise B9 (g), impute missing CES-D scores. First, create a new CES-D variable so that any possible mistakes do not affect the original scores. With the Compute Variable command, set the new variable (e.g., newcesd) equal to the original, cesd. Then, do another Compute Variable, and set newcesd equal to the predicted value from the regression equation. Click the If pushbutton and select “Include if case satisfies condition,” and then enter: MISSING (cesd). This will result in imputations only for cases with missing CES-D scores (and nonmissing values for education and current employment status). Click continue, then OK. Then run descriptive statistics (within the Frequencies program) for both cesd and newcesd and answer the following questions: (a) Look at the Polit2SetC dataset in Data view and find the values for the newcesd variable. Case 17 originally had a missing value for cesd. What is the imputed value for this case on the newcesd variable? (b) How many cases have valid CES-D scores for the newcesd variable? (c) In an earlier exercise, we learned that there were missing values on current employment status for two women. What would explain why there are not two missing values for newcesd? (d) Are the means for cesd and newcesd the same? (e) Are the SDs for the two variables the same?
With only 3.8% missing values on the CES-D scale in a very large sample, we might well use listwise deletion for any further substantive analyses with the cesd variable. As an exercise, however, we will impute missing values on the CES-D total scale using regression analysis. We will restrict the regression to two predictors that are significantly correlated with CES-D scores, but that have minimal missing values themselves: educational attainment and current employment status. (Other variables with low levels of missing data—age, race/ethnicity, and number of children—were not significantly correlated with CES-D scores; you could verify this yourself.) Use the instructions for running a standard multiple regression as discussed in the topic on Multiple Regression, with cesd as the dependent variable and educational attainment and current employment status as the independents. Then answer the following questions: (a) How many cases were used in this regression analysis? (b) What was the value of R? (c) Was the overall model statistically significant? (d) Was educational attainment significant, once current employment was controlled? (e) Was current employment status significant with educational attainment controlled? (f) Interpret what the b weights mean in terms of scores on the CES-D. (g) What is the regression equation for imputing missing values on cesd?