# Logistic regression

## Test multivariate associations when predicting for a dichotomous categorical outcome

Logistic regression is the

**multivariate**extension of a bivariate chi-square analysis. Logistic regression allows for researchers to**control for various demographic, prognostic, clinical, and potentially confounding factors**that affect the relationship between a primary predictor variable and a dichotomous categorical outcome variable. Logistic regression generates**adjusted odds ratios with 95% confidence intervals**. Logistic regression is published often in the medical literature and provides a measure of strength of relationship to a dichotomous categorical outcome when controlling for other variables.The figure below depicts the use of logistic regression. Predictor, clinical, confounding, and demographic variables are being used to predict for a dichotomous categorical outcome. Logistic regression is a multivariate analysis that can yield adjusted odds ratios with 95% confidence intervals.

### The steps for conducting a logistic regression in SPSS

1. The data is entered in a between-subjects fashion. The dichotomous categorical outcome is codified with

2. Click

3. Drag the cursor over the

4. Click

5. Click on the dichotomous categorical outcome variable to highlight it.

6. Click on the

7. Click on the primary predictor variable to highlight it.

8. Click on the

9. Repeat Steps 7 and 8 until all of the predictor, clinical, confounding, and demographic variables are moved into the Covariates: box.

10. Click on the

11. Click on the categorical variable in the

12. Click on the

13. Click on the

14. Click

15. Repeat Steps 11, 12, 13, and 14 until all of the categorical variables are in the

16. Click

17. Click on the

18. In the

19. In the

20. Click

21. Click on the

22. In the

23. Click

24. Click

**"0"**not having the outcome and**"1"**having the outcome. Categorical predictor variables with two levels are codified as 0 = NOT having the characteristic and 1 = HAVING the characteristic. Polychotomous categorical variables have a reference category that is codified as "0."2. Click

**.**__A__nalyze3. Drag the cursor over the

**drop-down menu.**__R__egression4. Click

**Binary Lo**.__g__istic5. Click on the dichotomous categorical outcome variable to highlight it.

6. Click on the

**arrow**to move the variable into the**box.**__D__ependent:7. Click on the primary predictor variable to highlight it.

8. Click on the

**arrow**to move the variable into the**box.**__C__ovariates:9. Repeat Steps 7 and 8 until all of the predictor, clinical, confounding, and demographic variables are moved into the Covariates: box.

10. Click on the

**Cate**button if researchers moved any categorical variables into the__g__orical**box.**__C__ovariates:11. Click on the categorical variable in the

**box to highlight it.**__C__ovariates:12. Click on the

**arrow**to move the variable into the**Ca**box.__t__egorical Covariates:13. Click on the

**Reference Category:**marker.__F__irst14. Click

**C**.__h__ange15. Repeat Steps 11, 12, 13, and 14 until all of the categorical variables are in the

**Ca**box.__t__egorical Covariates:16. Click

**Continue**.17. Click on the

**button.**__S__ave18. In the

**Residuals**table, click on the**and**__U__nstandardized**Sta**boxes to select them.__n__dardized19. In the

**Predicted Values**table, click on the**box to select it.**__P__robabilities20. Click

**Continue**.21. Click on the

**button.**__O__ptions22. In the

**Statistics and Plots**table, click on the**,**__H__osmer-Lemeshow goodness-of-fit**Case**, and__w__ise listing of residuals**CI for e**boxes to select them.__x__p(B):23. Click

**Continue**.24. Click

**OK**.### The steps for interpreting the SPSS output for a logistic regression

1. Scroll down to the

2. Look in the

If the

If the

3. Look in the

If the

If the

4. Look in the

5. Look in the

6. Researchers will interpret the adjusted odds ratio in the

If the confidence interval associated with the adjusted ratio crosses over 1.0, then there is a non-significant association. The

If the adjusted odds ratio is

If the adjusted odds ratio is

If the variable is measured at the ordinal or continuous level, then the adjusted odds ratio is interpreted as meaning

**Block 1: Method = Enter**section of the output.2. Look in the

**Omnibus Tests of Model Coefficients**table, under the**Sig.**column, in the**Model**row. This is the*p*-value that is interpreted.If the

*p*-value is**LESS THAN .05**, then researchers have a significant model that should be further interpreted.If the

*p*-value is**MORE THAN .05**, then researchers do not have a significant model and the results should be reported.3. Look in the

**Hosmer and Lemeshow Test**table, under the**Sig.**column. This is the*p*-value you will interpret.If the

*p*-value is**LESS THAN .05**, then the model does not fit the data.If the

*p*-value is**MORE THAN .05**, then the model does fit the data and should be further interpreted.4. Look in the

**Classification Table**, under the**Percentage Correct**in the**Overall Percentage**row. This is the total accuracy of the model. Researchers want it to ultimately be at least**80%**.5. Look in the

**Variables in the Equation**table, under the**Sig.**,**Exp(B)**, and**Lower**and**Upper**columns. The**Sig.**column is the*p*-value associated with the adjusted odds ratios and 95% CIs for each predictor, clinical, demographic, or confounding variable. The value in the**Exp(B)**is the adjusted odds ratio. The**Lower**and**Upper**values are the limits of the 95% CI associated with the adjusted odds ratio.6. Researchers will interpret the adjusted odds ratio in the

**Exp(B)**column and the confidence interval in the**Lower**and**Upper**columns for each variable.If the confidence interval associated with the adjusted ratio crosses over 1.0, then there is a non-significant association. The

*p*-value associated with these variables will also be**HIGHER**than .05.If the adjusted odds ratio is

**ABOVE 1.0**and the confidence interval is entirely above 1.0, then exposure to the predictor increases the odds of the outcome.If the adjusted odds ratio is

**BELOW 1.0**and the confidence interval is entirely below 1.0, then exposure to the predictor decreases the odds of the outcome.If the variable is measured at the ordinal or continuous level, then the adjusted odds ratio is interpreted as meaning

**for every one unit increase**in the ordinal or continuous variable, the risk of the outcome increases at the rate specified in the odds ratio.### Residuals and logistic regression

At this point, researchers need to construct and interpret several plots of the raw and standardized residuals to fully assess the fit of your model. Residuals can be thought of as

**the error associated with predicting or estimating outcomes using predictor variables**. Residual analysis is**extremely important**for meeting the linearity, normality, and homogeneity of variance assumptions of logistic regression.### The steps for conducting residual analysis for logistic regression in SPSS

1. Go back to the

The first is the

The second variable contains the

The third variable has

2. Click

3. Drag the cursor over the

4. Click

5. Click

6. Click

7. Click on the

8. Click on the

9. Click on the

10. Click on the

11. Click

**Data View**. There are three new variables that have been created.The first is the

**predicted probability**of that observation and is given the variable name of**PRE_1**.The second variable contains the

**raw residuals**(the difference between the observed and predicted probabilities of the model) and is given the variable name of**RES_1**.The third variable has

**standardized residuals**based on the raw residuals in the second variable and will be given the variable name of as**ZRE_1**.2. Click

**.**__G__raphs3. Drag the cursor over the

**drop-down menu.**__L__egacy Dialogs4. Click

**.**__S__catter/Dot5. Click

**Simple Scatter**to select it.6. Click

**Define**.7. Click on the

**RES_1**or raw residual variable to highlight it.8. Click on the

**arrow**to move the variable into the**Y Axis:**box.9. Click on the

**PRE_1**or predicted probability variable to highlight it.10. Click on the

**arrow**to move the variable into the**X Axis:**box.11. Click

**OK**.### The steps for interpreting the SPSS scatterplot output for logistic regression

1. If the points along the scatterplot are

If there are significantly

**symmetric**both above and below a straight line, with observations being**equally spaced out**along the line, then the**assumption of linearity**can be assumed. Interpretation of these types of scatterplot graphs allows for some**subjectivity**in regards to symmetry and spread along the line.If there are significantly

**larger residuals**and**wider dispersal of observations**along the line, then linearity cannot be assumed.### Outliers and logistic regression

**Normality and equal variance**assumptions apply to logistic regression analyses. Here is how to assess if there are any outliers in the dataset.

### The steps for checking for outliers with logistic regression in SPSS

1. Click

2. Drag the cursor over the

3. Click

4. Click on the

5. Click on the

6. Click

**.**__A__nalyze2. Drag the cursor over the

**D**drop-down menu.__e__scriptive Statistics3. Click

**.**__F__requencies4. Click on the

**ZRE_1**or standardized residuals variable to highlight it.5. Click on the

**arrow**to move the variable into the**Variable(s):**box.6. Click

**OK**.### The steps for interpreting the SPSS output for outliers with logistic regression

1. Look in the

2. Scroll through the entirety of the table.

3. If there are values that are

**Normalized residual**table, under the**first column**. (It has the word "Valid" in it).2. Scroll through the entirety of the table.

3. If there are values that are

**above an absolute value of 2.0**, then there are outliers in the dataset.Click on the

**Download Database**and**Download Data Dictionary**buttons for a configured database and data dictionary for logistic regression.**Click on the****Validation of Statistical Findings**button to learn more about bootstrap, split-group, and jack-knife validation methods.## Hire A Statistician - Statistical Consulting for Students

**DO YOU NEED TO HIRE A STATISTICIAN?**

Eric Heidel, Ph.D.** **will provide the following statistical consulting services for undergraduate and graduate students at $75/hour. Secure checkout is available with Stripe, Venmo, Zelle, or PayPal.

- Statistical Analysis
- Research Design
- Sample Size Calculations
- Diagnostic Testing and Epidemiological Calculations
- Survey Design and Psychometrics