Residual analysis
Assess regression model fit using residuals
Residual analysis is important with regression because it provides you with a measure of model fit. Model fit denotes the amount of error associated with predicting for an outcome. All regression models will have some form of error when estimating outcomes. Residuals are essentially the difference (or error) between the observed value and the predicted value yielded from the model.
When assessing overall model fit (or error) of both multiple regression and logistic regression models, plot the raw residuals on the y-axis against the estimated outcomes on the x-axis. The value should be close to zero,"0." This means that the predicted values are relatively similar to the observed values.
Assessing overall model fit with proportional odds regression and multinomial logistic regression is a tedious and time-consuming process. Essentially, researchers choose a reference category within the categorical outcome or ordinal outcome and create "a-1" (where "a" is the number of independent categories or ordinal ranks in the outcome) logistic regression models and repeat residual analyses for each. Plot the raw residuals against the estimated outcomes for all models. If all models have a value close to "0," then model fit can be assumed.
With Cox regression, Cox-Snell residuals should be calculated. These residuals are then used as the time signature variable in a Kaplan-Meier curve predicting for the outcome. This curve is then compared to a survival function where the outcome has been modeled using a unit exponential distribution.* If the curves are similar, then model fit can be assumed.
Finally, for Poisson regression, plot the standardized residuals on the y-axis against the expected rate of outcome on the x-axis. Evidence of model fit is assumed when 95% of the residuals are between 2 and -2.*
When assessing overall model fit (or error) of both multiple regression and logistic regression models, plot the raw residuals on the y-axis against the estimated outcomes on the x-axis. The value should be close to zero,"0." This means that the predicted values are relatively similar to the observed values.
Assessing overall model fit with proportional odds regression and multinomial logistic regression is a tedious and time-consuming process. Essentially, researchers choose a reference category within the categorical outcome or ordinal outcome and create "a-1" (where "a" is the number of independent categories or ordinal ranks in the outcome) logistic regression models and repeat residual analyses for each. Plot the raw residuals against the estimated outcomes for all models. If all models have a value close to "0," then model fit can be assumed.
With Cox regression, Cox-Snell residuals should be calculated. These residuals are then used as the time signature variable in a Kaplan-Meier curve predicting for the outcome. This curve is then compared to a survival function where the outcome has been modeled using a unit exponential distribution.* If the curves are similar, then model fit can be assumed.
Finally, for Poisson regression, plot the standardized residuals on the y-axis against the expected rate of outcome on the x-axis. Evidence of model fit is assumed when 95% of the residuals are between 2 and -2.*
Statistician For Hire
DO YOU NEED TO HIRE A STATISTICIAN?
Eric Heidel, Ph.D. will provide statistical consulting for your research study at $100/hour. Secure checkout is available with PayPal, Stripe, Venmo, and Zelle.
- Statistical Analysis
- Sample Size Calculations
- Diagnostic Testing and Epidemiological Calculations
- Psychometrics
*Katz, M.H. Multivariable analysis: A practical guide for clinicians and public health researchers. 3rd edn. Cambridge: Cambridge University Press, 2011.