Binary logistic regression is used to study the relationship between one or several predictor variables and a single binary outcome variable. Binary variables are categorical variables which can only take on two values , typically represented as 1 and 0. For example, male/female, dead/alive, disease/no disease. This differs from linear regression (outcome variable is continuous) and Cox regression (outcome variable is time to event). Linear regression models are unsuitable for the prediction of binary variables as normality is no longer assumed. Additionally, in a linear regression model values are not limited to only 0 and 1. To make the regression model fit the needs of a binary outcome variable a logistic transformation (logit) is applied to the dependent variable.
By applying a multivariable logistic regression analysis, one can study the effects of multiple predictor variables on an outcome, with the effect of a single predictor variable adjusted for other variables. For example, risk for being diagnosed with a particular cancer (binary outcome variable) has been shown to be increased in individuals who smoke. However, there are concerns that confounding may occur when calculating this risk. Researchers may want to adjust for confounding so decide to use logistic regression using additional predictor variables such as age, BMI, gender and alcohol intake. Additionally, it allows one to predict the probability for a particular outcome to be made based on individual data for each of the predictor variables. Adjusting for these results they may find the risk due to smoking for this cancer is not as high as previously believed. Using this new model, they can then calculate the probability a patient not included in the original model has this cancer based on their own individual characteristics.
Requirements for binary logistic regression:
- Single binary outcome variable
- Predictor variables must be independent of one another so not matched or repeated measurements. Paired variables could be combined into a single variable which can then be included in the model.
- Predictor variables must not be colinear (i.e., highly correlated with one another). These could be variables which describe the same measurement or a combination of another predictor variable in the model.
- There is a linear relationship between the predictor variables and the logit of the outcome variable. Extreme values of your predictor variables should concentrate at one or the other possible outcomes. For example, patients taking extremely high doses of a drug overdose (classified as 1) while those who take no dose or very little dose do not (classified as 0).
Interpreting the output for binary logistic regression
Coefficients - Negative/positive sign indicates the direction of the effect, magnitude determines the size of the effect on the outcome variable.
Wald test – If not statistically significant for a predictor variable then removal of that variable is unlikely to harm the fitness of the model
Model Likelihood Ratio Test - This compares how well different models fit when different predictor variables are removed. If the difference is statistically significant then the model with more variables is a better fit.
Rank Discrimination Indexes
- C-statistic - This is equal to the area under a ROC curve and is a measure of the goodness of fit of a model.
- Values < 0.5 indicates a very poor model
- Values = 0.5 indicates that the model’s predictions are similar to random chance
- Values > 0.7 indicate a good model
- Values > 0.8 indicate a very good model
- Values of 1 indicate a perfect fit.
Prepare your data in a format like the example spreadsheet provided above. All data should be expressed as numbers with the binary outcome variable being expressed as 0 or 1. For example, 0 = no disease, 1 = disease.
- Click on Analyze above and then upload your .csv or .xlsx data file (indicate which type of file you are uploading).
- If uploading a .csv file also indicate the separator used
- Under the tab Table you will see all your data as tabulated in the original file.
- Under the tab Selected columns, you will see the categorical variable/s you have previously selected.
- Under ‘Select Variables’, select all the variables you would like to include in the test, both predictor and outcome.
- Under ‘Define Dependent”, select the binary outcome variable you wish to include as the outcome.
- If variables are continuous, divide them into discrete categories to be used in your model. rBiostatistics will attempt to calculate odds ratios for each unique value for the continuous variable.
- Under ‘Define Ranking Variables’, select your predictor variables only.
- Under the tab ‘Results-1’, tables displaying the frequency of each value of a predictor outcome will be produced. For continuous variables this will display the frequency of every unique value.
- Under the tab ‘Results-2’, the results of your regression model will be calculated including the following:
- The coefficient of the intercept and each predictor variables
- Standard errors for the coefficients and their Z scores
- Odds ratio of the intercept and each predictor variable and their respective confidence intervals
- Measures of how well your model fits
Written by Arif Jalal