Correlation

↬ Analyse

Description

Correlation can be defined as the relationship between two variables. These relationships can be positive (i.e. when one variable increases in value, so does the other), negative ( when one decreases, the other variable increases) or has no correlation. 

Two-variable correlation in rBiostatistics allows you to find the strength of the association between two variables , using two different statistical tests, which are described below.

Worked example

An example dataset can be downloaded below, comparing the continuous variables ‘Age’ (in years) against ‘Height’ (in cm) (data points were produced randomly in this example). Each data point is paired (i.e. Age and height is measured from the same person to produce one data point) and is independent (i.e. One person’s age and height is not associated with another person’s age and height).

Instructions

  1. Download the excel dataset ‘correlation2variables’
  2. Click ↬ Analyse
  3. Select file type (.xlsx) (if using .csv file, select a separator)
  4. Select two variables only to be compared

 

Results can be found under the ‘Spearman’ or ‘Pearson’ tabs. A plot with a line of best fit can be seen under the ‘plot reg. Line’ tab. Under the ‘Summary’ tab, useful statistical metrics such as standard deviation and mean can be found.

 

Pearson’s product-moment correlation

Found under the tab 'pearson’, this test produces the output ‘cor’ (commonly denoted in statistics as ‘r’, or pearson’s correlation coefficient) which will be a value between +1 and -1. The closer the ‘cor’ value is to +1, the stronger the positive association between the variables. Likewise, the closer ‘cor’ is to -1, the stronger the negative association. If the ‘cor’ value is close to 0, the association between the variables is described as weak.

There are multiple limitations/assumptions to this test:

  1. Both variables must be continuous and paired 
  2. Data points should be independent (i.e. not have an association with other data points)
  3. The association between the variables should be linear - a plot of the data points can be viewed under the ‘plot points’ tab in rBiostatistics.

 

Spearman’s Rank correlation

Found under the tab ‘Spearman’ , this test produced the output ‘rho’ (⍴ in greek/statistics). This is interpreted similarly to Pearson’s correlation coefficient i.e. +1 = strong positive association, -1= strong negative association etc. Like Pearson’s correlation, data points between the two variables must be paired and independent.  Unlike Pearson’s correlation, ordinal and non-linear variables can be used in Spearman’s rank correlation.

 

Written by Kevin Michell.

 

 

↬ Analyse