# Correlation

**Description**

Correlation can be defined as the relationship between two variables. These relationships can be positive (i.e. when one variable increases in value, so does the other), negative ( when one decreases, the other variable increases) or has no correlation.

Two-variable correlation in rBiostatistics allows you to find the strength of the association between two variables , using two different statistical tests, which are described below.

**Worked example**

An example dataset can be downloaded below, comparing the continuous variables ‘Age’ (in years) against ‘Height’ (in cm) (data points were produced randomly in this example). Each data point is paired (i.e. Age and height is measured from the same person to produce one data point) and is independent (i.e. One person’s age and height is not associated with another person’s age and height).

**Instructions**

- Download the excel dataset ‘correlation2variables’
- Click
**↬ Analyse** - Select file type (.xlsx) (if using .csv file, select a separator)
- Select
**two variables only**to be compared

Results can be found under the ‘Spearman’ or ‘Pearson’ tabs. A plot with a line of best fit can be seen under the ‘plot reg. Line’ tab. Under the ‘Summary’ tab, useful statistical metrics such as standard deviation and mean can be found.

**Pearson’s product-moment correlation**

Found under the tab 'pearson’, this test produces the output ‘cor’ (commonly denoted in statistics as ‘r’, or pearson’s correlation coefficient) which will be a value between +1 and -1. The closer the ‘cor’ value is to +1, the stronger the *positive** *association between the variables. Likewise, the closer ‘cor’ is to -1, the stronger the *negative** *association. If the ‘cor’ value is close to 0, the association between the variables is described as weak.

There are multiple limitations/assumptions to this test:

- Both variables must be
**continuous**and**paired** - Data points should be
**independent**(i.e. not have an association with other data points) - The association between the variables should be
**linear**- a plot of the data points can be viewed under the ‘plot points’ tab in rBiostatistics.

**Spearman’s Rank correlation**

Found under the tab ‘Spearman’ , this test produced the output ‘rho’ (⍴ in greek/statistics). This is interpreted similarly to Pearson’s correlation coefficient i.e. +1 = strong positive association, -1= strong negative association etc. Like Pearson’s correlation, data points between the two variables must be **paired and independent**. Unlike Pearson’s correlation, **ordinal and non-linear variables can be used in Spearman’s rank correlation**.

Written by Kevin Michell.