Search This Blog

Visualizing correlation between variables


The
correlation between the variables in a data frame measures the strength and direction of the linear relationship between them. The correlation coefficient can range from -1 to 1, where -1 represents a perfect negative linear relationship, 0 represents no linear relationship, and 1 represents a perfect positive linear relationship.

There are several ways to calculate the correlation between the variables in a data frame in R, the most common methods are:

  1. Pearson correlation coefficient: This is the most widely used method for calculating correlation. It measures the linear association between two continuous variables and is calculated using the cor() function in R.
  2. Spearman rank correlation coefficient: This method is used when the data is ordinal or when there is a non-linear relationship between the variables. It is calculated using the cor() function in R with the method = "spearman" argument.
  3. Kendall rank correlation coefficient: This method is similar to the Spearman rank correlation coefficient but it is more robust to outliers. It is calculated using the cor() function in R with the method = "kendall" argument.

Once you have calculated the correlation coefficients, you can visualize them using functions like pairs(), corrplot(), ggcorrplot() and more.

It's important to note that correlation does not imply causality, it only shows the strength and direction of the linear relationship between two variables.

Here are some examples:

First lets Load the packages that we are going to need.


  1. cor(df): This function calculates the Pearson correlation coefficient between all pairs of variables in the data frame.
  2. cor(df, method = "spearman"): This function calculates the Spearman rank correlation coefficient between all pairs of variables in the data frame.
  3. PlotCorr(r_pearson): This function plots the correlation matrix using the Pearson correlation coefficients.
  4. PlotCorr(r_spearman): This function plots the correlation matrix using the Spearman rank correlation coefficients.



corr.test(data, method = "spearman", adjust = "none")$p performs a significance test of the correlation coefficients using the Spearman rank correlation method. It returns the p-values of the test and assigns it to the variable “seg_mat”.


ggcorrplot(corr = r_spearman, hc.order = TRUE, type = "lower", p.mat = anlam_mat) creates a visual representation of the correlation matrix using the Spearman rank correlation coefficients and the p-values of the significance test. It uses hierarchical clustering to order the variables and it shows only the lower triangle of the correlation matrix. The function is from the ggcorrplot package.



corrplot(r_spearman, method = "pie", type = "upper") creates a correlation plot with pie charts to represent the correlation coefficient values. It uses the Spearman rank correlation coefficients and it shows only the upper triangle of the correlation matrix. The function is from the corrplot package.



This code is using the Spearman rank correlation coefficient to evaluate the correlation between variables in the data frame "df" and it is showing the results in two different ways: one using ggcorrplot package and the other using corrplot package. Both functions are showing the p-values of the significance test as well.

0 Comments:

Post a Comment