'How to run cor.test() on two different dataframes
I would like to run cor.test() on two seperate dataframes but I am unsure how to proceed.
I have two example dataframes with identical columns (patients) but differing rows (bacteria and genes, respectively):
1C | 1L | 2C | 2L | |
---|---|---|---|---|
Staphylococcus | 10 | 400 | 20 | 600 |
Enterococcus | 15 | 607 | 39 | 800 |
1C | 1L | 2C | 2L | |
---|---|---|---|---|
IL4 | 60 | 300 | 90 | 450 |
IL8 | 30 | 600 | 54 | 750 |
TNFA | 89 | 450 | 96 | 600 |
I want to run a spearman correlation test between both dataframes to identify if bacterial counts (abundance) are associated with increased expression of genes. So basically I want to test all bacteria against all genes.
I have tried running:
cor.test(df1, df2, method = "spearman", alternative = c("two.sided"))
But I get this error:
Error in cor.test.default(df1, df2, method = "spearman", :
'x' and 'y' must have the same length
Solution 1:[1]
I think the issue you are having is trying to run a correlation on three variables when the function takes x and y vectors of the same length.
In order to compare all genes to all bacteria counts across subjects you have to get them into a tabular format the function can work with. You can use pivot_longer() from tidyr for that and then merge to join on subject.
Bacteria <- data.frame(name=c("Staph", "Enter"), C1=c(10,15), L1=c(400,607), C2=c(20,39), L2=c(600, 800))
Genes <- data.frame(name=c("IL4", "IL8", "TNFA"), C1=c(60,30,89), L1=c(300,600,450), C2=c(90,54,96), L2=c(450,750,600))
Bacteria <- pivot_longer(Bacteria, -1, names_to = "Subject", values_to="Counts")
Genes <- pivot_longer(Genes, -1, names_to = "Subject", values_to="Counts")
FullSet <- merge(Bacteria, Genes, by="Subject", suffixes = c(".Bac", ".Gene"))
cor.test(FullSet$Counts.Bac, FullSet$Counts.Gene, method="spearman", alternative=c("two.sided"))
Edit to create a nice looking corrplot with p-value matrix
library(tidyverse)
library(tidyr)
MakeStats <- function(x) {
result <- cor.test(x$Counts.Bac, x$Counts.Gene, method="spearman", alternative=c("two.sided"))
return(data.frame(Bacteria=x$name.Bac[1], Gene=x$name.Gene[1], Estimate=result$estimate[1], PValue=result$p.value, row.names=NULL))
}
ListOfTests <- split(FullSet, list(FullSet$name.Bac, FullSet$name.Gene))
Results <- bind_rows(lapply(ListOfTests, MakeStats))
PValues <- Results[,-3]
Estimates <- Results[,-4]
Estimates <- pivot_wider(Estimates, id_cols="Gene", names_from="Bacteria", values_from="Estimate")
PValues <- pivot_wider(PValues, id_cols="Gene", names_from="Bacteria", values_from="PValue")
EstMatrix <- as.matrix(data.frame(Estimates[-1], row.names = Estimates$Gene))
PMatrix <- as.matrix(data.frame(PValues[-1], row.names = PValues$Gene))
corrplot(EstMatrix, method="square", p.mat = PMatrix, pch=8)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |