Use of the Pearson correlation co-efficient is common in genomics and bioinformatics, which is OK as it goes (I have used it extensively myself), but it has some major drawbacks – the major one being that Pearson can produce large coefficients in the presence of very large measurements.

This is best shown via example in R:

# let's correlate some random data
g1 <- rnorm(50)
g2 <- rnorm(50)

cor(g1, g2)
# [1] -0.1486646

So we get a small, -ve correlation from correlating two sets of 50 random values. If we ran this 1000 times we would get a distribution around zero, as expected.

Let's add in a single, large value:

# let's correlate some random data with the addition of a single, large value
g1 <- c(g1, 10)
g2 <- c(g2, 11)
cor(g1, g2)
# [1] 0.6040776

Holy smokes, all of a sudden my random datasets are positively correlated with r>=0.6!

It's also significant.

> cor.test(g1,g2, method="pearson")

        Pearsons product-moment correlation

data:  g1 and g2
t = 5.3061, df = 49, p-value = 2.687e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.3941015 0.7541199
sample estimates:

So if you have used Pearson in large datasets, you will almost certainly have some of these spurious correlations in your data.

How can you solve this? By using Spearman, of course:

> cor(g1, g2, method="spearman")
[1] -0.0961086
> cor.test(g1, g2, method="spearman")

        Spearmans rank correlation rho

data:  g1 and g2
S = 24224, p-value = 0.5012
alternative hypothesis: true rho is not equal to 0
sample estimates: