Opiniomics

bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

I can’t recreate a graph from Ioannidis et al – can you?

Very quick one this!  Really interesting paper from Ioannidis et al about citation indices.

I wanted to recreate figure 1, which is:

journal.pbio.1002501.g001

Closest I could get (code here) is this:

plos_weird

Biggest difference is in NS, where they find all negative correlations, but most of mine are positive.

Source data are Table S1 Data.

Am I doing something wrong?  Or is the paper wrong?

 

UPDATE 9th July 2016

Using Spearman gets us closer but it’s still not quite correct (updated code too)

results_spearman

11 Comments

  1. Probably non normal distribution. Rank correlation maybe more appropriate and gives results as reported in the paper.
    If df = the dataframe from the paper
    library(dplyr)
    df1%select(Field:NC,NS)%>%group_by(Field)%>%summarise(NSpearson=cor(NS,NC),NSspearman=cor(NS,NC,method=”spearman”))%>%gather(Method,value,NSpearson:NSspearman)
    ggplot(df1,aes(x=Field,y=value,fill=Field))+facet_grid(Method~.)+geom_bar(stat=”identity”)
    Though not familiar with \N field .

    • The code didn’t come out right:
      library(dplyr)
      library(ggplot2)
      df1%select(Field:NC,NS)%>%group_by(Field)%>%summarise(NSpearson=cor(NS,NC),NSspearman=cor(NS,NC,method=”spearman”))

      • Oh autoformatted again! Its the first bit. I will try one last time:

        the df1 above (on the first post )should read:

        df1=df %>% select(Field:NC,NS)%>%group_by(Field)%>%summarise(NSpearson=cor(NS,NC),NSspearman=cor(NS,NC,method=”spearman”))%>%gather(Method,value,NSpearson:NSspearman)
        ggplot(df1,aes(x=Field,y=value,fill=Field))+facet_grid(Method~.)+geom_bar(stat=”identity”)
        Hopefully this will be more legible.

        Also forgot to mention : library(tidyr).

  2. Weird. Maybe for one last attempt, try method=”kendall” in cor() for Kendall’s Tau. I recall that method being pretty slow compared to pearson and spearman — so hopefully its not a big dataset.

  3. I used log-transformed values for np, h, hm, s, sf, sfl for the correlations. Try that.

Leave a Reply

Your email address will not be published.

*

*

code

© 2017 Opiniomics

Theme by Anders NorenUp ↑