Opiniomics

bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

I can’t recreate a graph from Ioannidis et al – can you?

Very quick one this!  Really interesting paper from Ioannidis et al about citation indices.

I wanted to recreate figure 1, which is:

journal.pbio.1002501.g001

Closest I could get (code here) is this:

plos_weird

Biggest difference is in NS, where they find all negative correlations, but most of mine are positive.

Source data are Table S1 Data.

Am I doing something wrong?  Or is the paper wrong?

 

UPDATE 9th July 2016

Using Spearman gets us closer but it’s still not quite correct (updated code too)

results_spearman

11 Comments

  1. Probably non normal distribution. Rank correlation maybe more appropriate and gives results as reported in the paper.
    If df = the dataframe from the paper
    library(dplyr)
    df1%select(Field:NC,NS)%>%group_by(Field)%>%summarise(NSpearson=cor(NS,NC),NSspearman=cor(NS,NC,method=”spearman”))%>%gather(Method,value,NSpearson:NSspearman)
    ggplot(df1,aes(x=Field,y=value,fill=Field))+facet_grid(Method~.)+geom_bar(stat=”identity”)
    Though not familiar with \N field .

  2. The code didn’t come out right:
    library(dplyr)
    library(ggplot2)
    df1%select(Field:NC,NS)%>%group_by(Field)%>%summarise(NSpearson=cor(NS,NC),NSspearman=cor(NS,NC,method=”spearman”))

  3. Oh autoformatted again! Its the first bit. I will try one last time:

    the df1 above (on the first post )should read:

    df1=df %>% select(Field:NC,NS)%>%group_by(Field)%>%summarise(NSpearson=cor(NS,NC),NSspearman=cor(NS,NC,method=”spearman”))%>%gather(Method,value,NSpearson:NSspearman)
    ggplot(df1,aes(x=Field,y=value,fill=Field))+facet_grid(Method~.)+geom_bar(stat=”identity”)
    Hopefully this will be more legible.

    Also forgot to mention : library(tidyr).

  4. biomickwatson

    9th July 2016 at 3:28 pm

    Maybe use a gist 🙂

  5. Ah yes. Thanks! – newbie to commenting that too with code snippet. 🙂

  6. biomickwatson

    9th July 2016 at 6:36 pm

    I added in a Spearman analysis (it isn’t clear from the paper which they used) and it gets us close but not quite….

  7. Weird. Maybe for one last attempt, try method=”kendall” in cor() for Kendall’s Tau. I recall that method being pretty slow compared to pearson and spearman — so hopefully its not a big dataset.

  8. I used log-transformed values for np, h, hm, s, sf, sfl for the correlations. Try that.

  9. biomickwatson

    13th July 2016 at 4:58 pm

    log(n+1)?

    Which base?

  10. In(n+1)/ln(max(n)+1) for each of the 6 original indicators puts them each within a [0,1] range. The sum of these gives the composite value.

  11. biomickwatson

    25th July 2016 at 12:09 pm

    Hi Kevin

    I tried again with your formula and still couldn’t reproduce the graph:

    https://github.com/mw55309/PLOS_citation_graph/blob/master/boyack.R

    And

    https://raw.githubusercontent.com/mw55309/PLOS_citation_graph/master/boyack_results_pearson.png

    Cheers
    Mick

Leave a Reply

© 2017 Opiniomics

Theme by Anders NorenUp ↑