bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"


Here is a guest post that Dan Graur didn’t write:

Dear ENCODE researchers,

First of all, allow me to congratulate you on delivering such a large and complex project.  The results of your research will benefit many 100s of scientists throughout the world, and the resources generated are invaluable.

However, I have to disagree with the headline figure that 80% of the genome is functional.  This contradicts much of the research in this area, and I feel you have used an incorrect definition of the term “functional”.


  1. Transcription does not equal function
  2. Histone modification does not equal function
  3. Open chromatin does not equal function
  4. Transcription-factor binding does not equal function
  5. DNA-methylation does not equal function

It is unfortunate that the headline figure of 80% functionality has taken away much focus from the success of the project.  It is also unfortunate how many journalists treated this particular piece of information.

However, I do appreciate that science, like many other disciplines, requires, and benefits from, people with opposing views.  Your view of functionality certainly opposes mine; however, at the very least, what you have achieved is to stimulate debate on the topic, which is of benefit to everyone


Not Dan Graur

DISCLAIMER: Dan Graur has nothing to do with the above post.  I have written this in response to Dan’s paper here, the tone of which I feel is of no benefit to anyone, and which sets a bad example to young scientists.


  1. I think that the tone is not that important. What sets a good example for young scientists is other scientists not being afraid to disagree, and to openly challenge other researchers’ results. Even more when the challenged project is an outstanding one. There is far too much consensus in science, and ironically, too little reproducibility.

  2. I don’t mind the tone of the critique. There is nothing wrong with scientists having emotions and letting them show. It’s humanizing and reveals what we really care about.

    • Emotions are great, showing them great, but I just cannot comprehend why the authors are so angry, and I do not think anger is a positive emotion, and it certainly has no place in a peer reviewed article.

      • Even though not a positive emotion, anger can be a quite important emotional motivator in science.

        Max F. Perutz, “I Wish I’d Made You Angry Earlier: Essays on Science, Scientists and Humanity”. Oxford University Press, 2002

  3. I agree with Janis. I think that by promoting loudly over the top interpretation of their data, the ENCODE team (1) brought such replies upon themselves, and (2) diminished the otherwise very valuable contribution of their data. I.e., they leave it to users to find out how full of false positives and problems of weird cell lines the data are, unstead of showing this clearly themselves.

    I also think that the tone serves a purpose in this case: since Nature/Science will almost never accept to publish a strong criticism of the science which they hyped, the criticism will be published in a less visible venue. So either it is then missed by most scientists, which is detrimental, or it must be made visible. Which is also paradoxical, since we come back to the issues of hype and marketing…

    I wouldn’t have written in the same tone as Dan, but I think that his tone is a very minor sin, relative to the deadly sin of over-interpretation and over-selling of results. Now there is a peer-reviewed reference to cite for the fact that no ENCODE did not show that 80% of the genome is functional.

    • But if Dan is correct, then those peer reviewed articles already exist – he cites them.

      So what is the purpose of Dan’s article?

      • What’s the purpose of the ENCODE summary article, or any review? Ignoring the tone, do you think the paper is not necessary or useful? I found marshalling the arguments about not only “80% functional” but also the dangers of using cell lines to study function very useful.

      • I understand your criticism of Dan’s article in terms of tone, but not this argument.

        Dan cites articles in support of his criticisms which, in discussing and being specifically directed towards the main ENCODE paper, are not to be found in the those cited articles – most of which were written and published before ENCODE. The purpose of his article appears to be precisely to, by making detailed and well-supported criticisms of its more extravagant claims, deflate the bubble of hype he perceives around the ENCODE project, and nip in the bud the uncritical uptake of those claims by others who don’t have a similarly-informed background.

        Also if we deemed as purposeless, and by implication unnecessary, all articles that only synthesised (or even less!) the content of primary research, that would be the whole secondary review literature out the window, which would be a shame 😉

      • I have no problem with the scientific arguments made in the paper, but the tone is unacceptable.

        Consider the alternative: that Dan, and his co-authors, could have made the self-same arguments in a reasoned and well-argued report, with a complete absence of snark.

        Wouldn’t that have been better?

  4. I agree in a perfect world this kind of sarcasm would not make it into a peer-reviewed paper, and I’m a little surprised so much survived in the final version. I have previously ranted about How Not To Be A Bioinformatician (http://www.scfbm.org/content/7/1/3), which I think is embarrassingly bad.

    But I’m inclined to let it pass in this case, because the paper is very good otherwise, and necessary. Would you agree that the tone of statements like ‘These results are going “to change the way a lot of [genomics] concepts are written about and presented in textbooks”‘ and ‘The project has played an important role in changing our concept of the gene’ (http://www.sciencemag.org/content/337/6099/1159.full) is also inappropriate? The ENCODE leaders have already spoiled the impact of the work of the consortium by making a big deal of the 80% functional claim; I don’t see Graur is doing any additional damage by fighting fire with fire.

    Perhaps another outlet would have been better – a commentary piece, or a blog. But this is the headline claim from one of the biggest big science projects, given the highest possible profile, and it is extremely dubious. So much is at stake here – can we expect the same from the Brain Activity Map? At the expense of how many other projects? How long will it take to unravel the damage done by this 80% claim? There is a serious political issue here, and responding emotionally is understandable. Graur expressing the strength of his feeling through sarcasm is one way of making a political point, in response to a serious show of political force. Ideally we’d keep politics out of the peer-reviewed literature, but the genie is long out of the bottle in this case.

    xp: what Marc said!

    • I don’t think the argument “they did a bad thing, so I’ll do a bad thing in reply” really holds any water, to be honest.

      If one thinks ENCODE was handled badly, then lead by example and show us how it should be done. Writing a snarky, angry little piece is not how it should be done.

      Dan’s paper won’t be remembered for the arguments within it, but for the tone.

      • You know what? I agree with you. It would have been better without the anger and the snark. But what’s your line here? No snark in peer-review, or no snark at all? What’s the difference between Graur et al and https://biomickwatson.wordpress.com/2013/01/14/call-the-bioinformatics-police, which actually I only remember for the tone? Is snark OK if you’re constructive? Because I think the Graur paper is constructive – it’s a useful summary of how to define biological function. If snark is unacceptable in peer-review, why not cut it out on blogs and Twitter too?

      • You ask a good question 🙂 I’d have liked to have seen well reasoned arguments in the peer reviewed paper, without snark.

        A blog would have been more appropriate for that tone.

        And I’m upset you only remember my post for the tone…

      • I disagree that Dan’s article will be remembered only for its tone, maybe it would not have got the attention it has without it. Although I do agree that the wordings are sarcastic in some places, the arguments are in there which targets the interpretation and analysis of the data than the science. The hype that the ENCODE claims have generated needs to be debated and communicated with the general public. In my opinion, the arguments in the article can be well understood by non-specialists as well.

    • There have been blog reactions by Larry Moran, Ryan Gregory, PZ Myers and many others, Their posts didn’t help because ENCODE was ruling the headlines. The tone of Gur et al. might appear in-appropriate under normal conditions. But without it the article may have passed un-noticed. Remember that ENCODE was not a single normal article but a co-ordinated action that made very strong claims and involved about 30 articles in several high-ranking journals and a media campaign full of hype.

  5. No reply link on the specific post, so:

    “Wouldn’t that have been better?”

    That’s not so clear to me. It would have been less rude, less disruptive, and annoyed fewer people so, if that’s “better”, then yes. If the rudeness draws more people’s attention to problems with the ENCODE claims so that they’re more self-critical about their work, and others pay more attention to these issues when reviewing papers and grants, is that “better”? If it angers the ENCODE authors into a robust defence that vindicates their claims more strongly than the publications themselves, is that “better”? If it encourages individuals not to take the claims of very large (and intimidating) projects at face value, avoid dogma, and to always be critical, is that “better”? I don’t know. I think it comes down to a value judgement in this case.

    • I agree that opening up a debate on the issue is a good thing – arguably, by making the 80% claim, that was what ENCODE were trying to do.

      I agree that some of what you propose would be good – but Dan’s paper hasn’t had that effect yet, so let’s not pat him on the back just yet, and we’ll never know if the paper would have had the same effect if written without snark.

  6. If it was ENCODE’s aim to open up a debate with a provocative statement, then they’ve certainly got that debate with Dan’s paper, and can’t really complain 😉

  7. I disagree that Graur “sets a bad example” or anything along those lines. Maybe Huxley set a bad example in 1860 with his putdowns? In the end, arguments are important, not tone; personally, I’m fine with some emotion in the literature.

  8. I think maybe the tone/snark could have been dialed back a bit. I certainly wouldn’t have written it that way, but I think there is a place for snark in peer-reviewed science publications and scientific debate. The witty and (hopefully) subtle put-down is a rhetorical tool, but we shouldn’t completely ignore rhetorical tools in science.

  9. On the one hand, Graur’s reply may not have garnered so much interest without the tone he set. On the other hand, all that is being discussed here is the tone rather than the substance of his criticism of ENCODE. So I am not sure his paper, while definitely achieving a level of interest, has achieved the kind of attention it solicited. Graur’s tone, on the whole, is distracting from his important points, and may ultimately be self-defeating to his message.

  10. Please remove my “signature” from your article. I am the only person allowed to use my name. Dan Graur

  11. opiniomics’s imaginary version of Dan Graur says: “It is unfortunate that the headline figure of 80% functionality has taken away much focus from the success of the project. It is also unfortunate how many journalists treated this particular piece of information.”

    But, “the inaccuracies are the media’s fault” suggestion, while often true, is almost entirely untrue in this case. The ENCODE leaders, and the journal that published it, chose the 80% number, chose to hype it, and chose to declare that they’d basically disproved junk DNA. The media just repeated these claims from the scientists. So that’s where most of the criticism has to be directed.

    • This is not strictly true. If you dig around, you will find that ultimately what ENCODE were talking about was “biochemical function”. Unfortunately, this often was shortened to “function”.

      • It’s bad science communications to put a sensational, exaggerated claim near the top of a press release and leave the media to “dig around” to find the less dramatic reality. That guarantees inaccurate coverage. It was also misleading to use the loaded word “function”; it would have been far more accurate to use the more neutral word “activity”.

  12. Hey Mick,

    There are two parts of the ENCODE story – scientific part (project design, data generation) and communication part (publications in CNS, press release and media cover).

    I think we can all agree that the communication part was problematic on many levels- and most of early responses indeed expressed concerns about how the results were presented by the media as well as the ENCODE team. From Ewan’s follow-up post on his personal blog, it was clear that he sincerely shared some of these concerns. Ideally, at this point, while keeping in mind the lessons taken from this project’s PR mess, we would celebrate the ENCODE team for their effort to generate such an incredible data resource.

    However, according to what Dan points out in his response, the scientific part was problematic as well – which raises concerns about the scientific value of some parts of the data generated (e.g., choice of the cell lines).

    Thus, I think what primarily led to Dan’s emotional response (and deservedly so based on his arguments) was the fact that the ENCODE team did a bad scientific job – other than the concerns about “big science” model or PR mess. The latter two have been covered exhautively on social media. However, someone had to point out the scientific problems with the ENCODE project. I think it is imperative for the sake of good science to discuss to what extent the ENCODE data are reliable and whether the “big science” model played a role in encouraging the ENCODE team do a bad job.

    • I disagree that the science in ENCODE was carried out in a bad way – over and above the amount of bad science that takes place in *any* project. It is just more visible with large projects.

      I wholeheartedly believe that there are some people who would not have been happy with ENCODE no matter what the result, and I suspect their dissatisfaction is a result of either i) jealousy they didn’t get that level of funding themselves, or ii) a moral objection to such large projects in the first place.

  13. Two separate issues. Encode is a technical exercise. Well conducted it seems by technical people. The interpretation of what has been found is an entirely different matter. It requires different skills. Physics has been through all this. Biology will learn. We will in time absorb ENCODE’s findings and we will discover its true significance. The rest is noise.

    • That’s mighty generous in some ways and a backhanded swipe at biologists in another.
      The vast majority of the contemporaries of the ENCODE project recognize that the comments about function were completely bogus. And it isn’t just sloppy. The perpetuate a misconception that nobody should have if they get a degree in biology/biochemistry/molecular biology/genetics or any of multple other life science degrees. Biology is sloppy. Enzymes are not magic, they bind to all sorts of “near misses”. Binding is not function. We expect RNA polymerase to make copies of useless sequences in ways that are pure waste. And with the “cost” of this to the cell being much less than 0.1% of its energy consumption, evolution cannot filter out that waste. It’s a basic synthesis of basic lessons. The ENCODE authors failed at the basics.

  14. Hi Mick,
    Beside the snarky tone, I personnally have another issue with Dan Graur’s paper, which I am somewhat surprised I haven’t seen mentionned much anywhere.
    I completely agree that ENCODE presented as “functional” anything that could be traced down to some biochemical activity, which may or may not in fact be functional (as in, actually doing something for the cell). The data likely contains large amounts of false positives, and has probably made far-reaching assumptions and claims that need to be toned down. And this needs to be said out loud.
    However, I do not believe that this makes it ok to misrepresent what is actually said in the ENCODE papers, and nor does it make it alright for the authors of the Graur et al. paper to use fallacies of their own to back up their claims. There are countless reasonings in this paper that look intuitive at first sight but prove to be wrong. For example – the authors (rightly) claim that ENCODE use a fallacy by saying “functional regions exhibit feature A -> regions with feature A are functional”. But they fall into the exact same fallacy by claiming that since conserved sequences are functional, functional sequences should exhibit sequence conservation. It is entirely possible that some functions, like say DNA compaction into the nucleus, may be fulfilled following physical principles based on whatever the genome has to offer at a given time, and do not rely on conservation at all. It is also possible that conservation exists but not at the sequence level (higher-order organisation of the chromatin, for example). But the authors seem to consider that the only possible thing that may be considered as “functional” in the genome is whatever directly and specifically results in the production of functional and needed transcripts. They are, in fact, falling into the opposite extreme from ENCODE. Besides, what does sequence conservation even mean? Haemoglobin is not conserved if your evolutionary scale of choice is human to mussel : have I just proved that haemoglobin is therefore not functional? That is of course a fallacy.
    There are lots and lots of other points that could be raised up: the claim that since 0.28% of STAT3 binding sites could be estimated to be within 10kb of a TSS, conserved across 5 species, and functional, it is somehow indication that 0.28% of the genome may be functional TFBS; or the claim that CpG-free regions are somehow more interesting than CpG islands (while CpGs are known to decay faster than they appear, and the entire paper advocates looking for what is conserved)…
    I guess my point is: put your money where your mouth is. Calling out bad science? Ok, but do it properly. This just does not help. And I am very surprised to see blogs and magazines all around hailing this paper as a “spot-on, meticulous critique” and so on.

    • Hi Cam,

      I completely agree with you! The Graur paper is riddled with false assumptions, to the point that the authors apparently don’t even understand what ChIP sequencing is and does. Likewise, their understanding of CpG sites and islands is, well, not exactly in line with mainstream science.

      Most of all, it comes down to the definition of “functional”. While the ENCODE paper is very specific about their definition, Graur et al. simply do not accept that and insist on using their own definition, namely to be under some sort of evolutionary pressure. What if one were to come up with a different definition of, say, “sequence conservation”, no matter how arbitrary – can one then harshly trash tons of papers?

      Graur et al.’s definition of “functional” is “having a function that is essential for the survival of the individual organism”. In any organism, many features are non-essential for survival – unless it needs them, and those can turn into features that are essential for species survival!

      I have a toolbox in my basement, with screwdrivers, a hammer, nails, a drill etc. Since I haven’t used it, though, by Graur’s definition, my toolbox is “non-functional”.

    • “But they fall into the exact same fallacy by claiming that since conserved sequences are functional, functional sequences should exhibit sequence conservation.”

      The reason why you haven’t seen anyone else point out this error is that Graur et al. do not commit this error. They refer to sequence conservation as just one of the methods that can be used to identify function. In fact, they explicitly note that using sequence conservation is likely to yield estimates that are too conservative (page 9).

      “For example – the authors (rightly) claim that ENCODE use a fallacy by saying “functional regions exhibit feature A -> regions with feature A are functional”. But they fall into the exact same fallacy by claiming that since conserved sequences are functional, functional sequences should exhibit sequence conservation. It is entirely possible that some functions, like say DNA compaction into the nucleus, may be fulfilled following physical principles based on whatever the genome has to offer at a given time, and do not rely on conservation at all.”

      Graur et al. address this. They refer to DNA that serves a function in a non-sequence dependent way as “indifferent DNA” (page 32).

      • Taylor, yes, they do commit this error.
        It is the underlying assumption to statements such as “ENCODE absurdly claims that 80-10% = 70% of the genome is indefinitely maintained without selection and no deleterious mutations can ever happen in these regions”, or “it is ridiculous to claim that 70% of the genome is affected by undetectable selection” (quoting from the top of my mind – you may want to check the exact wording). These statements only make sense if the basic premise is that any functional region must be a functional *sequence*, and hence must be conserved in some way. This is a fallacy – a functional region is not necessarily a functional sequence. As such, some functions may be maintained without sequence conservation (which does not mean that selection does not occur at another level).
        I agree with you that they nuance this point of view at the end of the paper with the notion of “indifferent DNA”. Interestingly, this part of the paper directly contradicts everything they have said in the first paragraphs. Once you accept that an unknown and potentially large part of the genome may be involved in non-sequence-specific binding or marking that has functional properties, the whole argument collapses.

        Also – you mention that Graur et al. refer to sequence conservation as “one of the methods that can be used to identify function”. Since the paragraph titles of the paper are vocally claiming that neither transcription factor binding, nor histone modifications, nor open chromatin, nor DNA methylation are adequate indications of function – I would be really interested in knowing what are those “other methods” apart from detecting selection that are blessed with their approval.

  15. “…sets a bad example to young scientists.” Interesting. Is sending $288 million into a sinkhole, one where the primary beneficiary is a publicly traded company (Illumina) combined with the poorly defined ideas of “translation science” setting a good example? Given the number of soft money jobs, and the willingness of the NIH to tolerate that behavior, is that setting a good example? Here’s the thing, the only people entering “science” now are those with something to prove, whether to themselves or to others. This albeit cynical (but IMO completely accurate) assessment of the ENCODE project does not set a bad example, it merely confirms what the naive and Polly-Anns refuse to admit, and these young scientists now that quite well. (Or at least the ones I meet) “Young Scientists” know that their future is grim, that funding is scarce, and chances are they will not get very far. However, they also believe, and most irrationally, that they are the one who will succeed, and if they jump on this poorly defined gravy train of translation (because stem cells derailed last decade) science they have a better chance. Now whoever encourages those decisions, that is setting a bad example.

  16. I’m generally a “can’t we all just get along” sort of fellow, but I didn’t think the critique of Graur et al., was at all bad from a tonal perspective. I’ve been in science since 1989 (when I started grad school) and I’ve seen scores (if not hundreds) of remarks in seminars, thesis defenses and scientific meetings that were far harsher and more personal than anything in that paper. Giving a colleague a bit of grief in a humorous way (the re-writing textbooks for marketing hype line) is a perfectly acceptable form of ribbing. Onkelbob also has a point about the worth of spending nearly $300 M on a set of studies and then being so slapdash with one’s analysis, neglecting (it appears) an evolutionary criteria for function. (I’ve got nothing against “big” science, either, when it’s done properly. Much of my work now benefits from big science.)

  17. The tone was right on the mark because the ENCODE conclusions about function were codswallop. It’s really basic. You’re working with DNA and you’re speculating about function. But you don’t consider genetics and evolution in your interpretation of function. You fail. You fail big. You fail in a dramatic way that exposes your complete lack of understanding of biology in general. And when challenged, you don’t get it bringing the Dunning-Kruger effect into play.

  18. I have no problem with the tone, in many ways he let them off lightly.
    The ENCODE authors have previously responded to some critics and in so doing provided examples of the Dunning-Krugar effect in action.
    They apparently have little to no understanding of the expectation that non-functional biochemical activity will be present, and apparently believe that most everything that happens in a cell happens for a reason. On occasion they seem to acknowledge the possibility while showing a complete lack of understanding. This implicit denial of non-functional molecular interactions is a remake of vitalism that demonstrates gross ignorance of biochemistry, enzymology, molecular natural history and genetics. And any definition of function that is not aligned with selective value is a failed teleology, full stop.

  19. Grauer wasn’t the only one to write a rebuke. There were a bunch of people that wrote something similar to what you have and they got no attention. Nothing compared to the big headlines that ENCODE were shouting.

    Grauer used the tone he did on purpose, in order to get the attention that this issue needed.

    When there’s that much money put into a project, there’s much more at stake than the wounded egos of some bad scientists, or the ‘civility of scientific discourse’. I think the productivity of scientific discourse is much more important.

Leave a Reply

Your email address will not be published.




© 2017 Opiniomics

Theme by Anders NorenUp ↑