I’m not perfect. Not in any way. I am sure if anyone was so inclined, they could work their way through my research with clinical forensic attention-to-detail and uncover all sorts of mistakes. The same will be true for any other scientist, I expect. We’re human and we make mistakes.
However, there is one mistake in bioinformatics that is so common, and which has been around for so long, that it’s really annoying when it keeps happening:
It turns out the Carp genome is full of Illumina adapters.
One of the first things we teach people in our NGS courses is how to remove adapters. It’s not hard – we use CutAdapt, but many other tools exist. It’s simple, but really important – with De Bruijn graphs you will get paths through the graphs converging on kmers from adapters; and with OLC assemblers you will get spurious overlaps. With gap-fillers, it’s possible to fill the gaps with sequences ending in adapters, and this may be what happened in the Carp genome.
Why then are we finding such elementary mistakes in such important papers? Why aren’t reviewers picking up on this? It’s frustrating.
This is a separate, but related issue, to genomic contamination – the Wheat genome has PhiX in it; tons of bacterial genomes do too; and lots of bacterial genes were problematically included in the Tardigrade genome and declared as horizontal gene transfer.
Genomic contamination can be hard to find, but sequence adapters are not. Who isn’t adapter trimming in 2016?!