Opiniomics

bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

101 bioinformatics facts

I am trying to crowd-source 101 fun bioinformatics facts!  Please contribute in the comments and I will add them below:

  1. BLAST is so fast, the authors had to deliberately slow down the code so it doesn’t overheat the servers
  2. GCG, the old bioinformatics package, was named after the authors kept high-fiving each other, shouting “good code guys!”
  3. Bowtie is named so because “it is almost impossible to tie”, referring to code to avoid a “race condition” when using multiple processors
  4. TopHat is named do because it was the first spliced RNA-Seq aligner, and when it worked first time, the authors shouted “Top that!”
  5. Over 1 billion people have searched the NCBI protein database for their own name
  6. The EBI is an elaborate front-end to NCBI services
  7. The SRA (short read archive) is the best known of the archives, and not many people know or use the MRA (medium read archive), the KLRA (kinda long read archive) and the LRA (long read archive)
  8. Europe PubMed Central has only ever been accessed by people accidentally clicking on links.  100% of visitors immediately bounce to pubmed.com
  9. There are now more journals than papers
  10. The HGAP assembler is actually an elaborate front-end hiding three thousand slave labourers all running GAP4 (via @IanGoodhead)
  11. HPC actually means “Homunculus Powered Computing”, and all servers are actually just mechanical turks full of leprechauns (via @froggleston)
  12. The biosemantics.org group of LUMC is doing psipred protein folding on 1kWh household radiators (128 cores each) https://vimeo.com/122893200 (via Eric Feliksik)
  13. The ‘p’ in p-value actually stands for p-otentially interesting! (via )
  14. Velvet is so named because @dzerbino wore velvet gloves when coding it (via @pathogenomenick)
  15. The @PacBio machines are so large because inside’s an Illumina machine + a bioinformatician running assemblies (via )
  16. The Cloud is actually just a cloud. That’s it. A real cloud (via @froggleston)
  17. If you plug in a @nanopore MinION and hit left,right,up,up,A,B,down, it’ll transform into a lifesize statue of @Clive_G_Brown (via @froggleston)
  18. DDBJ have their data centre in a volcano, and are basically a front for Osato Chemicals (SPECTRE) (via @SCEdmunds)
  19. Hidden Markov Models were initially developed to find Waldo shawnhymel.com/portfolio/413/ (via )
  20. 99.5% of people who cite Altschul et al have never read the paper
  21. Bioinformatics Applications Notes have to be automatically generated by the software they describe
  22. BGI exclusively publish in Nature journals because their papers are first rejected by Gigascience
  23. BGI actually only have one HiSeq but made to look like hundreds by a set up of mirrors, like that bit in Enter the Dragon (via @froggleston)
  24. the consumption rate of coffee (+ beer 🍻) among Bioinformaticians from around the world is increasing every year. TRUE FACT! (via @NazeefaFatima)
  25. EBI” actually stands for “European bureau of investigation”. It’s a front of the EU secret service, collecting genomic info (via @klmr)
  26. There are only 3 facts in 101 (via @mcaccamo)
  27. If all you have is a hmmer everything looks like it can be resolved with Viterbi (via @mcaccamo)
  28. Hidden Markov Models are like the recipe for Kentucky Fried Chicken.  There are only three people in the world who understand small parts of how HMMs work, and only when they get together do they know the full picture
  29. The “e” in e-value stands for “excellent”, as in “that’s an excellent BLAST hit”
  30. The Burrows-Wheeler transform, used in BWA and Bowtie, saves memory by transforming the DNA sequence data into a parallel dimension, meaning it ceases to exist in 4D space/time in this Universe
  31. Base qualities are called “Phred” scores in honour of Fred Sanger who developed DNA sequencing. #101bioinfofunfacts (via @tostenseemann)
  32. In a recent public survey of the 100 most desirable jobs, bioinformatician was a close second to astronaut (via @dynomics)
  33. Heng Li writes all his code in x86 assembly language, and uses a C decompiler before releasing it. @lh3lh3 (via @torstenseemann)
  34. The EBI secretly funds the Perl Foundation to ensure its legacy internal software infrastructure won’t collapse (via @torstenseemann)
  35. Illumina reads are short as before the development of Basespace they were delivered via Twitter (via @RoyChaudhuri)
  36. Pet Bioinformaticians are paid with cuddling #101bioinfofunfacts (via )
  37. Python was conceived in the 1980’s by @gvanrossum & named after his favourite British comedy, Monty Python’s Flying Circus (via )
  38. the word “ELVIS” appears 35 times in human peps (GRch38). “ELVISLIVES” appears 0 times. The king has left the genome #slowday (via @rdemes)
  39. Tuxedo suit is so named that only ‘privileged’ know how to use it ! #bioinformaticsfun (via )
  40. It’s easy! You only have to download this database in which all the genes have only one ID and you can retrieve the IDs in the most important databases (via @jorjial)
  41. If you stand in front of a mirror and say ‘HiSeq’ 3 times, Illumina staff member will show up holding the HiSeq X Ten system (via @nazeetafatima)
  42. @BenLangmead wrote Bowtie while wearing a tuxedo but he did all the testing in zip-up onesie batman pajamas (via @coletrapnell)
  43. Spike-ins are like gold (via @nomad421)
  44. Do you need more hard disk space to store and do the analysis? Sure! Let’s buy 10 hard disk of 3 TB in the supermarket (via @jorjial)
  45. This could be the basis for 10.1 papers in PLOS Comp. Biol. (via @kbradnam)
  46. All bioinformatics problems can be solved through the medium of twitter, snide and ranting 😉 (via @guyleonard)
  47. Installing TopHat with option –reverse will install HotTap, a program that spews vapid results on a random science hot topic (via @CamLBerthelot)
  48. SOLiD sequencers generated colour-space sequence using an algorithm based on the once popular “Simon Says” hand held game (via @iandcalling)
  49. CriMap was called CriMap because users do an awful lot of crying before they get a half decent map (via @dj_de_koning)
  50. A single anonymous donor, RP11, accounts for 72 percent of the human reference genome (via CanGenom)
  51. If you amass the de-bugging tears of a bioinformatician it is enough to fill an Olympic size swimming pool annually (via @paulhoskisson)
  52. FASTA 80 character line wrapping was invented to standardise data sharing using MS Word (via @IanGoodhead)
  53. nine out of ten Bioinformaticians prefer Excel (via
  54. if you’ve never shown the NIH sequencing costs plot in talk/lecture you’re not a real bioinformatician pic.twitter.com/jQzG7MGosd (via @AliciaOshlack)
  55. Illumina is short for Illuminati, the shadowy organisation that controls sequencing worldwide. (via @neilfws)
  56. Every time you run a closed source bioinformatics tool, a PhD student’s soul is sacrificed to the Blood God. (via @froggleston)
  57. The number of replicates needed for your RNA-seq experiment equals the impact factor of the journal you want to publish in (via @torstenseemann)
  58. NCBI’s bacterial annotation takes 6 weeks because it’s done manually by work experience students pasting ORFs into web BLAST (via @torstenseemann)
  59. The majority of bioinformaticians can’t pronounce “de Bruijn” properly (see also thegenomefactory.blogspot.sg/2013/08/how-to… @torstenseemann) (via @rvaerle)
  60. Oxford Nanopore plans to introduce a new FASTQ encoding scheme using an ASCII offset of 48 with optional emoji (via @torstenseemann)
  61. The HMMer package was so named when someone asked how it worked, and the developers said “Hmmmm… errr….” (via @mgollery)
  62. 63% of Bioinformaticists were Biologists to start with, but they realized that the cold room is really COLD! (via @mgollery)
  63. It has been calculated that there are twice as many data formats as there are Bioinformaticians (via @mgollery)

24 Comments

  1. The biosemantics.org group of LUMC is doing psipred protein folding on 1kWh household radiators (128 cores each)

  2. It’s easy! You only have to download this database in which all the genes have only one ID and you can retrieve the IDs in the most important databases

  3. No differential methylation method has ever yielded a significant p-value using real-world data. This has led some to suggest that epigenetics is an Illuminati-led global conspiracy.

  4. 3-letter amino acid code used to make Margaret Dayhoff cry in the lab, so she created the 1-letter code.

  5. #20 might actually be literally true in my experience. But in the spirit of things:

    “Make sure you provide a url in paper for code or data that will soon expire when you leave grad school or a postdoctoral position. Readers find it boring and unchallenging when they can just follow a link. Make it a treasure hunt for the readers to find what institution you are associated with these days and whether they can find the code or data there.”

  6. The HMMer package was so named when someone asked how it worked, and the developers said “Hmmmm… errr….”

  7. 63% of Bioinformaticists were Biologists to start with, but they realized that the cold room is really COLD!

  8. It has been calculated that there are twice as many data formats as there are Bioinformaticians.

  9. gi numbers, present in all NCBI sequence and structure records will soon be removed from GenBank records , but I assume will always be present in the ASN.1 files, on which all things at the NCBI are based on. GI is a acronym for GenInfo

  10. The “Pub” in PubMed comes from “Publishers” which were great contributors to making MEDLINE public — with Entrez, initially it was only the subset of Medline that was cited by DNA and proyein sequences (about 200,000 Medline entries), then it was direct citations and related papers (aka “BigMed), and then PubMed was all of Medline and other papers Publishers also wanted to include — that was PubMed, announced at the NLM in 1997: https://www.youtube.com/watch?v=lYIfNEOeC-E (I was in the room, see https://twitter.com/bffo/status/557380865823866881)

  11. actually, you can delete that comment (and this comment); late night ramblings of a tired mind 🙂

  12. Leighton Pritchard

    18th June 2015 at 9:49 pm

    You know number 37 is true, right?

  13. Dietrich Lueerssen

    18th June 2015 at 11:06 pm

    In decimal, Bioinformatics 101 is actually Bioinformatics 5 and thus an advanced course.

  14. Legend has it if you walk into a Bar in Dublin and mention the secret words “Clustal Omega”, Des Higgins appears from behind the bar (in a tutu) to serve you a Guinness.

  15. Some people know that the Newick format was named after the restaurant where it was invented, but few people know that BED format was named after the bed where John Lennon and Yoko Ono first came up with the idea of a compact genome annotation file format during one of their love-ins or that SAMtools was named after the serial killer “Son of Sam”, who invented short-read sequencing

  16. Some people know that the Newick format was named after the restaurant where it was invented, but few people know that BED format was named after the bed where John Lennon and Yoko Ono first came up with the idea of a compact genome annotation file format during one of their love-ins or that SAMtools was named after the serial killer “Son of Sam”, who invented short-read sequencing

  17. The most common question for a Bioinformatician: “You the IT guys right? Can you fix my laptop?”

  18. Hipster Bioinformaticians focus solely on obscure metagenomes, and sequencing is done with old ABI 3700’s. Analysis is done on clusters of thousands of Raspberry Pi computers.

  19. Bert Overduin

    2nd July 2015 at 1:53 pm

    EBI actually stands for Ewan Birney Institute.

  20. Reblogged this on My sense and non-sense and commented:
    🙂

  21. reblogged this on My sense and non-sense

Leave a Reply

© 2017 Opiniomics

Theme by Anders NorenUp ↑