bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

Bioinformatics is not something you are taught, it’s a way of life

Clearly I’m going to have to clarify this title – because of course you can teach bioinformatics and you can teach it well – but I want to make it clear that being taught bioinformatics is not the only way that you should learn bioinformatics, and it shouldn’t even be the major way.

The best bioinformaticians I know are problem solvers – they start the day not knowing something, and they enjoy finding out (themselves) how to do it. It’s a great skill to have, but for most, it’s not even a skill – it’s a passion, it’s a way of life, it’s a thrill. It’s what these people would do at the weekend (if their families let them).  In many ways, this post is in response to Nick Loman’s tweet:

And his subsequent blog post, which details some of the responses.

Like many bioinformaticians, I train people in bioinformatics. These usually take the form of 2-5 day hands-on courses, and may be specific to a particular domain (e.g. RNA-Seq, Metagenomics) or they may try and cover everything (e.g. next generation sequencing data analysis). In my experience these type of courses have a huge attrition rate; by that I mean that 6 months later, very few people are actually using any of the skills they were taught.

I suspect what we probably end up doing is accidentally selling a lot of CLC licenses.

One of the reasons is that we’re not teaching the right thing – we should teach “problem solving” not “this is how to use TopHat from the command line”. So why don’t we? I don’t know. Maybe because we are told, frequently, that people want to learn how to use command line tools; maybe because a course teaching “problem solving” wouldn’t get many applicants. I don’t have all of the answers, but we can revisit this point later.

A second reason for low success rates of short bioinformatics courses is, unfortunately, an attitude that after spending a week learning, the students will know everything they need to know and will be able to apply it to their own data. This is not true – the short course is not the end of the training, it is the start. It’s merely an introduction, and the students need to continue learning after the course ends. This rarely happens, in my experience.  But why?  I rack my brains about this, frequently.

We have to move away from the idea that a lot of people have, which is that you can turn up to a training course and learn bioinformatics in a week.  You can’t.  It takes more than that, much more, and that’s what this post is about.  It’s about having a “can do” attitude.  It’s about saying “I have no idea how to do this, but I’m going to find out”

If I sound a bit frustrated by this, it’s because I am.  Places on many of these short courses cost money and time, they’re in high demand, and it’s incredibly frustrating to see them wasted because students aren’t willing to go the extra mile.

I tried to tackle this in a recent course I was involved with in The Netherlands. My first presentation was a 30 minute “motivational” talk about what is required to learn bioinformatics. This is roughly what I recommended:

  • Buy a high spec computer and install Linux on it – BioLinux, Ubuntu, choose one and install it
  • Speak to your institute’s sys admin and get access to their Linux servers and clusters
  • Start using Linux regularly, and become familiar with the command line. Practise, lots
  • Install some bioinformatics tools – from source, using a package manager etc
  • Download some data (SRA has plenty) and play with it – assembly, alignment, SNP calling, exomes, whatever
  • Try and do this as much as possible by yourself i.e. by reading online resources rather than specifically asking someone
  • Only ask for help when you get really stuck

Some of the above may have to be done in your spare time – evenings, weekends – and that’s not great, but it’s what it takes. This comes back to Nick’s tweet above: how badly do you want to learn bioinformatics? How hard are you willing to work? The rewards are huge, but they don’t come easily. Sure, you want to learn, but how badly do you want it?

I’m not trying to put anyone off, but many bioinformatics trainers will have been working in bioinformatics for many, many years – you can’t expect to spend a week being trained by them and then know everything they do by the end of the course. You’re going to have to put in some extra time to get there.

Put another way – you may think it’s difficult to find good bioinformatics courses, but can anyone show me a one-week course where I can go and learn everything I need to know about molecular biology so that I can sequence genomes myself? does such a course even exist?  The point I’m making is that it would take me months of training and practice before I could even accomplish the simplest of lab-based tasks; bioinformatics is no different.

Problem Solving

OK, I’m going to make a statement:

The only thing you need (yes you; yes, even you!) to set up a fully functional Galaxy server, or a fully functional mirror of Ensembl, or to accomplish any other number of tasks, is a laptop, access to the internet and a credit card

Are you sat there thinking: “yeah, pretty sure I could do that”?  Or are you sat there thinking “no way, that is far beyond my capabilities”?

Actually, galaxy is pretty easy (in one of two ways), and Ensembl mirrors are not that hard either.  The reason you need a credit card is you might want to buy some server time from Amazon EC2 – instructions here  (please note, you are not allowed to use the credit card to pay someone to do things for you – that would be cheating!).

The point is that everything you need to do those seemingly complex tasks is on the internet.  Everything.  There are tutorials and guides galore.  All you have to do is try.

To be perfectly honest, I find it very hard to believe that anyone who can do this, will have a hard time mastering Linux and running a few bioinformatics commands.  It just takes time to teach yourself.  Anyone who has intelligence and confidence can learn a hell of a lot of bioinformatics straight from the internet, completely free of charge.

So why does it so rarely happen?  Is it that some scientists lack confidence in their computing abilities?  Is that what stops people from trying?

Just do it!

This isn’t just a rather cheesy Nike slogan, it’s a pretty good piece of advice – the internet is stuffed full of information on how to do a whole variety of bioinformatics tasks – get yourself a Linux PC and just do it! It’s the best way to learn.  Then, when you get stuck, use Biostars or SeqAnswers.

The future of bioinformatics training?

Imagine for a minute that you pay a few hundred pounds to attend a two-day training course on genome assembly.  You get there, and are presented with a Linux PC, some amazon vouchers and the simple instructions “Download a Salmonella genome from the SRA and assemble it”.  That’s it.  There’s nothing more, except a few bioinformatics experts ready to answer (some of) your questions.

Would you enjoy that?  Or would you prefer a course that takes you through genome assembly step-by-step?

The latter is what is generally available; however, I have a sneaking suspicion you’d learn far more if you attended a course like the former.  The problem is, I don’t think it exists (yet)


  1. Interesting post. I have many a caller knocking on my door and assuming that bioinformatics is a passive learning process. Typically asking of they can watch me align their sequence (like i do it by hand).
    Sadly i feel that the training offered as a postdoc is now very strict and doesn’t allow time to develop as a interested scientist wanting to ask questions and find out ways to solve them. I was lucky in my career to obtain an MRC training fellowship that funded a lowly zoologist to learn to be a bioinformatician. Most important in this funding to attend the infamous CSHL Programming for biology course with Lincoln Stein and Jim Tisdall. This intense course taught basics of scripting but mostly just split people into groups to write a software to solve a genuine problem we brought to the course. I came back and felt i had the tools to tinker write some scruffy code and get things done.
    My code is still scruffy, but i love the idea that i can sit in my office and think up a way to solve a problem and this should be encouraged. I like the idea of a training course where “teaching” is banned and “learning” is encouraged, lets get a venue and do it!!

  2. I see it’s still going! http://meetings.cshl.edu/courses/2013/c-info13.shtml – I don’t think Lincoln teaches it any more though….

    I am genuinely tempted to try out a “Problem Solving” workshop – what do you think the ratio of learners to teachers should be?

  3. Very interesting idea Mick. Spoon fed courses do a terrible job of preparing people to work independently. I think most non-bioformaticians come away from these types of courses and fall at the first hurdle.

    Hope that you get the chance to try out a more free-form workshop.

  4. heavier on students needed. This should encourage fleshing out of a question, writing pseudocode or pseudopipeline which is then discussed, developed, “googled for a set of tools” and generated.

    Had planned to do something similar flanking the UK NGS meeting but it never got off the ground. Count me in if looking for volunteers.

  5. Actually, very motivational post for people that is just starting in the bioinformatics world like me, thanks! 🙂

  6. I think I do about as much training as you, Mick, and I largely agree. I do think that the courses with more advanced grad students and postdocs end up being quite a bit about networking, and finding friendly people of whom to ask questions; your post emphasizes the “just ask!” without mentioning that, for many people, knowing how to start asking and having the self-confidence to ask are the hardest bits.

  7. I think some of the issue here is miscommunication. From my experience, when a wet-lab biologist tells me, “I want to learn bioinformatics,” they do not intend to put down the pipette and become a bioinformatician. They have generated some high-throughput data to help answer their research question, and they want to become proficient enough to analyze the data on the side as they continue their wet-lab experiments. The analysis will be very standard and is ony intended to be one part of the data presented in their manuscript. Often, they have collaborators that are more computational, and they simply want to be able to effectively communicate with the collaborators and to be able to explore the data on their own as well.

    Thus I don’t think a “problem solving” workshop would really cater to this audience. They are spending their “weekend time” problem solving at the bench. On the other hand, if someone wanted to make the hard transition from bench to computer, I think your idea of a “problem solving” workshop would be great. The question is, how many scientists out there are making that abrupt change?

    And of course it is useful to look at this from the other angle. Say a computational grad student wants to do some PCR+Sanger sequencing to validate some of their results. With some guidance, they could probably start generating usable data in the 3-6 month time frame, and then return to their computer. They didn’t need to be taught all of molecular biology to pull this off (and a “problem solving in molecular biology” workshop would have likely been overkill).

  8. I think we can start with a statement, and maybe you will agree or not agree: “the next generation of biology graduate will be computer literate, capable of analysing their own data using appropriate tool, and simple scripting tasks”

    If you agree with that, then the question is how we get there. Certainly in the UK, too many universities produce biology graduates who are not mathematically literate and who are not computer literate. This has to stop.

    Then the question is what we do with the young students and post docs out there now – do we just abandon them? Or do we try and make them maths and computer literate? And how best to do that?

  9. Confidence is an interesting concept – something neither you nor I lack – but how did I become confident with computers? It wasn’t because I was taught. ….

  10. Great post, and mirrors a lot of my own thoughts as a Post-Doc. I think another stumbling block is where the attitude that Bioinformatics can be passively learned in a short course comes from. In my experience many wet-lab biologists, particularly older “old-school” PIs (and often their trainees) have no real idea what bioinformaticians do or how broad their knowledge-base has to be in order to be effective. They have the idea that bioinformatics is easy, because the only stats they do for their data analysis traditionally is in Excel or using a point-and-click, no thought required package like Prism.

  11. Some People Want to Race Cars. Others Need a Vehicle to Get Their Groceries

    “BioMickWatson (BMW) is angry again !!

    Bioinformatics is not something you are taught, it’s a way of life

    That is what happens, when you take a terrific car-racer and ask him to train others to drive. Those, who come to the class, are not very interested in knowing how a fast car goes around the corner, or in figuring out how to cross every other racer in front of him. Many are simply happy to let the other person go ahead and move to the slowest lane. Some of them will probably be racers in future, but most will be happy to get their groceries and maybe weekend trips to the coast.”


  12. Hmmm, I can’t help think you missed the point….

    I hardly think e.g. going from FastQ to counts-per-gene to differential expression is racing a car – but in my experience, running TopHat from the command line and then edgeR within R is beyond a lot of people. This *is* a vehicle to get the groceries, and it is still beyond some people, despite being pretty simple.

    It’s a confidence thing, I suspect

  13. Well, it’s not helped by very high level Profs getting papers published with awful stats in them.

    I may be wrong here, but I saw a guy (well established, well known in his field) present % data in one of his presentations where the error bars clearly went above 100%… hmmm… is that allowed?!

  14. Thank you 🙂

  15. But why do they fall at the first hurdle? It’s not because they’re not intelligent enough – they are, without exception – so why?

  16. Well, that is why I brought up the WordPress example. When I was in college, creating webpages and keeping all hyperlinks in order was beyond the skills of many bright professors I encountered, but they all wanted to be online.

    Believe me, if there is enough demand, doing things from command line and then taking data to R portal will go away. R itself was created to replace C/python/PERL programming and had been quite successful.

    I am trying to create some tutorials to minimize the language differences and solve a problem from start to end in many languages. I have not gotten too far with writing, but will see whether that helps.


  17. To continue your metaphor, bioinformatics is very often like occasionally taking your car to the grocery store, except that you occasionally have to stop at the deli and the bakery, and occasionally there are detours depending on what you have in your fridge at home. If you need to go to the bakery and the deli, but you have to avoid King Street, and definitely take Main Street, but you can’t get stuck on McDonald way, then even a simple trip to get groceries does become a problem-solving activity.

    If your data does not resemble the contents of your bioinformatics course in most ways, you will have to experiment and play with the tools a little to get your results.

  18. I partially agree. 🙂

    On the one hand, I do not like how much undergraduate education in biology focuses on memorization (e.g. signalling cascades, organ systems, metabolic pathways, etc.). And it is not just biology! Even chemistry labs, a great time for teaching problem solving, are instead reduced to simply following a set of directions. The main impediment to improving this situation is the fact that by far the majority of bio majors (at least at a big public school like I attended) are pre-professional students. In the current situation, I would advise any prospective grad student in biology to major in one of CS/stats/math/engineering, minor in biology, and do research in a biology lab. Ideally, we would move to a system of science-education focused on problem solving instead of memorization (e.g. I think the Integrated Science program being spearheaded by David Botstein at Princeton is a great step in the right direction). At the very least, it would be nice if biology departments would offer their undergrads interested in grad school a separate track where they can substitute traditional pre-med courses like anatomy for CS/stats/math courses.

    On the other hand, there are still many biologists conducting meticulous research in biochemistry, cell biology, etc. that have not jumped on the “big data” (for lack of a better term) trend in biology. Thus, I think that proclaiming all future biologists must be able to write computer code to be relevant is premature.

    As for current grad students and postdocs, they are busy. They have to fit in learning to be computationally proficient on top of all the other demands on their time, which leads to the short courses offered that you are correct to point out will not make someone magically computationally proficient. But I would argue that they do teach enough to allow the participants to start experimenting themselves and to ask informed questions on relevant internet forums, and after just days to a week of instruction I think that is a success.

  19. I don’t think bioinformatics is particularly special in this regard. I’m just a student and I’ve seen all my friends studying various fields all follow the same learning process; “here’s some fundamentals, apply it to this complex problem, extra resources and tutors are here if you need them. Figure it out yourself.” Rinse and repeat for 3-5 years. It’s the same thing for writing, finance, carpentry, IT, and all shades of engineering and art.

    From what I remember of my undergraduate biology, there was relatively very little problem based learning like other fields. The learning process was usually like a spelling test; read > learn > cover > write > check. Most of the technical skills we learnt were simply a matter of following a recipe and the exciting part was then talking about the implications of the results on the wider philosophy of the field.

    Do biologists still think the same way with a general disregard of technical skill?

  20. Because typical courses skip over the first hurdle (installation). I remember installation being one of the most baffling things in my pre-bioinformatics days.

    Making it a pre-requisite that people try to install the software they are going to use on the course before attending and then making the first morning of the course be about installation might help more people to get up and running on their own. Give people an idea of the sensible things to try when an installation goes wrong (which it definitely will).

    Although, as some of the other comments say, there are so many potential use cases (lone ranger, hands-off bioinf support, hands-on bioinf support) that you are never going to please everyone. Doing a survey of whether people use the things they have learnt on a course and if not, why not, might be helpful. Especially considering the amount of effort put into preparing and presenting courses.

  21. Shameless plug: We have recently started a “NGS textbook” on wikibooks:

    Accompanying publication:

    The idea is to get create a way for people to learn both the principles and the actual commands, in contrast to the scattered mini-how-to-run-bwa-etc

  22. Yea I think it (aka bioinformatics) has to become a way of life. Post NGS course with Titus Brown many of us have in fact really pursued doing something with out data, but there was some definite shock value in getting back home and confronting the terminal all by your little lonesome. I suspect some of the course attendees with be able to really go on with their projects and accomplish something while others will abandon ship. I do wonder about this myself. It is nice to have NGS experts (from the course) to now pester with questions and then get yelled at (not naming any names here). Internet is invaluable of course but human connection also a nice addition. I agree, than in general, you do have to be a self starter and a problem solver. and committed because it’s a long long road. but it’s cool!

  23. Mick:

    You said “Then the question is what we do with the young students and post docs out there now – do we just abandon them? Or do we try and make them maths and computer literate? And how best to do that?”

    Personally, I think that overall curricula need to be updated — requirements in maths and basic computing. A course in general Linux/UNIX use, which would include scripting (incl. perl or preferrably python), source code control (ideally using github) from the start (i.e. for first-year students). Classes going forward that require data analysis should use the tools and techniques from that first class.

  24. Nice post. I am not a bioninformation at all, yet in the process of brushing up my bioinformatic literacy. Any suggestions for good resources and books are always welcome. Sometimes it is simply best to ask the experts.
    Regarding your comment that people need to get the basics right and need to problem solve when being bioinformatics. I agree yet I would like to extend this statement to being a good scientists. Science is problem solving and puzzling at its best. You need to understand your tools be it reagents/pipettes or programming languages to really excel at what you are up to. I think the main problem of learning bioinformatics is to become literate. This just takes time and effort the same way as learning to read and write back in the day took time. Yet have good instructors and human interaction is key for a great learning process.

  25. I am currently learning/doing bioinformatics (mostly metagenomics) after joining a medicinal chemistry lab. I learned from a post-doc who had himself no prior experience and had attended a short bioinformatics course, which only taught him the bare basics. Early on, I realized that learning how to problem-solve and searching on my own was much better than constantly asking the post-doc for help. He set a very good groundwork, but I often find taking a fresh look at what’s been done allows me to think of alternatives, some of which are improvements.

    However, the only con I can see about learning on my own is reinventing the wheel and not being aware of what tools more experienced bioinformaticians are using. It’s kind of like the difference between “known unknowns” and “unknown unknowns,” which create a disadvantage.

  26. Even if after the course a student does not end up performing bioinformatic analyses themselves, the experience may not have been wasted effort. I took a car mechanics class in high school, and I hardly ever work on my (now nonexistant) car myself, but I feel much more confident speaking with a mechanic. I have the vocabulary to discuss my problem, and the ability to understand (some of) the details of the mechanic’s solution.

  27. I think the problem you’ve pointed out with bioinformatics training has more to do with the student’s perception of how computing can be applied to biological sciences. Teaching very explicit and specific tasks re-enforces the perception of the computer as an appliance used to performed a defined set of functions rather than a rich environment for executing ideas.

    The way to go is to just throw the students in the water and have them figure it out through trial and error. Hopefully the student will make tons of mistakes and start to build a mental image of the environment through testing the bounds.

    However, this requires a lot of patience and inquisitiveness on the part of the student, which may be in short supply for people who just wants to do a quick one-off analysis of their data.

  28. Nice post !! motivating.. I recently graduated my masters and now I’m working as RA. My master thesis, internship and current projects are related to epigenetics works such as histone modifications, DNA methylation and gene expression studies. I do only wet lab experiments (ChIP-seq, Bisulfite, so on). I am interested in learning Data analysis to play with my own data. The reasons why I want to learn NGS data analysis are 1) In foreseeable future, I want to work in the same field (epigenetics), inevitably, NGS is part of my work. 2) Nowadays, even job recruiters are looking for biologist with programming knowledge either in academic/industry (hmmm,based on my job hunt 😉 ).

    My doubts are :
    1. So, I think if i learn how to do data analysis (of course, with my full effort) I can stand out of the crowd.. right ?
    2. I’m not an illiterate at programming. I have some knowledge on using DOS, C, C++ and Foxpro. I have learned these during my schooling and undergrads. Now, I am trying to learn Linux Shell and later planning to learn R and Perl. Even though, I can understand how to execute Linux commands and its output, but my frustrating quest is how to use Shell programming to work/handle my NGS data.? Looking forward to your suggestion 🙂

  29. Reblogged this on Kurui's blog and commented:
    Totally agree with this. As much as you can take a bioinformatics course, figuring out on your own sticks. Thats what you can remember. Learning through hard work.

  30. I can agree no more with the author. Being a complete novice in bioinformatics to someone who has certain skills sufficient for a scientific publication, I personally think Bioinformatics is a lifelong learning process and experience. Everyday the scientific community is updated with new tools/program and we have to update ourselves with this new changes. Learning Bioinformatics is certainly not a one day or a week course, it involve a lots of trial-error and problem solving, figuring ways to overcome the problems presented and using available resources with lots of thinking to solve the puzzles coupled with experiments/analysis to make sense of the data. To be good in it, the only way is to experience it yourself, getting some hands-on and try it with your set of data, instead of following instruction step by step. You have to really get yourself move with it rather than passive learning. You will forget in time if you do not practice. Practice makes perfect always holds true.

  31. Ok, I’m a microbiologist with absolutely no literacy in computer..I mean I deal with windows and use google that’s it!
    I’m interested in knowing what is bioinformatics and really want to learn it…I just don’t know exactly where to start. As you mentioned the web is loaded with free tutorials but I just don’t know which of them suits me.
    Shall I start learning Linux and programming as a start to defeat my computer illiteracy? and then step 2 is..?
    is there a site that can help completely newbies in this field..by giving them examples and examples and problems to solve?
    Sorry if I’ll let you repeat yourself , I just can’t hide the fact that I’m still lost amid the hundreds and thousands of suggested links and books that I find while trying to learn

  32. Two thoughts, based on my experience teaching R to people who mostly need it for biostatistics:

    – A lot of nebulous “problem solving skills” can actually be broken down into very specific, teachable tools, and that these should be part of any short training course. These include navigating the help system, how to google a question well, how to ask a question on Stack Overflow, etc. It’s easy for experienced users to think of these as trivial, but a lot of them are non-intuitive.

    – One technique that I’ve observed and that I think is very helpful is to assign exercises that go just beyond the scope of material already taught, so that students get to a certain point and then have to use the above skills to finish the material.

  33. Nice Article, but it seems the general idea applies to almost every branch of science. One has to learn a lot, ask people, learn through mistakes, be extremely pro-active, persistent, passionate. I absolutely confident that it all applicable both to wet and comp-based biology. Therefore, it seems to me that time/resourse/project-management questions are also important. Ok, I’m passionate, persistent and able to analyse and learn by myself. But if I (or my PI) manage badly, it’s gonna sink everything. So, pro-activity is an absolute must, but this seems quite self-evident, isn’t it? If not being pro-active, why doing research at all? From other hand, management thing seems to be not so self-evident. Generally, researchers I’ve met had problems laying in this direction (management, strategic thinking), even those who were exeptionally passionate and fast-minded. Whose who good at both are superstars, I suppose.

    So why not teaching scientists a strategic management?

  34. You refer to SRA in your article, what is it, can you give a link? Google gives tons of result on various “SRA”s.

Leave a Reply

© 2018 Opiniomics

Theme by Anders NorenUp ↑