Clearly I’m going to have to clarify this title – because of course you can teach bioinformatics and you can teach it well – but I want to make it clear that being taught bioinformatics is not the only way that you should learn bioinformatics, and it shouldn’t even be the major way.

The best bioinformaticians I know are problem solvers – they start the day not knowing something, and they enjoy finding out (themselves) how to do it. It’s a great skill to have, but for most, it’s not even a skill – it’s a passion, it’s a way of life, it’s a thrill. It’s what these people would do at the weekend (if their families let them).  In many ways, this post is in response to Nick Loman’s tweet:

And his subsequent blog post, which details some of the responses.

Like many bioinformaticians, I train people in bioinformatics. These usually take the form of 2-5 day hands-on courses, and may be specific to a particular domain (e.g. RNA-Seq, Metagenomics) or they may try and cover everything (e.g. next generation sequencing data analysis). In my experience these type of courses have a huge attrition rate; by that I mean that 6 months later, very few people are actually using any of the skills they were taught.

I suspect what we probably end up doing is accidentally selling a lot of CLC licenses.

One of the reasons is that we’re not teaching the right thing – we should teach “problem solving” not “this is how to use TopHat from the command line”. So why don’t we? I don’t know. Maybe because we are told, frequently, that people want to learn how to use command line tools; maybe because a course teaching “problem solving” wouldn’t get many applicants. I don’t have all of the answers, but we can revisit this point later.

A second reason for low success rates of short bioinformatics courses is, unfortunately, an attitude that after spending a week learning, the students will know everything they need to know and will be able to apply it to their own data. This is not true – the short course is not the end of the training, it is the start. It’s merely an introduction, and the students need to continue learning after the course ends. This rarely happens, in my experience.  But why?  I rack my brains about this, frequently.

We have to move away from the idea that a lot of people have, which is that you can turn up to a training course and learn bioinformatics in a week.  You can’t.  It takes more than that, much more, and that’s what this post is about.  It’s about having a “can do” attitude.  It’s about saying “I have no idea how to do this, but I’m going to find out”

If I sound a bit frustrated by this, it’s because I am.  Places on many of these short courses cost money and time, they’re in high demand, and it’s incredibly frustrating to see them wasted because students aren’t willing to go the extra mile.

I tried to tackle this in a recent course I was involved with in The Netherlands. My first presentation was a 30 minute “motivational” talk about what is required to learn bioinformatics. This is roughly what I recommended:

  • Buy a high spec computer and install Linux on it – BioLinux, Ubuntu, choose one and install it
  • Speak to your institute’s sys admin and get access to their Linux servers and clusters
  • Start using Linux regularly, and become familiar with the command line. Practise, lots
  • Install some bioinformatics tools – from source, using a package manager etc
  • Download some data (SRA has plenty) and play with it – assembly, alignment, SNP calling, exomes, whatever
  • Try and do this as much as possible by yourself i.e. by reading online resources rather than specifically asking someone
  • Only ask for help when you get really stuck

Some of the above may have to be done in your spare time – evenings, weekends – and that’s not great, but it’s what it takes. This comes back to Nick’s tweet above: how badly do you want to learn bioinformatics? How hard are you willing to work? The rewards are huge, but they don’t come easily. Sure, you want to learn, but how badly do you want it?

I’m not trying to put anyone off, but many bioinformatics trainers will have been working in bioinformatics for many, many years – you can’t expect to spend a week being trained by them and then know everything they do by the end of the course. You’re going to have to put in some extra time to get there.

Put another way – you may think it’s difficult to find good bioinformatics courses, but can anyone show me a one-week course where I can go and learn everything I need to know about molecular biology so that I can sequence genomes myself? does such a course even exist?  The point I’m making is that it would take me months of training and practice before I could even accomplish the simplest of lab-based tasks; bioinformatics is no different.

Problem Solving

OK, I’m going to make a statement:

The only thing you need (yes you; yes, even you!) to set up a fully functional Galaxy server, or a fully functional mirror of Ensembl, or to accomplish any other number of tasks, is a laptop, access to the internet and a credit card

Are you sat there thinking: “yeah, pretty sure I could do that”?  Or are you sat there thinking “no way, that is far beyond my capabilities”?

Actually, galaxy is pretty easy (in one of two ways), and Ensembl mirrors are not that hard either.  The reason you need a credit card is you might want to buy some server time from Amazon EC2 – instructions here  (please note, you are not allowed to use the credit card to pay someone to do things for you – that would be cheating!).

The point is that everything you need to do those seemingly complex tasks is on the internet.  Everything.  There are tutorials and guides galore.  All you have to do is try.

To be perfectly honest, I find it very hard to believe that anyone who can do this, will have a hard time mastering Linux and running a few bioinformatics commands.  It just takes time to teach yourself.  Anyone who has intelligence and confidence can learn a hell of a lot of bioinformatics straight from the internet, completely free of charge.

So why does it so rarely happen?  Is it that some scientists lack confidence in their computing abilities?  Is that what stops people from trying?

Just do it!

This isn’t just a rather cheesy Nike slogan, it’s a pretty good piece of advice – the internet is stuffed full of information on how to do a whole variety of bioinformatics tasks – get yourself a Linux PC and just do it! It’s the best way to learn.  Then, when you get stuck, use Biostars or SeqAnswers.

The future of bioinformatics training?

Imagine for a minute that you pay a few hundred pounds to attend a two-day training course on genome assembly.  You get there, and are presented with a Linux PC, some amazon vouchers and the simple instructions “Download a Salmonella genome from the SRA and assemble it”.  That’s it.  There’s nothing more, except a few bioinformatics experts ready to answer (some of) your questions.

Would you enjoy that?  Or would you prefer a course that takes you through genome assembly step-by-step?

The latter is what is generally available; however, I have a sneaking suspicion you’d learn far more if you attended a course like the former.  The problem is, I don’t think it exists (yet)