Inspired by this beautifully written piece over at the NY Genome Centre Blog, I thought I’d quickly write down the alternative version, according to yours truly 🙂
I mean, the NY genome piece has some lovely soundbites:
- bioinformaticians are “rewriting biology”
- a postdoc in bioinformatics can expect to earn about 50 percent more than a postdoc in biology
but I can’t help thinking the whole piece is a little too “nice”, a little too “perfect” and ignores some of the deficiencies of the real world.
So, here is the alternative version of what it takes to be a bioinformatician:
- Patience. You won’t be spending the majority of your time running beautifully crafted machine learning algorithms to find that perfect, but hidden, signal that reflects the true biology. No. I’d say that’s about 1% of your job. The vast majority of your time, upwards of 90%, will be spent getting data into the correct format, dealing with the fact that no two databases use the same identifiers, or the same format, trying to figure out why your cluster jobs didn’t run, and removing errors and systematic bias from your data. This is the true art of bioinformatics. Try and get this done quickly and efficiently, so you can spend more time on the biology.
- Suspicion. If it looks too good to be true, it probably is. A large majority of your “Eureka!” moments will just be errors and systematic bias. Whenever you find an answer, treat it with huge suspicion until you are absolutely sure it’s not an error. Don’t trust quality values, of any kind.
- Biological knowledge. Your job does not finish when the alignment jobs do. Nor does it finish when GATK does. If your day-to-day job is simply running algorithms, the results of which you then give to a biologist to interpret, then you are not a bioinformatician, you are just an informatician. Noone can expect you to know everything about every problem, but you should have enough biological knowledge to be able to add some interpretation to the data. If not, see (4). You need the biological knowledge to figure out the errors.
- Social skills. Whatever it is you’re working on, go talk to a biologist who knows a lot about it. In fact, talk to lots of them. They won’t need to be encouraged to talk about their science, in fact you’ll probably have to put a time limit on the conversation. Learn about the biology. Learn about the system. Not only will it help you interpret the data, it will also help you realise which results are errors, which are bias and which are real.
- Big cojones. I’m sorry for swearing, even if it is in Spanish, but I’m trying to make a point. You’ve just been given 100s of millions, if not billions, of data points. You need to find the answer, the story, within that data. It’s in there, somewhere. Finding it will not be easy. Do you have what it takes to confidently disregard what you suspect are errors, and engage in a dialogue with biologists about what you think the data is telling you? Or do you just send off an Excel sheet with all the data in it and expect someone else to do it?
- The mind of a super sleuth. You are basically a detective, and all the clues to the murder are in your data. The murderer is not going to make it easy and hand themselves in with a full confession. Work the clues. Work the data. Figure it out. Be a detective.
- Delivery. This is related to (5). Deliver an end product. Often this will be a paper. If I had to divide up all scientists (not just bioinformaticians) into two groups, it would be (i) those who can write papers; and (ii) those who can’t. As a bioinformatician, you can write papers. Not always, and not with every project, but with some projects you can. I’m not talking about writing the “bioinformatics method” section and providing a few figures, I am talking about designing and executing an in silico experiment, interpreting the results and writing the paper. Or creating some software, releasing it, supporting it and writing the paper. The guys who do this are the guys who get promotions, and the guys who get that extra 50% Purvesh Khatri is talking about in the NY Genome Piece. And yes, even if you’re in a “support role” – I was running a bioinformatics support group when I wrote my first bioinformatics papers.
- The ability to code. Perl, Python, Ruby, R, whatever. Some kind of coding ability is essential. Using GUIs and web-tools will only get you so far. If you need to do 1000 things, do you sit and open 1000 browser tabs and laboriously start every job? Or do you write a few lines of code and submit 1000 jobs to the cluster? The latter is what a bioinformatician does; the former is what an **** does (insert your own word here).
The idea that you can get ahead more quickly, and get paid more, because you have skills that are in demand is true – but this will only happen to the best, and to the lucky. Ultimately, if you are an academic, then (7) is the most important and that is what you will be judged on. You need first and last author papers if you want to get promoted, and if you want to be a PI. Being able to produce those is tough, very tough – harder even than installing QIIME. And to produce the papers, you’ll need 1-8 above.
Good luck 🙂
UPDATE (19/03/2013): Sorry, this post isn’t meant to be a criticism of people who perhaps feel they don’t have some of the above, and I am sorry if you feel I am trying to tell you you’re not a bioinformatician. There is nothing wrong with being an informatician. There is nothing wrong with being the support guy who doesn’t publish papers. This post was in response to the NY Genome post which paints a beautiful, romanticised version of what it means to be a bioinformatician. However, only a few of us will ever realise that vision – steps 1-8 above are what you’ll need.