In case you haven’t read it already, some colleagues of mine, who I know mostly through GOBLET, have written a paper titled “Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies“.  You should go read it, it’s a nice paper.

I actually think this is a decent stab at defining core competencies for a profession which really struggles to define itself – how on Earth do you define the skill set needed when it’s impossible to define the role itself?  Sure, the paper itself has its idiosyncrasies (Oracle, PostgreSQL, and MySQL are defined as “database management languages”); and by surveying bioinformatics core facility directors they are limiting responses to a certain type of bioinformatician;  but overall, I think it’s easy to tell that the authors have actually sat down and thought long and hard about the content, and there’s a refreshing honesty to their approach to a difficult problem.

Readers of this blog will know that I have written about related issues at great length: see So you want to be a computational biologist?, A guide for the lonely bioinformatician, Bioinformatics is not something you are taught, it’s a way of life, and The alternative “what it takes to be a bioinformatician”

However, even before that, before the blog was even started, I presented at Eagle Genomics‘ annual symposium titled “Provisioning Bioinformatics For The Next Decade – Are We Prepared?”.  My slides are here, and of course in order to answer the question, one has to define what bioinformaticians actually do, which I start on slide 18.  I’ll expand on this later.

First, back to Welch et al.  In their competencies paper, they define three roles: bioinformatics user, bioinformatics scientist and bioinformatics engineer:

journal.pcbi.1003496.t002What immediately struck me is that all of the skills for the first role, the role of “bioinformatics user”, are skills that any biological scientist will need; on reflection, this makes sense.  Welch et al are trying to define those competencies required by anyone who needs bioinformatics training.  What else is a “bioinformatics user” but a scientist? (alternatively, a “bench” or “wet lab” scientist)

Therefore, I think what Welch et al are saying is that there are two types of bioinformatician (if we ignore “bioinformatics user”, who is essentially any kind of researcher in biology) – that of “bioinformatics scientist”, someone who takes existing tools/databases and produces pipelines to answer specific biological questions; and that of “bioinformatics engineer”, someone who develops the tools/databases themselves.

I have different ideas.  When I spoke about this back in 2011, I defined 4 roles for bioinformaticians:

  1. The software developer
  2. The statistician
  3. The data miner/analyst
  4. The database developer

Crucially, it’s important to note that these are roles that exist outside of the domain of biology too, and therefore to be a bioinformatician, one has to carry out one of the four roles and possess and use a knowledge of biology.  I’ve emphasized the “use” deliberately – just because you have a degree in biology, it doesn’t make you a bioinformatician.  Do you actually use your knowledge of biology?  If not then you may not actually be a bioinformatician.

At face value, my “software developer” and “database developer” roles are “bioinformatics engineers”, and my “statistician” and “data miner/analyst” roles are “bioinformatics scientists”.  I’ve split these out specifically because they require quite different skills – a software developer will have different skills to someone who models and creates databases; and a statistician requires in depth knowledge of statistics that perhaps a data miner does not.  The guys who write Galaxy don’t need to know about statistics;  but the guys who wrote edgeR do.  The roles are related and overlap, but they are definitely different roles.  Therefore I’d prefer to stick with my four roles over Welch et al‘s two.

For those of you bursting at the gut to say “what about modelling?” or “what about systems biology?”, then (i) modelling is just a subset of statistics, and (ii) systems biology isn’t a separate science – a wise colleague of mine once said “we all do systems biology – who doesn’t?”

The “bio” bit; the important bit

As I mentioned above, the roles 1-4 I name above also exist outside of the domain of biology; for example all of those roles exist within finance, and within social sciences etc  So what makes a bioinformatician different?  Clearly it is the knowledge of biology, but I consistently question and challenge bioinformaticians on how much of their biology they actually use.

For example, software developers can enter into a contract to develop a computer system to be used by a bank; it doesn’t make them bankers. The software developers creating a system for managing NHS data in the UK don’t automatically become doctors or nurses.  They’re just software developers, working towards a specification created by domain experts.

So I challenge bioinformaticians again – do you use your biological knowledge?  Or are you working to the specification of a “domain expert”?  Does someone else define the what and you simply define the how?

I may appear as if I’m being mean, but actually biological knowledge, and knowing how to apply it, is the most important “competency” (aka skill) that a bioinformatician can possess.  In a field full of techies, the thing that will make you stand out is your biological knowledge, not your impressive array of awk one-liners.

When we were writing “How to be a computational biologist?“, Nick turned to me and said “I’m just not sure what we’re actually trying to say”.  This is a good question to ask oneself!  I guess with many of these posts, what I’m trying to do is to get bioinformaticians to be scientists.  To be researchers.  To hypothesize, to test, to use their biological knowledge to pose questions and their bioinformatics skills to answer them.  We’ve all seen the rise of the pet bioinformatician, and I guess what I’m trying to say is that you don’t have to be the dude that analyses someone else’s data, you don’t have to be the dude that writes the pipeline that enables every single one of your PI’s papers yet you remain middle author, you don’t have to be the only dude in the room capable of dealing with the data yet made to feel like a second-class citizen.  What I’m trying to say that your core competency, the only one you will ever actually need, is the skill of being a scientist.  Develop that skill and you won’t ever look back.

Note: I specifically define “dude” as encompassing all genders