bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

The only core competency you’re ever going to need

In case you haven’t read it already, some colleagues of mine, who I know mostly through GOBLET, have written a paper titled “Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies“.  You should go read it, it’s a nice paper.

I actually think this is a decent stab at defining core competencies for a profession which really struggles to define itself – how on Earth do you define the skill set needed when it’s impossible to define the role itself?  Sure, the paper itself has its idiosyncrasies (Oracle, PostgreSQL, and MySQL are defined as “database management languages”); and by surveying bioinformatics core facility directors they are limiting responses to a certain type of bioinformatician;  but overall, I think it’s easy to tell that the authors have actually sat down and thought long and hard about the content, and there’s a refreshing honesty to their approach to a difficult problem.

Readers of this blog will know that I have written about related issues at great length: see So you want to be a computational biologist?, A guide for the lonely bioinformatician, Bioinformatics is not something you are taught, it’s a way of life, and The alternative “what it takes to be a bioinformatician”

However, even before that, before the blog was even started, I presented at Eagle Genomics‘ annual symposium titled “Provisioning Bioinformatics For The Next Decade – Are We Prepared?”.  My slides are here, and of course in order to answer the question, one has to define what bioinformaticians actually do, which I start on slide 18.  I’ll expand on this later.

First, back to Welch et al.  In their competencies paper, they define three roles: bioinformatics user, bioinformatics scientist and bioinformatics engineer:

journal.pcbi.1003496.t002What immediately struck me is that all of the skills for the first role, the role of “bioinformatics user”, are skills that any biological scientist will need; on reflection, this makes sense.  Welch et al are trying to define those competencies required by anyone who needs bioinformatics training.  What else is a “bioinformatics user” but a scientist? (alternatively, a “bench” or “wet lab” scientist)

Therefore, I think what Welch et al are saying is that there are two types of bioinformatician (if we ignore “bioinformatics user”, who is essentially any kind of researcher in biology) – that of “bioinformatics scientist”, someone who takes existing tools/databases and produces pipelines to answer specific biological questions; and that of “bioinformatics engineer”, someone who develops the tools/databases themselves.

I have different ideas.  When I spoke about this back in 2011, I defined 4 roles for bioinformaticians:

  1. The software developer
  2. The statistician
  3. The data miner/analyst
  4. The database developer

Crucially, it’s important to note that these are roles that exist outside of the domain of biology too, and therefore to be a bioinformatician, one has to carry out one of the four roles and possess and use a knowledge of biology.  I’ve emphasized the “use” deliberately – just because you have a degree in biology, it doesn’t make you a bioinformatician.  Do you actually use your knowledge of biology?  If not then you may not actually be a bioinformatician.

At face value, my “software developer” and “database developer” roles are “bioinformatics engineers”, and my “statistician” and “data miner/analyst” roles are “bioinformatics scientists”.  I’ve split these out specifically because they require quite different skills – a software developer will have different skills to someone who models and creates databases; and a statistician requires in depth knowledge of statistics that perhaps a data miner does not.  The guys who write Galaxy don’t need to know about statistics;  but the guys who wrote edgeR do.  The roles are related and overlap, but they are definitely different roles.  Therefore I’d prefer to stick with my four roles over Welch et al‘s two.

For those of you bursting at the gut to say “what about modelling?” or “what about systems biology?”, then (i) modelling is just a subset of statistics, and (ii) systems biology isn’t a separate science – a wise colleague of mine once said “we all do systems biology – who doesn’t?”

The “bio” bit; the important bit

As I mentioned above, the roles 1-4 I name above also exist outside of the domain of biology; for example all of those roles exist within finance, and within social sciences etc  So what makes a bioinformatician different?  Clearly it is the knowledge of biology, but I consistently question and challenge bioinformaticians on how much of their biology they actually use.

For example, software developers can enter into a contract to develop a computer system to be used by a bank; it doesn’t make them bankers. The software developers creating a system for managing NHS data in the UK don’t automatically become doctors or nurses.  They’re just software developers, working towards a specification created by domain experts.

So I challenge bioinformaticians again – do you use your biological knowledge?  Or are you working to the specification of a “domain expert”?  Does someone else define the what and you simply define the how?

I may appear as if I’m being mean, but actually biological knowledge, and knowing how to apply it, is the most important “competency” (aka skill) that a bioinformatician can possess.  In a field full of techies, the thing that will make you stand out is your biological knowledge, not your impressive array of awk one-liners.

When we were writing “How to be a computational biologist?“, Nick turned to me and said “I’m just not sure what we’re actually trying to say”.  This is a good question to ask oneself!  I guess with many of these posts, what I’m trying to do is to get bioinformaticians to be scientists.  To be researchers.  To hypothesize, to test, to use their biological knowledge to pose questions and their bioinformatics skills to answer them.  We’ve all seen the rise of the pet bioinformatician, and I guess what I’m trying to say is that you don’t have to be the dude that analyses someone else’s data, you don’t have to be the dude that writes the pipeline that enables every single one of your PI’s papers yet you remain middle author, you don’t have to be the only dude in the room capable of dealing with the data yet made to feel like a second-class citizen.  What I’m trying to say that your core competency, the only one you will ever actually need, is the skill of being a scientist.  Develop that skill and you won’t ever look back.

Note: I specifically define “dude” as encompassing all genders


  1. I still think the logical partitioning I made here is the useful.
    Computer scientists are scientists too, but in a different way. They observe patterns and algorithms, but have less knowledge of biological system. When someone like Heng Li develops sophisticated algorithms, you cannot ask those guys to learn biology before accepting their ideas. Academics did that to Gene Myers during human genome project and learned a lesson.
    > within social sciences
    “social science” is not science. Give me a break.

  2. I totally agree with the science being the core competency. I have been a biologist who somehow learnt to program. Now I use tools, “engineer” tools and write code, develop databases, decide on algorithms etc, basically all 4 tasks that you have mentioned. The main factor is that I get to decide the problems I will work on (takes up major chunk of my time) besides helping others (minor chunk). I hope the formally trained bioinformaticians learn from your blog. Good contributions !

  3. I agree with layering the level of expertise in bioinformatics but I m curious where do I stand because I do what’s there in all 5 stages but still prefer Perl/R/ scripting for stage 4 and 5. So is it about the tools (language?) or the conceptual method?

  4. Great post Mick – and the Slideshare content is enlightening about the ‘day work’ you do. And a good challenge for those who style themselves as something they are clearly not – software or database experts who simply pick up or parrot others ideas around biology, rather than combining the two. The illustration about software and other folks working in banking is apt.

    Of course Welch et al. were careful in their phrasing of the title – ‘Toward a definition…’ rather than ‘The end-all definition…’, and I find your subdivisions useful.

  5. Totally agree Mick. I did my undergrad in Biology-Chemistry, MSc in Chemistry (computer aided drug design), and PhD in Chemistry (analysis of DNA Microarrays), and am currently a PostDoc. At every stage of my career I have to make use of the basic biology (and some advanced biology I learned in my PhD, and sometimes chemistry even) to write software and pipelines to solve problems in Bioinformatics. And I love it. The biology is what motivates me, getting to learn something new about biology and disease, and getting to do that by writing code on computers is awesome (mainly because I kill anything that is being cultured).

    I always get discouraged when I interact with CS folks who don’t understand the fundamental biology of the problem they are trying to solve, or even biologists who don’t get how the technique they used works and have messed something up in the process.

  6. Great post Mick! I would consider myself a “bioinformatics user”; I did my undergrad in Marine Biology and now I’m doing a MSc in Biology to test the interdependence of circadian and circalunar timekeeping as it relates to coral broadcast spawning.
    I tried taking the MOOC “Bioinformatics Algorithms” on Coursera; however, I found that the automated programming challenges on Stepic were too advanced/hard for a beginner.

  7. Oops my last post got cut off.
    I was going to say that now I’m working on learning computational methods. I took an Intro to R programming course on Coursera as well as I’ve done numerous bioinformatics tutorials in Bioconductor and R.
    I really think I should take a programming course to become proficient in a scripting language (likely Perl over Python) and was wondering if anyone knew of any free courses/tutorials that are good for beginners – as I mentioned before Stepic is not.
    Furthermore, does any one know of good resources for:
    Database management languages
    Networking Technology/Internet Protocols
    Object-oriented analysis, design, implementation
    Parameter estimation
    Graph theory

  8. Great post Mick, and I also prefer your four divisions to the two in the paper. I think I tend to straddle the Developer/Analyst positions myself (and as one of the aforementioned “Pet Bioinformaticians” I think this is important). I lead the initial genetic analysis on all of our genetic disease projects and have one or two that are my core ongoing projects. I then spend lots of my additional time developing publishable software that I use in my day-to-day work.

  9. Well, I think I’m the only one here that’s not a bioinformatician, yet. If you don’t mind I would like to share with you some thoughts and If you have any tip I’ll appreciate it =).

    I’m actually a biologist but after my degree I discovered how much I love programming. This feeling was so big that I decided to enroll into a 2 years Profesional Training Course focused on programing multiplatform apps (I’m learning Java in the course, but I’m learning Python on my own).

    I like Biology and I like programming so, I think, Bioinformatics is the perfect field for me. However I’m a bit confused. This is because I would like to code as much as I could, and I don’t know if the coding tasks are good for people with my curriculum. I mean, developing software and algorithm may not be suitable for someone with my biology and CS background. I wonder if these tasks are for people who came form a CS degree and learnt a bit of biology on his MS.

    Reading your article, and the comments, it seems that I’m be on the right way. I’m not the only one who doesn’t have a CS degree background.

    My fear comes from reading reddit. There’s a subreddit called r/bioinformatics and almost everyone who is studying Bioinformatics and posting there, has a strong programming background. What do you think it’s a better pathway? A computer scientist who learns biology or the other way. A biologist who learns computer science.

  10. These are the 2 MOOC I took for learning programming from scratch:

    1. https://www.coursera.org/course/pythonlearn
    2. https://www.coursera.org/course/interactivepython1

    I highly recommend them =).

Leave a Reply

© 2018 Opiniomics

Theme by Anders NorenUp ↑