bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

A quick guide to making your software installable

This post is a blatant excuse to give myself the opportunity to vent some steam 🙂 Some bioinformatics software packages/pipelines are notoriously difficult to install; others have just given no thought whatsoever to environments other than their own.  Here is quick check list of “DOs and DON’Ts” – well, in fact it’s mostly DON’Ts.

When considering the people who will install your software:

  1. Do not assume, under any circumstance, that we have root access.  We may very well have to obey the orders of a paranoid sys/admin who does not allow us root access.  We have to try and work round that.
  2. Do not assume that our system is in any way standard.  It won’t be.  It isn’t.  It never has been.
  3. Do not assume we have the latest anything.  We might do, but it won’t be where you think it is (see (1)).
  4. Do not assume that our Perl, Python etc libraries are in standard places.  See (1) above – we probably had to install them somewhere we could write to.  Yes, we know about $PERL5LIB and $PYTHONPATH – does your install script?
  5. Do not assume the system (/usr/bin/) Perl, Python etc is the one we use.  See (1) above – we probably had to put newer versions of Perl, Python etc somewhere we could write to.
  6. Do not assume, just because it’s not in our PATH, that we don’t have it.  There are all sorts of reasons why we might have something, but not put it in our PATH.
  7. Don’t force us to install, or link, software into a sub-directory of your software.
  8. Don’t force us to download, or link, large databases into a sub-directory of your software.
  9. Don’t assume the user, or the server, has internet access.
  10. Just because you are a developer, doesn’t mean we have all the *-dev packages installed.  Don’t assume we do.
  11. Don’t write an install script that installs something else which installs something else which installs something else etc etc.  If you have multiple dependencies, give us a separate script for each one and let us decide how, when and where it is installed.
  12. We will absolutely, 100% shut down and remove anything that tries to gather statistics and “call home”.  If this feature was a person, we would punch it, it makes us that angry.   Just. Don’t. Do. It.
  13. Do not think, for one second, that providing a VM circumvents any of these problems – it doesn’t.  VMs are occasionally useful, but ultimately, we will need the software on a local server – and the day our sys/admin lets us put your VM onto our network is the day hell freezes over.  And no, AWS is not the solution.  Not at those prices.
  14. DO – write a good step-by-step tutorial on how to install all of the dependencies and your software.  It’ll take time, and you will really not want to do it.  But good devleopers write good documentation.  So go write some.

And breath…..

DISCLAIMER: if you think I am talking about your software and feel offended, then I promise I am not talking about your software.


  1. So you’re basically saying “As an academic, you don’t have time to make your software installable to my standards, so don’t bother”?

    • I’m saying that it should be a requirement of publication that your software has good documentation on how to install it, along with all of the dependencies 🙂

      • In a field where not even making the software that is published available seems to be a requirement, you’re setting a very high bar there. In a perfect world, I agree that everything would be nice and shiny the way you describe it. In the real world not having to take a paper and re-implement the described method if I want to use it is the bar that I can deal with.

      • I’m a bit more vicious I guess – http://biomickwatson.wordpress.com/2013/01/14/call-the-bioinformatics-police/ – I think if software is published in an academic journal, and then it stops working or something is missing or it is too hard to install, then the paper should be retracted.

      • So in that “call the bioinformatics police” post, I do remember reading:

        “I think, as a scientist, if I take some published code, that it should work. Not much too ask is it? Sure, a readme.txt or a manual.pdf would be nice too, but first and foremost, it has to just do the eff-ing job it’s supposed to.”

        I think going from “just do the job it’s supposed to do” to “needs to install on my heavily non-standard system where I run perl 4 and python 1.7, on debian potato, without internet access” is quite a jump there. Ok, maybe I’m exaggerating, but there’s a practical limit on what you can support. I’ve got users who don’t know or care how to compile dependencies, so I need to provide binaries for Linux, Windows and OSX. 32bit and 64bit. Different Linux distros, if possible, because of all the package management bikeshedding that’s going on. And then the people who don’t want to use the pre-built installers (or maybe can’t because they don’t have root on the system) also want a step-by-step guide on how to build all the deps standalone. And that is in an environment where it’s hard to argue for spending time on anything like that with a funding agency.

        I have no doubt that making your software as easy to install and use in general will make it more successful, and will generate more citations. But as long as people expect that this comes for free, I don’t see it happening.

      • Haha, touche!

        I absolutely 100% reserve the right to hold two completely opposing problems on two completely different days 🙂

  2. This is a crushing burden to provide for any realistic subset of platforms. It’s why I provide Amazon EC2 install instructions; if someone can’t get it installed there (where I can provide completely static dependencies) then it’s on them. If they can get it installed there but run into other problems on their home cluster, then it’s something we can help with.

    Also, tests! Tests to help make sure it’s compiled properly, etc. Very important.

    But, overall, a really hard set of problems to tackle, and something that most academics suck at.

    • Even publishing a list of dependencies, with versions, would help – many software packages don’t even do that.

      • +1.

        I wrote something very much along these lines in a book chapter just accepted, need to push the editors to clarify if I can post a preprint online or not. That’s going to be a precondition of any future book chapter I agree to write – otherwise I feel I’d be far better off writing an open access paper or a blog post.

Leave a Reply

Your email address will not be published.




© 2017 Opiniomics

Theme by Anders NorenUp ↑