bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

Is this a realistic portrait of a modern student/post-doc in biology?

I was at a training workshop in Portugal recently, and whilst there I entered into a rather fruitless conversation about the usefulness of a certain software tool in bioinformatics training.  I won’t bore you with the details.  I came to the conclusion that there are many different types of trainee, and I wanted to paint a portrait of a certain type of trainee that I have come across fairly frequently, and then see how realistic readers of this blog think that portrait is.

DISCLAIMER: I encounter many trainees, in person and through my blog/twitter feed; they come from many different institutes and universities around the world.  The portrait below by no means reflects any specific individual or project, and they certainly do not reflect experiences at any particular institute or institution, including my own.

Inspired by Welch et al, I thought I would create a persona!  Here goes:

Ian has just started a 4-year PhD project with Professor Lollipop, an internationally renowned A. bacterium expert.  A. bacterium is pathogenic to both humans and animals, and is found in soil and water.  Ian’s project involves the sequencing and analysis of 1000 environmental and clinical samples of A. bacterium, including genome assembly, genome annotation, SNP calling, phylogenetic comparison and biological interpretation.  The project therefore requires significant bioinformatics expertise.  Prof Lollipop does not have any of the bioinformatics skills necessary to do this himself, and nor do any of Ian’s other co-supervisors.  Ian graduated from the University of Westeros with a 2:1 in Microbiology, and accepted this PhD project immediately after graduation.  Ian’s experience of bioinformatics is a week long course on how to use Galaxy during his second year.  Ian has heard of Linux/Unix, but has never worked on that platform.  Ian is bright and enthusiastic, and has a good knowledge of how to use Windows and Microsoft Office.  The institute where Ian is doing his PhD has some Linux/Unix servers, but Ian needs to demonstrate competence before he will be allowed to use them.  The institute offers Linux/Unix training, but this is only run every 6 months and there is a waiting list.  The servers themselves use a slightly obscure version of Linux that is 1 major version out-of-date.  Ian would not be allowed to have admin rights on these servers and couldn’t install any software.  He would be allowed to install software to his home directory, but any software that depends on up-to-date system libraries or software would not work.  The institute does not allow Ian to use external clouds such as Amazon EC2 as they consider them unsecure.  Ian’s project comes with £5,000 per year to spend on consumables (a total of £20,000) but only £500 from the entire budget has been allocated to training, and that has already been given to the institute to fund their “core skills” training programme – which includes “paper writing”, “how to use powerpoint” etc workshops. Ian has been tasked with learning the skills he needs to do the data analysis himself, and is very keen; however, he is also extremely worried that he may not be able to complete his PhD as he doesn’t have those skills and nor does he feel he has the support to acquire them.

So, question – do you think this is common?  Have you come across this type of person?  Are you this type of person?  Please comment below!

Now let’s extend the imagined scenario above to include the trainer:

Ian has managed to persuade Prof. Lollipop to fund a week long training course on NGS analysis at the University of Middle Earth.  You are the tutor on this training course.  Ian’s entire PhD depends on your ability to teach him everything he needs to know about bioinformatics.  Go!

Are you a trainer who has experienced this?  Thoughts?  Comments?


  1. Yes, I come across this kind of person regularly.

  2. >The institute does not allow Ian to use external clouds such as Amazon EC2 as they consider them unsecure.

    I have heard of that occurring; especially in governmental (e.g. Porton Down) and US military institutions but never in plain academic ones. Mostly I would say it comes down to the actual running costs… The university will not authorise departmental credit card use and if you’re a lowly paid, in-debt PhD you’re not going to want to put your details in to Amazon EC2!

    >Ian’s project comes with £5,000 per year to spend on consumables (a total of £20,000) but only £500 from the entire budget has been allocated to training, and that has already been allocated to the institute to fund their “core skills” training programme – which includes “paper writing”, “how to use powerpoint” etc workshops.

    Eugh. This is all too common. When I did my PhD they made me do stupid things like how to use MS Word and Endnote and other similar banal examples (the excuse was these were Research Council mandates, but I never believed that for a minute). These are entirely pointless for nearly anyone under the age of 45. They also would not allow PhDs to do computational courses instead of these. I was lucky as I could already program, so it was just more biology to learn…

    Having recently taught on the evomics.org course, a NERC funded metagenomics course and also on internal Unix+Perl courses, I would pluck a number between 60-80% of the students were/are in a similar position to “Ian”, although many of them did secure funding to attend – so it may be that Universities/RCs are becoming better at allowing students to do more focused/relevant courses to their PhD.

    It was clear though that nearly every student wanted to bring their own data to work on at these workshops and often had questions specific to their datasets and about the ‘best’ approaches to analysing it.

    It had to be made clear quite often that the aim of the workshops was to give an understanding and starting point on a range of programs suited to certain research questions and delivered by a set of invited specialists – who maintain preferences on programs/teaching style – and that we were not there to go in to detail on their data.

    We did hold breakout sessions to discuss individual concerns/data etc but the main sessions were on how to familiarise yourself with *nix and become comfortable using it and how to then apply this to running a suite of genomic/informatics tools… This seems to be a good way – I think (and from feedback) – to approach the “Ian” problem, assuming students can secure funding/time to attend one of these courses.

  3. Laura (@MicroWavesSci)

    20th March 2014 at 2:06 pm

    I’d say that part 1 (describing the student) was me, but part 2 (describing the institution) was not my situation. I was largely self-taught, but I had incredibly helpful and supportive sysadmins (who installed software for me, etc.) and excellent computational resources. For me, that made a tremendous difference, but I’m sure these resources aren’t in place for many students/postdocs.

    It sounds like there are three main problems in the scenario you describe: (1) lack of meaningful opportunities for the student to gain or improve computational skill sets, (2) lack of appropriate computational infrastructure for student to be successful with the project, and (3) PI’s general unawareness of (1) and (2). As more labs launch into sequencing projects, it seems like this scenario will become even more common. What needs to happen to help these labs/projects be successful?

  4. Adam Hargreaves

    20th March 2014 at 2:12 pm

    I’ve just completed my PhD, and I suppose I would say I was in a similar-ish situation. I had no bioinformatics experience (no perl, no linux, nada) and needed to learn a lot of things in order to get the most out of my data. Personally I’ve found it extremely useful being thrown in at the deep end, it forces you to work at it and most definitely to become extremely efficient at troubleshooting!

    The genomics and Linux courses that NBAF run are excellent, and I frequently recommend them to other students, but they’re certainly not a quick fix. Knowing a good (and very patient) bioinformatician has been a life saver in terms of specific questions, and there’s a hell of a lot of info on the seqanswers threads.

  5. Given the scenario you described above, it’s not entirely Ian’s fault – save for that he probably should have considered joining another lab and then collaborated with Dr. Lollipop; but hindsight is always 20/20.

    I think this situation is unfortunately all too common – the University and his postdoc advisors have a responsibility to the postdocs they hire to provide them with the resources they need to succeed. Funding the postdoc’s stipend is only -part- of the equation. If there’s no Training Plan (Career Development plan, etc), or resources for training and doing the informatics work – then they are getting short changed. Mentors and advisors have a responsibility to provide their students and postdocs with resources/support (and TIME to train) – otherwise they shouldn’t take them on board in the first place.

    IMHO Ian and his advisor should a) find a collaborating extramural R&D group that has the computational resources/willingness to share and a common interest in Ian’s project; b) find an appropriate mentor for bioinformatics who is outside their department or even Univ. and c) map out what Ian’s immediate, near term, and long term goals are – prioritize them according to the resources he has, and be realistic about how he will achieve them.

    Lastly – I don’t think as a trainer you can realistically “teach him everything he needs to know about bioinformatics” in a week. Best you could probably do would be to get him scripting in Python or Perl 🙂 and send him on his way with a plan of action. My 2¢

  6. I have come to rely on VirtualBox, Vagrant, and Ansible to set up any environment on my Windows laptop to side step any IT constraints that we might run into at clients or collaborators. With a modern laptop you can run complete private clouds locally with multi-tier applications complete with web servers, application servers, and databases. So, I would bestow our Ian with that set of requirements as well.

    You do need to have a story ready for when you use bridged networking and present a new chain of MACs to the DHCP server: depending on the sophistication of the IT department, you may find yourself setting off a firestorm.

  7. I’m fighting to not be that person, albeit the core issues aren’t a problem here at MSU.

  8. Very realistic. I would add that a non-trivial portion of support burden on the GATK forum comes from insufficient computer literacy in users. In the GATK workshops specifically, we always get a few newcomers to the field whose Unix skills are flimsy to non-existent (generally through no fault of their own). It is certainly a big challenge for training; and we’re not even talking about scripting here, just running commands in the terminal. Going forward we are hoping to preface our workshops with an optional “remedial Unix” training session with the help of our IT department. But that’s not going to be enough to help your prototypical grad student in the long run…

  9. Well instead of just suggesting a one size fits all solution. Ian should really sit down and learn two languages, one general purpose scripting language perl python anyone will do both are similar and perhaps R for quick fast data analysis as it already comes with all the stats he needs built into it. Next he’ll need to sign up for one of the on going online courses about UNIX/linux environments. Nothing comes easy I’m sure we all started as ‘noobs’.

    But I must say, having a good mentor is really important

  10. Apologies, for the long response:
    Prof omnipresent (heretofore Prof Omni) hires Gena (green-enthusiastic-naive-adaptable) grad student to pursue PhD on advanced ecological speciation theory that requires sequencing of 14+ unlinked yet nearby genes from 2 different organisms found at 20% within a 600,000 clone BAC library (he has her create prior to sequencing). Project requires ‘only’ sequencing and full genome construction/annotation for 2 organisms but then requires sequencing and analysis of 600,000 clones after sifting through to find the ones that matter and contain any/all subsets of the 14+ genes or less. Other needs: phylogenetics, recombination analyses, ecological diversity indexing… Prof Omni does not have said skillset, prof omni is not ‘hot’ on spending money on workshops or other training. The closest thing Gena has touched to a Linux was a Kaypro II computer in 1992 that her dad had and a Macintosh Classic in 1993…Gena never manages to convince her prof to let her take molecular evolution at wood’s hole…or any other training workshop she found. That was me.

    None of Gena’s classes touched bioinformatics or sequence analysis, most sequencing was still sanger and NGS was just getting ramped up and the most advanced program in the lab was Sequencher prior to her arrival and the lab had no Linux set up. In response: Gena buys books from amazon starting with Phylogenetics Made Easy, progressing to The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing, Molecular Evolution and Phylogenetics, and Inferring Phylogenies by Felsenstein. When another grad student enters the lab with an understanding of beginning Linux she latches and he sets up Putty and Linuxes in the lab. She starts combing the internet and downloading random command-line programs that profess to analyze her data…combed the manuals, combs tutorials, compares programs, starts irritating the developers, has 3 notebooks full of cheat sheets and terminology she’s not familiar with. When finally allowed to go to a conference 3 years into her degree…makes connections with people that know more than her prof about analysis. Starts bugging them. Manages to take a prokaryotic annotation workshop at the now dissolved TIGR. All online classes are still ‘buy me’…pays for and enrolls in online certification in biostatistics to learn stats programming. Fast forward 7 years of PhD (2010)…Gena has taught herself data analysis and has countless other grad students, developers and mentors to thank for putting up with her incessant questions and requests for books or programs to help her learn. A learning process that took 7 years (not 1 week) and is still going to this day and quite frankly in this field…it does not and should not stop.

    Now I am that trainer at NGS and Phylogenetics workshops…now there are MOOC’s…now there are blogs and online tutorials and things to make learning programming and Linux more user-friendly for ‘newbies’ like I was (and still am sometimes)…now I still bug developers, researchers, I take online classes and tutorials on python and linux to upkeep my skills and gain new ones when I have time and it’s fantastic and frustrating. I teach the basics using the resources they have onsite (which often times is a dell computer or mac) and pile on the resources, the considerations, the encouragement to compare, the suggestions, the links, the books the papers, the manuals…I make them read, I make them troubleshoot programs and data issues, I make them as frustrated as I was and I walk them through it (hopefully!) with the same patience that I was shown as a grad student.

    Yes, I encounter students you are talking about. When students ask for the ‘holy grail’ program that will do their PhD for them…I pick up their hand form it into a pointer and point it back at them. Simply my opinion of course. 🙂

  11. My concern for Ian lies less in the inability to run the scripts, but the lack of close mentors and peers who can help him analyze the resulting data and critique his analyses. The proper pipeline parameters to get a good list of candidate mutations (and subsequent analyses) are different for each project, and the ability to identify a suitable approach cannot be learned in a workshop. Labs should not embark on analyzing such complex data unless their are suitable mentors nearby. It is a huge disappointment when I read papers produced by people like Ian, who plug in whatever tools look appropriate and don’t realize that they’ve gotten garbage out.

  12. Not where I went to school.

    At Boston College all of the biology grad students were required to learn the fundamentals of Unix in a required core course that Gabor Marth taught. There were a couple projects similar to the one you describe going on at the time and the students in wet labs working on those collaborated with one of the informatics labs. If the collaboration was not formal, they would just come to our lab and we’d help them out. There was a university-wide computing cluster available, as well, and if the physics grad students were not bogarting it we could use that, though my lab had its own resources.

    There is also a state resource for academic computing in Massachusetts that has a lot of computational biology programs installed and working. I don’t know the details of that but my brother is using it for his post doc at Northeastern and he says it works well. He has taken on computational stuff in a wet lab but he is the smart one in the family and he went into grad school with a math degree. He benefits a little bit from having a sister at the Broad.

  13. To be completely fair though, my prof did allow me to attend a modeling workshop my last year and it was an amazing breath of air in workshop learning and converted me completely to the value of workshops as a great starting point for launching into bioinformatics!

  14. Very common. I teach a lot of 3 day workshops in bioinformatics to clinicians, grad students, etc. People that didn’t know their mac/winbox had a terminal. An aggravating problem is that there POS laptop has easily $10k worth of data on it and the laptop cost $500. Five years ago. I don’t think the PIs do enough to champion these problems for their staff in part because they just don’t know any better.

  15. David Eccles (gringer)

    20th March 2014 at 9:43 pm

    I was quite lucky on the *nix side of things because I chose to do a lot of computer science undergraduate courses, which introduced me to BSD. So I knew what a good computer system looked like, knew how it could benefit me, and was instilled with a drive to hunt for ways to do my research in a similar fashion.
    While I was still using a bit of Windows for Honours, I was in the process of switching and shifted to almost exclusively Linux for my PhD. Our university essentially had two IT groups, computer science and the rest of the university, with the computer science group being very open about allowing most things, and the university IT becoming gradually more draconian. After shifting from the computer science department for Honours to the Biology department for my PhD I skirted around the university support issue by arranging funds for my own Linux desktop computer. It was a bit of a pain trying to work around university IT silliness when connecting my computer to the university network (e.g. I could use R and install packages from the NZ repository, but not use any bioconductor packages), but for the most part I could do my Linux work away from the frustration of ITS.
    My recommendation for others who have to work around silly IT policies is to get your own desktop computer and install Linux on it. A very high-spec computer can be found for under $3000, and $1500 will cover most bioinformatics use cases. Even an airgapped (i.e. no Internet connection) Linux computer will provide substantial benefits to people who want to do bioinformatics.

  16. yep. I did a PhD in systems biology and the entire program was structured like this. I moved on to big pharma and they operate the same system for their staff. It’s pathetic.

  17. I would say that I’m in the same position as Ian. My instituiton does not offer courses which pertain to bioinformatics; I’m in the biological science department. I will have to attend a week long workshop abroad to gain the required skills. I’m in my first year of M.Sc program and the workshop wont be until July. I’ve taken a number of moocs but they didn’t focus enough on CS background. Training a wet-lab biologist to run GUI’s and workflows in R but with little focus on custom applications which I’ve since noticed is a neccessitity.

  18. I am a member of the Bioinformatics Core at UC Davis and we have been doing bioinformatics analysis on a recharge basis for over 8 years now and we have seen a wide range of skill levels in our clients. We have also been teaching bioinformatics workshops for over six years. There we have also seen many different kinds of students with widely varying skill levels when it comes to bioinformatics analysis. We began our training courses six years ago on the command-line and it felt like it was too difficult for people. We eventually migrated to teaching analysis using Galaxy in the Amazon Cloud, which is easier for people, but obviously much less flexible as an analysis tool. So now we decided to offer two different courses, one using the command-line and one using Galaxy. My point is that there are some biologists who will never learn the command-line, and they might never need to based upon their project. I think any course work, whether it be as a university class or a paid workshop or a MOOC, it needs to address these two kinds of researchers: computer-savvy and computer-averse. However, we always stress that our courses are not meant to teach everything about bioinformatics, they are only the first of many stepping stones.
    One of the earlier comments mentioned taking 7 years to really learn how to do bioinformatics analysis for their project, which is great if you have that kind of time, but many of the projects that come to us have a much shorter time line, sometimes on the order of weeks. For these projects, it would be infeasible to learn bioinformatics analysis techniques, so it makes sense to come to a group such as ours. However, we work on a recharge basis, which means you need the money to hire us. I feel like the climate of grant money is changing so that granting agencies are much more receptive to adding bioinformatics analysis costs to the budget and that PIs are beginning to realize that they need to do so. However, there are still many PIs that don’t want to accept the idea that they should have to pay money to get analysis done. They try to get by with having the grad student teach themselves or by assuming that a week-long course will be sufficient. For some PIs and grad students that works, but for many others it does not.
    What is really needed is comprehensive bioinformatics education at the graduate or even undergraduate level. Just like students have to take classes in biology and genetics, why not in bioinformatics? Teach them the command-line, teach them how to use bioinformatics tools for analysis, and just as important, give them access to high performance computational clusters. Just my $0.02.

  19. I think this scenario is unfortunately common, but the fault doesn’t lie with the training, the choice of software used for that training, or the system in general. The fault lies with the PI in being reticent to move with the times, and to fight for and secure the right environment to complete the research project. PhDs are about training young graduates to be researchers, and if the supervisor isn’t equipped to support that student in their training, then they shouldn’t have put forward the PhD project in the first place. It’s irresponsible to themselves and that PhD student.
    Secondly, if an entire PhD rests on a single week-long training course, that is supposed to teach *everything* about bioinformatics, then it’s going to be a tough slog full stop. There’s no way that a single course can do that, for anyone.
    The nice thing is that there are a huge number of online free resources to act as reference guides, and things like Software Carpentry that are *really* good value (not to mention crazily cheap compared to Industry courses) to help people get started. Lastly, the brilliant thing about bioinformatics is access to a welcoming community – the open source movement, for me at least, encompasses a mentality that knowledge should be shared for free if at all possible. I’m more than happy to help any PhD student from any institution in any country to gain access to resources to further their studies!

  20. Hi Nikhil…I was the 7 year comment…it was during my PhD and my advisor was averse to allocating funds to training beyond university classwork required for the degree. That is until my last year (which doesn’t help the 6 years prior). He also, though, was not on a hot path to finish the project as fast as possible –it was a massive project. I agree for projects nowadays on a shorter timeline, 7 years is not feasible. But to ‘learn’…for a student I think 4-6 years is reasonable with the right curriculum and if the project requires bioinformatic knowledge; well, as a student, they need to learn what needs to be learned for their project right? Especially if the prof is not keen on paying for the expertise, and that’s what I was talking about. If a PI wants to just ‘get it done’ then ya, they need to pay a facility to do it and you are right it’s difficult for many to accept that this type of work isn’t just ‘easy’ it’s not a push button solution and you actually have to pay for computational power and the time of those that do it for you. I also agree…this view is changing and PIs are starting to allocate funds to computation and people to do analysis which is great, just wasn’t the case during my graduate program. It depends on the project of course but for a student whose focus is bioinformatics coming in or they accept a project that requires bioinformatics skills and they don’t have a skillset –it’ll just have to be learned and that will take time and I think PIs need to understand that and I think students need to understand/accept that and be open to figuring it out rather than just ask for a button or ‘the right program’ to process their data. If they have a short order project then they’ll need to convince their PI to hire a skilled bioinformatician which is more money or pay a facility. And yes, in the meantime…bioinformatics education is a must in the field now, some of the suggested curriculums that are coming out are forging a path –some more daunting than others, but all going the right direction. Cheers.

  21. “Secondly, if an entire PhD rests on a single week-long training course, that is supposed to teach *everything* about bioinformatics, then it’s going to be a tough slog full stop. There’s no way that a single course can do that, for anyone.”

    –I whole-heartedly agree.

  22. I agree with several commenters that 1 week of training will not fill the huge gap that “Ian” has. It can give him leads to understanding how much he is missing though, and directions to pursue to improve things.

    In my experience, in such cases what is needed is a mentor / co-advisor from bioinformatics. Which is more or less easy depending on the state of bioinformatics in the surroundings.

  23. I am an old hand now and I did my PhD starting in 1992. I did it in homology modelling and molecular dynamics. So the sum of my training was – here is a guide to shell commands on SGI IRIX – here are the Biosym and MSI manuals and here is a £20,000 SGI computer. We could do something about peptides and aldolase. So actually this student has had much more care an attention than was usual in the past. They actually have core training (although often this is bad).
    A PhD is more like an apprenticeship than a degree, so a problem with training is that you focus on one thing (for example Galaxy which I would not use personally) when you could have a broader and more scattered approach. You learn by reading and doing. Bioinformatics is no different and I may be a dinosaur but there is nothing lost except time by taking a student self-learning approach. There is plenty of literature, there are some OK but not great textbooks and a little bit of FOFO learning makes them more independent. There is no quick and easy way to learn linux and being thrown in at the deep-end makes you a fast learner.

  24. Yes I agree – it is the computing infrastructure to get you started that is the most important. A sympathetic sys admin. is a must.

  25. I think you misread the OP — “Ian” is a PhD student straight out of an undergrad degree, not a postdoc. I don’t think it’s realistic to expect a 22 year old graduate to critically evaluate whether a PhD supervisor can provide the proper training.
    Many moons ago I was 22 years old, straight out of a BSc, and starting a PhD. My thought process would have been, “Hey, great! I’ve got a place and funding to study with Professor Lollipop at the University of Narnia! It’s a really well-known and prestigious department! My supervisor is a senior professor, so he must know everything about everything! I’m so lucky, I can’t wait to get started!” It really wouldn’t have occurred to me that Prof. L might not bother to provide me with the correct training to succeed.
    (To be clear, my PhD supervisor was great. But that involved dumb luck on my part, not wisdom and strategy. 22 year olds are not good at wisdom and strategy.)
    So in this hypothetical case, the supervisor, department and university have an even heavier responsibility to make sure “Ian” gets the right support in the way that you describe.

  26. I’m that better, that I’ve started to use linux 8 years ago on my private computers. I don’t affraid of ‘middlebrow’ computations and I’m kind of curious of the world. This portrait seems to me little bit too dramatic and exeptional. If Ian is not able to come to break through in his study and install linux anywhere, he should simply resign in my opinion. Of course there is prof issue but after all he is the most interested one in what he puts effort. That interest demands realistic thinking. It’s not too demading for phd student (kind of clever guy) to be aware of such a must do list.

  27. I have to say I am really surprised to hear all of these stories but perhaps there is a bias towards the willingness to share the dramatic experiences over the positive ones. I am a first year Bioinformatics PhD student at a university in Poland, where PhD is typically a 4 year program and requires an MSc. I joined a computational biology lab with no prior bioinformatics experience, no Python, no Linux and am working for a PI who has a reputation of being unapproachable. As it turned out however, this could not be further from the truth and I count myself lucky. My PI is patient and supportive and is interested in hearing my ideas. Additionally, I have one excellent and dedicated mentor and at least two other people that I feel very comfortable asking for help. Everybody in my lab is a bioinformatician and less than half of us came from the wet lab (including my PI). I have access to several servers owned by my lab and a super computer cluster at an affiliated institution that is an easy grant away. So far I have attended a Linux training course and two R classes all for free. My machine runs on Linux as does everybody else’s in the lab. Besides all the mentorship in the lab, I often find myself looking for answers on Biostars, StackOverflow and Seqanswers as well as any documentation I can get my hands on. I actually moved back to Poland after many years in the US and knew I was taking a chance with my education, today I feel like I made the right decision (of course there were other reasons that tipped the scale toward the return to the old country).

Leave a Reply

© 2018 Opiniomics

Theme by Anders NorenUp ↑