Opiniomics

bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

Category: opinion (page 1 of 9)

HiSeq move over, here comes Nova! A first look at Illumina NovaSeq

Illumina have announced NovaSeq, an entirely new sequencing system that completely disrupts their existing HiSeq user-base.  In my opinion, if you have a HiSeq and you are NOT currently engaged in planning to migrate to NovaSeq, then you will be out of business in 1-2 years time.  It’s not quite the death knell for HiSeqs, but it’s pretty close and moving to NovaSeq over the next couple of years is now the only viable option if you see Illumina as an important part of your offering.

Illumina have done this before, it’s what they do, so no-one should be surprised.

The stats

I’ve taken the stats from the spec sheet linked above and produced the following.  If there are any mistakes let me know.

There are two machines – the NovaSeq 5000 and 6000 – and 4 flowcell types – S1, S2, S3 and S4.  The 6000 will run all four flowcell types and the 5000 will only run the first two.  Not all flowcell types are immediately available, with S4 scheduled for 2018 (See below)

S1 S2 S3 S4 2500 HO 4000 X
Reads per flowcell (billion) 1.6 3.3 6.6 10 2 2.8 3.44
Lanes per flowcell 2 2 4 4 8 8 8
Reads per lane (million) 800 1650 1650 2500 250 350 430
Throughput per lane (Gb) 240 495 495 750 62.5 105 129
Throughput per flowcell (Gb) 480 990 1980 3000 500 840 1032
Total Lanes 4 4 8 8 16 16 16
Total Flowcells 2 2 2 2 2 2 2
Run Throughput (Gb) 960 1980 3960 6000 1000 1680 2064
Run Time (days) 2-2.5 2-2.5 2-2.5 2-2.5 6 3.5 3

For X Ten, simply mutiply X figures by 10.  These are maximum figures, and assume maximum read lengths.

Read lengths available on NovaSeq 2×50, 2×100 and 2x150bp.  This is unfortunate as the sweet spot for RNA-Seq and exomes is 2x75bp.

As you can see from the stats, the massive innovation here is the cluster density, which has hugely increased. We also have shorter run times.

So what does this all mean?

Well let’s put this to bed straight away – HiSeq X installations are still viable.  This from an Illumina tech on Twitter:

 

We learn two things from this – first, that HiSeq X is still going to be cheaper for human genomes until S4 comes out, and S4 won’t be out until 2018.

So Illumina won’t sell any more HiSeq X, but current installations are still viable and still the cheapest way to sequence genomes.

I also have this from an un-named source:

speculation from Illumina rep “X’s will be king for awhile. Cost per GB on those will likely be adjusted to keep them competitive for a long time.”

So X is OK, for a while.

What about HiSeq 4000? Well to understand this, you need to understand 4000 and X.

The HiSeq 4000 and HiSeq X

First off, the HiSeq X IS NOT a human genome only machine.  It is a genome-only machine.  You have been able to do non-human genomes for about a year now.  Anything you like as long as it’s a whole genome and it’s 30X or above.  The 4000 is reserved for everything else because you cannot do exomes, RNA-Seq, ChIP-Seq etc on the HiSeq X.  HiSeq 4000 reagents are more expensive, which means that per-Gb every assay is more expensive than genome sequencing on Illumina.

However, no such restrictions exist on the NovaSeq – which means that every assay will now cost the same on NovaSeq.   This is what led me to say this on Twitter:

At Edinburgh Genomics, roughly speaking, we charge approx. 2x as much for a 4000 lane as we do for an X lane.  Therefore, per Gb, RNA-Seq is approx. twice as expensive as genome sequencing.  NovaSeq promises to make this per-Gb cost the same, so does that mean RNA-Seq will be half price?  Not quite.  Of course no-one does a whole lane of RNA-Seq, we multiplex multiple samples in one lane.  When you do this, library prep costs begin to dominate, and for most of my own RNA-Seq samples, library prep is about 50% of the per-sample cost, and 50% is sequencing.  NovaSeq promises to half the sequencing costs, which means the per-sample cost will come down by 25%.

These are really rough numbers, but they will do for now.  To be honest, I think this will make a huge difference to some facilities, but not for others.  Larger centers will absolutely need to grab that 25% reduction to remain competitive, but smaller, boutique facilities may be able to ignore it for a while.

Capital outlay

Expect to get pay $985k for a NovaSeq 6000 and $850k for a 5000.

Time issues

One supposedly big advantage is that NovaSeq takes 40 hours to run, compared to the existing 3 days for a HiSeq X.   Comparing like with like that’s 40 hours vs 72 hours.  This might be important in the clinical space, but not for much else.

Putting this in context, when you send your samples to a facility, they will be QC-ed first, then put in library prep queue, then put in sequencing queue, then QC-ed bioinformatically before finally being delivered.  Let’s be generous and say this takes 2 weeks.  Out of that sequencing time is 3 days.  So instead of waiting 14 days, you’re waiting 13 days.  Who cares?

Clinically having the answer 1 day earlier may be important, but let’s not forget, even on our £1M cluster, at scale the BWA+GATK pipeline itself takes 3 days.  So again you’re looking at 5 days vs 6 days.  Is that a massive advantage?  I’m not sure.  Of course you could buy one of the super-fast bioinformatics solutions, and maybe then the 40 hour run time will count.

Colours and quality

NovaSeq marks a switch from the traditional HiSeq 4 colour chemistry to the quicker NextSeq 2 colour chemistry.  As Brian Bushnell has noted on this blog, NextSeq data quality is quite a lot worse than HiSeq 2500, so we may see a dip in data quality, though Illumina claim 85% above Q30.

 

Is the long read sequencing war already over?

My enthusiasm for nanopore sequencing is well known; we have some awesome software for working with the datawe won a grant to support this work; and we successfully assembled a tricky bacterial genome.  This all led to Nick and I writing an editorial for Nature Methods.

So, clearly some bias towards ONT from me.

Having said all of that, when PacBio announced the Sequel, I was genuinely excited.   Why?  Well, revolutionary and wonderful as the MinION was at the time, we were getting ~100Mb runs.  Amazing technology, mobile sequencer, tri-corder, just incredible engineering – but 100Mb was never going to change the world.  Some uses, yes; but for other uses we need more data.  Enter Sequel.

However, it turns out Sequel isn’t really delivering on promises.  Rather than 10Gb runs, folk are getting between 3 and 5Gb from the Sequel:

At the same time, MinION has been coming along great guns:

Whilst we are right to be skeptical about ONT’s claims about their own sequencer, other people who use the MinION have backed up these claims and say they regularly get figures similar to this. If you don’t believe me, go get some of the World’s first Nanopore human data here.

PacBio also released some data for Sequel here.

So how do they stack up against one another?  I won’t deal with accuracy here, but we can look at #reads, read length and throughput.

To be clear, we are comparing “rel2-nanopore-wgs-216722908-FAB42316.fastq.gz” a fairly middling run from the NA12878 release, m54113_160913_184949.subreads.bam and one of the Sequel SMRT cell datasets released.

Read length histograms:

minion_vs_pacbio

As you can see, the longer reads are roughly equivalent in length, but MinION has far more reads at shorter read lengths.  I know the PacBio samples were size selected on Blue Pippin, but unsure about the MinION data.

The MinION dataset includes 466,325 reads, over twice as many as the Sequel dataset at 208,573 reads.

In terms of throughput, MinION again came out on top, with 2.4Gbases of data compared to just 2Gbases for the Sequel.

We can limit to reads >1000bp, and see a bit more detail:

gt1000minion_vs_pacbi

  • The MinION data has 326,466 reads greater than 1000bp summing to 2.37Gb.
  • The Sequel data has 192,718 reads greater than 1000bp, summing to 2Gb.

Finally, for reads over 10,000bp:

  • The MinION data has 84,803 reads greater than 10000bp summing to 1.36Gb.
  • The Sequel data has 83,771 reads greater than 10000bp, summing to 1.48Gb.

These are very interesting stats!


This is pretty bad news for PacBio.  If you add in the low cost of entry for MinION, and the £300k cost of the Sequel, the fact that MinION is performing as well as, if not better, than Sequel is incredible.  Both machines have a long way to go – PacBio will point to their roadmap, with longer reads scheduled and improvements in chemistry and flowcells.  In response, ONT will point to the incredible development path of MinION, increased sequencing speeds and bigger flowcells.  And then there is PromethION.

So is the war already over?   Not quite yet.  But PacBio are fighting for their lives.

From there to here

This is going to be quite a self-indulgent post – in summary, I am now a full Professor at the University of Edinburgh, and this is the story of my early life and career up until now.  Why would you want to read it? Well this is just some stuff I want to say, it’s a personal thing.  You might find it boring, or you may not.  There are some aspects of my career that are non-traditional, so perhaps that will be inspiring to others; to learn that there is another way and that there are multiple routes to academic success.

Anyway, let’s get it over with 😉

Early life

I was born and bred in Newcastle.  I come from a working class background (both of my grandfathers were coal miners) and growing up, most of my extended family lived in council houses.  I went to a fairly average comprehensive (i.e. state) school, which I pretty much hated.  Something to endure rather than enjoy.  Academic ability was not celebrated – I don’t know the exact figures but I’d guess less than 5% of my fellow pupils ended up in higher education.  Being able to fight and play football were what made you popular, and I could do neither 🙂 Choosing to work extra hours so I could learn German didn’t improve my popularity much…

My parents both worked hard – really hard – and I had everything I needed, but not everything I wanted.  Looking back this was a good thing.  My Dad especially is a bit of a hero.  He started off as a “painter and decorator” – going to other peoples’ houses and doing DIY, painting, putting up shelves, wallpaper, laying carpets etc.  As if doing that, having a small family (I have an older brother) and buying a house weren’t enough, he studied part-time for a degree in sociology at Newcastle polytechnic, famously going off to do painting and decorating jobs between lectures, and at the weekends.  After graduating, he went on to have a successful career in adult social care, and finished life as lecturer at a further education college.  My mother also worked in social care, and together they supported me through University (BSc and MSc).  Just amazing, hard working parents.  I attribute my own work ethic to both of them, they set the example and both my brother and I followed.

Education

I did a BSc hons in Biology at the University of York.  Some bits I loved, some bits I didn’t, and I came out with a 2.1.

One practical sticks in my mind – in pairs, we had 3 hours to write a program in BASIC (on a BBC B!) to recreate a graph (yes – we had to recreate an ASCII graph) based on an ecological formula (I don’t remember which).  I finished it in ten minutes and my partner didn’t even touch the keyboard.  Writing this now reminds me of two things – firstly, hours sat with my Mum patiently punching type-in programs into our Vic 20 so I could play space invaders (written in, you guessed it, BASIC); and secondly, hacking one of my favourite ZX Spectrum games, United, so I could have infinite money to spend on players.  Happy days!

At the end of my undergraduate, I didn’t know what else to do, so I took the MSc in Biological Computation, also at the University of York.  This was awesome and I loved every minute (previous graduates include Clive Brown, but more about that later).  The prospectus was so wide-ranging – we covered programming (in C), mathematical modelling, statistics (lots and lots of statistics), GIS and Bioinformatics, among many other courses.  It was hard work but wonderful.  It really taught me independence; that I could start the day knowing nothing, and end the day having completed major tasks, using nothing but books and the internet.

At the end of the course I had three job offers and offers to do a PhD – looking back this was a pivotal decision, but I didn’t realise it at the time.  I chose a job in the pharmaceutical industry.

Early career

My first job was a 12 month contract at GlaxoWellcome in Stevenage.  GW had invested pretty heavily in gene expression arrays.  Now, these were very different to modern microarrays – this was pre-genome, so we didn’t know how many genes were in the human genome, nor what they did.  What we had were cDNA libraries taken from various tissues and normalised in various ways.  These would be spotted on to nylon membranes, naively – we didn’t know what was on each spot.  We then performed experiments using radioactively labelled mRNA, and any differentially expressed spots were identified.  We could then go back to the original plate, pick the clone, sequence it and find out what it was.

It’s now obvious that having a genome is really, really useful 😉

I was in a team responsible for all aspects of the bioinformatics of this process.  We manufactured the arrays, so we wrote and managed the LIMS; and then we handled the resulting data, including statistical analysis.  I worked under Clive Brown (now CTO at Oxford Nanopore).  He asked me in interview whether I could code in Perl and I had never heard of it!  How times have changed…. he was very much into Perl in those days, and Microsoft – we wrote a lot of stuff in Visual Basic.  Yes – bioinformatics in Visual Basic.  It can be done…

I spent about four years there – working under Francesco Falciani for a time – then left to find new challenges.  I spent an uneventful year at Incyte Genomics working for Tim Cutts (now at Sanger) and David Townley (now at Illumina), before we were all unceremoniously made redundant.  With the genome being published, Incyte’s core business disappeared and they were shutting everything down.

Remember this when you discuss the lack of job security in academia – there is no job security in industry either.  The notion of job security disappeared with the baby boomers, I’m afraid.

I managed to get a job at Paradigm Therapeutics, also in Cambridge, a small spin-out focused on mouse knockouts and phenotyping.  I set up and ran a local Ensembl server (I think version 12) including the full genome annotation pipeline.  This was really my first experience of Linux sys/admin and running my own LAMP server.  It was fun and I enjoyed it, but again I stayed less than a year.  Having been made redundant from Incyte, my CV was on a few recruitment websites, and from there an agent spotted me and invited me to apply for my first senior role – Head of Bioinformatics at the Institute for Animal Health (now Pirbright).

Academia

So, my first job in academia.  It’s 2002.  I am 28.  I have never been in academia before.  I have no PhD.  I have never written a paper, nor written a grant (I naively asked if we got the opportunity to present our grant ideas in person in front of the committee).  I am a group leader/PI so I am expected to do both – I need to win money to build up a bioinformatics group.  What the institute needed was bioinformatics support; but they wanted me to win research grants to do it.

This job really was really, really tough, and I was hopelessly ill-equipped to do it.

My first grant application (to create and manage pathway genome databases for the major bacterial pathogens we worked on) was absolutely annihilated by one reviewer, who wrote a two page destruction of all of my arguments.  Other reviewers were OK, but this one killed it.  Disaster.  I figured out that no-one knew who I was, I needed a foot-print, a track record and I needed to publish (I’d still like to find out who that reviewer was….)

In 2005 I published my first paper, and in 2006 my second.  You’ll note I am the sole author on both.  Take from that what you will.  Meanwhile I was also supporting other PIs at the institute, carrying out data analyses, building collaborations, and these also led to papers; and so slowly but surely I began building my academic life.  I got involved in European networks such as EADGENE, which eventually funded a post-doc in my lab.  I won a BBSRC grant from the BEP-II initiative to investigate co-expression networks across animal gene expression studies; and I successfully applied for and won a PhD studentship via an internal competition.  With papers and money coming in, the institute agreed to provide me with a post-doc from their core support and hey presto, I was up and running.  I had a research group publishing papers, and providing support to the rest of the institute.  It only took me five or six years of hard work; good luck; and some excellent support from one of my managers, Fiona Tomley.  During my time at IAH I had 7 managers in 8 years; Fiona was the only good one, and she provided excellent support and advice I’ll never forget.

So what are the lessons here?  I think this was an awful job and had I known better I wouldn’t have taken it.  At the beginning there was little or no support, and I was expected to do things I had no chance of achieving.  However, I loved it, every minute of it.  I made some great friends, and we did some excellent work.  I forged collaborations that still exist today.  I worked really, really hard and it was quite stressful – at one point my dentist advised gum shields as I was grinding my teeth – but ultimately, through hard work and good luck, I did it.

It’s worth pointing out here that I believe this was only possible in bioinformatics.  My skills were in such short supply that a place like IAH had no choice but to take on an inexperienced kid with no PhD.   This couldn’t have happened in any other discipline, in my opinion.  It’s sad, because ultimately I showed that if you invest in youth they can and will make a success of it.  Being older, or having a PhD, is no guarantee of success; and being young and inexperienced is no guarantee of failure.

Roslin

And so, in 2010, to Roslin.  There was quite an exodus from IAH to Roslin as the former shrank and the latter grew.  I was lucky enough to be part of that exodus.  My role at Roslin was to build a research group and to help manage their sequencing facility, ARK-Genomics.  I certainly don’t claim all of the credit, but when I arrived ARK-Genomics had a single Illumina GAIIx and when we merged to form Edinburgh Genomics, we had three HiSeq 2000/2500.  We also had significantly better computing infrastructure, and I’m proud that we supported well over 200 academic publications.  My research is also going really well – we have, or have had, grants from TSB, BBSRC, and industry, and we’re publishing plenty of papers too.  The focus is on improved methods for understanding big sequencing datasets and how they can contribute to improved farm animal health and production.  Scientifically, we are tackling questions in gut microbiome and host-pathogen interactions.

Roslin is a great place to work and has been very supportive; the group enjoys a small amount of core support through which we help deliver Roslin’s core strategic programmes; and it is an incredibly rich, dynamic environment with over 70 research groups and over 100 PhD students.  You should come join us!

In 2015, I was awarded a PhD by research publication – “Bioinformatic analysis of genome-scale data reveals insights into host-pathogen interactions in farm animals” (I don’t think it’s online yet) – and in 2016 promoted to Personal chair in bioinformatics and computational biology.  Happy days!

A note on privilege

Clearly being a white male born in the UK has provided me with advantages that, in today’s society, are simply not available to others.  I want readers of this blog to know I am aware of this.  I wish we lived in a more equal society, and if you have suggestions about ways in which I could help make that happen, please do get in contact.

So what next?

More of the same – I love what I do, and am lucky to do it!  More specifically, I want to be a champion of bioinformatics as a science and as an area of research; I want to champion the young and early career researchers; I want to continue to train and mentor scientists entering or surviving in the crazy world of academia; and I want to keep pushing for open science.  Mostly, I want to keep being sarcastic on Twitter and pretending to be grumpy.

Cheers 🙂

 

Tips for PhD students and early-career researchers

As we enter October, here at Roslin it is the start of the academic year and many new PhD students begin.  We have over 100 PhD students here at any one time; it’s a very exciting and dynamic environment.  However, for some (many?) a PhD is one of the most stressful things that will ever happen to them (ed: interesting choice of language).   So how can we get round that?  Below are my tips for new PhD students.  Note my experience is in bio, so if you’re in another field, be aware of that.  Enjoy.

1. Lead

PhD projects, and PhD supervisors, come in all shapes and sizes, and work in many different ways.  For some, there will be a very detailed plan for the whole 3/4 years, with well defined objectives/chapters etc; others will be little more than a collection of ideas that may or may not work; and many will be between these two extremes.  Whichever project/supervisor you have, you the student are responsible for making it all happen.  This will be difficult for many; some people are not “natural born” leaders; and even those who are may not have had much chance to practice.  However, we have to recognize that a PhD is not a taught course; it is a project whereby a student learns how to carry out their own research, to investigate their own ideas, to plan and execute research.  That doesn’t happen if someone tells you what to do at every stage.  So, lead.  Take the lead.  This is your project – if you have ideas that go beyond what’s written in the original project plan, then you now have the opportunity to explore them.  Of course take advice; speak to your supervisor; speak to other experts; figure out whether your ideas are good or not;  do things by the book and be healthy and safe.  But if your ideas are good and they are worth exploring, get on and do it.  If there is a predefined plan, execute it.  Don’t wait; don’t ask; don’t sit nervously waiting for your supervisor to ask if there’s anything you want to explore – get on and do it.  Lead.

2. Read

Read the literature.  In a fast paced field such as genomics, papers will be out of date within ~2 years.  This means that to be on top of your game, you will have to read a lot of papers.  This is something you need to just bite the bullet and do.  Hopefully, if you’re in a field you love, this won’t be too arduous.  Combine with Twitter (see below) and news feeds so you can figure out which papers to prioritize.  As I have said before, you want to be the student sending a mail to your supervisor saying “hey have you seen this cool new paper?” rather than the student receiving those mails.  Take a coffee, a bunch of papers, go somewhere quiet and read them.

3. Write

Write.  For the love of all that is holy, write.  Learn to write quickly and well.  Science is not only pipettes, tubes, plates, labs and computers; it is about writing.  As a scientist it’s what I spend my time doing more than any other activity.  Grants, papers, reviews, reports, plans, emails etc etc.  Being asked to put together a coherent (referenced!) 3-page document in 24 hours is not unheard of; being asked to do the same for a “one pager” in a few hours is even more common.  Writing is so important.  I can’t emphasize this enough.  If you hate writing, then perhaps science isn’t for you; honestly, genuinely, I mean that.  Think about it.  Being a scientist means being a writer. If all the results are in, papers shouldn’t take months to write; posters should take no more than a few hours.

4. Engage

I am not the first to say this and I won’t be the last, so here it is again – science is a social activity. Engage with people. Talk to people. Go to meetings, conferences, workshops and be active in them. Talk to people about yourself and about your project, ask them what their interests are. Much of success is about luck, about being in the right place at the right time. Go out and talk to people about what you do. Don’t be shy, don’t think they aren’t interested. You might also consider blogging and social media. The more you are out there, the more people know about you and what you’re doing, the higher chance they might want to work with you.

5. Record

Keep a record of everything you do. In my group, we use a wiki; others use Github; obviously lab-based students have a lab-book (but this isn’t always sufficient!). It doesn’t matter what you use, the basic requirement is to keep a record of what you’ve done to such a standard that the work can easily be understood and repeated by someone else. This will help you in future, and it may very well serve as a record for protecting intellectual property.

6. Tweet

It’s just possible that if I had to name the single thing that has had the biggest impact on my career, that thing might be joining Twitter. I can say with great confidence that people on every continent know who I am and what I do. I’m not sure that would have been true going on just my scientific record alone. Twitter has enabled me to meet, and engage with, 1000s of wonderful people, scientific leaders in their field. Twitter is an online community of (mostly liberal) forward-thinking scientists. If you’re not on Twitter, you don’t get it; I was once a person who didn’t get Twitter, I actually joined just to drum up applications for a job I had advertised. However, it has been a transformational experience. Now – it’s hard. When you first join, you have no followers, and no-one is listening. You have to work hard to get followers, to get retweets, and it’ll take years to get 1000s of followers. But it’s worth it, I promise you!

7. Learn

Find out what skills are in demand and try and focus your research on those. Bioinformatics is a good example – learn Unix, learn to code and learn some stats. If you have those, you will always be employable and in demand. Try and look for trends – at the moment CRISPR is very hot, as is single-cell work – if you’re in the lab, can you integrate these into your project and therefore practice these techniques? Seriously, anyone can do DNA preps and PCR; find out the skills and techniques that are in demand and learn them, if you can.

8. Plan

I want my PhD students to have a thesis plan by 3 months. You don’t have to stick to it, but it’s good to have an idea. What are the results chapters? What are the likely papers? If you are a post-doc, then if you have 3 years, what are you going to do with them? If you’re a PI, again, plan – what questions? Which experiments? Papers? Where is this going? What will be your first grant and how do you get to the stage where you are ready to be funded? Plan.

9. Speak

At meetings, conferences, workshops. Submit abstracts and talk about your research. If you are not confident at public speaking, then practice until you are. Don’t submit poster abstracts, submit speaking abstracts. People remember amazing talks they have seen more than they remember amazing posters. It’s quite common, I find, for young scientists to default to posters as they don’t feel ready or willing to speak. You must get over this. I know it’s hard. I used to be nervous as hell before speaking at conferences. However, it has to be done and it has to be done well. Practice is key. Get out there, overcome your fears, and do it.

10. Realise

You have an amazing job and an amazing position. You get to push back the boundaries of human knowledge. That’s your job. You come in to work and by the end of the day we know more about the world we live in than we did before. It is an amazing, incredible, privileged position to be in. Yes it’s hard; yes there are barriers and it can be stressful. However, if you can get by those, and you have a good supportive team around you, then you have the most amazing job/position in the world. Enjoy it!

And I’ll leave it there. As always, I look forward to your comments!

Why you probably shouldn’t be happy at Stern’s recommendations for REF

If you are a British academic then you will be aware that Lord Stern published his recommendations for REF (the research excellence framework) this week.  REF is a thoroughly awful but necessary process.  Currently your academic career is distilled down to 4 publications and assessed every 4-5 years.  Papers are classified via some unknown system into 2*, 3* or 4* outputs and your value as an academic recorded appropriately.   Given that higher REF scores result in more money for Universities, your individual REF score has a very real impact on your value to your employer.  This has pros and cons as I will set out below.

Here are some of the recommendations and my thoughts:

  • Recommendation 1: All research active staff should be returned in the REF
  • Recommendation 2: Outputs should be submitted at Unit of Assessment level with a set average number per FTE but with flexibility for some faculty members to submit more and others less than the average.
  • What currently happens?  At the moment your employer decides whether your outputs make you “REF-able”.  In other words, if you don’t have 4 good outputs/publications, you won’t be submitted and you are REF-invisible
  • Stern recommendation: Stern recommends that all research-active staff be submitted and that the average number of outputs is 2.  However, there is a twist – the number of submissions per person can be between 0 and 6.  Therefore you may be submitted with zero outputs, which is perhaps even worse than being REF-invisible.  Given the formula for the number of expected outputs is 2*N (where N is the number of research-active staff), if a University has less than 2*N good impacts, there must surely be a pressure to transfer those with few outputs onto a teaching contract rather than a research contract.  And given the range of 0 to 6, I can see established Profs taking up all 6, with early career researchers being dumped or submitted with zero outputs.  So I’m not impressed by this one.

 

  • Recommendation 3: Outputs should not be portable.
  • What currently happens?  At the moment, an output stays with the individual.  So if I publish a Nature paper during a REF cycle and then move to another University, then my new employer gets the benefit, rather than my old employer.  This has resulted in REF-based recruitment, whereby individuals are recruited by Universities (often with high salaries and incentives) specifically because they have good REF outputs.
  • Stern recommendation: that outputs are not portable.  Specifically that publications remain with the employer present when they are accepted for publication.   It’s worth reading what the Stern report says here: “There is a problem in the current REF system associated with the demonstrable increase in the number of individuals being recruited from other institutions shortly before the census date. This has costs for the UK HEI system in terms of recruitment and retention”.   Read and re-read this sentence in context – high impact publications directly influence how much money a University gets from the government; yet here Stern argues that this shouldn’t be used for “recruitment and retention” of staff who produce those publications.  In other words current REF rules are pitched not as some sort of incentive to reward good performance, but as some kind of unnecessary cost that should be banished from the system.   Yes – read it again – potential staff rewards for good performance (“retention”) are quite clearly stated as a “cost” and as a “problem” to HEIs.
  • What the old REF rules did, in a very real way, is give power to the individual.  Publish some high impact papers and not only will other HEI’s offer you a job, but your existing employer might try and keep you, offering incentives such as pay rises and promotions.  What Stern is recommending is that power is taken from the individual and handed to the institution.  Once you publish, that’s it, they own the output.  No need to reward the individual anymore.
  • This also has the perverse outcome that an institution’s REF score shows how good they were not how good they are.  Take an extreme toy example – University A might have 100 amazing researchers between 2010 and 2014 and achieve an incredible REF score in 2015; yet they all may have left to go to University B.  How good is University A at research?  Well, not very good because all of their research-active staff left – yet they still have a really good REF score.

 

I don’t really have any major objections to the other recommendations; I think Stern has done a pretty good job on those.  However, I’m not at all happy with 1-3 above.   There are actually very few incentives for pay rises amongst UK academics, and REF was one of those incentives.  Stern wants to remove it.  You can see how healthy your University’s accounts are here (from here);  you will see that the vast majority (about 110 out of 120) UK universities generated an annual surplus last year, and the whole sector generated a surplus of £1.8Bn.   Yet somehow, incentives to promote, recruit and retain staff who are performing well is  a “cost” and a “problem”.  I also don’t think that the recommendations help ECRs as they could remain invisible to the entire process.

In conclusion, I don’t think the recommendations of Stern – or to give him his full title, Professor Lord Stern, Baron of Brentford, FRS, FBA – do anything positive for the individual researcher, they don’t provide much help for ECRs, and they hand power back to Universities.

 

 

Plot your own EU referendum poll results

Due to the unspeakable horror of the EU referendum, I have to find something to make me feel better.  This poll of polls usually does so, though it is way too close for comfort.

Anyway, I took their data and plotted it for myself.  Data and script are on github, and all you need is R.

Enjoy! #voteRemain

voteRemain

You don’t need to work 80 hours a week in academia…. but you do need to succeed

I’ve been thinking a lot lately about academic careers, chiefly because I happen to be involved in some way with three fellowship applications at the moment.  For those of you unfamiliar with the academic system at work here, the process is: PhD -> Post-Doc -> Fellowship -> PI (group leader) -> Professorship (Chair).  So getting a fellowship is that crucial jump from post-doc to PI and represents a person’s first chance to drive their own research programme.  Sounds grand doesn’t it?  “Drive your own research programme”.  Wow.  Who wouldn’t want to do that?

Well, be careful what you wish for.  Seriously.  I love my job; I love science and I love computers and I get paid to do both.  It’s amazing and possibly (for me) one of the best jobs in the world.  However, it comes with huge pressures; job insecurity; unparalleled and relentless criticism; failures, both of your ideas and your experiments; and occasionally the necessity of working with and for people who act like awful human beings.  It also requires a lot of hard work, and even then, that isn’t enough.  This THE article states very clearly and eloquently that very few people actually work an 80 hour week in academia, and you do not need to in order to succeed.   I would tentatively agree, though I have pointed out some of the things you need to do to succeed in the UK system, and one of them is working hard.

It’s true, you don’t need to work 80 hours a week in academia…. but you do need to succeed.

What does success look like?

Unfortunately, science lends itself very well to metrics: number of papers published; amount of money won in external grant funding; number of PhD students trained; feedback scores from students you teach; citation indices; journal indices.  And probably many more.  All of these count, I’m sorry but they do.  We may wish for a better world, but we don’t yet live in one, so believe me – these numbers count.

To succeed as a PI, even a new one such as a fellow, you will need to win at least one external grant.  Grant success rates are published: here they are for BBSRC, NERC and MRC.  I skim-read these statistics and the success rate for standard “response mode” grants seems to be somewhere between 10 and 25%.  However, bear in mind that this includes established professors and people with far better track records and reputations than new fellows have.   Conservatively, I would half those success rates for new PIs, taking your chances of success to between 5 and 12%.  What that means is you’re going to have to write somewhere between 8 and 20 grants just to win one.   I couldn’t find statistics for the UK, but the average age a researcher in the US gets their first R01 grant is 42.  Just take a moment and think about that.

It’s not all doom and gloom – there are “new investigator” schemes specifically designed for new PIs.  The statistics look better – success between 20-30% for NERC, and similar for BBSRC.   However, note the NERC grants are very small – £1.3M over 20 awards is an average of £65k per award, and that probably covers you for about 8 months at FEC costing rates.  The BBSRC new investigator awards have no upper limit, so there is a tiny speck of light at the end of the tunnel.  The statistics say that you will still need to write between 3 and 5 of these just to win one though.

What do grant applications look like?

I am most familiar with BBSRC, so what’s below may be specific to them, but I imagine other councils are similar.  Grant applications consist of the following documents:

  1. JeS form
  2. Case for support
  3. Justification of resources
  4. Pathways to impact
  5. Data management plan
  6. CV
  7. Diagrammatic workplan (optional)
  8. Letters of support (optional)

The JeS form is an online form containing several text sections: Objectives, Summary, Technical Summary, Academic Beneficiaries and Impact Statement.  I haven’t done a word count because they are PDFs, but that’s probably around 1000 words.

The case for support is the major description of the research project and stretches to at least 8 pages, depending on how much money you’re asking for.   Word counts for my last 4 are 4450, 4171, 3666, and 3830.

The JoR, DMP and PtI are relatively short, 1-2 pages, and mine are typically 300-500 words each, so let’s say 1000 words in total.

Therefore, each grant is going to need 6000 words (properly referenced, properly argued) over 5 documents.  They need to be coherent, they need to tell a story and they need to convince reviewers that this is something worth funding.

Given the success rates I mentioned above, there is every possibility that you need to write between 5 and 10 of these in any given period to be deemed a success.   In other words, for success, you’re going to need to write often, write quickly and write well.   Don’t come into academia if you don’t like writing.

(by the way, there is such a thing as a LoLa which stands for “longer, larger”.  These are, as you may guess, longer and larger grants – the last one I was involved in, the case for support was 24 pages and 15,400 words – about half a PhD thesis)

Failure is brutal

I’ll take you through a few of my failures so you can get a taste….

In 2013 the BBSRC put out a call for research projects in metagenomics.  We had been working on this since 2011, looking to discover novel enzymes of industrial relevance from metagenomics data.  What we found when we assembled such data was that we had lots of fragmented/incomplete genes.  I had a bunch of ideas about how to solve this problem, including  targeted assembly of specific genes, something we were not alone in thinking about.    Reviews were generally good (Exceptional, Excellent and Very Good scores), but we had one comment about the danger of mis-assemblies.  Now, I had an entire section in the proposal dealing with this, basically stating that we would use reads mapped back to the assembly to find and remove mis-assembled contigs.  This is a tried, tested, and established method for finding bad regions of assemblies, and we have used it very successfully in other circumstances.   Besides which, mis-assembled contigs in metagenomic assemblies are very rare, probably around 1-3%.  I explained all this and didn’t think anything of it.  Mis-assemblies really aren’t a problem, and we have a method for dealing with it anyway.

The grant was rejected.  I asked for feedback from the committee (which can take 3 months by the way, and is often just a few sentences).   The feedback was that we had a problem with mis-assemblies and we didn’t have a method for dealing with it.  Apparently, the method we proposed (a tried and tested method!) represented a “circular argument” i.e. using the same reads to both assembly and validation was wrong.   Anyone working in this area can see that argument doesn’t make sense.  So our grant was rejected, because of a problem that isn’t important, which we had a method for dealing with, by someone who demonstrated a complete lack of understanding of that problem.   Frustrating?  I had to take a long walk, believe me.

In 2015 I wrote a grant to the BBSRC FLIP scheme for a small amount of money (~£150k) to get various bioinformatics software tools and pipleines (e.g. BWA+GATK) working on Archer, the UK’s national supercomputer.   It’s a cray supercomputer, massively parallel but with relatively low RAM per core, and jobs absolutely limited to 48 hours.   The grant was rejected, with feedback containing such gems as “the PI is not a software developer” and “Roslin is not an appropriate place to do software development”.   It’s over a year ago and I am still angry.

The last LoLa I was involved in was the highest scoring LoLa from the committee that assessed it.  They fully expected it to be funded.  It wasn’t, killed at a higher level committee.  So even getting through review and committee approval, you can still lose out.  One of the reviewer’s comments was that better assembled and annotated animal genomes will only represent a “1% improvement” over SNP chips.   I can’t even….

Our Meta4 paper was initially rejected for being “just a bunch of Perl scripts”; our viRome paper similarly rejected for being “a collection of simple R functions”; our paper on identifying poor regions of the pig genome assembly got “it seems a bunch of bioinformaticians ran some algorithms with no understanding of biology”; whilst our poRe paper was initially rejected without review because it “contains no data” (at the time I knew the poretools paper was under review at the same journal and also contained no data).

What point am I trying to make?   That failure is common, criticism is brutal and often you will fail because of comments that are either incorrect, unfair or both.  And there is often no appeal.

Lack of success may mean lack of a job

It’s more and more common now for academic institutions to apply funding criteria when assessing the success of their PIs: there have been redundancies at the University of Birmingham, as expectations on grant income were set; staff at Warwick have been given financial targets; Dr Alison Hayman was sacked for not winning enough grants;  and the awful, tragic death of Stephen Grimm after Imperial set targets of £200k per annum.

To put that in context, the average size of a BBSRC grant is £240k.  So Imperial are asking their PIs to win approx. one RCUK grant per year.  Do the maths using the success rates I mention above.

Is the 80 hour week a myth?

Yes it is; but the 60 hour week is not.  You may have a family, mouths to feed, bills to pay, a mortgage.  To do all of that you need a job, and to keep that job you need to win grants.  Maybe you haven’t won one in a while.  Tell me, under those circumstances, how many hours are you working?

Working in academia (for me) is wonderful.  I absolutely love it and wouldn’t change it for anything else.  However, it’s also highly competitive and at times brutal.  There are nights I don’t sleep.  A few years ago, my dentist told me I had to stop grinding my teeth.

It’s a wonderful, wonderful job – but in the current system, believe me, it’s not for everyone.  I recommend you choose your career wisely.  You don’t need to work 80 hours a week, but you do need to succeed.

 

 

 

We need to stop making this simple f*cking mistake

I’m not perfect.  Not in any way.  I am sure if anyone was so inclined, they could work their way through my research with clinical forensic attention-to-detail and uncover all sorts of mistakes.  The same will be true for any other scientist, I expect.  We’re human and we make mistakes.

However, there is one mistake in bioinformatics that is so common, and which has been around for so long, that it’s really annoying when it keeps happening:

It turns out the Carp genome is full of Illumina adapters.

One of the first things we teach people in our NGS courses is how to remove adapters.  It’s not hard – we use CutAdapt, but many other tools exist.   It’s simple, but really important – with De Bruijn graphs you will get paths through the graphs converging on kmers from adapters; and with OLC assemblers you will get spurious overlaps.  With gap-fillers, it’s possible to fill the gaps with sequences ending in adapters, and this may be what happened in the Carp genome.

Why then are we finding such elementary mistakes in such important papers?  Why aren’t reviewers picking up on this?  It’s frustrating.

This is a separate, but related issue, to genomic contamination – the Wheat genome has PhiX in it; tons of bacterial genomes do too; and lots of bacterial genes were problematically included in the Tardigrade genome and declared as horizontal gene transfer.

Genomic contamination can be hard to find, but sequence adapters are not.  Who isn’t adapter trimming in 2016?!

On preprints, open access and generational change

A bunch of things are happening/happened recently that are all tied together in my head so I thought writing some of these things down would be useful (for me at least!).  The “things” are:

Let me try and get this straight 😉

Generational change

Generational change is both inevitable and necessary.  Each new generation comes along, takes a look at the system, identifies problems with that system, and takes measures to fix those problems.  I don’t mean just in science, I mean across life in general – a good example might be our treatment of the environment.  Twenty years ago, no-one cared about the environment; in twenty years time pretty much everyone will care.  This is generational change in action, and often it has to involve the disruption of existing power structures.

The problem with disruption of power structures is that those in power don’t like it; they want to hold on to those structures, because they are the source of that power.  However, this only serves to slow down progress – change is inevitable, and the best thing those in power could do is to get out of the way and help enable the change to happen.   However, crucially, they cannot own that change; those in power cannot and should not take credit for it.  It doesn’t belong to them – the old system is the one that belonged to them, they reaped the benefits.  The new system belongs to and should be driven by the next generation.

This is important – it is important that we empower our younger generations, rather than taking their ideas and pretending they are our own.

Blogs and social media as the democratisation of opinions and power

Let me paint you a picture.  A young graduate says to an established professor “Hey, I love science and I want to be a researcher.  I have some great ideas about research, but I also want to influence how research is done.  How do I get in to it?”.  The answer is simple.  “First you need to do a PhD, which may mean you are effectively used as cheap labour to carry out some of your supervisor’s ideas that they couldn’t get funding to do elsewhere.  After four years, you will need to get a post-doc in a good lab, and probably 90% of people will drop out at that stage.  As soon as you are a post-doc be sure to publish in high impact journals (Nature/Science/Cell etc) because you will need those to get a second post-doc or fellowship – though you won’t have much influence on where you publish as your PI will decide that.   To be a PI/group leader you will need to apply for and win one of a very few, highly competitive fellowships.  Finally, as a fellow, you will be given a small budget and have the chance to explore ideas of your own.  You will have 5 years to prove you can ‘cut it’ as a PI i.e. win a grant.  If you win enough grants as a fellow, you can be a junior group leader.  However, this is not a secure position – you will have to constantly publish and win grants for many years before finally you will be given tenure.  Then people might start listening to you – you may get to be an expert on grant panels, you may get to have some tiny influence on strategy and the type of science that gets funded.  You’ll probably be at least 50 by then”

Are we surprised that young people might take one look at that and say “Fuck that” ?

Jingmai O’Connor’s assertion that critiques of published papers should only happen via similarly published papers means that probably 90% of the scientific work force would be unable to critique her work, because only PIs get to decide what gets published and when by their research groups.

Is anyone else looking at this system and thinking “the young have no voice”?  Is it any wonder that the next generation have taken to blogs and social media to find that voice?

Don’t get me wrong, blogs and social media are still biased – if a Nobel winner starts a blog, it’ll be read way more than if a PhD student starts one – but they are still a far more level playing field than the academic system – because if someone starts a blog or joins social media, by-and-large, if they say something interesting, engaging or useful, they will build up a following and they will become known, they will become influential in their own way, and this is an incredibly good thing.   It is a form of empowerment of the younger generation in a system that almost completely lacks it.

Anything that improves access to research outputs is a good thing

I must say I have the utmost respect for Mike Eisen.  He has been passionate about open access from the start, and now he is passionate about preprints.  You will find no criticism of him here.  I am 100% an open access advocate, and I believe preprints are an excellent idea.

However, Andy Fraser makes an excellent point:

As soon as established, superstar scientists adopt something, the story is no longer the story, the story is the superstar.  Take a look here:

This is the very same Randy Schekman who published countless papers in pay-walled glamour mags, but then started telling everyone to publish open access.  Well, the open access movement isn’t about Randy.

Is it a good thing that Novel laureates are putting out preprints and supporting open access?  Of course it is!

Does it annoy me that they are getting tons of credit and attention for something (open access) that I and others have been doing our entire careers?  Of course it does.  It annoys the shit out of me.  Because the story of revolution in academic publishing doesn’t belong to the guys who made the old system and then changed their minds; the story belongs to the people who made the new system – the Mike Eisens of the open access world and the countless PIs, post-docs and students who have never been anything other than open.

I am glad some established professors are changing their mind, but the credit and attention for the OA movement has to go to those who’ve committed their entire career to open access.

“They can’t win”

The obvious response to Andy’s tweet is that the established professors can’t win; they are damned if they do, damned if they don’t.  It’s an argument Mike made too:

I see the argument, I really do, but the point is that the established professors have already won.  They have tenure, they have funding, they have established reputations.  Don’t say they can’t win, because they already did win.

Of course it’s great that the establishment are embracing open access, and preprints, but somehow they need to make the story not about them.  They need to make the story about the people who drove the change – perhaps it was a student or post-doc who persuaded them to put up a preprint, or to submit to an OA journal.  If that’s the case, make the story about the student/post-doc.   Perhaps they just had an epiphany?  Well if that’s the case, a bit of humility wouldn’t go amiss.  Don’t ride in on your white horse and take all the credit for winning the war; instead, fall on your sword and apologize that you once fought for the other side.

The revolution in academic publishing isn’t about established professors, it’s about generational change and empowerment of a new generation of scientists.  Let’s not lose sight of that.  And let’s not take something away from the younger generation who have so little to begin with.

Did you notice the paradigm shift in DNA sequencing last week?

There is a very funny tweet by Matthew Hankins about the over-use of “paradigm shift” in articles in Scopus between 1962-2014 which clearly suggest we over-use the term:

However, last week there was a genuine shift in DNA sequencing published in bioRxiv by Matt Loose called “Real time selective sequencing using nanopore technology“.  So what makes this paper special?  Why is this a genuine paradigm shift?

Well, this is the first example, ever, of a DNA sequencer selecting regions of the input genome to sequence.   To be more accurate, Matt demonstrates the MinION sequencing DNA molecules and analyzing them in real time; testing whether they come from a region of the genome he is interested in; and if they are not, rejecting them (“spitting them out”) and making the nanopore available for the next read.

This has huge applications from human health through to pathogen genomics and microbiome research.  There are many applications in genomics where you are only interested in a subset of the DNA molecules in your sample.  For example, exome sequencing, or other sequence capture experiments – at the moment a separate capture reaction needs to take place and this can actually be more expensive than the sequencing.   Or pathogen diagnostics – here most of what you sequence would be host DNA, which you could ask the sequencer to reject, leaving only the pathogen to sequence; or flip it round – human saliva samples can have up to 40% oral microbiome DNA, which is wasted if what you want to sequence is the host.  I have worked with piRNA and siRNA datasets from insects where less than 1% of the Illumina reads are those I actually want.  If you want to know if a particular bacterial infection is resistant to antibiotics, you only actually need to sequence the antibiotic-resistance genes/islands/plasmids, not the entire genome. etc etc etc

Of course, we still have a bit of a way to go – Matt demonstrates so-called “read until” on Lambda, a relatively small, simple genome – but this is an amazing proof-of-principle of an incredible idea – that you can tell your sequencer what to sequence.  The paper deserves your attention.  Please read it and fire up your imagination!

Older posts

© 2017 Opiniomics

Theme by Anders NorenUp ↑