Opiniomics

bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

HiSeq move over, here comes Nova! A first look at Illumina NovaSeq

Illumina have announced NovaSeq, an entirely new sequencing system that completely disrupts their existing HiSeq user-base.  In my opinion, if you have a HiSeq and you are NOT currently engaged in planning to migrate to NovaSeq, then you will be out of business in 1-2 years time.  It’s not quite the death knell for HiSeqs, but it’s pretty close and moving to NovaSeq over the next couple of years is now the only viable option if you see Illumina as an important part of your offering.

Illumina have done this before, it’s what they do, so no-one should be surprised.

The stats

I’ve taken the stats from the spec sheet linked above and produced the following.  If there are any mistakes let me know.

There are two machines – the NovaSeq 5000 and 6000 – and 4 flowcell types – S1, S2, S3 and S4.  The 6000 will run all four flowcell types and the 5000 will only run the first two.  Not all flowcell types are immediately available, with S4 scheduled for 2018 (See below)

S1 S2 S3 S4 2500 HO 4000 X
Reads per flowcell (billion) 1.6 3.3 6.6 10 2 2.8 3.44
Lanes per flowcell 2 2 4 4 8 8 8
Reads per lane (million) 800 1650 1650 2500 250 350 430
Throughput per lane (Gb) 240 495 495 750 62.5 105 129
Throughput per flowcell (Gb) 480 990 1980 3000 500 840 1032
Total Lanes 4 4 8 8 16 16 16
Total Flowcells 2 2 2 2 2 2 2
Run Throughput (Gb) 960 1980 3960 6000 1000 1680 2064
Run Time (days) 2-2.5 2-2.5 2-2.5 2-2.5 6 3.5 3

For X Ten, simply mutiply X figures by 10.  These are maximum figures, and assume maximum read lengths.

Read lengths available on NovaSeq 2×50, 2×100 and 2x150bp.  This is unfortunate as the sweet spot for RNA-Seq and exomes is 2x75bp.

As you can see from the stats, the massive innovation here is the cluster density, which has hugely increased. We also have shorter run times.

So what does this all mean?

Well let’s put this to bed straight away – HiSeq X installations are still viable.  This from an Illumina tech on Twitter:

 

We learn two things from this – first, that HiSeq X is still going to be cheaper for human genomes until S4 comes out, and S4 won’t be out until 2018.

So Illumina won’t sell any more HiSeq X, but current installations are still viable and still the cheapest way to sequence genomes.

I also have this from an un-named source:

speculation from Illumina rep “X’s will be king for awhile. Cost per GB on those will likely be adjusted to keep them competitive for a long time.”

So X is OK, for a while.

What about HiSeq 4000? Well to understand this, you need to understand 4000 and X.

The HiSeq 4000 and HiSeq X

First off, the HiSeq X IS NOT a human genome only machine.  It is a genome-only machine.  You have been able to do non-human genomes for about a year now.  Anything you like as long as it’s a whole genome and it’s 30X or above.  The 4000 is reserved for everything else because you cannot do exomes, RNA-Seq, ChIP-Seq etc on the HiSeq X.  HiSeq 4000 reagents are more expensive, which means that per-Gb every assay is more expensive than genome sequencing on Illumina.

However, no such restrictions exist on the NovaSeq – which means that every assay will now cost the same on NovaSeq.   This is what led me to say this on Twitter:

At Edinburgh Genomics, roughly speaking, we charge approx. 2x as much for a 4000 lane as we do for an X lane.  Therefore, per Gb, RNA-Seq is approx. twice as expensive as genome sequencing.  NovaSeq promises to make this per-Gb cost the same, so does that mean RNA-Seq will be half price?  Not quite.  Of course no-one does a whole lane of RNA-Seq, we multiplex multiple samples in one lane.  When you do this, library prep costs begin to dominate, and for most of my own RNA-Seq samples, library prep is about 50% of the per-sample cost, and 50% is sequencing.  NovaSeq promises to half the sequencing costs, which means the per-sample cost will come down by 25%.

These are really rough numbers, but they will do for now.  To be honest, I think this will make a huge difference to some facilities, but not for others.  Larger centers will absolutely need to grab that 25% reduction to remain competitive, but smaller, boutique facilities may be able to ignore it for a while.

Capital outlay

Expect to get pay $985k for a NovaSeq 6000 and $850k for a 5000.

Time issues

One supposedly big advantage is that NovaSeq takes 40 hours to run, compared to the existing 3 days for a HiSeq X.   Comparing like with like that’s 40 hours vs 72 hours.  This might be important in the clinical space, but not for much else.

Putting this in context, when you send your samples to a facility, they will be QC-ed first, then put in library prep queue, then put in sequencing queue, then QC-ed bioinformatically before finally being delivered.  Let’s be generous and say this takes 2 weeks.  Out of that sequencing time is 3 days.  So instead of waiting 14 days, you’re waiting 13 days.  Who cares?

Clinically having the answer 1 day earlier may be important, but let’s not forget, even on our £1M cluster, at scale the BWA+GATK pipeline itself takes 3 days.  So again you’re looking at 5 days vs 6 days.  Is that a massive advantage?  I’m not sure.  Of course you could buy one of the super-fast bioinformatics solutions, and maybe then the 40 hour run time will count.

Colours and quality

NovaSeq marks a switch from the traditional HiSeq 4 colour chemistry to the quicker NextSeq 2 colour chemistry.  As Brian Bushnell has noted on this blog, NextSeq data quality is quite a lot worse than HiSeq 2500, so we may see a dip in data quality, though Illumina claim 85% above Q30.

 

4 Comments

  1. Nice round-up, thanks. Illumina’s claiming only 75% of reads >Q30 for 2 x 150 nt paired-ends, which is what most will use for genomes. That seems pretty low but at some point, I don’t think I or anyone else will care (it does way more sequence, so if you have to throw a quarter of it out, what difference does it make?).

    Thanks also for the real-world assessment of run-time impact. As you say, 72 hr vs 40 hr (which is two working days anyway, practically speaking) is a wash when library prep, QC, and data wrangling is factored in.

  2. Great post. I have the same concerns as Richard, with a realistic 75% reads >Q30 (even 85%, but lets give some real world vs. Illumina validation data buffer of 10%) and at best 2x150bp reads, many applications will not be able to use such data. I think we’ll have to see how it rolls out, in my short time in the field (2-3 years) I’ve already seen claims by new NGS platforms fall apart and first adapters having to play guinea pig before being able to really use the technology. I’m not sure the cost savings will be there for small labs, really ever, with this new technology.

  3. Any opinion on how long the NextSeq is viable/supported? We’re a smaller university/user-base and have had one for about a year. My work is heavily dependent on it and we use it at-cost (reagents only). Our uni will NEVER purchase another instrument.

Leave a Reply

Your email address will not be published.

*

*

code

© 2017 Opiniomics

Theme by Anders NorenUp ↑