Opiniomics

bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

Which sequencer should you buy?

My aim for this post is a quick, pithy review of available sequencers, what they’re good for, what they’re not and under which circumstances you should buy one.   However, your first thought should be: do I need to buy a sequencer?  I know it’s great to have your own box (I love all of ours), but they are expensive, difficult to run, temperamental and time-consuming.  So think on that before you spend millions of pounds of your institute’s money – running the sequencers may cost millions more.  My blog post on choosing an NGS supplier is still valid today.

Illumina HiSeq X Ten

To paraphrase Quentin Tarantino, you buy one of these “When you absolutely, positively got to sequence every person in the room, accept no substitutes”.  The HiSeq X Ten is actually ten HiSeq X instruments; each instrument can run two flowcells and each flowcell has 8 lanes.  Each lane will produce 380-450million 150PE reads (120Gbase or data or 40X of a human genome).  Runs take 3 days.  Expect to pay an extra £1M on computing to cope with the data streams.  Ordered flowcells are quite difficult to deal with and can result in up to 30% “optical duplicates” (actually spill over from one well to adjacent wells).  You can producing 160 genomes every 3 days.  Essentially now used as a very cheap, whole-genome genotyping system, cost per genome is currently £850+VAT at Edinburgh Genomics.  Limited to 30X (or greater) genome sequencing.  I have checked with Illumina and this is definitely still true.

Illumina HiSeq X Five

Do not buy one of these.  The reagents are 40% more expensive for no good reason.  Simply out source to someone with an X Ten.

Illumina HiSeq 4000

The baby brother of the HiSeq X, I actually think it’s the same machine, except with smaller flowcells (possibly a different camera or laser).  Expect 300million 150PE reads per lane (same setup, two flowcells, each with 8 lanes, 3.5 day run time).  That’s 90Gbase per lane.  Same caveats apply – ordered flowcells are tricky and it’s easy to generate lots of “optical duplicates”.  No limitations, so you can run anything you like on this.  The new workhorse of the Illumina stable.  Buy one of these if you have lots and lots of things to sequence, and you want to run a variety of different protocols.

Illumina HiSeq 2500

One of the most reliable machines Illumina has ever produced.  Two modes: high output has the classic 2 flowcell, 8 lane set-up and takes 6 days; rapid is 2 flowcell, 2 lanes and takes less time (all run times depend on read length).  High output V4 capable of 250million 125PE reads, and rapid capable of 120million 250PE reads.   Increased throughput of the 4000 makes the 2500 more expensive per Gb and therefore only buy a 2500 if you can get a good one cheap second-hand, or get offered a really great deal on a new one.  Even then, outsourced 4000 data is likely to be cheaper than generating your own data on a 2500

Illumina NextSeq 500

I’ve never really seen the point – small projects can go on MiSeq, and medium- to large- projects fit better and are cheaper as a subset of lanes on the 2500/4000.  The machine only measures 3 bases, with the 4th base being an absence of signal.  This means the runs are ~25% quicker.  I am told V1 data was terrible, but V2 data much improved.  NextSeq flowcells are massive, the size of an iPad mini, and have four lanes, each capable of 100million 150PE reads.  Illumina claim these are good for “Everyday exome, transcriptome, and targeted resequencing“, but realistically these would all be better and cheaper run multiplexed on a 4000.

Illumina MiSeq

A great little machine, one lane per run, V2 is capable of 12million 250PE reads per run; V3 claims 25million 300PE reads but good luck getting those, there has been a problem with V3 300PE for as long as I can remember – it just doesn’t work well.  Great for small genomes and 16S.

Illumina MiniSeq

I suspect Illumina will sell tons of these as they are so cheap (< $50k), but no-one yet knows how well it will run.  Supposedly capable of 25million 150PE reads per run, that’s 7.5Gbase of data.  You could just about run a single RNA-Seq sample on there, but why would you?  A possible replacement for MiSeq if they get the read length up.  Could be good for small genomes and possibly 16S.  Illumina claim it’s for targetted DNA and RNA samples, so could work well with e.g. gene panels for rapid diagnostics.


 

One interesting downside of Illumina machines is that you have to fill the whole flowcell before you can run the machine.  What this means is that despite the fact Illumina’s cost-per-Gb is smaller, small projects can be cheaper and faster on other platforms.


Ion Torrent and Ion Proton

The people who I meet who are happy with their Ion* platforms are generally diagnostic labs, where run time is really important (they are faster than Illumina machines) and where absolute base-pair accuracy is not important.   Noone I know who works in genomics research uses Ion* data – it’s just not good enough.  Major indel problem and Illumina data is cheaper and better.

PacBio Sequel

No-one has seen any data but this looks like an impressive machine.   There are 1 million ZMWs per SMRT cell and about 30-40% will return useable data.  Useable data will be 10-20Kb reads at 85% raw accuracy, but correctable to 99% accuracy.  Output at launch is 5-10Gbase per SMRT cell, and PacBio expect to produce 20Kb and 30Kb library protocols in 2016.  Great for genome assembly and structural variation, not quite quantitative for RNA-Seq bu fantastic for gene discovery.  Link this up to NimbleGen’s long fragment capture kits and you can target difficult areas of the genome with long reads.  Machine cost is £300k so good value compared to the RSII.  These will fly off the shelf.

PacBio RSII

The previous workhorse of PacBio, capable of 2Gbase of long reads per SMRT cell.  Cool machine, but over-shadowed by Sequel, I wouldn’t recommend buying one.

Oxford Nanopore MinION

The coolest sequencer on the planet, a $1000 access fee gets you a USB sequencer the size of an office stapler.   Each run on the mark I MinION can produce several hundred Mb of 2D data, and fast mode (in limited early access) promises to push this into the Gbases.  Read lengths are a mean of 10Kb with raw 2D accuracy at 85% and a range of options for correction to 98-99% accuracy.  We use for scaffolding bacterial genomes, and also for pathogen detection.  Should you buy one?  You should have one already!

Oxford Nanopore PromethION

The big brother of the MinION, this is best imagined as 48 bigger, fast-mode MinIONs run in parallel.  If fast mode MinION can produce 1Gbase per run, the PromethION will produce 300Gbase per run.  This machine is in limited early access, but offers the possibility of long-read, population scale sequencing.  Access fee is $75,000 but expect to spend ten times that on compute to deal with the data.  Get one if you can deal with the data.

 

30 Comments

  1. PromethION in early access?!? Have you really seen it elsewhere than ONT booth or Clive Browns photos?

  2. Alejandro Sanchez-Flores

    23rd January 2016 at 3:41 am

    Just a few comments and info missing here:
    -Illumina Ten X: not an affordable option unless you have 10M USD burning holes in your pockets. The downside is that they only allow you to run human genome samples.
    -NextSeq 500: it is the replacement for the old good GAIIx. Flowcell with lanes that are not independent like in hiseq2000. The data is not bad at all in v1 or 2. Very quick though but some bias and artifacts we haven’t hacmve time to evaluate with precision.
    -MiSeq: V3 600 cycle kit has a faulty reagent that makes the last ~100 bases of each PE read to be very poor in quality.
    -Nanopores: Are like ink printers. The printer itself is cheap but the cartridges are even as expensive as buying the whole printer new.

    Nice review!

  3. Wrt getting Illimina machines second hand, do you think there is substance to the claim posted on SeqAnswers that Illumina won’t support machines purchased from third parties?

    • No, they will support this but you will have to pay to get the machine tested, re-certified and under warranty.

      • So get a cheap machine, and be wiling to write it off as a loss if Illumina come back with a big bill. My ex-boss many years ago bought two ABI 3700’s from Celera in Cambridge I think. They were cheap but one cost over £40,000 to get working – and it was always a bit of a dog.

  4. Hi Mick,

    I love reading your blog, but I disagree about the Ion sequencer on some levels, full disclosure I work on a PGM for both diagnostic and research. While I agree that Illumina is cheaper in the fact you can run 300 samples at a time our lab is not doing that level of research (sadly) a big study for us right now is an evaluation of 30 isolates so the cost effectance of Illumina would be lost on us. Also the Indel issue with Ion is address able and will become an issue for Illumina as they push ready lengths further as well. It will just be downstream of GC rich regions instead of homopolymer regions. I feel that Ion data and Illumina data are getting to be the same quality as Ion has honed its chemistries.

    Although if you want to point out the Ion (or Life Tech in general) is only concerned with first to market and you should avoid by their top of the line anything until the second iteration I would agree with.

    Also we are currently awaiting our first nanopore so I am excited about that as well and I have to agree Alejandro’s comment about them being the new printers, but I’m still pumped to start playing with it.

    • Alejandro Sanchez-Flores

      24th January 2016 at 2:41 am

      Oye personal experience with Ion sequencing is that the chip loading, the amount of polyclonals and the fact that I can’t guarantee my users a minimum amount of reads, really makes me angry. Having said that, the PGM have save us a couple of times when we had problems with the GAIIx back in the days.

      I still think that if you can outsource to a service provider, there is no point on buying sequencer.

      Mick forgot to write about the cost for maintainance policies which in the case of Illumina its about 10-15% of the machine price.

      • I agree with polyclonals an average of 20% is frustrating , we have noticed it is very user variable. We are small so we all spend time at the bench and I can tell the difference between when I load and when our other tech who routinely loads. We ordered the Ion Chef automated emPCR and chip loading I will let you know if we get more consistent loading if you are curious.

        Ion’s warranty/service agreement is a flat fee per year so it can be about 5% of the machine or it can become a very sizable line item if you have multiple pieces of equipment from them. Currently between the sequencers and the qPCR machines it’s a sizable line item.

  5. In my opinion for a non negligible fraction of NGS customers (either actual or perspective) the answer is: NONE.
    Good buyers are facilities, genome centres. They have trained staff, equipped laboratories, adequate storage redundancy.
    Outsourcing to them is a deal, while several micro-laboratories with just some funding to burn on a MiSeq usually waste public money on a machine that will be under-used, loaded by PhD students / Postdocs that will last one/two years and so and forth. And this MiniSeq?

    In Italy where I live the sequencing capacity is some petabase per day if you sum all the machines… I even know of a single *individual* that bought an IonTorrent!

  6. I’m really interested in getting a MinION but it’s a hard sell when you already have a MiSeq – costs the same to run. What are some great applications for MinION?

  7. Little late but I largely agree with most of the opinions expressed above. These are largely best done with dedicated centres or core facilities. In in those situations some of the medium-sized sequencers don’t make sense. That said there are some cases on the research side, like unacceptable turn around times from busy core facilities, where if you plan on doing a fairly steady stream of exomes and transcriptomes, the NextSeq 500 makes a lot of sense. Not as cheap top run but you may not have the capacity for a HiSeq, but also can’t wait 6-8 weeks for data.

    Of course the other big one is diagnostics. Depending on your jurisdiction sending things out versus doing them in house can be a major problem. The NextSeq is perfectly suited for a diagnostics lab doing clinical exomes + research that won’t run a HiSeq and needs things turned around quickly. I think that is their primary market for that machine.

    • biomickwatson

      25th February 2016 at 5:39 pm

      Indeed these are good points but assume (i) your NextSeq works all the time and (ii) you know how to run it. Large centres have redundant sequencers so can handle downtime and dodgy Illumina reagents. This needs to be factored in to turn around comparisons!

      • NextSeq is ideal for fast turnaround projects with modest numbers of reads, specifically RNA-Seq. 12 samples, >30M reads each in about 16 hours (overnight). Cheaper than HiSeq for short read sequencing. Gone are the days of us having to sequence ChIP 100bp PE because that’s the flow cell that was going on that week.
        Main downside is longer reads after expensive and reads give crap >℅Q30 after about 100bp

        • biomickwatson

          31st August 2016 at 6:18 pm

          I’m fairly confident that RNA-Seq on a HiSeq 4000 is cheaper than on NextSeq, but would need to see the numbers.

          What read-length do you do?

  8. I can say that you very much misspoke about the nextseq500. Its the absolute cheapest cost per read on any illumina machine, the high out put mode lists 400M reads but we have gotten close to 800M with 95%+ Q30 for less than $1000 in consumables. If you do the math its cheaper per read to run 4 nexseqs than a single 3k or 4k especially when you factor in service contracts (5-10% of instrument price/year) and initial purchase price. The 11hr run time for 1×75 or 19hr for 1×150 is awesome turnaround. We get better quality reads off the nexseq 500 than a rapid run on the 2500 in terms of error rate and Q30%.

    Cheers

    • biomickwatson

      23rd December 2016 at 1:04 pm

      If the recommended output is 400M I can’t see how you’re getting 800M without a massive PCR duplicate problem….

      • I’d also worry about the possible impact on GC bias at that cluster density. But it depends very much on the application. High clustering is not always bad, but some have reported unexpected side-effects fo squeezing that much out of the instruments.

        Overall NextSeq is a great machine for labs that want to run their own sequencer. I still believe HiSeq 4000 is the “cheapest” way to do ChIP-seq and RNA-seq especially using single-end 50bp reads. Not everyone will agree that single-end is a good idea though.

        • I agree that nextseq single end 75bp high output run is a good way of doing rnaseq. 400M reads is a good number of reads for a 20-30 sample experiment – depending on its aims of course. Our hiseq tends to run PE most of the time and output is too high usually for a single experiment so coordination more tricky.

      • We get regularly get 550M-600M clusters passing filter in our Nextseq 500, we haven’t pushed it more (800M number may include clusters not passing filters I imagine??). We are also very happy about the running time. The short running time, combined with high multiplexing, has helped us to detect bad library preps, and to iteratively repool libraries to get similar number of reads. We use it mostly for RNA-seq and ATAC-seq.

        • biomickwatson

          24th December 2016 at 11:54 am

          Have you checked for high levels of PCR duplication?

          • Error rates go up, but I have not noticed a big increase in PCR duplicates for loading more but I can look more into it. I will try to find a case where we loaded twice the same thing with two different loadings. Would it matter which version of bcl2fastq I use for base-calling? I usually notice more duplicate problems if we had problems in library prep.

      • Reads per run is a function of cluster density, and cluster density has nothing to do with PCR duplicates. Cluster density is directly related to the amount of library loaded. PCR duplicates are a function of input, and cycles during library building. I run a NextSeq, and regularly get 500-550 million reads per high output run. I’ve never tried to push it to 800.

        • biomickwatson

          5th January 2017 at 5:03 pm

          Massively over-clustered flowcells will have millions of optical duplicates.

          • Ah, I see. My confusion stems from your original comment mentioning PCR, not optical duplicates.

            I see examples and explanations of the impact of under loading patterned flow cells and increased optical duplicates, but I haven’t seen a clear explanation of what would drive optical duplicates on the NextSeq platform. The few times I’ve overclustered, my quality PF% have dropped like a stone.

          • Alejandro Sanchez-Flores

            5th January 2017 at 6:52 pm

            Same here… Also, you found problems when you pool libraries with different insert sizes. Even if we quantify and normalize considering the insert size, we can never estimate correctly the concentration needed for a certain yield for short insert size libraries (~200bp) when they are mixed with long ones (~500bp).

            Sequencing is a fine art…

          • biomickwatson

            5th January 2017 at 7:40 pm

            My apologies if I was being confusing 🙂

            OK, there are three (possibly 4 now!) types of duplicates:

            PCR – caused by too many rounds of PCR during lib prep. Found by mapping to a genome and counting fragments that map in identical locations
            Optical – these actually occur on NON-PATTERNED flowcells. Basically what happens is the Illumina software counts one cluster as two, and this is correlated with over-clustering. Found by inspecting flowcell co-ordinates in the read ID and looking for identical sequences near one another on the flowcell
            Well – these occur on PATTERNED flowcells. Patterned flowcells work by competitive exclusion – i.e. you get one molecule per well because they amplify quickly and fill the well, preventing other molecules from getting in. If the loading concentration is not right, a well will fill up and spill over into adjacent, empty wells. Found by inspecting flowcell co-ordinates in the read ID and looking for identical sequences near one another on the flowcell – but they are NOT optical duplicates, technically, despite the fact Picard calls them this.

            And then there are the new edge duplicates: https://twitter.com/BBToolsBio/status/816761172053458944

            James does a good job here: http://core-genomics.blogspot.co.uk/2016/05/increased-read-duplication-on-patterned.html

    • This thread has some of my analysis of NextSeq vs 2500 data.:

      http://seqanswers.com/forums/showthread.php?t=40741

      I’ve since repeated this analysis numerous times, and the results are fairly consistent:

      1) The NextSeq typically has a 5-10x higher error rate.
      and
      2) Both platforms have horribly inaccurate quality scores.

      When V2 chemistry first arrived, there were a couple runs with fairly good accuracy. But recent runs are low quality again, with inflated quality scores, presumably to pass Illumina’s specifications (since those are based on quality scores rather than error rates). And the base-frequency divergence issue for read 2 was never addressed. The error profiles also seem substantially more biased than HiSeq, making low-frequency variant analysis very difficult.

Leave a Reply

Your email address will not be published.

*

*

code

© 2017 Opiniomics

Theme by Anders NorenUp ↑