This article was also published in Biome
Illumina announced to the world last month two new sequencing machines, the NextSeq 500 and the HiSeq X; and lost in the noise, a 1Tb upgrade to their current flagship sequencer, the HiSeq 2500. All three systems demand more attention, and I intend to give them some in this Perspective, but before we get to that, it’s worth recounting a little history. The last 8 or 9 years has been termed by many as the ‘genomics revolution’, as the paradigm of DNA sequencing has shifted from Sanger sequencing to massively parallel systems capable of huge throughput. In this Perspective, I want to tease out some important aspects from the last few years, describe the impact of Illumina’s new systems, and discuss the future of DNA sequencing.
Throughout this article, when I quote throughput figures, I am using those provided by the relevant technology provider.
Those of us of a certain age will remember the great video-tape format war of the 1970s and 80s – Betamax vs VHS vs V2000. If you’re a geek, it’s fascinating to look at the history of this ‘war’ – here we had three competing technologies, all viable with different pros and cons, fighting on issues such as price, quality and reliability. Those three issues should certainly strike a chord with modern genomics researchers. Remarkably, the video format war was repeated between 2006 and 2008, HD-DVD competing with Blu-ray for the storage of high-definition video. Again, issues of cost and quality were prevalent, and the war was eventually won by Blu-ray.
It is tempting to mention sequencing technologies in the same way, but in reality, there never was a war; or if there was, it was won by Illumina many years ago; and ‘massacre’ may be a more appropriate term than ‘war’. However, I have been accused of being a ‘pom-pom swinging Illumina cheer leader’ so let’s look at some statistics.
Initially the three ‘competing’ technologies were Roche/454, ABI SOLiD and Illumina/Solexa. However, as far back as 2009, the writing was on the wall as two major sequencing centres, The Sanger Institute and Washington University, decided to focus on the Illumina technology (detailed in Nick Loman’s blog post). Other blog posts followed announcing Illumina’s dominance (such as this one), and in 2010 Nick’s Omicsmaps survey revealed an estimated 60% market share for Illumina. A GenomeWeb survey in 2013 revealed this had grown to 71%; in the same time-frame, ABI SOLiD was replaced by its owners, Life Technologies, with the Ion Torrent sequencing platforms (Proton and Personal Genome Machine), while Pacific Biosciences’ SMRT sequencers emerged as a useful technology for genome assembly, and Roche announced that it would be shutting down its 454 sequencing business.
Both GenomeWeb and Nick’s statistics are based on machine ownership, rather than data produced – and many of you will be aware of machines that are owned but never run (I certainly am). In terms of actual output, Illumina estimate that over 90% of the world’s sequencing data has been produced on their machines.
While we can and should question all of these statistics, they all point to a simple fact – Illumina are the dominant sequencing technology, and arguably have been since 2009. If you are interested in why they won the war, then in my opinion, it can be summed up in a single word familiar to all biologists.
The fundamental technology used by Illumina’s sequencing platforms has not changed since its purchase of Solexa in 2007. The Solexa technology was developed in Shankar Balasubramanian and David Klenerman’s labs at the University of Cambridge in the late 90s and early 2000s. When Illumina purchased Solexa, their first machine, the Genome Analyzer, was producing 25bp reads at a rate of 1 Gigabase (1 billion bases) per day. However, Illumina didn’t rest on their laurels – they took what was a really good sequencing instrument and turned it into an exceptional one; through many rounds of evolution, they increased both read length and throughput, eventually producing the hugely successful sequencing platforms you see today. Instead of 25bp reads, the MiSeq instrument can produce paired-end 300bp reads (2x250bp reads are set to hit the HiSeq 2500 this year). Instead of 1Gb per day, the HiSeq 2500 produces over 50Gb per day – and that’s before taking into account the recent 1Tb upgrade.
Illumina’s evolution of the Solexa technology is impressive, and has largely fuelled the genomics revolution, making genomics one of the most exciting areas of science to work in. Their ability to develop the technology was something their rivals were unable to match. It is my belief that it was this continuous evolution of the Solexa technology that ensured that Illumina won the sequencing war, and they deserve a huge amount of credit for what they have achieved. Ultimately, they beat their rivals on all three fronts: price, quality and reliability.
What’s possible now
With Roche shutting down their 454 sequencing business and Life Technologies largely pushing their Ion Torrent systems over the SOLiD platform, there are really only 3 technologies currently available: Ion Torrent, Pacific Biosciences and Illumina.
Ion Proton, the higher throughput of Life Technologies’ Ion Torrent machines, currently runs the PI chip, capable of producing 60 to 80 million 200bp reads in a 4 hour run, a total output of 10Gb. The PII chip, having been scheduled for release early in 2013 has now been pushed back to mid-2014. The PII chip will apparently be capable of producing 300 million 100bp reads, resulting in a 30Gb output.
Pacific Biosciences’ RS II machine, the only single-molecule sequencer on the market, does not really compete with Illumina or Ion in terms of throughput; its P5-C3 chemistry produces only 375Mb of sequence per run. The real strength of the RS II is its long reads: the average read being 8.5Kb, with the longest being in excess of 30Kb. Recently published read correction strategies remove many of the errors, and now the SMRT technology of Pacific Biosciences (or ‘PacBio’) seems the weapon of choice for finishing genomes or de novo sequencing of new genomes.
I will restate the current capabilities of the HiSeq 2500 system, before moving on to the 1Tb upgrade and the new systems announced by Illumina in the next section. The current HiSeq 2500 has two modes, rapid run and high output. The rapid run is configured as 2 independent flowcells, each of which has 2 lanes. Each lane produces between 120 and 150 million reads, and the read length is currently at a maximum of 2x150bp. The run takes a little under 2 days (40 hours) producing roughly 180Gb, or 90Gb per day.
The high-output mode is configured as 2 independent flowcells, each of which has 8 lanes. Each lane produces between 150 and 180 million reads, and the maximum read length is 2x100bp. The run takes about 11 days producing roughly 600Gb, or 60Gb per day.
Illumina’s new systems and upgrades
The 1Tb upgrade to the HiSeq 2500, available on all factory made HiSeq 2500s and some newer HiSeq 2000s, extends the high-output mode. The 2 flowcell by 8 lane configuration is maintained, but read length is extended to 2x125bp and read numbers increased to 250million per lane. Multiply these numbers and one reaches the magical 1,000Gb (1Tb) number. The run time has also been reduced to 6 days, meaning the upgraded HiSeq 2500 can produce 166Gb of sequence per day.
The NextSeq 500 is a new sequencing system, in more than one way. The NextSeq 500 has a single flowcell with 4 lanes, and kits are available that produce between 20Gb and 120Gb per run. The modes are termed ‘mid output’ and ‘high output’. The 120Gb run comes as 2x150bp reads, and takes just over a day to run (29 hours). However, the most interesting change in the NextSeq is its new chemistry: the NextSeq 500 will use only 2 dyes to detect base incorporation, compared with 4 dyes on every other Illumina system. In the NextSeq 500, a single dye will be used to detect T and C, a mix of both dyes will be used to detect A, and an absence of dye will be used to detect G. This solves two problems for Illumina: first, detecting 2 dyes instead of 4 reduces the scanning time and speeds up sequencing; second, problems with certain motifs (such as the GGC motif) have been reported on Illumina systems, and the removal of the dye from the detection of the G base is designed to solve this.
The system that will deliver the $1,000 genome is the HiSeq X Ten, which is actually a configuration of ten HiSeq X machines. One can only buy HiSeq X machines in groups of ten or more, and each X costs $1million. The $1,000 genome is achieved by purchasing one HiSeq X Ten system, sequencing 18,000 human genomes per year and depreciating the equipment over 4 years. On paper, this appears to add up. Configured as either a single or dual flowcell, each HiSeq X machine will have a maximum output of 6 billion reads (2x150bp) and the run time is around 3 days. The total output of a single HiSeq X machine is therefore 1.8Tb, or 600Gb per day. With ten of those working in tandem, the total output of the HiSeq X Ten is therefore 18Tb, or 6Tb per day.
Apart from the huge increase in throughput, the HiSeq X system is the first and only Illumina system to use ordered arrays. Instead of clusters forming randomly on the surface of the flowcell, they will form in pre-defined wells, allowing Illumina much more control over the position and nature of the clusters, and easing image analysis.
The future for Illumina’s competitors
It will be really interesting to see how Life Technologies responds to Illumina’s latest developments. Their key advantage is speed, with the Ion Torrent platforms carrying out the sequencing component in hours rather than days. However, the throughput and cost-per-base do not match current Illumina platforms, never mind the new ones. To remain a viable business, Life Technologies, and its Ion Torrent platforms, must respond.
Pacific Biosciences’ SMRT technology has evolved significantly too and has become an essential tool for those wishing to close genomes, or sequence de novo new genomes. Intriguingly, Roche, a global health-care company, announced an agreement with Pacific Biosciences to develop DNA sequencing products for clinical diagnostics. This is not a space that Pacific Biosciences have been in up until now, and it is difficult to see how their RS II system can compete with Illumina and Ion Torrent in the clinic. Because of this, rumors of a new (benchtop?) PacBio machine abound on social media.
For many, Oxford Nanopore Technologies, heir apparent to the sequencing crown, remain the future. The company recently opened up an early-access program for their USB sequencer, the MinION. At the AGBT conference, David Jaffe showed the first data from this platform. These are exciting times for nanopore-based sequencing, but there remain challenges – particularly in interpreting the signal and turning it into the ‘bases + quality scores’ paradigm that many are used to. In fact, to get the best out of the platform, many think a change to the above paradigm will need to happen, and software may need to interpret the raw signal rather than the traditional 4 bases. On his blog, Yaniv Erlich commented that ‘MinION is not a sequencing platform. It is a sequencing sensor.’ (his italics) and I think this is a key differentiator of the platform. It is the only platform that detects an actual single strand of DNA (rather than incorporation events as a template strand is copied), and it still astounds me to think that we are soon to have in our hands the very first mobile device capable of sequencing DNA – surely an historic moment. With that idea in mind, it is hard not to believe that nanopore-based sequencing is the future. Of course, Oxford Nanopore have competitors, but one look at their management team, followed by a look at their IP portfolio, and it is hard to imagine anyone being better placed to deliver on the promises of nanopore-based sequencing.