There’s been a bit of chatter on Twitter about how the cost of sequencing is not going down anymore, with reference to that genome.gov graph. I realise that sensationalist statements get more retweets, but I am sorry, this one is complete crap – the cost of sequencing is still on a long-term downward trend.
By way of qualification, I have been helping to run our University’s next-generation genomics facility, Edinburgh Genomics, since 2010. We are an academic facility running an FEC model – which means we recover all of our costs (reagents, staff, lab space etc) but do not make a profit. If you are interested in Illumina sequencing, especially after reading below, you should contact us.
You can read some relatively up-to-date references here, here and here.
What I am going to talk about below are the medium- to long- term trends in sequencing prices on the Illumina platform. There are fluctuations in the price in any given period (for example Illumina put prices up across the board a few weeks ago), but these fluctuations are very small in the context of the wider, global trend of cost reduction.
Back in 2013/14, the most cost-effective method of sequencing genomes was on the HiSeq 2000/2500 using V3 chemistry. At its best, a lane of sequencing would produce 180million paired 100bp reads, or 36 gigabases of sequence data. I am going to use this as our base line.
After HiSeq V3, came HiSeq V4 which was introduced last year. All new 2500s could offer this and many new-ish 2500s could be upgraded. At its best, V4 produces 250million paired 125bp reads, or 62.5 gigabases of sequence data.
Of course, Illumina charge more for V4 reagents than they do for V3 reagents, but crucially, the price increase is proportionally smaller than the increase in throughput. So, at Edinburgh Genomics, the cost of a V4 lane was approx. 1.2 times the cost of a V3 lane, but you get 1.7 times the data. Therefore, the cost of sequencing decreased.
This is rather trivial I think! By my calculations, a lane of the HiSeq X will produce around 110 gigabases of sequence data, which is 3 times as much data as HiSeq V3, and the cost has come down to about 0.4 times. Therefore, the cost of sequencing decreased.
The HiSeq 4000 is a bit of an unknown quantity at present as we haven’t seen one in the wild (yet) and nor have we come up with a detailed costing model. However, my back-of-a-post-it calculations tell me the HiSeq 4000 lanes will produce around 93 gigabases of data (about 2.6 times as much data as HiSeq V3) at about 1.4 times the cost. Therefore, the cost of sequencing decreased.
Drawing a new graph
Here it is. You’re welcome.
My numbers are accurate, despite what you may hear
It came to my attention last year that some facilities might be offering HiSeq V3 lanes with over 200 million read-pairs/clusters. It is possible to go that high, but often only through a process known as over-clustering. This is a bad thing. It not only reduces sequence quality, but also produces lots of PCR and optical duplicates which can negatively affect downstream analysis.
I haven’t spoken about Ion Torrent or Ion Proton for obvious reasons. I also haven’t included NextSeq 500 nor MiSeq – to be honest, though these are very popular, they are not cost-effective ways of producing sequence data (even Illumina will admit that) and if you told your director that they were, well, shame on you 😉
PacBio? Well it seems the throughput has increased 3 times in the last year:
Despite the need for an expensive concrete block:
So I can’t really see the cost of PacBio sequencing going up either.
MinION and PromethION – costs currently unknown, but very promising platforms and likely to bring the cost of sequencing down further.
Complete Genomics – well, as I said in 2013, they claimed to be able to increase throughput by 30 times:
There is also the BGISEQ-1000, which apparently can do 1 million human genomes per year. Apparently.
All of which means – the cost of sequencing is going to keep coming down.
So why is the genome.gov graph incorrect?
I don’t know for sure, but I have an idea. Firstly, the data only go to July 2014; and secondly, the cost per genome is listed as $4905, which is obviously crap in the era of HiSeq X.
Can we stop this now?