Opiniomics

bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

The cost of sequencing is still going down

There’s been a bit of chatter on Twitter about how the cost of sequencing is not going down anymore, with reference to that genome.gov graph.  I realise that sensationalist statements get more retweets, but I am sorry, this one is complete crap – the cost of sequencing is still on a long-term downward trend.

By way of qualification, I have been helping to run our University’s next-generation genomics facility, Edinburgh Genomics, since 2010.  We are an academic facility running an FEC model – which means we recover all of our costs (reagents, staff, lab space etc) but do not make a profit.  If you are interested in Illumina sequencing, especially after reading below, you should contact us.

You can read some relatively up-to-date references here, here and here.

What I am going to talk about below are the medium- to long- term trends in sequencing prices on the Illumina platform.  There are fluctuations in the price in any given period (for example Illumina put prices up across the board a few weeks ago), but these fluctuations are very small in the context of the wider, global trend of cost reduction.

HiSeq V3

Back in 2013/14, the most cost-effective method of sequencing genomes was on the HiSeq 2000/2500 using V3 chemistry.  At its best, a lane of sequencing would produce 180million paired 100bp reads, or 36 gigabases of sequence data.  I am going to use this as our base line.

HiSeq V4

After HiSeq V3, came HiSeq V4 which was introduced last year.  All new 2500s could offer this and many new-ish 2500s could be upgraded.  At its best, V4 produces 250million paired 125bp reads, or 62.5 gigabases of sequence data.

Of course, Illumina charge more for V4 reagents than they do for V3 reagents, but crucially, the price increase is proportionally smaller than the increase in throughput.  So, at Edinburgh Genomics, the cost of a V4 lane was approx. 1.2 times the cost of a V3 lane, but you get 1.7 times the data.  Therefore, the cost of sequencing decreased.

HiSeq X

This is rather trivial I think!  By my calculations, a lane of the HiSeq X will produce around 110 gigabases of sequence data, which is 3 times as much data as HiSeq V3, and the cost has come down to about 0.4 times.  Therefore, the cost of sequencing decreased.

HiSeq 4000

The HiSeq 4000 is a bit of an unknown quantity at present as we haven’t seen one in the wild (yet) and nor have we come up with a detailed costing model.  However, my back-of-a-post-it calculations tell me the HiSeq 4000 lanes will produce around 93 gigabases of data (about 2.6 times as much data as HiSeq V3) at about 1.4 times the cost.   Therefore, the cost of sequencing decreased.

Drawing a new graph

Here it is.  You’re welcome.

biomickwatson_thenewgraph

My numbers are accurate, despite what you may hear

It came to my attention last year that some facilities might be offering HiSeq V3 lanes with over 200 million read-pairs/clusters.  It is possible to go that high, but often only through a process known as over-clustering.  This is a bad thing.  It not only reduces sequence quality, but also produces lots of PCR and optical duplicates which can negatively affect downstream analysis.

Other platforms

I haven’t spoken about Ion Torrent or Ion Proton for obvious reasons.  I also haven’t included NextSeq 500 nor MiSeq – to be honest, though these are very popular, they are not cost-effective ways of producing sequence data (even Illumina will admit that) and if you told your director that they were, well, shame on you 😉

PacBio?  Well it seems the throughput has increased 3 times in the last year:

Despite the need for an expensive concrete block:

So I can’t really see the cost of PacBio sequencing going up either.

MinION and PromethION – costs currently unknown, but very promising platforms and likely to bring the cost of sequencing down further.

Complete Genomics – well, as I said in 2013, they claimed to be able to increase throughput by 30 times:

There is also the BGISEQ-1000, which apparently can do 1 million human genomes per year. Apparently.

All of which means – the cost of sequencing is going to keep coming down.

So why is the genome.gov graph incorrect?

I don’t know for sure, but I have an idea.  Firstly, the data only go to July 2014; and secondly, the cost per genome is listed as $4905, which is obviously crap in the era of HiSeq X.

Can we stop this now?

15 Comments

  1. The discrepancy between your (Edinburgh Genomics) figures and NHGRI could be this, “Costs accounted for”:
    “Sequencing instruments and other large equipment (amortized over three years)”

    It does strike me that in order to get the best cost/base ratio you need get the most up-to-date kit. So, absolutely, the reagent cost per genome is going down, but only if you can afford to buy new kit (or can upgrade).
    Someone stuck with an ‘old’ HiSeq2000 can only do V3 chemistry.

  2. Thanks Mick for this analysis.

    Yet ‘that genome.gov graph’ that you refer to is so often mis-represented, and you also fall into the same trap of ignoring the annotations around http://www.genome.gov/sequencingcosts – not only the amortization, as mentioned in the above comment, but to quote that page, the following indirect and direct costs:
    • Labor, administration, management, utilities, reagents, and consumables
    • Sequencing instruments and other large equipment (amortized over three years)
    • Informatics activities directly related to sequence production (e.g., laboratory information management systems and initial data processing)
    • Submission of data to a public database
    • Indirect Costs (http://oamp.od.nih.gov/dfas/faq/indirect-costs#difference) as they relate to the above items

    So while you are accurate in the raw reagent cost per Gb, by ignoring the rest we get to an unfair comparison. And of course these numbers are one center’s (albeit very large one) analysis and calculation for their own accounting.

    As one who knows a number of the NHGRI folks at both the institute and NISC (their sequencing center), these cost figures are very useful, as you know the fixed costs mentioned in the above are not going down (but actually are going up in many instances, for example the cost of initial data processing on a per-Gb basis).

  3. I must have misunderstood something here. But with HiseqX for example, the cost quoted is only after 4 years of amortisation and high capacity constant runs. The cost per genome on days one isn’t anywhere near this ? On day one you (or your national tax payer) are significantly out of pocket until you ‘pay yourself back’ by slaving over the machine for 4 years surely ?

  4. Hi Dale

    Thanks for the comment. As I say in the post, the costs I have used are from the facility I help run and are FEC costs – i.e. they include the costs you mention above, and in fact we try to amortise equipment over 2 years, as this seems to be Illumina’s cycle.

    So to be clear – I am not comparing raw reagent cost. I am comparing FULL ECONOMIC COST of sequencing.

    Cheers
    Mick

  5. Hi Chris

    As I said in the post, these are FEC costs and include amortisation of the equipment over two years. They include everything – staff, reagents, core informatics, equipment amortisation, university overheads – everything.

    You are right that some labs get stuck with old kit. My current advice? Don’t buy an Illumina sequencer unless you can i) afford to upgrade it every year; ii) afford to replace it every 2 years.

    Cheers
    Mick

  6. which begs the question, if you can’t feed that machine, i.e. you dont have a pre agreed mega-project, lets say you are a core, and people stop sending you their samples, what is the cost then ? I guess if you just write off the cost of the idle machines, and your funder doesn’t know any better, its all the same to the machine vendors – they keep the capex or lease money and just lose the reagent revenue stream ?

  7. — i think getting big capex and then just writing it off, especially when idle, is a scandal waiting to break.

  8. I mean its probably tens or hundreds of millions gone that way, how many nurses or teachers would that employ? Instead its all in the pocket of some big corporation.

  9. then of course, there is only ever a finite pot of grant money. So you got your capex for the big machines, but other labs therefore didn’t get their science funded as a result. So on top of the lost money in idle systems, you also have the opportunity cost of the unfunded alternative projects.

  10. so that the real benefit here is to the company selling large capex systems to a relatively small number of central labs who take a larger and larger slice of public resources.

  11. It’s important to remember that this is a graph of NHGRI’s cost, not the field as a whole. The way it’s presented makes it easy to think it’s intended to be the latter, but interpreting it as anything but the former is misleading.

    AFAIK, they have yet to acquire either a HiSeq X or a HiSeq 4000 (though I’m sure the 4000 will be along as soon as Illumina can get it to them), so cost decreases related to those platforms are irrelevant. They’ve also acquired extra capabilities (optical mapping, PacBio sequencing) that may make the cost per base go up while increasing the quality of those bases enough to more than compensate.

  12. Ah, I didn’t realise FEC also included depreciation.

  13. anyway, i look forwards to seeing the machines at 100% utilisation.
    Out of curiosity, how much additional are you paying in the ‘support’ contract ?

  14. Graph finally updated showing a sharp cost decrease in 2015.

Leave a Reply

© 2017 Opiniomics

Theme by Anders NorenUp ↑