Opiniomics

bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

Bacterial genomes – 2nd and 3rd generation costs

There are some really cool developments coming out from PacBio, not least of which is the tantalising ability to be able to sequence and assemble single-contig bacterial genomes (link, link, link).

Adam Phillipy and others have published a really cool paper on this at arXiv, and I have to say I am really, really incredibly impressed by PacBio and all the advances bioinformaticians are making in this area.  It’s really cool.

However, in the hype, it’s possible to lose sight of the advantages of the Illumina system, and there are certainly some uncertainties around cost – in the arXiv paper, we see the phrase:

“While the cost of multiplexed Illumina can be as low $300 per genome, the resulting assemblies are typically in hundreds of contigs”

Whilst I don’t have issue with the latter part of that sentence, the first part is perhaps worth questioning!

Some statistics

The rapid-run mode of the HiSeq 2500 is perfectly capable of producing 150 million 150bp paired-end reads.  This equates to 45Gb of sequence data.

If we are sequencing 5Mb genomes, at 40X, we need 200Mb of sequence.  96 of those will therefore need ~20Gb of sequence, so as you see, a single lane of HiSeq 2500 easily copes.

ARK-Genomics runs a non-profit full cost recovery business model, which means we charge for reagents, staff time and equipment.  So for that lane of sequencing, we would probably charge in the region of £2500.

We need to factor in the cost of libraries.  In reality, we could make this cheaper via automation, but for the sake of ease, let’s say the library prep is £100 per sample.  That’s £9600 on library prep.

That’s a total of £12100, or £126 per genome.

In reality, I think we could get library prep down to £50 per sample,  This would bring the cost down to £76 per genome.

At present exchange rates, $300 is about £200, so you can see our costs are significantly cheaper than the costs in Adam’s paper.

PacBio costs

I have less of an idea about Pac Bio costs – Adam’s paper suggests between $900 and $1200, but admits a different recipe is as high as $2200.

We have commissioned some PacBio work and the cost was about £1100 for a single sample.

Perhaps others can comment on this?

Comparison

My conservative estimate is that PacBio is about 10 times more expensive per sample for bacterial genomes than Illumina, and in reality it is probably higher.  Even taking my conservative estimate,  the figure of “10 times” is significantly higher than the comparison implied in Adam’s paper.  My worry is that Adam’s paper compares an expensive Illumina quote with a cheap PacBio quote.

Pros and Cons

Pros of PacBio are that you get a finished genome.

Pros of Illumina are

  1. Cost per sample is far cheaper
  2. Population level statistics – I’m not sure of the fold coverage one achieves with PacBio, but 40x Illumina coverage certainly lets you begin to see low-level variants in the population of cells being sequenced
  3. Scale – if you want to sequence 96 genomes, the only real option is Illumina – more people have consumables budgets of around £10k than have budgets around £100k

Horses for courses

I love what PacBio are doing, and I love what Adam and others are doing on the Informatics side.  At the end of the day, we must choose the right technology for the right question.  PacBio is great if you want complete genomes; Illumina is still the only viable alternative if you want to sequence hundreds of bacterial genomes at once.

7 Comments

  1. For PacBio costs, I was quoted by a commercial supplier, $700 for library prep and $500 per SMRT cell. I think two SMRT cells are recommended, making for a total cost of £1100-ish per strain. We decided to take a punt on Nextera mate pair instead, no data yet.

  2. Mick,
    Thanks for the plug of our paper, and taking time to give us feedback. I have enjoyed the discussion this kicked up. It’s a nice benefit of releasing pre-prints. I think we’ve covered most of what’s below on twitter, but I’m putting a summary here.

    First, 2nd gen costs weren’t included in the original arXiv version of this paper, nor were 2nd gen assemblies. The primary intent of our paper is to show what’s algorithmically possible with PacBio sequencing, and how long reads significantly drive down the cost of genome *finishing* and enable 1-contig assemblies. Thanks for you kind words on that, main part, of the paper! From our results, I don’t think there is any debate that PacBio is much, much cheaper for genome finishing than any other approach, and is capable of some pretty cool stuff.

    Adding the comparisons to 2nd gen assemblies was recently added for context; essentially, what assembly does each platform produce, at what price. You took particular exception to our $300 figure for a per-genome Illumina cost, but it’s nice to see that we are both in the same ballpark, i.e. £126 vs. $300 for 5Mbp genome on Illumina. It’s important to note that we are talking about generating enough data to generate a good de novo assembly. You mentioned 40X above, we like to think as 100X as the minimum Illumina coverage required to generated a good de novo assembly (personal experience). So our cost figures are assuming ~100X Illumina and ~150X PacBio. The $2,200 figure you quote is for the hybrid closure approach put forward by the Broad last year and includes both PacBio and Illumina sequencing — that’s the cost we were actually trying to improve upon, and the lowest closure cost quoted prior to our paper. The numbers are all broken out in the “Sequencing Cost Estimate” section of the paper on arXiv. Note our PacBio cost estimate dropped from ~$2,000 to ~$1,000 with the recent release of the RSII, which at least doubled the instrument throughput.

    Total sequencing costs are tricky to estimate — or get people to admit to — so a little background on where ours came from. All “costs” noted in our paper were taken from contract sequencing prices estimated by the Duke University Genome Sequencing Information Manager: https://dugsim.net/ . Duke provides a really neat web UI that allows you to estimate project costs for essentially any platform (Sanger, 454, Illumina, Ion, PacBio). This was a very helpful resource for us because it provided consistent, advertised pricing from a single center. The assumption being that if Duke is willing to advertise these prices, the price includes amortized machine costs, reagents, labor, storage, etc. (i.e. the total cost of producing the sequence). Also, the costs of all platforms are from the same center, hopefully controlling for big variables across centers, like labor. Certainly not perfect, but they are actual, listed prices. So, when we say ~$1,000 for a closed class I or II genome using PacBio, that’s an actual listed price from the Duke website; and not an intentionally “cheap” PacBio quote. Same goes for the Illumina quotes.

    This is all straight-forward if we are talking about a microbiologist who wants to sequence his/her favorite genome and contracts the sequencing out to a core facility like Duke. Send the DNA, get a fastq back. The problem is to fairly compare against Illumina we have to consider multiplexing, since no one would run full a HiSeq lane for a single genome. Not an issue for PacBio and 454, since both are suited to do only 1 or 2 genomes a run. But a big problem for estimating Illumina, because library prep comes to dominate the cost of sequencing. I agree with Nick Loman’s suggestion that comparing reagent costs and bench time leaves much less room for debate, but when you are multiplexing 96 genomes, the per-genome sequencing cost is dominated by labor/overhead costs. For example, Duke, U. Delaware, dnasusequencing.org all want more than $250 per Illumina library prep, while the sequencing costs are less than $50 per genome (assuming multiplexing). So I do like to think in terms of total cost. Labor is not cheap. As you, and others noted, the per-genome cost of multiplexed libraries can be brought down with automation, but we were unable to find any multiplexed library price references. So, we’re left with the ~$300 figure (~$250 lib + ~$50 seq), and if you didn’t have 96 genomes to sequence, and asked Duke to do 4 genomes on a MiSeq, the price would be over $500 per genome (~$250 lib + ~$300 seq). You are most skeptical of the library prep costs, but note that the library prep cost included in the PacBio estimates is more than $400 per sample. So both Illumina and PacBio quotes pay this high library prep cost, making the comparison fair in my mind.

    Finally, I agree it is important to keep sequencing applications separated. Nobody debates the fact that for applications like high-throughput SNP studies, multiplexed Illumina is the way to go. We’ll add a comment to the paper that states that fact. In contrast, PacBio currently excels at the lower-throughput applications like genome finishing or sequencing a few genomes at a time. These are different, but important applications. In this context of de novo assembly, multiplexed Illumina will reduce costs by 3-10X (based on our and your estimates), but the number of contigs will increase 100X versus PacBio. If you’re doing many genomes, this tradeoff may make sense. If you want good assemblies, we’re telling you how much it will cost you to do it with PacBio.

    • Thank you Adam, for your reply 🙂

      I guess I should get my PacBio from Duke – I’ve never been quoted anything like those prices even for a single SMRT cell. So what I saw was an expensive Illumina quote vs a cheap PacBio – but perhaps I need to change my idea on hos much PacBio costs?

      What would be great is if (i) PacBio introduced multiplexing, and (ii) we had an idea if low X PacBio plus high X Illumina could finish genomes i.e. Illumina to get to contigs and low coverage PacBio to finish it (the costs would again reduce significantly)

      • Yes, the PacBio costs dropped by half with the very recent introduction of the RSII. Those upgrades have been rolling out of the past few months, so whatever you were quoted in the past, just assume it will be half that with the RSII. (i) I have seen some activity wrt barcodes and multiplexing on the PacBio. See here: http://www.pacificbiosciences.com/pdf/TN_Multiplexing_Targeted_Sequencing_Using_Barcodes.pdf . (ii) The Illumina/PacBio hybrid idea is exactly what the Broad published last year and we compared against in our paper. It is more expensive in the end because it requires multiple library preps.

      • I know the professor, who runs the Duke facility. Based on my conversation with him couple of years back, I believe Duke was not ‘full cost’ regarding the amortization cost of new machines. We can check with him, if that is something that distorts the calculation. Also note that universities have internal and external prices, and the external (low volume) subsidizes the internal (higher volume).

        Please note that Duke is one of the most efficiently run sequencing centers. That is one of their reasons to put prices online – because they were early with NGS and managed to offer better than competing centers.

      • That’s really interesting!

        I can confirm, categorically, that we do not distinguish external from internal customers – same price, same queue, no favouritism 🙂

Leave a Reply

Your email address will not be published.

*

*

code

© 2017 Opiniomics

Theme by Anders NorenUp ↑