Top ten lists have taken a bit of a battering in recent times, and so I kind of apologise for adding another one, but I’m not really that sorry.

Some of you will be aware that I manage a genomics facility, ARK-Genomics.  I choose my words carefully – I don’t run it, the excellent staff who work in the facility do that.  If I dropped off the face of the planet, the facility would still “run”.  No, I “manage” the facility, which at the end of the day means that I write documents and submit them to various people – our management team, our funders, our finance team etc.

I may come across as cynical at times, but it’s only because I care.  A lot.  I love science.  I love genomics.  I want to see it done well and I get angry when it isn’t.  So I care about ARK-Genomics.  I care about the data we produce, and I care about the customers who come to us.

This post is really about caring, and why you should take care when you choose an NGS supplier.

As a quick aside, you may be surprised at my use of the word “customer”.  You shouldn’t be.  Whilst ARK-Genomics is an entirely academic operation, we do not make any profit and we prefer to work as a scientific collaborator, make no mistake, the relationship benefits if we think of our collaborators as customers.  They have money, we have genomics expertise.  They pay us and they have expectations.

Meeting those expectations is what keeps me awake at night!

So here are my top ten things to consider when chooding an NGS supplier:

1. Quality – not all sequencing is equal.

If you get some Illumina data from there it won’t necessarily be as good as there.  These machines are temperamental.  It’s not just a case of doing a bit of pipetting followed by the push of a button.  There is a huge amount of skill, expertise and experience involved in QC-ing and preparing NGS libraries for sequencing.  Then, when the sequencer is running, it needs to be monitored pretty closely – Illumina have a habit of producing pokey batches of reagents, and if a problem develops, you need to act fast.  So lots of skill is involved in the process.

Below is a quality plot from a publicly available dataset.  Would you be happy with that?  You shouldn’t be.bad_qual

How about this one?  This isn’t a public dataset, nor is it one of ours, but it is one I have come across.  Someone I know paid for the sequencing before checking the quality of the data.  Do I have your attention now? 🙂


So not all data is the same, not all facilities are the same.  Some are better than others, quite frankly.  So choose well!

2.  Delivery – can they do it?

This is similar to point (1) above.  Sequencing DNA is different to mRNA, which is different again from microRNA.  ChIP-Seq, MeDIP-Seq, Bisulfite sequencing, mate-pairs etc etc etc.  They all take skill.  They’re all different.  Just because a facility has a certain sequencing technology, it doesn’t mean they’ll be able to do the type of sequencing you want.  Once a facility buys a machine, it can take them years to perfect the various different library preparation techniques.  Ask yourself, and ask them – do they have a track record in producing the type of data you want?  If not, then your project might be the one where they find out they can’t make that type of library after all.  This isn’t necessarily a problem, but if it ends up being a “development project” for the facility, you should know about it first!

3. Technology – the sequence is not the DNA

Not all sequencing is equal, and not all technologies are equal.  The sequence you get is the result of a long series of steps involving QC, PCR, library prep, sequencing and bioinformatics.  Each technology has pros and cons.  PacBio have high raw indel ratesSOLiD data is not great for genome assembly454 and Ion Torrent tend to have homopolymer errorsIllumina sequencing also can produce artefacts.  So, think about what you want to do, and choose the appropriate technology.  Be prepared for the fact that the cheapest per-base technology might not be the one you need.

4. Bioinformatics

Bioinformatics is now the most rate-limiting step in genomics research.  Can you handle the data yourself?  If not, what kind of bioinformatics support is available and how much does it cost?  You might find the bioinformatics component costs more than the sequencing – did you budget for that?  Note, not all bioinformaticians are equal, either.  Just because they know Linux and can get the tools to run, it doesn’t mean they are good bioinformaticians.  There is more to bioinformatics than running a few tools and e-mailing you the results to interpret.  Does your chosen facility have a good research track record of producing bioinformatics publications?  If so, it’s probably a good sign.

In my very early NGS years (in my previous role) we commissioned some Illumina sequencing, and the bioinformatics support was “Buy Lasergene”. Hmmmmm, this was not good advice!

5. Time

This is a fairly obvious one – how quickly will your project be run?  And where does it fit in with their priorities?  Is your project ranked as highly as, perhaps, their internal projects?  Will your £5k project be treated the same as the £5million project they just won to sequence 10,000 frightened badgers?  Or will your project be silently forgotten as they chase the next big grant?

6. Customer service

You probably won’t hear this term mentioned by many academic facilities, but this is important.  If you have an issue, or a question, who answers your call or your e-mail?  Does it get answered at all?  Or is it forgotten about/ignored?  What happens when you have a complaint?  At ARK-Genomics, we have a complaints procedure, and I encourage anyone who has an issue to e-mail me directly.  How many facilities encourage you to e-mail the director when things go wrong?

This is really important. You won the grant, you have the money, you have every right to expect your queries and complaints to be dealt with rapidly and professionally.

7. Responsiveness

So you want to know how much something costs, or you want advice on experimental design or choice of technology.  Who do you ask?  How long does it take for an answer?  Who answers you and do they know what they’re talking about?  Is it a sales person, who will basically tell you what you want to hear to make the sale?  Or a scientist, who is honest with you about the limitations?  Which would you prefer, honesty or the hard sell?  Will people who have a £1million consumables budget get a quicker response than someone who has £10k?

8. Processes

Some call this quality assurance, others call them SOPs (standard operating procedures).  Essentially they are talking about the same thing: processes, what they are, and documenting them.  I believe everything should have an SOP – project management, data management, customer complaints, bioinformatics, lab techniques, etc, etc:  everything.  It ensures that you get a good, standard product, no matter who carries out your project.  At ARK-Genomics, we are continuously developing our SOPs, and plan on obtaining ISO-9001 accreditation this year.  Does your chosen facility have processes and QA?  They should do!  If you’re working on human subjects, you may need CLIA accreditation, or you might want to insist on GLP.  These are all the same thing (albeit with different demands) – defining and documenting processes to ensure a consistent standard.

9. Knowledge, research, collaboration

All you need is a sequencing provider that knows how to sequence, right?  Right?! Wrong! Why not choose an NGS supplier that knows about your research interests too?

I’ll give you an example.  Take the pig genome, recently published in Nature.  The first authors are Martien Groenen and Alan Archibald, both internationally recognized experts in pig research and pig genomics.  Martien works at Wageningen University in the Netherlands, and Alan is my boss, and works at The Roslin Institute – home of ARK-Genomics.  You won’t be surprised to know that both institutions run next-generation sequencing facilities.

We know the pig genome well.  We’ve studied it for years.  We helped design the Illumina porcine SNP50 chip and the Affymetrix Porcine expression array (Alan is name-dropped here), and we recently published a Porcine Gene Expression atlas.

So if you’re doing research into pigs, the only question is why you wouldn’t do it in collaboration with a sequencing provider that knows the pig genome inside out?  Why would you choose anyone else?  Other providers might not know crucial details – for example, they might not be aware that the IFITM gene cluster, which encodes flu resistance, is poorly assembled, or that the growth regulating IGF2 is missing from the assembly entirely.  If you choose a provider who knows about the biology you are interested in, that can only benefit your research.

This is starting to sound like an advert, so I’ll mention others.  Mark Blaxter is a World-renowned nematode and evolutionary biologist, and he also happens to run an NGS facility – the GenePool.  If you’re into worms and you’re not doing your NGS with Mark, then you’re doing it wrong.  Likewise, Neil Hall is last author on the Wheat genome paper.  He also runs Liverpool’s Centre for Genomic Research.  Can you guess what I’m going to suggest?!

10. Comparing like with like

Show me a quote, show me any quote, and I’ll beat it.  It’s easy.  I’ll just do single-end instead of paired-end; or I’ll give you less reads;  or shorted read-lengths.  Actually, I won’t, because I have ethics, but you get the general idea.  As an example, for RNA-Seq, we recommend a minimum of 30M reads per sample, and 100bp paired-end sequencing as standard (the longer reads are better at defining splice junctions, and the pairs help reconstruct transcripts).  Now this experiment will incur a certain cost, and any other NGS facility can undercut our quote if they offer you 50bp paired-end, or 36bp single-end etc.  They may miss out crucial details, such as that TopHat, the most popular spliced aligner, is optimized for reads longer than 75bp.  Alternatively, they might give you 15M reads per sample instead of 30M, failing to inform you that ENCODE recommend 30M as a minimum.

You get my point, though – when choosing an NGS supplier, make sure you compare like with like!


So there we have it, my top ten things to consider when choosing an NGS supplier 🙂

The big, fat elephant in the room is what I haven’t mentioned – price.  To be honest, if price is more important to you than any of the above issues, then you’re doing it wrong – step away from the science and come back when you’ve thought about it a little bit more.  Sure, price is important up to a point, no-one likes to be ripped off and we all love a bargain, but in my opinion, it comes a long way down the list compared to the issues mentioned above.

Because good science and good data is the highest priority, right?

Update 11:31am 21/01/13

I should have said, the above points are in no particular order.