This morning, Dan Graur tweeted this explosive article:

I recommend everyone reads it.  tl;dr – lots of cancer cell lines are not what they’re supposed to be, having been contaminated and overtaken by other, perhaps more aggressive cell lines.

With the advent of NGS, this seems like something we could tackle relatively easily.  For example, cell lines will have (i) signature gene expression profiles; (ii) signature SNP profiles; and (iii) signature CNV profiles.

It shouldn’t be too difficult to set up a service, linked to the public databases, that can check all submitted data against known (contaminant) cell lines and which could identify datasets that perhaps come from a different cell line to that which is reported.

I propose that the funding agencies immediately fund EBI/NCBI to set up such a service, attached to the major sequence repositories, that can identify possible cell line contamination.