Think Gene Think Gene RSS

a bio blog about genetics, genomics, and biotechnology

DNA Helix

Posts Tagged ‘sequencing’

Helicos Error Rates

Steve Murphy reports consternation regarding Helicos’ “5% total error rate.” I’m no expert (as is Daniel MacArthur at Genetic Future, I’ll run this by him [done]), but I think that “total” in this context means for the total read sequence, not per base pair. However, assuming that error is independent per read, the “5%” statistic isn’t that relevant because one can achieve any arbitrary confidence by re-running the read. “5%” is just an intermediate variable. The per read error could be 99% and hypothetically one could still have a viable test if reads were cheap and fast enough.

From the Helicos Press Release:

Initial commercial specifications for the Helicos Genetic Analysis Platform were set at 50 Mb per hour; 10 Gb per run in 8 days. Early adopters can expect 8 million reads at length-of-read from 25 to 50 bases in each of the 50 flow cell channels utilized, totaling 400 million reads per run. Aftermarket costs are approximately $1.80 per megabase sequenced or $45 per million reads. Additionally, performance is independent of template sizes anywhere from 25 b to 8 Kb. The total error rate is less than or equal to 5%, with a competitive 0.5% substitution error rate. Further, the error rate is independent of the read length. The HeliScope Sequencer is capable of accurately sequencing samples with 20% to 80% GC content.

Scientific data presented at conferences throughout the third quarter included conclusive demonstration of HeliScope Sequencer performance exceeding 50 Mb per hour using three bacterial genomes of diverse genomic content with limited if any sequence content bias, as well as proof-of-concept on single molecule paired reads maintaining our simple, amplification-free sample prep on human placental RNA and a demonstration of digital gene expression performance demonstrating accuracy, high levels of reproducibility and quantitation.

The somewhat troubling statistic here is the size of the read (25 to 50 bases). These sequencers work by shredding many copies of a DNA molecule into small overlapping pieces, reading these pieces, and then reassembling the reads into a sequence using a statistical model and templates. Yes, each composite read piece may be near 100% accurate, but depending on the sequence to be reassembled (e.g. long repeats) it can be difficult to reassemble smaller read pieces with sequence-consistent high confidence. The bigger the read pieces are, and the more those pieces overlap, the better the pieces can be reassembled. Small read pieces are not as reliable or accurate in reassembly.

But, real story here seems to be that there is no story yet: this is an intra-industry release meant to demonstrate progress to customers and investors. I read it as set of selective statistics an industry salesman could use to favorably compare against competitor statistics —to be conveniently “corroborated” by the press. If Helicos had something to broadcast outside its industry’s bubble, Helicos would have published something like: “We can sequence the human genome (3Gb) in X days for Y dollars with Z accuracy. Yes, we to be used to provide medical advice in the specific ways outlined at LINK for details and caveats. Pre-order now!” I don’t see that.

Otherwise, outside the sequencing industry (e.g. the medical community, including Yale genetics), the answer to “How much do I pay you to sequence my genome, how long will it take, and how accurate will your report be?” is: It Depends. But, as Dilbert knows, “it depends” is tech-speak for “abandon all ye hope of a useful answer.” “Abandon all hope” is probably not the best attitude for  prime-time medical application.

Software developed by Boston College lab delivers speed and accuracy to genome research

It took a global corps of scientists approximately $500 million and 13 years to identify the more than 35,000 genes of the human genome. Five years later, Boston College Biologist Gabor Marth and his research team have developed software that can analyze half a million DNA sequences in 10 minutes.The Marth laboratory’s proprietary PyroBayes software is one of a new breed of computer programs able to accurately process the mountains of genome data flowing from the latest generation of gene decoding machines, which have placed a premium on computational speed and accuracy in data-crunching fields known as bioinformatics and high-throughput biology, said Marth, an associate professor of Biology.

“We’re on the edge of a real technological revolution that I think will help us understand the genetic causes of diseases in humans and how genetic materials determine traits in animals,” said Marth. “It is going to lead to less expensive technologies that will allow researchers to decode any individual.” … Continue Reading »