Searching for SNPs... a disaster waiting to happen.
Well, I'm postponing my planned article, because I just don't feel in the mood to work on that tonight. Instead, I figured I'd touch on something a little more important to me this evening: WTSS SNP calls. Well, as my committee members would say, they're not SNPs, they're variations or putative mutations. Technically, that makes them Single Nucleotide Variations, or SNVs. (They're only polymorphisms if they're common to a portion of the population.
In this case, they're from cancer cell lines, so after I filter out all the real SNPs, what's left are SNVs... and they're bloody annoying. This is the second major project I've done where SNP calling has played a central role. The first was based on very early 454 data, where homopolymers were frequent, and thus finding SNVs was pretty easy: they were all over the place! After much work, it turned out that pretty much all of them were fake (false positives), and I learned to check for homopolymer runs - a simple trick, easily accomplished by visualizing the data.
We moved onto Illumina, after that. Actually, it was still Solexa at the time. Yes, this is older data - nearly a year old. It wasn't particularly reliable, and I've now used several different aligners, references and otherwise, each time (I thought) improving the data. We came down to a couple very intriguing variations, and decided to sequence them. After several rounds of primer design, we finally got one that worked... and lo and behold. 0/2. Neither of them are real. So, now comes the post-mortem: Why did we get the false positives this time? Is it bias from the platform? Bad alignments? Or something even more suspicious... do we have evidence of edited RNA? Who knows. The game begins all over again, in the quest for answering the question "why?" Why do we get unexpected results?
Fortunately, I'm a scientist, so that question is really something I like. I don't begrudge the last year's worth of work - which apparently is now more or less down the toilet - but I hope that the why leads to something more interesting this time. (Thank goodness I have other projects on the go, as well!)
Ah, science. Good thing I'm hooked, otherwise I'd have tossed in the towel long ago.
In this case, they're from cancer cell lines, so after I filter out all the real SNPs, what's left are SNVs... and they're bloody annoying. This is the second major project I've done where SNP calling has played a central role. The first was based on very early 454 data, where homopolymers were frequent, and thus finding SNVs was pretty easy: they were all over the place! After much work, it turned out that pretty much all of them were fake (false positives), and I learned to check for homopolymer runs - a simple trick, easily accomplished by visualizing the data.
We moved onto Illumina, after that. Actually, it was still Solexa at the time. Yes, this is older data - nearly a year old. It wasn't particularly reliable, and I've now used several different aligners, references and otherwise, each time (I thought) improving the data. We came down to a couple very intriguing variations, and decided to sequence them. After several rounds of primer design, we finally got one that worked... and lo and behold. 0/2. Neither of them are real. So, now comes the post-mortem: Why did we get the false positives this time? Is it bias from the platform? Bad alignments? Or something even more suspicious... do we have evidence of edited RNA? Who knows. The game begins all over again, in the quest for answering the question "why?" Why do we get unexpected results?
Fortunately, I'm a scientist, so that question is really something I like. I don't begrudge the last year's worth of work - which apparently is now more or less down the toilet - but I hope that the why leads to something more interesting this time. (Thank goodness I have other projects on the go, as well!)
Ah, science. Good thing I'm hooked, otherwise I'd have tossed in the towel long ago.
Labels: Bioinformatics, biology, Sequencing, short-reads, SNP calling, SNPS, Solexa/Illumina
0 Comments:
Post a Comment
<< Home