Matthew Bainbridge, Baylor College of Medicine - “Human Variant Discovery Using DNA Capture Sequencing”
overview: technology + pipeline, then genome pilot 3, snp calling, verification.
Use solid phase capture – Nimblegen array + 454 sequeencing
map with BLAT and cross_match. SNP calling (ATLAS-SNP).
All manner of snp filtering.
1.Remove duplicates with same location
2.Then filter on p value.
3.More.. [missed it]
226 samples of 400.
Rebalanced Arrays.. Some exons pull down too much, and others grab less. You can change concentrations, then, and then use the rebalanced array.
Average coverage came down, but overall coverage went up.. Much less skew with rebalanced array. 3% of target region just can't get sequence. 90% of sequence ends up covered 10x or better.
Started looking at SNPs – frequency across individuals.
Interested in Ataxia, hereditary neurological disorder. Did 2 runs in first pilot test on 2 patients. Now do 4. Found 18,000 variants. Found one in the gene named for that disease – turned out to be novel, and non-synonymous. Follow up on it, and it looks good: and sequence it in the rest of the family, but it didn't actually exist outside that patient.
So that brings us to validation: Concordance to HapMap, etc etc, but they only tell you about false negatives, not false positives. You have to go learn more about false positives with other methods, but the traditional ones can't do high throughput. So, to verify, they suggest using other platforms: 454 + SOLiD.
When they're done, you get a good concordance, but the false positives drop out. The interesting thing is “do you need high quality in both techniques?” The answer seems to be no. You just need high quality in one... but do you need even that? Apparently, no, you can do this with two low quality runs from different platforms. Call everything a SNP (errors, whatever.. call it all a SNP.) When you do that and then build your concordance, you can get a very good job of SNP calling! (60% are found in dbsnp.)
My Comments: Nifty.
Use solid phase capture – Nimblegen array + 454 sequeencing
map with BLAT and cross_match. SNP calling (ATLAS-SNP).
All manner of snp filtering.
1.Remove duplicates with same location
2.Then filter on p value.
3.More.. [missed it]
226 samples of 400.
Rebalanced Arrays.. Some exons pull down too much, and others grab less. You can change concentrations, then, and then use the rebalanced array.
Average coverage came down, but overall coverage went up.. Much less skew with rebalanced array. 3% of target region just can't get sequence. 90% of sequence ends up covered 10x or better.
Started looking at SNPs – frequency across individuals.
Interested in Ataxia, hereditary neurological disorder. Did 2 runs in first pilot test on 2 patients. Now do 4. Found 18,000 variants. Found one in the gene named for that disease – turned out to be novel, and non-synonymous. Follow up on it, and it looks good: and sequence it in the rest of the family, but it didn't actually exist outside that patient.
So that brings us to validation: Concordance to HapMap, etc etc, but they only tell you about false negatives, not false positives. You have to go learn more about false positives with other methods, but the traditional ones can't do high throughput. So, to verify, they suggest using other platforms: 454 + SOLiD.
When they're done, you get a good concordance, but the false positives drop out. The interesting thing is “do you need high quality in both techniques?” The answer seems to be no. You just need high quality in one... but do you need even that? Apparently, no, you can do this with two low quality runs from different platforms. Call everything a SNP (errors, whatever.. call it all a SNP.) When you do that and then build your concordance, you can get a very good job of SNP calling! (60% are found in dbsnp.)
My Comments: Nifty.
Labels: AGBT 2009
0 Comments:
Post a Comment
<< Home