AGBT 2010 - Stacey Gabriel, Broad Institute
Applications of New Sequencing Technology to Medical and Cancer Genetics
Testing the full range of allele frequencies
* High frequency polymorphisms = 90% of heterozygosity
* Low frequency polymorphisms = 9%
* Rare mutations = <1%
As cost of generating sequences drops, ability to generate sequence rises quickly, we can now work on rare variants more effectively.
1. 1000 Genomes
2. Exome Sequencine
3. genome and exome sequencing in cancer - hardest set. Eg. cancers
1000 Genomes:
* Shotgun and targeted sequencing in reference samples
* Very large collaborative project
* Should get all variation down to 0.5%.
* Will enable better array development in the future
Analytic pipeline
* unmapped reads to genetic variation.
1. data production
2 mapping (qc and calibration)
3. pilots (shallow or deep)
4. Bayesian modeling
5. Variant calling
(check out poster on this)
Pilot results:
* SNPs (18M SNPs, 50% are novel)
* FDR of novel variants <10%
* 1.5M short indels (Validation under review)
* CNVs (~10,000)
* All of this will be in dbsnp131
What does it mean?
* Now, 90-95% of snps called in any genome have already been discovered
* dbsnp is growing a LOT in 131.
Move on to de novo Mutations (or rare mutations)
* Exome sequencing ($5k -> $1k per sample in the future)
* Whole genome ($50k -> 10k per sample in future)
There will always be a need to do targetted exomes.
Goals of sequencing exomes at scale
* > 95% of alignable exome at 10% or less cost of whole genome sequencing
* Remove sample prep capacity as a bottleneck
Sequencing more individual molecules reduces the chance that PCR errors become high quality snp calls
Targets: v.1.1
* 18,560 Genes
* 188,260 targets
* 32.7mb target territory
* 473k baits
* 120b bait length
* 41mb bait territory
Exome coverage: mean coverage 150x
* mean, 15k snps, 88% concordancy (1/1700 snp/bases)
* Scale up : 1500 selections every month
* should go towards 8000 per month
* planning to do 5000 by end of year
Applications
* First example: pilot - Extreme LDL-C
* 6 children with high LDL (95th percentile LDL)
* Previously screened by other labs for normal genes, nothing found
* Called snps, ended up with 16,00, not in dbsnp 784, not in 1000 genomes 512, not in 60 controls: 286
* Of that, synonymous: 105, missense : 170, premature stop: 4, splice sites: 7.
* So, look at burden on genes-> no smoking guns, however. There are interesting glimpses.
* 2 of the individuals had 2 muttations missed in earlier screening
Another example, 5 families, 6 samples - Low LDL
* Found one gene with 2 stop mutations. Has been implicated in other LDL studies.
* Sequencing data came off this weekend.
Cancer:
* New mutations that occur only in tumour and not in matched normals.
* Whole genome shotgun sequencing.
* Looking at all types of changes
* one 120 genomes (60 tumour normals)
(Circos diagrams!)
* Metrics: Mutation rate, copy number, inter chomosomal, intra chromosomeal
* Won't do well until we start looking at similar tumours of one type
* Need to sequence ~500 samples to find good statistic power
Multiple Myeloma:
* Have enough samples to do this.
* Incurable - median survival ~4 years
* 26 pairs whole genome
* 17 pairs whole exome
* point mutations: avg 40 mutations per sample (95% validation)
* rearrangements: 200 candidates per sample (30-50% validation)
* Nothing recurrent (rearrangements)
* Mutation count - 1.5 mutations per Mb.
* Significant mutated genes: Yes.
* NRAS, KRAS, TP53, etc.
Pathway analysis also shows some good information: co-agulation pathway (novel finding)
Found known mutations
* Found Acvivating mutations in BRAF
* Found coaggualation mathway
* DIS3 mutations and FAM46c mutations (25% of cases)
* Mutations in HOX modifying genes
* IRF4 mutations (two identical mutations) - known required for MM survival
Sequencing is getting easier and cheaper.... Firehose analogy.
Challenges:
* analysis and interpretation
* data quality
Testing the full range of allele frequencies
* High frequency polymorphisms = 90% of heterozygosity
* Low frequency polymorphisms = 9%
* Rare mutations = <1%
As cost of generating sequences drops, ability to generate sequence rises quickly, we can now work on rare variants more effectively.
1. 1000 Genomes
2. Exome Sequencine
3. genome and exome sequencing in cancer - hardest set. Eg. cancers
1000 Genomes:
* Shotgun and targeted sequencing in reference samples
* Very large collaborative project
* Should get all variation down to 0.5%.
* Will enable better array development in the future
Analytic pipeline
* unmapped reads to genetic variation.
1. data production
2 mapping (qc and calibration)
3. pilots (shallow or deep)
4. Bayesian modeling
5. Variant calling
(check out poster on this)
Pilot results:
* SNPs (18M SNPs, 50% are novel)
* FDR of novel variants <10%
* 1.5M short indels (Validation under review)
* CNVs (~10,000)
* All of this will be in dbsnp131
What does it mean?
* Now, 90-95% of snps called in any genome have already been discovered
* dbsnp is growing a LOT in 131.
Move on to de novo Mutations (or rare mutations)
* Exome sequencing ($5k -> $1k per sample in the future)
* Whole genome ($50k -> 10k per sample in future)
There will always be a need to do targetted exomes.
Goals of sequencing exomes at scale
* > 95% of alignable exome at 10% or less cost of whole genome sequencing
* Remove sample prep capacity as a bottleneck
Sequencing more individual molecules reduces the chance that PCR errors become high quality snp calls
Targets: v.1.1
* 18,560 Genes
* 188,260 targets
* 32.7mb target territory
* 473k baits
* 120b bait length
* 41mb bait territory
Exome coverage: mean coverage 150x
* mean, 15k snps, 88% concordancy (1/1700 snp/bases)
* Scale up : 1500 selections every month
* should go towards 8000 per month
* planning to do 5000 by end of year
Applications
* First example: pilot - Extreme LDL-C
* 6 children with high LDL (95th percentile LDL)
* Previously screened by other labs for normal genes, nothing found
* Called snps, ended up with 16,00, not in dbsnp 784, not in 1000 genomes 512, not in 60 controls: 286
* Of that, synonymous: 105, missense : 170, premature stop: 4, splice sites: 7.
* So, look at burden on genes-> no smoking guns, however. There are interesting glimpses.
* 2 of the individuals had 2 muttations missed in earlier screening
Another example, 5 families, 6 samples - Low LDL
* Found one gene with 2 stop mutations. Has been implicated in other LDL studies.
* Sequencing data came off this weekend.
Cancer:
* New mutations that occur only in tumour and not in matched normals.
* Whole genome shotgun sequencing.
* Looking at all types of changes
* one 120 genomes (60 tumour normals)
(Circos diagrams!)
* Metrics: Mutation rate, copy number, inter chomosomal, intra chromosomeal
* Won't do well until we start looking at similar tumours of one type
* Need to sequence ~500 samples to find good statistic power
Multiple Myeloma:
* Have enough samples to do this.
* Incurable - median survival ~4 years
* 26 pairs whole genome
* 17 pairs whole exome
* point mutations: avg 40 mutations per sample (95% validation)
* rearrangements: 200 candidates per sample (30-50% validation)
* Nothing recurrent (rearrangements)
* Mutation count - 1.5 mutations per Mb.
* Significant mutated genes: Yes.
* NRAS, KRAS, TP53, etc.
Pathway analysis also shows some good information: co-agulation pathway (novel finding)
Found known mutations
* Found Acvivating mutations in BRAF
* Found coaggualation mathway
* DIS3 mutations and FAM46c mutations (25% of cases)
* Mutations in HOX modifying genes
* IRF4 mutations (two identical mutations) - known required for MM survival
Sequencing is getting easier and cheaper.... Firehose analogy.
Challenges:
* analysis and interpretation
* data quality
Labels: AGBT 2010
0 Comments:
Post a Comment
<< Home