AGBT 2010 - Carlos Bustamante - Stanford University School of Medicine
Complete Genome Sequencing and Analysis of a Diploid African-American and Mexican-American Genome: Implications for Personal Ancestry Reconstruction and Multi-Ethnic Medical Genomics
Motivation and Objectives:
* GWAS has been successful
* Many traits, however, are not being explained by GWAS
* Understanding rare and common genetic variants will require multi-ethnic sequencing
Goals:
* reseq two admixed genomes to high coverage
* compare to population and demographic models
* Understand diversity [?... missed this point]
Recap:
* Establish resource for studying human population gentics, recent dmography and admixture
* Using Affymetrix 500k
* 500 samples
You can cluster by ancestry, principle component analyses.
* Admixing: S. Asian and Mexican.
* Using PC 3 an PC 4, you get huge amount of diversity from native americans that aren't sampled by current chips.
Think about approaches that are "dated ribbon?"
* proportion of african ancestry P = b / (a+b)
[I'm really missing stuff in this talk - it's very quick, and I know nothing about admixing.... will read up on that later.]
Approach
* PCA along windows across genome
* Use HMM for Admixture Estimation in African Americans
* This identifies "Ancestry switchpoints" which seem to be cross over events that skew towards one or the other ancestry within the same chromosome.
* Multiple events in one chromosome are possible.
Individual ancestry results:
* You get a lot of variation in content across single chromosomes, you can then quantify this amount.
* Latin americans, however, are all over the place - they are really mosaic.
Great variety in amount of ancestry and location of breakpoints.
Take home message:
Personal ancestry reconstruction including detection of admixture tracts ins feasible on genome-wide scale
How to improve ability to deconvolute this using Sequencing.
* use reference human genome samples sequenced with SOLiD.
* includes 100 genomes data
New Tools:
* STRUCTURE 2.0 LIKE" algorithm -- J. Degenhardt
Use Reconstruction - shows each chromosome in different colors representing which of the ancestries is likely at each window.
* Can see small regions. Are they important? Are they real? Do they matter?
Haplotype-based Andmixture deconvolution
* can reveal fine-scale admixture.
* Seems the signals (small regions are real.)
* Lots of small regions (segments) of diverse ancestry signatures in the genome.
* Do they happen at hotspots?
Looked at mexican
* Many more switch points than previous example (African)
* [Sums up history.... ]
Distribution of Ancestry switches is used to compare
* can look at history of mixture - correlates to length of mix of population
* Scales as (1 + k) / (1 + theta).
Can use this information to find "time to most common recent ancestry"
Extrapolate this to show lengths of time for whole chromosomes and genomes.
* TMRCA varies dramatically along the genome.
* Also fits nicely with SNP work that people have done (dbSNP mentioned as well.)
Functional implications
* discovered ~10,000 NS snps in each genome (Varies by individual)
** Some might be deleterious...
* Functional Annotation of nsSPS using PolyPhen
* Show that admixed populations share more snps, I think.
* snps that are probably damaging are highest in CEPH and MEX.
[Moving very fast over a bunch of slides showing the same message - no notes here.]
Bottle neck in European founding population - Europeans show more deleterious SNPs.
Demographic models explain difference in dN/dS
Conclude:
* 3M snps for each genome
* 10k nsSNPs
* thinking about demographic history is important
* they're really only been working on only one small bit of diversity of the human genome. More will be necessary moving forward for medical applications.
Motivation and Objectives:
* GWAS has been successful
* Many traits, however, are not being explained by GWAS
* Understanding rare and common genetic variants will require multi-ethnic sequencing
Goals:
* reseq two admixed genomes to high coverage
* compare to population and demographic models
* Understand diversity [?... missed this point]
Recap:
* Establish resource for studying human population gentics, recent dmography and admixture
* Using Affymetrix 500k
* 500 samples
You can cluster by ancestry, principle component analyses.
* Admixing: S. Asian and Mexican.
* Using PC 3 an PC 4, you get huge amount of diversity from native americans that aren't sampled by current chips.
Think about approaches that are "dated ribbon?"
* proportion of african ancestry P = b / (a+b)
[I'm really missing stuff in this talk - it's very quick, and I know nothing about admixing.... will read up on that later.]
Approach
* PCA along windows across genome
* Use HMM for Admixture Estimation in African Americans
* This identifies "Ancestry switchpoints" which seem to be cross over events that skew towards one or the other ancestry within the same chromosome.
* Multiple events in one chromosome are possible.
Individual ancestry results:
* You get a lot of variation in content across single chromosomes, you can then quantify this amount.
* Latin americans, however, are all over the place - they are really mosaic.
Great variety in amount of ancestry and location of breakpoints.
Take home message:
Personal ancestry reconstruction including detection of admixture tracts ins feasible on genome-wide scale
How to improve ability to deconvolute this using Sequencing.
* use reference human genome samples sequenced with SOLiD.
* includes 100 genomes data
New Tools:
* STRUCTURE 2.0 LIKE" algorithm -- J. Degenhardt
Use Reconstruction - shows each chromosome in different colors representing which of the ancestries is likely at each window.
* Can see small regions. Are they important? Are they real? Do they matter?
Haplotype-based Andmixture deconvolution
* can reveal fine-scale admixture.
* Seems the signals (small regions are real.)
* Lots of small regions (segments) of diverse ancestry signatures in the genome.
* Do they happen at hotspots?
Looked at mexican
* Many more switch points than previous example (African)
* [Sums up history.... ]
Distribution of Ancestry switches is used to compare
* can look at history of mixture - correlates to length of mix of population
* Scales as (1 + k) / (1 + theta).
Can use this information to find "time to most common recent ancestry"
Extrapolate this to show lengths of time for whole chromosomes and genomes.
* TMRCA varies dramatically along the genome.
* Also fits nicely with SNP work that people have done (dbSNP mentioned as well.)
Functional implications
* discovered ~10,000 NS snps in each genome (Varies by individual)
** Some might be deleterious...
* Functional Annotation of nsSPS using PolyPhen
* Show that admixed populations share more snps, I think.
* snps that are probably damaging are highest in CEPH and MEX.
[Moving very fast over a bunch of slides showing the same message - no notes here.]
Bottle neck in European founding population - Europeans show more deleterious SNPs.
Demographic models explain difference in dN/dS
Conclude:
* 3M snps for each genome
* 10k nsSNPs
* thinking about demographic history is important
* they're really only been working on only one small bit of diversity of the human genome. More will be necessary moving forward for medical applications.
Labels: AGBT 2010
0 Comments:
Post a Comment
<< Home