AGBT 2010 - Manual Garber - Broad
Annotating LincRNA Transcripts Using Targeted Sequencing
Goal: Identify functional large ncRNAs in the mammalian genome
* look like mRNA, but non-coding
* Use Chip-Seq to separate genome into regions
* use Tiling arrays, hybridize RNA...
* Tiling arrays - no information about connectivity, limited resolution
* studying the functions of lincRNAs reqruie precise sequences for both experimental and computational analyses.
Use RNA-Seq protocol to build transcriptome
what RNA-seq gives you:
* RNA, map to genome
* introns... junction reads.
* use reads with mate in poly-A to find end.
Used Tophat to align
Junction reads:
* Longer reads provide junction evidence
* first, use only reads that align with a gap. (Build connectivity map)
* topology map
* use map with ChIP-Seq data to build "paths"
* use paths to call transcripts
* clean up with Paired End Data - > join or kill unlikely isoforms.
Example:
* Mouse ES
* Illumina sequence (156M - 76bp reads)
* 75% exonic alignment
* correctly reconstruct most expressed known genes at single nucleotide resolution.
* works even on overlapping genes.
* 81% genes fully-reconstructed
* Good recovery of genes at all expression levels.
Novel Transcripts discovered:
* 800 loci between genes
** 250 out of 317 ES lincsRNA are reconstructed
* 200 loci overlapping genes
** 131 overlap coding exons. (making them antisense for visual purpose.)
Are they protein coding genes?
* LincRNAs are probably too small to produce proteins [Strange assumption, IMHO... maybe I'm missing something.]
* 650 of 800 have no lincRNAs have no coding potential
* have lower expression level than coding regions.
* intergenic transcript conservations.. (similar conservation to old lincRNAs)
* Antisense transcripts? - no antisense coding potential
* antisense expression - very low antisense expression
* Antisense conservation - a little more conserved than sense lincRNA because of overlap with exons of genes
* antisense exons are not conserved.
What do overlapping trancripts do?
* expression is low,
* little or no conservation
* correlation with overlapping transcripts
* Thus: artifacts, noise, fine tuners? other ideas?
Conclusion
* novel statistical method takes advantage of longer reads
* mouse ES coding gene novelties
* intergenic non coding RNA (lincRNA)
* new family of antisense non coding RNA
* validation of 18/20.
Goal: Identify functional large ncRNAs in the mammalian genome
* look like mRNA, but non-coding
* Use Chip-Seq to separate genome into regions
* use Tiling arrays, hybridize RNA...
* Tiling arrays - no information about connectivity, limited resolution
* studying the functions of lincRNAs reqruie precise sequences for both experimental and computational analyses.
Use RNA-Seq protocol to build transcriptome
what RNA-seq gives you:
* RNA, map to genome
* introns... junction reads.
* use reads with mate in poly-A to find end.
Used Tophat to align
Junction reads:
* Longer reads provide junction evidence
* first, use only reads that align with a gap. (Build connectivity map)
* topology map
* use map with ChIP-Seq data to build "paths"
* use paths to call transcripts
* clean up with Paired End Data - > join or kill unlikely isoforms.
Example:
* Mouse ES
* Illumina sequence (156M - 76bp reads)
* 75% exonic alignment
* correctly reconstruct most expressed known genes at single nucleotide resolution.
* works even on overlapping genes.
* 81% genes fully-reconstructed
* Good recovery of genes at all expression levels.
Novel Transcripts discovered:
* 800 loci between genes
** 250 out of 317 ES lincsRNA are reconstructed
* 200 loci overlapping genes
** 131 overlap coding exons. (making them antisense for visual purpose.)
Are they protein coding genes?
* LincRNAs are probably too small to produce proteins [Strange assumption, IMHO... maybe I'm missing something.]
* 650 of 800 have no lincRNAs have no coding potential
* have lower expression level than coding regions.
* intergenic transcript conservations.. (similar conservation to old lincRNAs)
* Antisense transcripts? - no antisense coding potential
* antisense expression - very low antisense expression
* Antisense conservation - a little more conserved than sense lincRNA because of overlap with exons of genes
* antisense exons are not conserved.
What do overlapping trancripts do?
* expression is low,
* little or no conservation
* correlation with overlapping transcripts
* Thus: artifacts, noise, fine tuners? other ideas?
Conclusion
* novel statistical method takes advantage of longer reads
* mouse ES coding gene novelties
* intergenic non coding RNA (lincRNA)
* new family of antisense non coding RNA
* validation of 18/20.
Labels: AGBT 2010
0 Comments:
Post a Comment
<< Home