AGBT and Sequencability
First of all, I figured I'd try to do some blogging from ABGT, while I'm there. I don't know how effective it'll be, or even how real-time, but we'll give it a shot. (Wireless in Linux on the Vostro 1000 isn't particularly reliable, and I don't even know how accessible internet will be.)
Second, what I wrote yesterday wasn't very clear, so I thought I'd take one more stab at it.
Sequencability (or mappability) is a direct measure of how well you'll be able to sequence a genome using short reads. Thus, by definition, de novo sequencing of a genome is going to be a direct result of the sequencability of that genome. Unfortunately, when people talk about the sequencability, they talk about it in terms of "X% of the genome is sequencable", which means "sequencability is not zero for X% of the genome."
Unfortunately, even if sequencability is not zero, it doesn't mean you can generate all of the sequences (even if you could do 100% random fragments, which we can't), indicating that much of the genome beyond that magical "X% sequencable" is still really not assemblable. (Wow, that's such a bad word.)
Fortunately, sequencability is a function of the length of the reads used, and as the read length increases, so does sequencability.
Thus, there's hope that if we increase the read length of the Illumina machines, or someone else comes up with a way to do longer sequences with the same throughput (e.g. ABI Solid, or 454's GS FLX), the assemblability of the genome will increase accordingly. All of this goes hand in hand: longer reads and better lab techniques always make a big contribution to the end results.
Personally, I think the real answer lays in using a variety of techniques: Paired-End-Tags to span difficult to sequence areas (eg. low or zero sequencability regions), and Single-End-Tags to get high coverage... and hey throw in a few BACs and ESTs reads for good luck. (=
Second, what I wrote yesterday wasn't very clear, so I thought I'd take one more stab at it.
Sequencability (or mappability) is a direct measure of how well you'll be able to sequence a genome using short reads. Thus, by definition, de novo sequencing of a genome is going to be a direct result of the sequencability of that genome. Unfortunately, when people talk about the sequencability, they talk about it in terms of "X% of the genome is sequencable", which means "sequencability is not zero for X% of the genome."
Unfortunately, even if sequencability is not zero, it doesn't mean you can generate all of the sequences (even if you could do 100% random fragments, which we can't), indicating that much of the genome beyond that magical "X% sequencable" is still really not assemblable. (Wow, that's such a bad word.)
Fortunately, sequencability is a function of the length of the reads used, and as the read length increases, so does sequencability.
Thus, there's hope that if we increase the read length of the Illumina machines, or someone else comes up with a way to do longer sequences with the same throughput (e.g. ABI Solid, or 454's GS FLX), the assemblability of the genome will increase accordingly. All of this goes hand in hand: longer reads and better lab techniques always make a big contribution to the end results.
Personally, I think the real answer lays in using a variety of techniques: Paired-End-Tags to span difficult to sequence areas (eg. low or zero sequencability regions), and Single-End-Tags to get high coverage... and hey throw in a few BACs and ESTs reads for good luck. (=
Labels: assembly, Bioinformatics, Grad School, mappability, sequencability, Solexa/Illumina