Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Monday, May 25, 2009

Can't we use ChIP-chip controls on *-Seq?

Thanks to Nicholas, who left this comment on my web page this morning, in reference to my post on controls in Second-Gen Seqencing:
Hi Anthony,

Don't you think that controls used for microarray (expression
and ChIP-chip) are well established and that we could use
these controls with NGS?

Cheers!

I think this is a valid question, and one that should be addressed. My committee asked me the same thing during my comprehensive exam, so I've had a chance to think about it. Unfortunately, I'm not a statistics expert, or a ChIP-chip expert, so I would really value other people's opinion on the matter.

Anyhow, I think the answer has to be put in perspective: Yes, we can learn from ChIP-chip and Arrays for the statistics that are being used, but no, they're not directly applicable.

Both ChIP-chip and array experiments are based on hybridization to a probe - which makes them cheap and reasonably reliable. Unfortunately, it also leads to a much lower dynamic range, since they saturate out at the high end, and can be undetectable at the low end of the spectrum. This alone should be a key difference. What signal would be detected from a single hybridization event on a micro-array?

Additionally, the resolution of a chip-chip probe is vastly different from that of a sequencing reaction. In ChIP-Seq or RNA-Seq, we can get unique signals for sequences with a differing start location only one base apart, which should then be interpreted differently. With ChIP-chip, the resolution is closer to 400bp windows, and thus the statistics take that into account.

Another reason why I think the statistics are vastly different is because of the way we handle the data itself, when setting up an experiment. With arrays, you repeat the same experiment several times, and then use that data as several repeats of the same experiment, in order to quantify the variability (deviation and error) between the repeats. With second-generation sequencing, we pool the results from several different lanes, meaning we always have N=1 in our statistical analysis.

So, yes, I think we can learn from other methods of statistical analysis, but we can't blindly apply the statistics from ChIP-chip and assume they'll correctly interpret our results. The more statistics I learn, the more I realize how many assumptions go into each method - and how much more work it is to get the statistics right for each type of experiment.

At any rate, these are the three most compelling reasons that I have, but certainly aren't the only ones. If anyone would like to add more reasons, or tell me why I'm wrong, please feel free to add a comment!

Labels: , ,

2 Comments:

Anonymous Anonymous said...

Anthony,

i think it is very inaccurate to state that chip-seq has a higher dynamic range than chip-chip.

the dynamic range of chip-seq strongly depends on the sequencing depth or the relative number of mapped reads per genome.

this is obviously not a fixed number and therefore one should not advertise a chipseq feature the way the industry advertises it.

second: the 'validation' of chipseq by chipchip and or qpcr is nothing but a technical validation (can you detect the same piece of DNA with a different tech). it is important to point out that a biological validation is not all provided by a control chipchip or qPCR.

August 26, 2009 2:46:00 AM PDT  
Blogger Anthony Fejes said...

Hi Anonymous,

I somewhat disagree with your comments. Yes, the dynamic range of ChIP-seq is dependent on the coverage depth, but that becomes less and less of an issue as the sequencing technology becomes better and better. At this point, one lane or sequencing (illumina) is roughly equivalent to 4-5 lanes of sequencing from about a year ago. Surely that depth is sufficient to give you a good dynamic range.

Point 1b - I'm not advertising anything. This blog is my own personal opinion, and should be considered as such. There are no commercial sponsors for it, and the $50/year it takes to keep it running comes from my own pocket.

Second point: I appreciate your comment, but I'll let the readers decide if they believe qPCR is validation of a binding site or not. While you may not be able to detect the same piece of DNA twice (technical validation), in my mind, confirming the location by two separate techniques may be interpreted as a strong indication of the biological phenomenon.

August 26, 2009 8:59:00 AM PDT  

Post a Comment

<< Home