Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

Third: Finally, with Brig's help I performed supervised normalization of DNA methylation data. First, he suggested to focus only on methylated probed, because unmethylated probes, similarly to mismatch probes on old Affymetrix array are confusing and don't provide any additional information. 

In fact the unmethylated probes show overall higher intensity than methylated probes (this was first pointed out by Josh Millstein in our discussion of normalization approaches).

...

One can read my Sweave file (filename) for these analysis to understand step by step what and how I did it. In short, I performed PCA on each sub-dataset and identified technical variables that had the biggest influence on my data. It was batch, which was also highly correlated with month and center. When I removed batch, I could still see strange patterns in my data. I ended up also removing the first principal component. Since I was not sure if it was correct for my further analyses I proceded with the dataset from which only batch was removed (called mb) and the dataset from which batch and the first principal component were removed (called mbc). One more thing: I centered the red and green channel before combining them into the final datasets. 

This is all great and I agree with Brig's approach as it is very intuitive and unassuming. However, now that I proceed with data analysis it is critical for me to figure out which probes are actually methylated and which are not. Especially because I don't have any control data. How should I approach it? With M value the methods have been developed for drawing a cutoff (because the data has a distinctive bimodal shape). Should I take unmethylated probes, process them similarly to the methylated probes, combine them to make M values and apply the existing method (described here) to figure out what is methylated and what is not? Also, Bin has built comethylation networks based on mb and mbc normalization. Should I rebuild them with a new M value? Something to think about.

Important to remember: I didn't adjust the data for age or stage/grade. In comethylation networks we need to see if there is any association with age/stage/grade (this is the only biology that is available to us). I would also like to see comethylation network built with only stage III patients because it is the largest group and I am not sure how many more novel information we are gaining by keeping a few stage I, II and IV outliers.