Analysis of chromosomal positions of CpGs within each module (CpGchromosome)

Analysis of chromosomal positions of CpGs within each module (CpGchromosome)

We are interested in understanding the mechanisms behind the comethylation networks and what drives CpGs to become a part of one or another module. One of the hypotheses is chromosomal location: CpGs that belong to a comethylation module also belong to one chromosome and cluster within a chromosome. 

When I created chromosome counts of CpG coordinates for each module of a comethylation network I found that in quite a few modules CpGs come from a single chromosome:

Network

Total # of modules

# of modules with all CpGs from one chr.

# of modules with more than 50% from one chr.

Network

Total # of modules

# of modules with all CpGs from one chr.

# of modules with more than 50% from one chr.

mb

65

9

18

mbc

79

27

25

I also attempted to plot CpG locations from single-chromosome modules on that chromosome relative to all CpGs from that chromosome on the platform (this statement should be written in simpler English). Although my code seems to have some flaws and CpG frequencies per bin (I used 50k) sometimes seem higher than for the background (should not be the case) it is obvious from the plots that not only some modules are represented by a single chromosome, also: the CpGs cluster within that chromosome (some occupy the entire p or q arms of the chromosome). It is fun to see that chromosome X is covered completely (yellow module from the mb is especially representative (80% of it is X chromosome in both networks) but there are also some loci in the gold module (in the mb)).

mbc:

I also performed the hypergeometric test to calculate enrichment of loci in each module. I summarized the results as a heatmap (in this case multiple testing adjustment was done separately for each module rather than the entire list of modules. Each module membership was tested against each of 24 chromosomes).

The left side heatmap shows P values obtained in Cox proportional hazards model of patients survival with each single module (i.e. PC1) as a predictor. There are three colored categories: red - P value is equal or less than 0.05, orange - P value is between 0.05 and 0.1, yellow - P value is more than 0.1. 

Interesting thing: the platform does have some CpGs on Y chromosome. Some modules  (besides the grey one) include these CpGs: brown, honeydew, turquoise, yellow in mb and blue, green, turquoise, yellow in mbc. What does it mean? Crosshybridization? Here is a paper that discusses this phenomenon with Illumina's methylation arrays. Something to think about.

To do things: fix the CpG density plots. Xia suggested to use her code:

####function to generate bins and counts within bins gnrt.bin.count <- function(data.set, coord.col,  x.min, x.max, x.incr) {  x <- data.set  y <- 0  x.cuts <- floor((x.max-x.min)/x.incr)  x.pos <- cbind(seq(x.min,x.max,x.incr), seq(x.min,x.max,x.incr) + x.incr)  y <- apply(x.pos, 1, FUN=function(x,y) { length(y\[y>=x\[1\] & y < x\[2\]\]) } , x[,coord.col])    return(cbind(x.pos, as.numeric(unlist(y)))) } ##do this for module loci then for all loci ##plot normalized count: module loci bin counts/all loci bin counts

lt would be also interesting to take a chromosome and plot CpGs from several modules (in different colors) on it to see whether and how they overlap.