/
Gene overlap between coexpression and comethylation networks

Gene overlap between coexpression and comethylation networks

By calculating correlation between PC1 of the coexpression and comethylation networks I found about 50 pairs that were correlated (Rho = 0.4). Question: are there are any genes in common between the modules of the coexp. and cometh. network? Will gene membership explain correlation between coexpression and comethylation modules? To check this the following steps were performed:

  1. Map Gene Symbols from the coexpression network to ENTREZ Gene ID using Bioconductor package org.Hs.eg.db : org.Hs.egSYMBOL2RG. Out of 12042 I was able to map 11244. Code for this mapping can be found here: trunk/
  2. Use Bioconductor package IlluminaHumanMethylation.db to map CpG IDs to the nearest gene as ENTREZ Gene ID.
  3. Calculate overlap between each possible pair of module from the coexp. and cometh. network. Count only unique genes in common. The overlap is calculated as the ratio of the unique genes in common to the size of the smallest module in the pairs being compared (the size of the module included the count of non-unique Gene IDs).
  4. To calculate associated P values for the significance of each overlap I need to calculate "simulation overlaps". The hypergeometric test won't work in this particular scenario. The reason for that is because multiple CpGs map to a single ENTREZ Gene ID, on top of that CpGs that map to the same gene often end up in different comethylation modules because their methylation status is different. This discovery surprised me because I read earlier that CpGs within 600 nt distance are correlated (have similar methylation status). Is there a distance trend from the TSS that would explain these differences in methylation? Anything else? Calculation of the simulated overlap: for each module in the coexp. network select n number of genes from the methylation data with replacement where n is the size of the cometh. module for which overlap is being calculated. Repeat this sampling 1000 times. P value is calculated as the number of times when the overlap is equal or higher than the observed overlap for that module, then it is divided by the number of iterations (1000). The code for this simulation can be found here: trunk/

Results: mb

Number of overlapping genes between every possible pair of module from two networks:


There are very few pairs for which the is relatively large. These pairs represent the largest modules and the "garbage": turquoise and grey.

Percent of the overlapping genes (defined as number of the overlapping genes to the size of the smallest module in the analyzed pair and multiplied by 100) and associated p values:
 
The majority of the overlaps appeared to be non-significant according to the results of the simulation. This is shown on the right panel with the histogram of the p values calculated for every pair of modules. 

Heatmaps showing percent overlap and the associated p values for every pair:

 

Analysis of the overlap for the correlated modules of the coexpression and comethylation networks:

Rho=0.4

Rho=0.5

Rho=0.6

One thing that is very clear from the last 6 histograms is that with increase in Rho stringency I see an increase in the percentage of overlap as well as the tendency to the smaller p values. 

The list of correlated coexpression and comethylation modules, together with the number of overlapping genes and associated p values can be found here

Analysis of comethylation modules and genes which expression is correlated with DNA methylation (mb network)

Next, the genes that are shared between highly correlated comethylation and coexpression modules, how many of them are actually correlated with DNA methylation? 

I pulled out a list of the genes/CpG cis pairs and found the genes that are correlated with the nearest CpG site. Used P value cutoff of 0.005, total number of pairs is 6146 (not sure about the Rho value, probably need to check it). Next I mapped the CpG to their comethylation modules, 2344 out of 6146 belong to the gray module. Total number of genes that are overlapping between correlated (abs(Rho)>=0.4) coexpression and comethylation modules is 904. I found that 872 of them are also correlated with the methylation status of the nearby CpG (323 genes have negative correlation and 549 have positive correlation). This means that the common genes (the very small number of them) between correlated coexpression and comethylation modules also correlate with methylation status of CpGs. 

I also checked to see if these overlapping/correlating genes have high K.in in comethylation modules. Although I haven't saved the picture the distribution of K.in values (expressed as a percentile) was pretty much uniform. Conclusion: the overlapping/correlating genes don't tend to be highly connected.