Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

Network

...

and

...

network

...

diagnostics

...

with

...

M

...

value

...

based

...

data

...

(center,

...

batch,

...

plate

...

row

...

and

...

column

...

variables

...

removed)

...

Check

...

if

...

the

...

network

...

is

...

scale

...

free.

...

To

...

do

...

that

...

I

...

need

...

to

...

plot

...

clustering

...

coefficient

...

against

...

network

...

connectivity.

...

To

...

calculate

...

clustering

...

coefficient:

...

  1. Use

...

  1. Bioconductor

...

  1. "graph"

...

  1. library

...

  1. Calculate

...

  1. positive

...

  1. adjacency

...

  1. matrix

...

  1. Code Block

...

  1. adjM<-abs(cor(t(data)))

...

  1. Since taking a chunk of the adjacency matrix will give a good idea about the clustering coefficient and will significantly speed up the calculations Chris suggested to take 4k by 4k matrix and use hard thresholding. I used first 4k columns and rows.

Take a look at the distribution of the small adjacency matrix:
Image Added

From the conversation with Chris it seems that I have very very few strong correlations and most of them are not that strong. Therefore when we are trying to use a scale free network (raising each correlation to the power of beta) it creates even smaller values and it impedes clustering into modules.
Hard threshold was 0.46 on the 4k by 4 k matrix to get about 1% of the nodes (adjM4000)

  1. To use it with the graph library change diagonal of the matrix to 0 (no self looping nodes!)
  2. Create an instance of the class graph and calculate clustering coefficient:
    Code Block
    
    g4000<-new("graphAM",adjMat=adjMat4000)
    cc4000<-clusteringCoefficient(g4000)
    

...

To

...

calculate

...

connectivity

...

get

...

a

...

sum

...

of

...

all

...

columns

...

of

...

the

...

hard-thresholded

...

matrix

...

(adjM4000):

...

}
Code Block
kk<-apply(adjMat4000,2, sum)
{code}

Finally,

...

get

...

rid

...

of

...

all

...

values

...

in

...

cc4000

...

and

...

kk

...

that

...

are

...

equal

...

to

...

0

...

(which

...

I

...

didn't

...

for

...

this

...

plot)

...

and

...

make

...

a

...

scatter

...

plot:

...

Image Added
If I do averaging of cc over k I can get an approximate straight line which is the evidence of the scale-free network (Barabasi, 2004). However, this plot is only a part of an the evaluation. I actually need to plot P(k) vs k to get a definite answer.

Another thing that was interesting to check is the variance of CpG loci vs connectivity. Apparently, in gene expression there is a linear relationship between these properties: more variable genes tend to be more connected. Tkae the same matrix of 4k by 4k, get var() value for each CpG and plot it. Also, plot mean value of each CpG across all patients in relation to the connectivity:

Image Added Image Added
It is interesting to see that with DNA methylation the most connected genes are actually the least variable across patients. Does it really make biological sense? Do we expect this with this type of data? Could it be a reflection of the technology rather than biology (DNA methylation in itself is not a continuous trait although it might become on if we have many cells and a heterogeneous population of them). What if we take the most variable CpGs and build network out of them rather than all 27k?

Cytoscape view of the 4k network. Color: CpG variance (darker =more variable), size = number of connections

Image Added

WGCNA

R version 2.13.0 64 bit, SageBionetworksCoex 0.11, running time ~4.5 hours (~27k by 486)

Image Added   Image Added Image Added

beta=6, R=0.89. Number of modules: 18 + 1 (grey):

Module

# of probes

Module

# of probes

black

108

pink

97

blue

1206

purple

89

brown

615

red

113

cyan

43

salmon

46

greenyellow

86

tan

85

grey

22115

turquoise

2151

grey60

33

yellow

469

lightcyan

34

lightgreen

32

magenta

92

green

121

midnightblue

43

 

 

Analysis of the first PC of each module:

Image Added Image Added

With M value almost no modules were composed of the CpGs from a single chromosome (chromosomal density plots can be found here)

Modules lightcyan, lightgreen, salmon and yellow, >55% of the loci came from the same chromosome (plots are here)

Gene Ontology analyses of each module (top 10 categories were selected and KEGG pathways didn't work for my on that day):

 Network and network diagnostics based on M value data (normalization is similar to the one above but center was retained)

The reason for going into trouble and doing another round of normalization without removing the center is Justin's comment who suggested that center may be associated with patients who come with different genetic backgrounds (different CNV profiles). If this is the case by removing the center we will get rid suck out the "genetic" component and this could be the reason why we don't see the characteristic clustering of CpGs within almost every module defined by the comethylation network. This was seen with the data for which only the methylated probes were used (see here)

Image Added Image Added Image Added

Beta is 7, R^2=0.89. Number of modules = 8 

Module

#of loci

black

41

blue

938

brown

408

green

135

pink

34

red

45

turquoise

1750

yellow

159

Percent variance explained by the first PC of each module and variability of the first PC of each module:

Image Added  Image Added

...