After removing the batch effect as it is described here I used the residuals to build a coexpression network. Used Bruce's coexpression package (speed of WGCNA and wisdom of Bin's code), version SageBionetworksCoex_0.10. Obtained .tar.gz file from Bruce. To install on belltown for myself:
- Create ~/R/library
- Copy SageBionetworksCoex_0.10.tar.gz to ~/R/library
- Type: R CMD INSTALL -l "~/R/library" ~/R/library/SageBionetworksCoex_0.10.tar.gz
- To load the package into R: library(SageBionetworksCoex, lib.loc="~/R/library/")
- To get help: ?SageBionetworksCoex
- To run it using a wrapper function: performCoexFromFiles(filename)
Test with an old Affy expression dataset
For the first time I chose automatic selection of the beta value. Next I wanted to compare the results obtained with the package using the smaller Affy expression dataset for which Bin has already constructed the network. He chose to use beta 3.5 which also produces R^2 of 0.84. I used the same beta when I ran Bruce's package: performCoexFromFiles(filename,beta=3.5). Here is just a visual comparison:
Bin's network | Bruce's package |
---|---|
| |
| |
I realize that it is really not a good way of comparing two networks and I need to compare module membership (Bruce has used this partition algorith: http://florence.acadiau.ca/collab/hugh_public/index.php?title=R:compare_partitions), but Bruce in his analyses showed that the difference is minimal between two methods if the same beta is used. One thing that could be improved in Bruce's package is the colors for the correlation matrix.
Evaluating beta
The biggest problem of everyone who starts working with coexpression networks is to choose beta. My understanding is that the higher beta the higher the connectivity between the genes in the modules. With low beta we get more connections but modules are not as "tight". I guess it is a trade off and I need to read up on that. One of the parameters for evaluation of a coexpression network could be percent of variance explained by the first principal component of a module and the variability of the first principal component. I built 3 networks with three different beta values: 3.5 (I chose it because that was the beta used by Bin for the smaller dataset), 4 (chosen automatically) and 4.5 (just for the sake of it). Here is my evaluation:
beta | R^2 | % variance explained | PC1 | number of modules |
---|---|---|---|---|
3.5 | 0.84 | | | 82 |
4 | 0.91 | | | 78 |
4.5 | 0.94 | | | 74 |