...
Our strategy, therefore, is:
- leverage the UCLA-WGCNA package for the "common" steps, 1->5, gaining significant performance
- provide the user a parameter choice at step 5, to do "tree cutting" in the manner of the Sage algorithm, or in that of the UCLA-WGCNA algorithm
- provide two algorithms for step 6 (module merging), allowing a user to choose the Sage or UCLA-WGCNA algorithm
- leverage the UCLA-WGCNA dendrogram/module plotting algorithm in step 9
- maintain the Sage algorithms for the Sage-specific post-processing, i.e. step 7, step 8, the heat maps in step 9.
UCLA-WGCNA dependencies
Sage software dependencies
...
External dependencies
WGCNA::cor -- the compiled/accelerated Pearson correlation computation
WGCNA::pickSoftThreshold -- optimal choice of scale free exponent
WGCNA::Tomdist -- the compiled/accelerated TOM computation
flashClust::flashClust -- the compiled/accelerated hierarchical clustering computation
WGCNA::scaleFreePlot -- scatter plot of network connectivity, with regression line
dynamicTreeCut::cutreeDynamic -- tree cutting / module determination
WGCNA::plotDendroAndColors -- graphic function to plot dendrogram with colored modules aligned underneath
Sage software dependencies
module merging, by analyzing most-highly-connected genes in each module
fixed-cluster-number tree cutting for sample module definition
computation of within- and between- module per-gene connectivity statistics
heatmap generation for correlation and TOM matrices
Package functions
The main 'points of entry' into the package are:
performCoexFromFiles - run the entire package using a file of gene expression data as input
performCoexpressionAnalysis - run the analysis portion of Coexpression, taking a data frame as input
clusterGenes - run the rote, time consuming portion of Coexpression, taking a data frame as input
modulesFromGeneTree - choose modules from a gene dendrogram, taking the output of 'clusterGenes' as input
clusterSamples, intraModularStatistics - analysis steps auxiliary to module determination
createDiagnosticPlots - create dendrogram, heatmap plots, etc., from the results of the coexpression analysis
Package features
Comparison to Sage code base
Correlation computation is faster, with identical results.
TOM computation is faster, with identical results.
Hierarchical clustering is faster, with identical results.
Scale-free exponent (beta) determination is similar, with very similar results and regression statistics.
"Dynamic tree cutting" algorithm is the same, with very similar results.
Diagnostic plot set is reduced from 12 to 8, omitting redundant plots.
Additional Features
Option to do tree cutting and/or subsqeuent merging by UCLA-WGCNA algorithm or by 'Sage classic' algorithm.
Separation of rote, time consuming steps (correlation, TOM computation) from tree cutting.
Separation of analysis from plotting.
Separation of analysis from file system, to facilitate Synpase integration.
Evaluation
Dataset | # Probes | # Samples |
| Sage Time | Sage Space | Package Time | Package Space | Sage Beta | Package Beta | Gene trees same? | Module difference (%) |
---|---|---|---|---|---|---|---|---|---|---|---|
Female mouse liver | 3600 | 135 |
| --- | --- | --- | --- |
|
|
|
|
|
|
| --- | --- | --- | --- |
|
|
|
| |
Methylation (gene subset) | ?? | 555 |
|
|
|
|
|
|
|
|
|
Colon cancer (small gene subset) | ?? | 322 |
| --- | --- | --- | --- |
|
|
|
|
Cranio | 2534 | 249 |
| --- | --- | --- | --- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Methylation (full set) | 27,578 | 5555 |
|
|
|
|
|
|
|
|
|
Colon cancer (large gene subset) |
| 322 |
|
|
|
|
|
|
|
|
|
Human liver cohort | 40,102 | 427 |
|
|
|
|
|
|
|
|
|
...