Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Our strategy, therefore, is:

  • leverage the UCLA-WGCNA package for the "common" steps, 1->5, gaining significant performance
  • provide the user a parameter choice at step 5, to do "tree cutting" in the manner of the Sage algorithm, or in that of the UCLA-WGCNA algorithm
  • provide two algorithms for step 6 (module merging), allowing a user to choose the Sage or UCLA-WGCNA algorithm
  • leverage the UCLA-WGCNA dendrogram/module plotting algorithm in step 9
  • maintain the Sage algorithms for the Sage-specific post-processing, i.e. step 7, step 8, the heat maps in step 9.

UCLA-WGCNA dependencies

Sage software dependencies

...

External dependencies

WGCNA::cor  -- the compiled/accelerated Pearson correlation computation

WGCNA::pickSoftThreshold -- optimal choice of scale free exponent

WGCNA::Tomdist -- the compiled/accelerated TOM computation

flashClust::flashClust -- the compiled/accelerated hierarchical clustering computation

WGCNA::scaleFreePlot -- scatter plot of network connectivity, with regression line

dynamicTreeCut::cutreeDynamic -- tree cutting / module determination

WGCNA::plotDendroAndColors -- graphic function to plot dendrogram with colored modules aligned underneath

Sage software dependencies

module merging, by analyzing most-highly-connected genes in each module

fixed-cluster-number tree cutting for sample module definition

computation of within- and between- module per-gene connectivity statistics

heatmap generation for correlation and TOM matrices

Package functions

The main 'points of entry' into the package are:

performCoexFromFiles - run the entire package using a file of gene expression data as input

performCoexpressionAnalysis - run the analysis portion of Coexpression, taking a data frame as input

clusterGenes - run the rote, time consuming portion of Coexpression, taking a data frame as input

modulesFromGeneTree - choose modules from a gene dendrogram, taking the output of 'clusterGenes' as input

clusterSamples, intraModularStatistics - analysis steps auxiliary to module determination

createDiagnosticPlots - create dendrogram, heatmap plots, etc., from the results of the coexpression analysis

Package features

Comparison to Sage code base

Correlation computation is faster, with identical results.
TOM computation is faster, with identical results.
Hierarchical clustering is faster, with identical results.
Scale-free exponent (beta) determination is similar, with very similar results and regression statistics.
"Dynamic tree cutting" algorithm is the same, with very similar results.
Diagnostic plot set is reduced from 12 to 8, omitting redundant plots.

Additional Features

Option to do tree cutting and/or subsqeuent merging by UCLA-WGCNA algorithm or by 'Sage classic' algorithm.
Separation of rote, time consuming steps (correlation, TOM computation) from tree cutting.
Separation of analysis from plotting.
Separation of analysis from file system, to facilitate Synpase integration.

Evaluation

Dataset

# Probes

# Samples

 

Sage Time

Sage Space

Package Time

Package Space

Sage Beta

Package Beta

Gene trees same?

Module difference (%)

Female mouse liver

3600

135

 

---

---

---

---

 

 

 

 


 

 

 

---

---

---

---

 

 

 

 

Methylation (gene subset)

??

555

 

 

 

 

 

 

 

 

 

Colon cancer (small gene subset)

??

322

 

---

---

---

---

 

 

 

 

Cranio

2534

249

 

---

---

---

---

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Methylation (full set)

27,578

5555

 

 

 

 

 

 

 

 

 

Colon cancer (large gene subset)

 

322

 

 

 

 

 

 

 

 

 

Human liver cohort

40,102

427

 

 

 

 

 

 

 

 

 

...