Goal
1) Make the Sage coexpression software runnable by any data analyst in R.
Document the analysis steps, package functions, and adjustable parameters. Write start-to-finish vignette(s) having sufficient detail to guide a new analyst through the software, including making parameter settings and other intermediate decisions. Refactoring software to enhance readability and/or to make decision points and adjustable parameters explicit, is within scope for this goal.
2) Clearly explain the methodology underlying the coexpression algorithms.
This includes contrasting the Sage algorithms with the separately published “WGCNA” algorithms and explaining the rationale for the differences and the decision process for choosing which to use.
3) Make the Sage coexpression software publicly available.
The code must be available through a standard R distribution channel (CRAN or Bioconductor). An option is to merge Sage-specific algorithms or steps into WGCNA (if the two are sufficiently overlapping).
4) Make the Sage coexpression software perform well, on commonly available hardware.
Currently, special high capacity hardware is required to run the Sage coexpression software on commonly encountered data set sizes. The goal is to optimize the code to run on commonly available hardware, at a minimum the code should be runnable on the 68GB, quad core servers available through Amazon Web Services. Ideally, the code would be runnable on a “heavy” Sage laptop (8 GB total RAM).
Strategy
The steps for Sage Coexpression are:
- Compute correlation coefficient matrix.
- Determine optimal value for the scale free exponent, beta.
- Compute the toplogical ovelap matrix (TOM).
- Perform hierarchical clustering of genes, based on TOM.
- Detect and label modules in TOM, using "Dynamic Tree Cutting".
- Merge modules based on hierarchical clustering of representative genes.
- Cluster samples hierarchically.
- Compute intra/inter-module network statistics, per gene.
- Produce diagnostic plots (dendrograms, heat maps, statistical scatter plots).
- Produce tabular output of module membeship, network statistics, and scale-free regression statistics.
UCLA-WGCNA dependencies
Sage software dependencies
...
Dataset | # Probes | # Samples |
| Sage Time | Sage Space | Package Time | Package Space | Sage Beta | Package Beta | Gene trees same? | Module difference (%) |
---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
compute correlation coefficient |
determine optimal value for beta |
compute TOM |
perform hierarchical clustering over TOM (using 'hclust') |
detect and label modules in TOM (using 'cutreeDynamic') |