...
Option to do tree cutting and/or subsqeuent merging by UCLA-WGCNA algorithm or by 'Sage classic' algorithm.
Separation of rote, time consuming steps (correlation, TOM computation) from tree cutting.
Separation of analysis from plotting.
Separation of analysis from file system, to facilitate Synpase integration.
Performance and Limits
Coexpression has O(n^3) ("cubic") time complexity and O(n^2) ("quadratic") space complexity, where n is the number of probes in the dataset. The time complexity is due to the TOM computation. The space complexity is due to the need to hold the nXn correlation and TOM matrices. The R language has an inherent limit on the size of a vector or matrix of about 2 billion elements (http://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html). For a square matrix, this means the maximum is 46340 x 46340. A square matrix in R requires 8n^2 bytes. (Values are double precision and R has no provision for single precision representation of floating point values.) Such a maximum matrix requires 16.78 GB of contiguous memory.
Evaluation
Dataset | # Probes | # Samples | Sage Time | Sage Space | Package Time | Package Space | Sage beta | Package beta | Gene trees same? | Module difference (%) |
---|---|---|---|---|---|---|---|---|---|---|
Female mouse liver | 3600 | 135 | --- | --- | --- | --- | 6.5 | 6.5 | TRUE | 3.7% |
Cranio | 2534 | 249 | --- | --- | --- | --- | 4.0 | 4.5 | FALSE / TRUE* | 44% / 0.9%* |
Methylation, top 5K genes | 5000 | 555 | --- | --- | --- | --- | 8.5 | 8.5 | TRUE | 0 |
Colon cancer, top 5K genes | 5000 | 322 | --- | --- | --- | --- | 3 | 3.5 | FALSE / TRUE* | 11% / 0.5%* |
Human liver cohort, top 5K genes | 5000 | 427 | --- | --- | --- | --- | 11 | 11 | TRUE | 1.0% |
PARC | 18,392 | 960 | 5h:55m | 83.9 GB | 2h:06m | 86.2 GB | 8 | 7.5 | ?? / ??* | ?? / ??* |
Methylation (full set)** | 27,578 | 555 | 24h:45m | 180 GB | 13h:20m | 196 GB | 8 | 11.5 | FALSE / FALSE* | 14% / 0.2%* |
Colon cancer, top 40K genes** | 40,000 | 322 | Out of memory*** | --- | Out of memory*** | --- | --- | --- | --- | --- |
Human liver cohort** | 40,102 | 427 | Out of memory*** | --- | Out of memory*** | --- | --- | --- | --- | --- |
...