...
Dataset | # Probes | # Samples | Sage Time | Sage Space | Package Time | Package Space | Sage beta | Package beta | Gene trees same, independent beta? | Gene trees same, same beta? | Module difference, independent beta | Module difference, same beta |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Female mouse liver | 3600 | 135 | --- | --- | --- | --- | 6.5 | 6.5 | TRUE | TRUE | 3.7% | 3.7% |
Cranio | 2534 | 249 | --- | --- | --- | --- | 4.0 | 4.5 | FALSE | TRUE | 44% | 0.9% |
Methylation, top 5K genes | 5000 | 555 | --- | --- | --- | --- | 8.5 | 8.5 | TRUE | TRUE | 0 | 0 |
Colon cancer, top 5K genes | 5000 | 322 | --- | --- | --- | --- | 3 | 3.5 | FALSE | TRUE | 11% | 0.5% |
Human liver cohort, top 5K genes | 5000 | 427 | --- | --- | --- | --- | 11 | 11 | TRUE | TRUE | 1.0% | 1.0% |
PARC* | 18,392 | 960 | 5h:55m | 83.9 GB | 1h:40m | 71 GB | 8 | 7.5 | FALSE | FALSE | 4.7% | 0.6% |
Methylation (full set)* | 27,578 | 555 | 24h:45m | 180 GB | 13h:20m | 196 GB | 8 | 11.5 | FALSE | FALSE | 14% | 0.2% |
Colon cancer, top 40K genes* | 40,000 | 322 | Out of memory** | --- | Out of memory** | --- | --- | --- | --- | --- | --- | --- |
Human liver cohort* | 40,102 | 427 | Out of memory** | --- | Out of memory** | --- | --- | --- | --- | --- | --- | --- |
Conclusions: The new package has considerably better time performance. Though the two algorithms have the same approach for computing 'beta', the results can vary greatly. When beta is the same, the dendrograms and modules are similar or identical. However, module determination is very sensitive to beta, which can vary greatly with small changes in regression statistics, as can be seen here: http://sagebionetworks.jira.com/wiki/display/SCICOMP/Package+Comparison+Details * These were run on an Amazon Elastic Compute Cloud (EC2) "High-Memory Quadruple Extra Large" unix server, having 68GB of RAM.
...
The details of the differences summarized in the table can be found here: http://sagebionetworks.jira.com/wiki/display/SCICOMP/Package+Comparison+Details
Conclusions
The new package has considerably better time performance than does the original code. Though the two algorithms have the same approach for computing 'beta', the results can vary greatly. When beta is the same, the dendrograms and modules are similar or identical. However, module determination is very sensitive to beta, which can vary greatly with small changes in regression statistics, as can be seen on the 'details' page.
Goals, Revisited
Goal | How we met it |
---|---|
Make the Sage coexpression software runnable by any data analyst in R | Created easy to use, documented R package. (TODO: training class) |
Clearly explain the methodology underlying the coexpression algorithms. | Included links to literature in the R package documentation. |
Make the Sage coexpression software publicly available. | TBD (see below) |
Make the Sage coexpression software perform well, on commonly available hardware. | Used UCLA's accelerated algorithms. Accelerated the 'intra-module statistics' computation. Profiled datasets of up to 27,000 genes on inexpensive, high capacity cloud resources. |
...