Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Dataset

# Probes

# Samples

Sage Time

Sage Space

Package Time

Package Space

Sage beta

Package beta

Gene trees same, independent beta?

Gene trees same, same beta?

Module difference, independent beta

Module difference, same beta

Female mouse liver

3600

135

---

---

---

---

6.5

6.5

TRUE

TRUE

3.7%

3.7%

Cranio

2534

249

---

---

---

---

4.0

4.5

FALSE

TRUE

44%

0.9%

Methylation, top 5K genes

5000

555

---

---

---

---

8.5

8.5

TRUE

TRUE

0

0

Colon cancer, top 5K genes

5000

322

---

---

---

---

3

3.5

FALSE

TRUE

11%

0.5%

Human liver cohort, top 5K genes

5000

427

---

---

---

---

11

11

TRUE

TRUE

1.0%

1.0%

PARC*

18,392

960

5h:55m

83.9 GB

1h:40m

71 GB

8

7.5

FALSE

FALSE

4.7%

0.6%

Methylation (full set)*

27,578

555

24h:45m

180 GB

13h:20m

196 GB

8

11.5

FALSE

FALSE

14%

0.2%

Colon cancer, top 40K genes*

40,000

322

Out of memory**

---

Out of memory**

---

---

---

---

---

---

---

Human liver cohort*

40,102

427

Out of memory**

---

Out of memory**

---

---

---

---

---

---

---

 Conclusions:  The new package has considerably better time performance.  Though the two algorithms have the same approach for computing 'beta', the results can vary greatly.  When beta is the same, the dendrograms and modules are similar or identical.  However, module determination is very sensitive to beta, which can vary greatly with small changes in regression statistics, as can be seen here: http://sagebionetworks.jira.com/wiki/display/SCICOMP/Package+Comparison+Details * These were run on an Amazon Elastic Compute Cloud (EC2) "High-Memory Quadruple Extra Large" unix server, having 68GB of RAM.

...

The details of the differences summarized in the table can be found here: http://sagebionetworks.jira.com/wiki/display/SCICOMP/Package+Comparison+Details

Conclusions

The new package has considerably better time performance than does the original code.  Though the two algorithms have the same approach for computing 'beta', the results can vary greatly.  When beta is the same, the dendrograms and modules are similar or identical.  However, module determination is very sensitive to beta, which can vary greatly with small changes in regression statistics, as can be seen on the 'details' page. 

Goals, Revisited

Goal

How we met it

Make the Sage coexpression software runnable by any data analyst in R

Created easy to use, documented R package.  (TODO: training class)

Clearly explain the methodology underlying the coexpression algorithms.

Included links to literature in the R package documentation.

Make the Sage coexpression software publicly available.

TBD  (see below)

Make the Sage coexpression software perform well, on commonly available hardware.

Used UCLA's accelerated algorithms.  Accelerated the 'intra-module statistics' computation.  Profiled datasets of up to 27,000 genes on inexpensive, high capacity cloud resources.

...