Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Coexpression has O(n^3) ("cubic") time complexity and O(n^2) ("quadratic") space complexity, where n is the number of probes in the dataset.  The time complexity is due to the TOM computation.  The space complexity is due to the need to hold the nXn correlation and TOM matrices.  The R language has an inherent limit on the size of a vector or matrix of about 2 billion elements (http://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html).  For a square matrix, this means the maximum is 46340 x 46340.  A square matrix in R requires 8n^2 bytes.  (Values are double precision and R has no provision for single precision representation of floating point values.)  Such a maximum matrix requires 16.78 GB of contiguous memory.   The coexpression package uses two nXn matrices (as mentioned above) and at times creates temporary variables at sizes equal to or a large fraction of the nXn matrix size.  Therefore, to support maximum dataset sizes, machines used to run coexpression should have dozens of GB of RAM.  In the empirical evaluations summarized below, we use Amazon Web Services quaduple extra large machines having 68 GB RAM, each of which costs (at the time this is written) $2.88/hour. 

Evaluation

Dataset

# Probes

# Samples

Sage Time

Sage Space

Package Time

Package Space

Sage beta

Package beta

Gene trees same?

Module difference (%)

Female mouse liver

3600

135

---

---

---

---

6.5

6.5

TRUE

3.7%

Cranio

2534

249

---

---

---

---

4.0

4.5

FALSE / TRUE*

44%  / 0.9%*

Methylation, top 5K genes

5000

555

---

---

---

---

8.5

8.5

TRUE

0

Colon cancer, top 5K genes

5000

322

---

---

---

---

3

3.5

FALSE / TRUE*

11% / 0.5%*

Human liver cohort, top 5K genes

5000

427

---

---

---

---

11

11

TRUE

1.0%

PARC

18,392

960

5h:55m

83.9 GB

2h:06m

86.2 GB

8

7.5

?? / ??*

?? / ??*

Methylation (full set)**

27,578

555

24h:45m

180 GB

13h:20m

196 GB

8

11.5

FALSE / FALSE*

14% / 0.2%*

Colon cancer, top 40K genes**

40,000

322

Out of memory***

---

Out of memory***

---

---

---

---

---

Human liver cohort**

40,102

427

Out of memory***

---

Out of memory***

---

---

---

---

---

...