Document toolboxDocument toolbox

Results - Colon cancer

This data set fails to complete, for both the Sage coexpression code and the UCLA-WGCNA code.  In the former case the error is

Error in matrix(0, ncol(x), ncol(x)) : too many elements specified

 and in the latter case the error is:

Error in matrix(0, nBlockGenes, nBlockGenes) :   too many elements specified

According to

http://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html

"On all builds of R, the maximum length (number of elements) of a vector is 2^31 - 1 ~ 2*10^9, as lengths are stored as signed integers.".

Therefore, if an algorithm requires a matrix of size #probes x #probes (e.g. a correlation or TOM 'distance matrix' for clustering), then # probes is limited to 46340.  Clearly this colon cancer data set, with 54,675 probes exceeds this limit.

The code does run to completion for the UCLA-WGCNA code when using a preprocessing option to break up the genes into 'blocks' using K-means clustering.  Requesting a block size of 10,000 resulted in 6 blocks and 2h:47m of run time.

Other options to address large probe numbers include pre-filtering to select a subset of probes of interest (e.g. using just the probes with the greatest variation across samples).