This data set fails to complete, for both the Sage coexpression code and the UCLA-WGCNA code. In the former case the error is
Error in matrix(0, ncol(x), ncol(x)) : too many elements specified
and in the latter case the error is:
Error in matrix(0, nBlockGenes, nBlockGenes) : too many elements specified
According to
http://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html
"On all builds of R, the maximum length (number of elements) of a vector is 2^31 - 1 ~ 2*10^9, as lengths are stored as signed integers.".
Therefore, if an algorithm requires a matrix of size #probes x #probes (e.g. a correlation or TOM 'distance matrix' for clustering), then # probes is limited to 46340. Clearly this colon cancer data set, with 54,675 probes exceeds this limit.
The code does run to completion for the UCLA-WGCNA code when using a preprocessing option to break up the genes into 'blocks' using K-means clustering. Requesting a block size of 10,000 resulted in 6 blocks and 2h:47m of run time.
Other options to address large probe numbers include pre-filtering to select a subset of probes of interest (e.g. using just the probes with the greatest variation across samples).
Add Comment