Skip to end of banner
Go to start of banner

Results - Colon cancer

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

This data set fails to complete, for both the Sage coexpression code and the UCLA-WGCNA code.  In the former case the error is

Error in matrix(0, ncol(x), ncol(x)) : too many elements specified

 and in the latter case the error is:

Error in matrix(0, nBlockGenes, nBlockGenes) :   too many elements specified

According to

http://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html

"On all builds of R, the maximum length (number of elements) of a vector is 2^31 - 1 ~ 2*10^9, as lengths are stored as signed integers.".

Therefore, if an algorithm requires a matrix of size #probes x #probes (e.g. a correlation or TOM 'distance matrix' for clustering), then # probes is limited to 46340.  Clearly this colon cancer data set, with 54,675 probes exceeds this limit.

The code does run to completion for the UCLA-WGCNA code when using a preprocessing option to break up the genes into 'blocks' using K-means clustering.  Requesting a block size of 10,000 resulted in 6 blocks and 2h:47m of run time.

Other options to address large probe numbers include pre-filtering to select a subset of probes of interest (e.g. using just the probes with the greatest variation across samples).

  • No labels