Forcing clusters in colorectal cancer

Forcing clusters in colorectal cancer

Since in the analysis of colon cancer described here I wasn't able to identify the clusters that would be correlated with the mutations I asked another question: is it possible that the reason for not being able to identify the CIMP is because I am dealing with colon rather than colorectal cancer. To test this hypothesis:

  1. Download rectal cancer 27k data from TCGA

  2. Normalize together with the colon cancer data

  3. Test whether we can identify CIMP clusters using either Shen's or Laird's definitions

Normalization

Colon cancer data: 166 patients, rectal cancer data: 70 patients. Mutation data is available for only 128 patients (using Qingying's all_mut.txt table)

Combined the datasets into one, analysed batch effect. Numbers above each boxplot represent the number of patients in each batch. Since there is a huge correlation of the batch with PCs (PC1: 2.2e-16) as well as with the tissue source (rectal and colon: p-value < 2.2e-16) I decided to select the most similar batches (all boxplots which median is above the line), because it gave more patients (149).  After selecting these patients I found that the PC1 is not highly correlated with the batch but PC2 is: p-value = 1.499e-06. Remove the batch effect from the data. Removed 5 outliers. 

Shen markers

The analysis was done in the same way as it is described here.

 

BRAF

KRAS

TP53

 

BRAF

KRAS

TP53

Shen 3 clusters, CRC

0.8204

0.001973

0.1588

Laird markers

M value, 2 clusters

 

BRAF

KRAS

TP53

 

BRAF

KRAS

TP53

Laird, 2 clusters, M value

0.2114

0.01141

0.5281

With Beta value I see a very similar trend in P values, some improvement for the association with KRAS