De novo discovery of methylation clusters in TCGA CRC data
Data: 148 TCGA colorectal patients, batch normalized (patient selection and normalization is described here)
Clustering and bootstrapping: R/Bioconductor package ConsensusClusterPlus. Pearson correlation, K means
Heatmap:
Correlation with specific activating mutations: KRAS12, KRAS13 and BRAF V600:
K | KRAS12 | KRAS13 | BRAF600 |
|---|---|---|---|
2 | 0.009 | 0.1107143194 | 0.0003663223 |
3 | 3.281718e-03 | 1.583375e-01 | 1.713689e-13 |
4 | 2.596711e-03 | 3.227223e-01 | 1.983404e-16 |
5 | 2.654568e-03 | 8.009702e-02 | 1.045403e-17 |
6 | 3.160703e-03 | 3.229406e-01 | 9.916327e-16 |
7 | 9.011882e-03 | 4.169644e-01 | 4.213014e-15 |
8 | 1.782041e-03 | 3.358756e-01 | 1.620879e-14 |
Distribution of mutation in different clusters:
We don't have any TP53 mutation data so I didn't check the association for independence (association based on the general mutation table wasn't significant).
I have a table with epigenetic "writers" and "erasers" collected from review articles manually (81 genes). Found 60 of them in the general mutation table. Genes with significant P values in different K (P value is indicated in parentheses):
SMYD2: K2 (0.01), K6
KAT5: K3 (0.003), K4(0.0219), K5(0.049)
HDAC7: K3(0.008), K4(0.029)
EHMT1: K6 (0.026), K7, K8
MLL3: K6
BMI1: K7, K8
None of these gene show a striking distribution across clusters. Literature search showed that SMYD2 and KAT5 interact with TP53. SMYD2 involved in regulation of TP53 through methylation and KAT5 is an acetyltransferase, when its activity is diminished TP53 promotes cell growth and transforming activity.
Association with clinical traits are similar to those demonstrated in Hinoue 2012 paper:
K | Gender | Age | Histological type |
|---|---|---|---|
2 | 0.64 | 0.008 | 0.0007 |
3 | 0.001 | 0.002 |
|
4 | 0.001 | 0.007 | 0.0008 |
5 | 0.0008 | 0.003 | 0.0003 |
6 | 0.0009 | 0.007 | 0.0007 |
7 | 0.002 |
|
|
8 | 0.005 |
|
|
Gender:
Histological type:
> table(pData(eCRCbatch148)[,17])
Rectal Adenocarcinoma Rectal Mucinous Adenocarcinoma
31 2
Colon Adenocarcinoma Colon Mucinous Adenocarcinoma
93 19 As for the location Charles and Andrew suggested two classification. Original data distribution:
> table(pData(eCRCbatch148)[,2])
Rectosigmoid Junction Rectum Sigmoid Colon
1 33 48
Ascending Colon Cecum Descending Colon
18 23 4
Hepatic Flexure Splenic Flexure Transverse Colon
7 1 12First classification: Rectosigmoid+rectum+sigmoid = Rectum, Cecum+Ascending = Ascending, Hepatic Flexure+Splenic Flexure+ Transverse Colon = Transverse Colon, Descending Colon on its own. Although
that gave very low p-values, the distribution looked different than in the paper:
Charles also suggested keeping Rectum separately, combine Cecum and Ascending, combined Hepatic Flexure with Transverse with Slenic Flexure with Descending Colon. With this classification the distribution of sites looked similar to the published one: