Forcing clusters in colon cancer using Shen or Laird markers for CIMP phenotype
The purpose of this exercise is to see that if we use the markers of CIMP in colon cancer and force the data to have the number of predicted clusters will the cluster correlate with mutations in BRAF, KRAS or P53?
Test Shen markers in colon cancer using the 27k platform. Data: M value, batch removed or Beta value (converted from normalized M value using the equation in Pan, 2010)
Test Laird markers in colon cancer using the 27k platform. Data: M value, batch removed. Beta value wasn't tested because similar results were obtained for Beta and M value using Shen's markers.
Shen Markers
Clustering was performed according to the short K-means tutorial from here. In the paper they describe 3 clusters: CIMP1, CIMP2 and CIMP negative. CIMP1 are characterized by MSI (80%) and BRAF mutations (53%) and rare KRAS and p53 mutations (16% and 11%, respectively). CIMP2 is associated with 92% KRAS mutations and rare MSI, BRAF, or p53 mutations (0, 4, and 31% respectively). CIMP-negative cases have a high rate of p53 mutations (71%) and lower rates of MSI (12%) or mutations of BRAF (2%) or KRAS (33%).
Note: data weren't scaled before clustering
Attempt to determine the number of clusters:
Plot of the number of clusters vs the within groups sum of squares (the analysis was run 1-7 times, graph shows the results from a single run). Left: M values, Right: Beta values
In the paper they identified 3 clusters, force the data to have 3 clusters. Code:
Three clusters based on Shen markers. Left: M value, Right: beta value
For the correlations with the mutation status of BRAF, KRAS and P53 genes I used the all_mut.txt table constructed by Qingying. I merged all types of mutations into one so either mutated or not. For the correlation between clusters and the mutation status I use Chi-square test. I show P values for the test in the table.
| Shen, batch removed; M value; 3 clusters | Shen, batch removed; Beta value; 3 clusters |
|---|---|---|
BRAF | 0.1958 | 0.3837 |
KRAS | 0.00395 | 0.007297 |
TP53 | 0.5645 | 0.5769 |
Some significance for association with KRAS. Table for the number of patients in each cluster with or without a certain mutation
| M value | Beta value | |||||
|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 1 | 2 | 3 | ||
BRAF | 0 | 0.17 | 0.26 | 0.22 | 0.29 | 0.19 | 0.17 |
1 | 0.11 | 0.15 | 0.06 | 0.11 | 0.10 | 0.11 | |
KRAS | 0 | 0.14 | 0.21 | 0.23 |
|
|
|
1 | 0.13 | 0.20 | 0.05 |
|
|
| |
TP53 | 0 | 0.17 | 0.22 | 0.14 | 0.21 | 0.15 | 0.17 |
1 | 0.10 | 0.19 | 0.14 | 0.19 | 0.14 | 0.10 | |
Also looked at the dataset where batch, age and gender were removed and didn't find anything interesting there either
Laird markers
Five genes. They identified 2 clusters. Didn't do the analysis with the beta value since the results for beta and M value were similar in the analysis of Shen markers
| BRAF | KRAS | TP53 |
|---|---|---|---|
Laird markers | 0.4662 | 0.5753 | 0.1088 |