...
- Carefully identify the probes, remove the from the raw data (5060 + ~800).
- Create two datasets: batch only normalized (125 tumor patients) and gender,age normalized (no need to remove batch since they all came from a single batch; 100 patients).
- Repeat ConsensusClusterPlus with the most variable probes (HC, complete linkage, euclidean distance). Use 10% of the original probe number as described in the Sweave document. They followed the vignette directly.
- Repeat ConsensusClusterPlus using K-means, K=2:6, Pearson correlation. Use 10% of the original probe number as described in the Sweave document.
Batch only normalized, HC, euclidean distance.
May be there are like 5 or 6 clusters but not 4. It definitely doesn't look the the clusters identified in the paper.
Gender and age adjusted, HC, euclidean distance.
This looks significantly worse.
Final attempt: K-means clustering, pearson Pearson correlation and the seed value provided in the package. Batch removed:
Code Block | ||
---|---|---|
| ||
> icl[["clusterConsensus"]] k cluster clusterConsensus [1,] 2 1 0.8790997 [2,] 2 2 0.8825873 [3,] 3 1 0.8900365 [4,] 3 2 0.8615545 [5,] 3 3 0.8763479 [6,] 4 1 0.7145161 [7,] 4 2 0.8060535 [8,] 4 3 0.9781181 [9,] 4 4 0.9889430 [10,] 5 1 0.8289404 [11,] 5 2 0.7577152 [12,] 5 3 0.8221909 [13,] 5 4 0.7363796 [14,] 5 5 0.9454712 [15,] 6 1 0.8789223 [16,] 6 2 0.7593188 [17,] 6 3 0.7090342 [18,] 6 4 0.7150963 [19,] 6 5 0.9857516 [20,] 6 6 0.9189523 |
I
tried to correlate clusters (Summary table for association with clinical variable for K=2,3 and K=4) with age and gender. Looks that the clusters don't correlate with age at all but have some correlation with gender.
K = 4:
Code Block | ||
---|---|---|
| ||
> kruskal.test(tumorMeta$Age,consClass4)
Kruskal-Wallis rank sum test
data: tumorMeta$Age and consClass4
Kruskal-Wallis chi-squared = 4.9015, df = 3, p-value = 0.1792
> chisq.test(tumorMeta$Gender,consClass4)
Pearsons Chi-squared test
data: tumorMeta$Gender and consClass4
X-squared = 14.7676, df = 3, p-value = 0.002026 |
Age distribution among clusters:
Test for association with mutation status:
Code Block | ||
---|---|---|
| ||
> chisq.test(k,tumorMeta$BRAF_mutation)
Pearson's Chi-squared test
data: k and tumorMeta$BRAF_mutation
X-squared = 95.1974, df = 3, p-value < 2.2e-16
Warning message:
In chisq.test(k, tumorMeta$BRAF_mutation) :
Chi-squared approximation may be incorrect
> chisq.test(k,tumorMeta$KRAS_mutation)
Pearson's Chi-squared test
data: k and tumorMeta$KRAS_mutation
X-squared = 26.6428, df = 3, p-value = 6.995e-06
Warning message:
In chisq.test(k, tumorMeta$KRAS_mutation) :
Chi-squared approximation may be incorrect
> chisq.test(k,tumorMeta$TP53_mutation)
Pearson's Chi-squared test
data: k and tumorMeta$TP53_mutation
X-squared = 12.1586, df = 3, p-value = 0.006859 |
K = 3:
Code Block | ||
---|---|---|
| ||
> kruskal.test(tumorMeta$Age,consClass3)
Kruskal-Wallis rank sum test
data: tumorMeta$Age and consClass3
Kruskal-Wallis chi-squared = 2.4866, df = 2, p-value = 0.2884
> chisq.test(tumorMeta$Gender,consClass3)
Pearsons Chi-squared test
data: tumorMeta$Gender and consClass3
X-squared = 5.9141, df = 2, p-value = 0.05197 |
So it is not very correlated with age and it is somewhat correlated with gender.
Test for association with mutation status:
collapse | true |
---|
,4,5,6 (only batch is removed)
K | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|
Age | 0.78 | 0.88 | 0.7 | 0.6345 | 0.444 |
Gender | 0.12 | 0.05 | 2.02e-03 | 5.35e-03 | 8.097155e-03 |
Rectal/colon | 0.0016 | 4.74e-04 | 4.44e-03 | 7.41e-03 | 2.2e-03 |
Tumor stage | 0.36 | 0.43 | 0.23 | 0.3 | 0.58 |
BRAF | 1.33e-05 | 1.03e-11 | 1.67e-20 | 1.01e-19 | 5.15e-19 |
KRAS | 3.87e-03 | 0.04 | 6.7e-06 | 1.22e-04 | 5.97e-06 |
KRAS type | 1.97e-02 | 0.24 | 7.14e-03 | 1.59e-02 | 2.32e-03 |
TP53 | 0.96 | 3.45e-06 | 6.86e-03 | 1.75e-03 | 1.26e-03 |
MLH1 mutation | 1.26e-04 | 5.11e-10 | 1.45e-20 | 8.96e-20 | 4.12e-19 |
Gender and age are removed, ConsensusClusterPlus, K means, Pearson correlation:
Cluster consensus:
Code Block | ||
---|---|---|
| ||
> icl[["clusterConsensus"]] k cluster clusterConsensus [1,] 2 1 0.9008927 [2,] 2 2 0.9411635 [3,] 3 1 0.8680856 [4,] 3 2 0.8760303 [5,] 3 3 0.7333630 [6,] 4 1 0.7745014 [7,] 4 2 0.7901681 [8,] 4 3 0.8024310 [9,] 4 4 0.8247401 [10,] 5 1 0.9153414 [11,] 5 2 0.8141924 [12,] 5 3 0.6573712 [13,] 5 4 0.6708655 [14,] 5 5 0.6319631 [15,] 6 1 0.8695192 [16,] 6 2 0.8568005 [17,] 6 3 0.7128842 [18,] 6 4 0.7407073 [19,] 6 5 0.5585342 [20,] 6 6 0.7426816 |
Test for association of clusters with mutation status. K = 2.
Code Block | ||
---|---|---|
| ||
#w is the data frame with clinical information for 100 tumor patients
> chisq.test(k,w$BRAF_mutation)
Pearson's Chi-squared test with Yates' continuity correction
data: k and w$BRAF_mutation
X-squared = 15.2964, df = 1, p-value = 9.189e-05
> chisq.test(k,w$KRAS_mutation)
Pearson's Chi-squared test with Yates' continuity correction
data: k and w$KRAS_mutation
X-squared = 5.0882, df = 1, p-value = 0.02409
> chisq.test(k,w$TP53_mutation)
Pearson's Chi-squared test with Yates' continuity correction
data: k and w$TP53_mutation
X-squared = 0.12, df = 1, p-value = 0.729 |
K | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|
Tumor stage | 0.36 | 0.44 | 0.66 | 0.83 | 0.76 |
Rectal/colon | 8.69e-03 | 1.68e-02 | 6.12e-02 | 0.15 | 0.13 |
BRAF | 9.18e-05 | 1.46e-12 | 1.81e-12 | 1.16e-13 | 1.63e-10 |
KRAS | 2.41e-02 | 3.36e-02 | 5.51e-02 | 2.53e-04 | 3.32e-03 |
KRAS type | 6.98e-02 | 3.705241e-01 | 0.55 | 9.23e-03 | 0.46 |
TP53 | 0.72 | 5.90e-03 | 4.15e-03 | 1.40e-02 | 2.14e-02 |
MLH1 methyl. | 9.19e-05 | 1.46e-12 | 2.01e-10 | 1.03e-13 | 1.63e-10 |