...
- Carefully identify the probes, remove the from the raw data (5060 + ~800).
- Create two datasets: batch only normalized (125 tumor patients) and gender,age normalized (no need to remove batch since they all came from a single batch; 100 patients).
- Repeat ConsensusClusterPlus with the most variable probes (HC, complete linkage, euclidean distance). Use 10% of the original probe number as described in the Sweave document. They followed the vignette directly.
- Repeat ConsensusClusterPlus using K-means, K=2:6, Pearson correlation. Use 10% of the original probe number as described in the Sweave document.
Batch only normalized, HC, euclidean distance.
May be there are like 5 or 6 clusters but not 4. It definitely doesn't look the the clusters identified in the paper.
Gender and age adjusted, HC, euclidean distance.
This looks significantly worse.
Final attempt: K-means clustering, pearson correlation and the seed value provided in the package. Batch removed:
Code Block | ||
---|---|---|
| ||
> icl[["clusterConsensus"]]
k cluster clusterConsensus
[1,] 2 1 0.8790997
[2,] 2 2 0.8825873
[3,] 3 1 0.8900365
[4,] 3 2 0.8615545
[5,] 3 3 0.8763479
[6,] 4 1 0.7145161
[7,] 4 2 0.8060535
[8,] 4 3 0.9781181
[9,] 4 4 0.9889430
[10,] 5 1 0.8289404
[11,] 5 2 0.7577152
[12,] 5 3 0.8221909
[13,] 5 4 0.7363796
[14,] 5 5 0.9454712
[15,] 6 1 0.8789223
[16,] 6 2 0.7593188
[17,] 6 3 0.7090342
[18,] 6 4 0.7150963
[19,] 6 5 0.9857516
[20,] 6 6 0.9189523 |
I tried to correlate clusters (K=3 and K=4) with age and gender. Looks that the clusters don't correlate with age at all but have some correlation with gender.
K = 4:
Code Block | ||
---|---|---|
| ||
> kruskal.test(tumorMeta$Age,consClass4) Kruskal-Wallis rank sum test data: tumorMeta$Age and consClass4 Kruskal-Wallis chi-squared = 4.9015, df = 3, p-value = 0.1792 > chisq.test(tumorMeta$Gender,consClass4) Pearsons Chi-squared test data: tumorMeta$Gender and consClass4 X-squared = 14.7676, df = 3, p-value = 0.002026 |
Age distribution among clusters:
Test for association with mutation status:
Code Block | ||
---|---|---|
| ||
> chisq.test(k,tumorMeta$BRAF_mutation)
Pearson's Chi-squared test
data: k and tumorMeta$BRAF_mutation
X-squared = 95.1974, df = 3, p-value < 2.2e-16
Warning message:
In chisq.test(k, tumorMeta$BRAF_mutation) :
Chi-squared approximation may be incorrect
> chisq.test(k,tumorMeta$KRAS_mutation)
Pearson's Chi-squared test
data: k and tumorMeta$KRAS_mutation
X-squared = 26.6428, df = 3, p-value = 6.995e-06
Warning message:
In chisq.test(k, tumorMeta$KRAS_mutation) :
Chi-squared approximation may be incorrect
> chisq.test(k,tumorMeta$TP53_mutation)
Pearson's Chi-squared test
data: k and tumorMeta$TP53_mutation
X-squared = 12.1586, df = 3, p-value = 0.006859 |
K = 3:
Code Block | ||
---|---|---|
| ||
> kruskal.test(tumorMeta$Age,consClass3)
Kruskal-Wallis rank sum test
data: tumorMeta$Age and consClass3
Kruskal-Wallis chi-squared = 2.4866, df = 2, p-value = 0.2884
> chisq.test(tumorMeta$Gender,consClass3)
Pearsons Chi-squared test
data: tumorMeta$Gender and consClass3
X-squared = 5.9141, df = 2, p-value = 0.05197 |
So it is not very correlated with age and it is somewhat correlated with gender.
Test for association with mutation status:
Code Block | ||
---|---|---|
| ||
> k<-resultsK[[3]][["consensusClass"]] > chisq.test(k,tumorMeta$BRAF_mutation) Pearson's Chi-squared test data: k and tumorMeta$BRAF_mutation X-squared = 50.5952, df = 2, p-value = 1.031e-11 Warning message: In chisq.test(k, tumorMeta$BRAF_mutation) : Chi-squared approximation may be incorrect > chisq.test(k,tumorMeta$KRAS_mutation) Pearson's Chi-squared test data: k and tumorMeta$KRAS_mutation X-squared = 6.7096, df = 2, p-value = 0.03492 > chisq.test(k,tumorMeta$TP53_mutation) Pearson's Chi-squared test data: k and tumorMeta$TP53_mutation X-squared = 25.1538, df = 2, p-value = 3.451e-06 |
Gender and age are removed, ConsensusClusterPlus, K means, Pearson correlation:
Cluster consensus:
Code Block | ||
---|---|---|
| ||
> icl[["clusterConsensus"]]
k cluster clusterConsensus
[1,] 2 1 0.9008927
[2,] 2 2 0.9411635
[3,] 3 1 0.8680856
[4,] 3 2 0.8760303
[5,] 3 3 0.7333630
[6,] 4 1 0.7745014
[7,] 4 2 0.7901681
[8,] 4 3 0.8024310
[9,] 4 4 0.8247401
[10,] 5 1 0.9153414
[11,] 5 2 0.8141924
[12,] 5 3 0.6573712
[13,] 5 4 0.6708655
[14,] 5 5 0.6319631
[15,] 6 1 0.8695192
[16,] 6 2 0.8568005
[17,] 6 3 0.7128842
[18,] 6 4 0.7407073
[19,] 6 5 0.5585342
[20,] 6 6 0.7426816 |
Test for association of clusters with mutation status. K = 2.
Code Block | ||
---|---|---|
| ||
#w is the data frame with clinical information for 100 tumor patients
> chisq.test(k,w$BRAF_mutation)
Pearson's Chi-squared test with Yates' continuity correction
data: k and w$BRAF_mutation
X-squared = 15.2964, df = 1, p-value = 9.189e-05
> chisq.test(k,w$KRAS_mutation)
Pearson's Chi-squared test with Yates' continuity correction
data: k and w$KRAS_mutation
X-squared = 5.0882, df = 1, p-value = 0.02409
> chisq.test(k,w$TP53_mutation)
Pearson's Chi-squared test with Yates' continuity correction
data: k and w$TP53_mutation
X-squared = 0.12, df = 1, p-value = 0.729 |