Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Carefully identify the probes, remove the from the raw data (5060 + ~800).
  2. Create two datasets: batch only normalized (125 tumor patients) and gender,age normalized (no need to remove batch since they all came from a single batch; 100 patients).
  3. Repeat ConsensusClusterPlus with the most variable probes (HC, complete linkage, euclidean distance). Use 10% of the original probe number as described in the Sweave document. They followed the vignette directly. 
  4. Repeat ConsensusClusterPlus using K-means, K=2:6, Pearson correlation. Use 10% of the original probe number as described in the Sweave document.


    Batch only normalized, HC, euclidean distance.

May be there are like 5 or 6 clusters but not 4. It definitely doesn't look the the clusters identified in the paper.

Gender and age adjusted, HC, euclidean distance. 

This looks significantly worse. 

 

Final attempt: K-means clustering, pearson correlation and the seed value provided in the package. Batch removed:

Code Block
collapsetrue
> icl[["clusterConsensus"]]
      k cluster clusterConsensus
 [1,] 2       1        0.8790997
 [2,] 2       2        0.8825873
 [3,] 3       1        0.8900365
 [4,] 3       2        0.8615545
 [5,] 3       3        0.8763479
 [6,] 4       1        0.7145161
 [7,] 4       2        0.8060535
 [8,] 4       3        0.9781181
 [9,] 4       4        0.9889430
[10,] 5       1        0.8289404
[11,] 5       2        0.7577152
[12,] 5       3        0.8221909
[13,] 5       4        0.7363796
[14,] 5       5        0.9454712
[15,] 6       1        0.8789223
[16,] 6       2        0.7593188
[17,] 6       3        0.7090342
[18,] 6       4        0.7150963
[19,] 6       5        0.9857516
[20,] 6       6        0.9189523

 

I tried to correlate clusters (K=3 and K=4) with age and gender. Looks that the clusters don't correlate with age at all but have some correlation with gender.

K = 4:

Code Block
collapsetrue
> kruskal.test(tumorMeta$Age,consClass4)
        Kruskal-Wallis rank sum test
data:  tumorMeta$Age and consClass4
Kruskal-Wallis chi-squared = 4.9015, df = 3, p-value = 0.1792
> chisq.test(tumorMeta$Gender,consClass4)
        Pearsons Chi-squared test
data:  tumorMeta$Gender and consClass4
X-squared = 14.7676, df = 3, p-value = 0.002026

Age distribution among clusters:

Test for association with mutation status:

Code Block
collapsetrue
> chisq.test(k,tumorMeta$BRAF_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$BRAF_mutation 
X-squared = 95.1974, df = 3, p-value < 2.2e-16
Warning message:
In chisq.test(k, tumorMeta$BRAF_mutation) :
  Chi-squared approximation may be incorrect
> chisq.test(k,tumorMeta$KRAS_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$KRAS_mutation 
X-squared = 26.6428, df = 3, p-value = 6.995e-06
Warning message:
In chisq.test(k, tumorMeta$KRAS_mutation) :
  Chi-squared approximation may be incorrect
> chisq.test(k,tumorMeta$TP53_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$TP53_mutation 
X-squared = 12.1586, df = 3, p-value = 0.006859

 

K = 3:

Code Block
collapsetrue
 
> kruskal.test(tumorMeta$Age,consClass3)
        Kruskal-Wallis rank sum test
data:  tumorMeta$Age and consClass3
Kruskal-Wallis chi-squared = 2.4866, df = 2, p-value = 0.2884
> chisq.test(tumorMeta$Gender,consClass3)
        Pearsons Chi-squared test
data:  tumorMeta$Gender and consClass3
X-squared = 5.9141, df = 2, p-value = 0.05197

So it is not very correlated with age and it is somewhat correlated with gender

Test for association with mutation status:

Code Block
collapsetrue
> k<-resultsK[[3]][["consensusClass"]]
> chisq.test(k,tumorMeta$BRAF_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$BRAF_mutation 
X-squared = 50.5952, df = 2, p-value = 1.031e-11
Warning message:
In chisq.test(k, tumorMeta$BRAF_mutation) :
  Chi-squared approximation may be incorrect
> chisq.test(k,tumorMeta$KRAS_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$KRAS_mutation 
X-squared = 6.7096, df = 2, p-value = 0.03492
> chisq.test(k,tumorMeta$TP53_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$TP53_mutation 
X-squared = 25.1538, df = 2, p-value = 3.451e-06

 

Gender and age are removed, ConsensusClusterPlus, K means, Pearson correlation:

Cluster consensus:

Code Block
collapsetrue
> icl[["clusterConsensus"]]
      k cluster clusterConsensus
 [1,] 2       1        0.9008927
 [2,] 2       2        0.9411635
 [3,] 3       1        0.8680856
 [4,] 3       2        0.8760303
 [5,] 3       3        0.7333630
 [6,] 4       1        0.7745014
 [7,] 4       2        0.7901681
 [8,] 4       3        0.8024310
 [9,] 4       4        0.8247401
[10,] 5       1        0.9153414
[11,] 5       2        0.8141924
[12,] 5       3        0.6573712
[13,] 5       4        0.6708655
[14,] 5       5        0.6319631
[15,] 6       1        0.8695192
[16,] 6       2        0.8568005
[17,] 6       3        0.7128842
[18,] 6       4        0.7407073
[19,] 6       5        0.5585342
[20,] 6       6        0.7426816

Test for association of clusters with mutation status. K = 2.

Code Block
collapsetrue
#w is the data frame with clinical information for 100 tumor patients
 
> chisq.test(k,w$BRAF_mutation)
        Pearson's Chi-squared test with Yates' continuity correction
data:  k and w$BRAF_mutation 
X-squared = 15.2964, df = 1, p-value = 9.189e-05
> chisq.test(k,w$KRAS_mutation)
        Pearson's Chi-squared test with Yates' continuity correction
data:  k and w$KRAS_mutation 
X-squared = 5.0882, df = 1, p-value = 0.02409
> chisq.test(k,w$TP53_mutation)
        Pearson's Chi-squared test with Yates' continuity correction
data:  k and w$TP53_mutation 
X-squared = 0.12, df = 1, p-value = 0.729