Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Probes: 5060 probes in the series matrix are masked as "null" and supposedly they represent the probes which CpG site and 5 bp around it has SNPs, overlap with repetitive regions, not uniquely aligned at 20 nt at the 3' term of the probe, overlap with regions of deletions and insertions in the human genome, those that have P value >=0.05.  Apparently that doesn't include probes from X and Y chromosomes because those probes were removed later as indicated in the Sweave file. This wasn't clear right away. When I removed the "null" probes and XY probes I ended up with the same number (?) as indicated in the Sweave document. Outline:

Parameters: euclidean distance, hierarchical clustering, complete linkage, 10000 bootstraps (pvclust library)

Hinoue data: 125 tumor patients, batch removed, M value, remove 5060 "null probes" identified from the series matrix. Take 10% the most variable probes. Pvclust to identify cluster stability. Parameters: euclidean distance, hierarchical clustering, complete linkage, 10000 bootstraps


The best results I can get is 69 to 83 AU value for cluster stability even after 10,000 bootstraps. Does it mean that the clusters are weak and they just accepted it as is?

ConsensusCluster plus with the same parameters as pvlcust above

Will using a different package make any difference? Use ConsensusClusterPlus package with the same parameters as pvclust. Hierarchical clustering, evaluate 20 clusters, use 80% of the data for bootstrapping. They claimed that the identified 4 clusters. 

...

  1. Carefully identify the probes, remove the from the raw data (5060 + ~800).
  2. Create two datasets: batch only normalized (125 tumor patients) and gender,age normalized (no need to remove batch since they all came from a single batch; 100 patients).
  3. Repeat ConsensusClusterPlus with the most variable probes (HC, complete linkage, euclidean distance). Use 10% of the original probe number as described in the Sweave document. They followed the vignette directly. 
  4. Repeat ConsensusClusterPlus using K-means, K=2:6, Pearson correlation. Use 10% of the original probe number as described in the Sweave document.Batch only normalized, HC, euclidean distance.



ConsensusClusterPlus, XY probes removed, hierarchical/euclidean

Batch removed

May be there are like 5 or 6 clusters but not 4. It definitely doesn't look the the clusters identified in the paper.

Gender and age adjusted, HC, euclidean distance. 

This looks significantly worse. 

 Final attempt:


ConsensusClusterPlus, K

-

means

clustering

, pearson correlation

and the seed value provided in the package.

Batch removed :(125 patients)

Code Block
collapsetrue
> icl[["clusterConsensus"]]
      k cluster clusterConsensus
 [1,] 2       1        0.8790997
 [2,] 2       2        0.8825873
 [3,] 3       1        0.8900365
 [4,] 3       2        0.8615545
 [5,] 3       3        0.8763479
 [6,] 4       1        0.7145161
 [7,] 4       2        0.8060535
 [8,] 4       3        0.9781181
 [9,] 4       4        0.9889430
[10,] 5       1        0.8289404
[11,] 5       2        0.7577152
[12,] 5       3        0.8221909
[13,] 5       4        0.7363796
[14,] 5       5        0.9454712
[15,] 6       1        0.8789223
[16,] 6       2        0.7593188
[17,] 6       3        0.7090342
[18,] 6       4        0.7150963
[19,] 6       5        0.9857516
[20,] 6       6        0.9189523

 I

tried to correlate clusters (Summary table for association with clinical variable for K=2,3 and K=4) with age and gender. Looks that the clusters don't correlate with age at all but have some correlation with gender.

K = 4:

Code Block
collapsetrue
> kruskal.test(tumorMeta$Age,consClass4)
        Kruskal-Wallis rank sum test
data:  tumorMeta$Age and consClass4
Kruskal-Wallis chi-squared = 4.9015, df = 3, p-value = 0.1792
> chisq.test(tumorMeta$Gender,consClass4)
        Pearsons Chi-squared test
data:  tumorMeta$Gender and consClass4
X-squared = 14.7676, df = 3, p-value = 0.002026

Age distribution among clusters:

Image Removed

Test for association with mutation status:

Code Block
collapsetrue
> chisq.test(k,tumorMeta$BRAF_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$BRAF_mutation 
X-squared = 95.1974, df = 3, p-value < 2.2e-16
Warning message:
In chisq.test(k, tumorMeta$BRAF_mutation) :
  Chi-squared approximation may be incorrect
> chisq.test(k,tumorMeta$KRAS_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$KRAS_mutation 
X-squared = 26.6428, df = 3, p-value = 6.995e-06
Warning message:
In chisq.test(k, tumorMeta$KRAS_mutation) :
  Chi-squared approximation may be incorrect
> chisq.test(k,tumorMeta$TP53_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$TP53_mutation 
X-squared = 12.1586, df = 3, p-value = 0.006859

 

K = 3:

Code Block
collapsetrue
 
> kruskal.test(tumorMeta$Age,consClass3)
        Kruskal-Wallis rank sum test
data:  tumorMeta$Age and consClass3
Kruskal-Wallis chi-squared = 2.4866, df = 2, p-value = 0.2884
> chisq.test(tumorMeta$Gender,consClass3)
        Pearsons Chi-squared test
data:  tumorMeta$Gender and consClass3
X-squared = 5.9141, df = 2, p-value = 0.05197

So it is not very correlated with age and it is somewhat correlated with gender. 

Test for association with mutation status:

Code Block
collapsetrue
> k<-resultsK[[3]][["consensusClass"]]
> chisq.test(k,tumorMeta$BRAF_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$BRAF_mutation 
X-squared = 50.5952, df = 2, p-value = 1.031e-11
Warning message:
In chisq.test(k, tumorMeta$BRAF_mutation) :
  Chi-squared approximation may be incorrect
> chisq.test(k,tumorMeta$KRAS_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$KRAS_mutation 
X-squared = 6.7096, df = 2, p-value = 0.03492
> chisq.test(k,tumorMeta$TP53_mutation)
        Pearson's Chi-squared test
data:  k and tumorMeta$TP53_mutation 
X-squared = 25.1538, df = 2, p-value = 3.451e-06

 

Gender and age are removed, ConsensusClusterPlus, K means, Pearson correlation:,4,5,6 (only batch is removed)

K23456
Age0.780.880.70.63450.444
Gender0.120.052.02e-035.35e-038.097155e-03
Rectal/colon0.00164.74e-044.44e-037.41e-032.2e-03
Tumor stage0.360.430.230.30.58
BRAF1.33e-051.03e-111.67e-201.01e-195.15e-19
KRAS3.87e-030.046.7e-061.22e-045.97e-06
KRAS type1.97e-020.247.14e-031.59e-022.32e-03
TP530.963.45e-066.86e-031.75e-031.26e-03
MLH1 mutation1.26e-045.11e-101.45e-208.96e-204.12e-19

 

ConsensusClusterPlus, K means, Pearson correlation


Age/gender removed, 100 patients

Cluster consensus:

Code Block
collapsetrue
> icl[["clusterConsensus"]]
      k cluster clusterConsensus
 [1,] 2       1        0.9008927
 [2,] 2       2        0.9411635
 [3,] 3       1        0.8680856
 [4,] 3       2        0.8760303
 [5,] 3       3        0.7333630
 [6,] 4       1        0.7745014
 [7,] 4       2        0.7901681
 [8,] 4       3        0.8024310
 [9,] 4       4        0.8247401
[10,] 5       1        0.9153414
[11,] 5       2        0.8141924
[12,] 5       3        0.6573712
[13,] 5       4        0.6708655
[14,] 5       5        0.6319631
[15,] 6       1        0.8695192
[16,] 6       2        0.8568005
[17,] 6       3        0.7128842
[18,] 6       4        0.7407073
[19,] 6       5        0.5585342
[20,] 6       6        0.7426816

Test for association of clusters with mutation status. K = 2.

Code Block
collapsetrue
#w is the data frame with clinical information for 100 tumor patients
 
> chisq.test(k,w$BRAF_mutation)
        Pearson's Chi-squared test with Yates' continuity correction
data:  k and w$BRAF_mutation 
X-squared = 15.2964, df = 1, p-value = 9.189e-05
> chisq.test(k,w$KRAS_mutation)
        Pearson's Chi-squared test with Yates' continuity correction
data:  k and w$KRAS_mutation 
X-squared = 5.0882, df = 1, p-value = 0.02409
> chisq.test(k,w$TP53_mutation)
        Pearson's Chi-squared test with Yates' continuity correction
data:  k and w$TP53_mutation 
X-squared = 0.12, df = 1, p-value = 0.729

 

 
K23456
Tumor stage0.360.440.660.830.76
Rectal/colon8.69e-031.68e-026.12e-020.150.13
BRAF9.18e-051.46e-121.81e-121.16e-131.63e-10
KRAS2.41e-023.36e-025.51e-022.53e-043.32e-03
KRAS type6.98e-023.705241e-010.559.23e-030.46
TP530.725.90e-034.15e-031.40e-022.14e-02
MLH1 methyl.9.19e-051.46e-122.01e-101.03e-131.63e-10