Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Will using a different package make any difference? Use ConsensusClusterPlus package with the same parameters as pvclust. Hierarchical clustering, evaluate 20 clusters, use 80% of the data for bootstrapping. They claimed that the identified 4 clusters. 

Image RemovedImage RemovedImage RemovedImage RemovedImage Removed

May be there are like 5 or 6 clusters but not 4.

Image AddedImage AddedImage AddedImage Added


It seems that there is some separation but the size of the clusters is very uneven, nothing like was presented in the paper. 

Image Added

After receiving the Sweave file I found that 5060 "null" probes don't include probes from XY chromosomes. Then I took the most variable probes I used for clustering and identified that 25% of those are from X or Y chromosomes. 

Next: 
  1. Carefully identify the probes, remove the from the raw data (5060 + ~800).
  2. Create two datasets: batch only normalized (125 tumor patients) and gender,age normalized (no need to remove batch since they all came from a single batch; 100 patients).
  3. Repeat ConsensusClusterPlus with the most variable probes (HC, complete linkage, euclidean distance). Use 10% of the original probe number as described in the Sweave document. They followed the vignette directly. 
  4. Repeat ConsensusClusterPlus using K-means, K=2:6, Pearson correlation. Use 10% of the original probe number as described in the Sweave document.


    Batch only normalized, HC, euclidean distance.

Image AddedImage AddedImage AddedImage AddedImage Added

May be there are like 5 or 6 clusters but not 4. It definitely doesn't look the the clusters identified in the paper.

Gender and age adjusted, HC, euclidean distance

Image AddedImage AddedImage AddedImage AddedImage Added

This looks significantly worse. 

 

Final attempt: K-means clustering, pearson correlation and the seed value provided in the package. Batch removed:

Image AddedImage AddedImage AddedImage Added

I tried to correlate clusters (K=3 and K=4) with age and gender. Looks that the clusters don't correlate with age at all but have some correlation with gender.

K = 4:

Code Block
collapsetrue
> kruskal.test(tumorMeta$Age,consClass4)
        Kruskal-Wallis rank sum test
data:  tumorMeta$Age and consClass4
Kruskal-Wallis chi-squared = 4.9015, df = 3, p-value = 0.1792
> chisq.test(tumorMeta$Gender,consClass4)
        Pearsons Chi-squared test
data:  tumorMeta$Gender and consClass4
X-squared = 14.7676, df = 3, p-value = 0.002026

Age distribution among clusters:

Image Added

 

K = 3:

Code Block
collapsetrue
> kruskal.test(tumorMeta$Age,consClass3)
        Kruskal-Wallis rank sum test
data:  tumorMeta$Age and consClass3
Kruskal-Wallis chi-squared = 2.4866, df = 2, p-value = 0.2884
> chisq.test(tumorMeta$Gender,consClass3)
        Pearsons Chi-squared test
data:  tumorMeta$Gender and consClass3
X-squared = 5.9141, df = 2, p-value = 0.05197