...
Will using a different package make any difference? Use ConsensusClusterPlus package with the same parameters as pvclust. Hierarchical clustering, evaluate 20 clusters, use 80% of the data for bootstrapping. They claimed that the identified 4 clusters.
It seems that there is some separation but the size of the clusters is very uneven, nothing like was presented in the paper.
After receiving the Sweave file I found that 5060 "null" probes don't include probes from XY chromosomes. Then I took the most variable probes I used for clustering and identified that 25% of those are from X or Y chromosomes.
- Carefully identify the probes, remove the from the raw data (5060 + ~800).
- Create two datasets: batch only normalized (125 tumor patients) and gender,age normalized (no need to remove batch since they all came from a single batch; 100 patients).
- Repeat ConsensusClusterPlus with the most variable probes (HC, complete linkage, euclidean distance). Use 10% of the original probe number as described in the Sweave document. They followed the vignette directly.
- Repeat ConsensusClusterPlus using K-means, K=2:6, Pearson correlation. Use 10% of the original probe number as described in the Sweave document.
Batch only normalized, HC, euclidean distance.
May be there are like 5 or 6 clusters but not 4. It definitely doesn't look the the clusters identified in the paper.
Gender and age adjusted, HC, euclidean distance.
This looks significantly worse.
Final attempt: K-means clustering, pearson correlation and the seed value provided in the package. Batch removed:
I tried to correlate clusters (K=3 and K=4) with age and gender. Looks that the clusters don't correlate with age at all but have some correlation with gender.
K = 4:
Code Block | ||
---|---|---|
| ||
> kruskal.test(tumorMeta$Age,consClass4) Kruskal-Wallis rank sum test data: tumorMeta$Age and consClass4 Kruskal-Wallis chi-squared = 4.9015, df = 3, p-value = 0.1792 > chisq.test(tumorMeta$Gender,consClass4) Pearsons Chi-squared test data: tumorMeta$Gender and consClass4 X-squared = 14.7676, df = 3, p-value = 0.002026 |
Age distribution among clusters:
K = 3:
Code Block | ||
---|---|---|
| ||
> kruskal.test(tumorMeta$Age,consClass3)
Kruskal-Wallis rank sum test
data: tumorMeta$Age and consClass3
Kruskal-Wallis chi-squared = 2.4866, df = 2, p-value = 0.2884
> chisq.test(tumorMeta$Gender,consClass3)
Pearsons Chi-squared test
data: tumorMeta$Gender and consClass3
X-squared = 5.9141, df = 2, p-value = 0.05197 |