...
M value: analysis of red probes, identification of adjustment variables
Percent variance explained of the "red" dataset:
Potential adjustment variables: batch (sixth field in the patient ID barcode); center (second field in the patient ID barcode); day, month, year of shipment; concentration, amount, plate row, plate column. Potential biology: tumor stage, tumor grade, age.
Correlation of the red probes, first four eigengenes with the batch effect:
Code Block |
---|
kruskal.test(mredsvd$v[,2],as.factor(batch))
Kruskal-Wallis rank sum test
data: mredsvd$v[, 2] and as.factor(batch)
Kruskal-Wallis chi-squared = 10.8903, df = 12, p-value = 0.5383
kruskal.test(mredsvd$v[,3],as.factor(batch))
Kruskal-Wallis rank sum test
data: mredsvd$v[, 3] and as.factor(batch)
Kruskal-Wallis chi-squared = 21.7447, df = 12, p-value = 0.04048
kruskal.test(mredsvd$v[,4],as.factor(batch))
Kruskal-Wallis rank sum test
data: mredsvd$v[, 4] and as.factor(batch)
Kruskal-Wallis chi-squared = 35.8388, df = 12, p-value = 0.0003439 |
After removing the batch effect p-value for the association of the first principal component with the batch is 0.7748. adjustment and bio variables:
(Kruskal-Wallist test for categorical variables and Spearman correlation for age)
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2.2e-16 | 2.2e-16 | 2.2e-16 | 2.2e-16 | 2.2e-16 | 2.2e-16 | 0.0004486 | 5.882547e-02 | 6.881028e-02 | 0.32 | 0.27 | 0.1071 |
2 | 0.5383 | 0.3486 | 0.5577 | 0.9876 | 0.2710 | 0.04873 | 0.6482 | 0.2862026 | 0.4786892 | 0.31 | 0.10 | 0.006634 |
3 | 0.04048 | 0.05258 | 0.03756 | 0.01480 | 0.1233 | 0.1786 | 0.5335 | 0.55585676 | 0.25289498 | 0.50 | 0.35 | 0.5131 |
4 | 0.0003439 | 0.01709 | 0.0008948 | 0.0001387 | 0.5725 | 0.7225 | 0.5267 | 0.0516508987 | 0.1404578746 | 0.43 | 0.43 | 0.02168 |
From my previous work with Brig we identified that day, month and year of shipment and center are highly correlated with batch. Therefore start by removing the batch effect. Percent variance explained after removing the batch effect:
Code Block | ||
---|---|---|
| ||
> X<-model.matrix(~factor(batch))
> Xbc<-solve(t(X) %*% X) %*% t(X) %*% t(red)
> redB<- red-t(X %*% Xbc)
|
Looks like the first eigengene now explains ~20% of the variance. Correlation with the adjustment and bio variables after removing the batch effect:
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.7748 | 9.445e-05 | 6.8e-01 | 8.2e-01 | 8.0e-01 | 5.4e-01 | 1.1e-01 | 1.3e-07 | 9.4e-05 | 0.541 | 0.093 | 0.6379 |
2 | 1 | 0.4722 | 1.00 | 1.00 | 0.98 | 0.78 | 0.76 | 0.26 | 0.45 | 0.18 | 0.11 | 0.01303 |
3 | 1 | 0.03917 | 1.00 | 1.00 | 0.94 | 0.98 | 0.45 | 0.59 | 0.15 | 0.60 | 0.24 | 0.06834 |
4 | 1 | 0.3475 | 1.000 | 1.000 | 0.998 | 0.969 | 0.428 | 0.055 | 0.104 | 0.56 | 0.73 | 0.1463 |
Removing the batch effect didn't completely remove the center and the row/column effect. Day, month and year of shipment have been taken care of.