Page Comparison

...

M value: analysis of red probes, identification of adjustment variables

Percent variance explained of the "red" dataset:
Image Added
Potential adjustment variables: batch (sixth field in the patient ID barcode); center (second field in the patient ID barcode); day, month, year of shipment; concentration, amount, plate row, plate column. Potential biology: tumor stage, tumor grade, age.

Correlation of the red probes, first four eigengenes with the batch effect:

Code Block

kruskal.test(mredsvd$v[,2],as.factor(batch))

        Kruskal-Wallis rank sum test

data:  mredsvd$v[, 2] and as.factor(batch)
Kruskal-Wallis chi-squared = 10.8903, df = 12, p-value = 0.5383

kruskal.test(mredsvd$v[,3],as.factor(batch))

        Kruskal-Wallis rank sum test

data:  mredsvd$v[, 3] and as.factor(batch)
Kruskal-Wallis chi-squared = 21.7447, df = 12, p-value = 0.04048

kruskal.test(mredsvd$v[,4],as.factor(batch))

        Kruskal-Wallis rank sum test

data:  mredsvd$v[, 4] and as.factor(batch)
Kruskal-Wallis chi-squared = 35.8388, df = 12, p-value = 0.0003439

After removing the batch effect p-value for the association of the first principal component with the batch is 0.7748. adjustment and bio variables:

(Kruskal-Wallist test for categorical variables and Spearman correlation for age)

PCs	Batch	Center	Day	Month	Year	Amount	Concentr.	Row	Column	Stage	Grade	Age
1	2.2e-16	2.2e-16	2.2e-16	2.2e-16	2.2e-16	2.2e-16	0.0004486	5.882547e-02	6.881028e-02	0.32	0.27	0.1071
2	0.5383	0.3486	0.5577	0.9876	0.2710	0.04873	0.6482	0.2862026	0.4786892	0.31	0.10	0.006634
3	0.04048	0.05258	0.03756	0.01480	0.1233	0.1786	0.5335	0.55585676	0.25289498	0.50	0.35	0.5131
4	0.0003439	0.01709	0.0008948	0.0001387	0.5725	0.7225	0.5267	0.0516508987	0.1404578746	0.43	0.43	0.02168

From my previous work with Brig we identified that day, month and year of shipment and center are highly correlated with batch. Therefore start by removing the batch effect. Percent variance explained after removing the batch effect:

Code Block

collapse	true

> X<-model.matrix(~factor(batch))
> Xbc<-solve(t(X) %*% X) %*% t(X) %*% t(red)
> redB<- red-t(X %*% Xbc)

Image Added
Looks like the first eigengene now explains ~20% of the variance. Correlation with the adjustment and bio variables after removing the batch effect:

PCs	Batch	Center	Day	Month	Year	Amount	Concentr.	Row	Column	Stage	Grade	Age
1	0.7748	9.445e-05	6.8e-01	8.2e-01	8.0e-01	5.4e-01	1.1e-01	1.3e-07	9.4e-05	0.541	0.093	0.6379
2	1	0.4722	1.00	1.00	0.98	0.78	0.76	0.26	0.45	0.18	0.11	0.01303
3	1	0.03917	1.00	1.00	0.94	0.98	0.45	0.59	0.15	0.60	0.24	0.06834
4	1	0.3475	1.000	1.000	0.998	0.969	0.428	0.055	0.104	0.56	0.73	0.1463

Removing the batch effect didn't completely remove the center and the row/column effect. Day, month and year of shipment have been taken care of.

Version	Old Version 4	New Version 5
Changes made by	Vitalina Komashko (Unlicensed)	Vitalina Komashko (Unlicensed)
Saved on	Oct 19, 2011	Oct 19, 2011

Versions Compared

Key

M value: analysis of red probes, identification of adjustment variables