...
SVD on the entire 27k by 511 patients M value matrix, plot 1st eigenarray (u matrix) and "color" the points according to the dye with with each CpG was labeled:
M value: analysis of red probes, identification of adjustment variables
...
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2.2e-16 | 2.2e-16 | 2.2e-16 | 2.2e-16 | 2.2e-16 | 2.2e-16 | 0.0004486 | 5.882547e-02 | 6.881028e-02 | 0.32 | 0.27 | 0.1071 |
2 | 0.5383 | 0.3486 | 0.5577 | 0.9876 | 0.2710 | 0.04873 | 0.6482 | 0.2862026 | 0.4786892 | 0.31 | 0.10 | 0.006634 |
3 | 0.04048 | 0.05258 | 0.03756 | 0.01480 | 0.1233 | 0.1786 | 0.5335 | 0.55585676 | 0.25289498 | 0.50 | 0.35 | 0.5131 |
4 | 0.0003439 | 0.01709 | 0.0008948 | 0.0001387 | 0.5725 | 0.7225 | 0.5267 | 0.0516508987 | 0.1404578746 | 0.43 | 0.43 | 0.02168 |
Finally, the center effect needs to go. Variables to adjust for: batch, center, plate row, plate column. Percent Variance explained after the adjustment:
It looks like removing the batch and the plate row did help some with the center effect but the plate column effect is still significantly higher. Need to remove that.
...
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.936 | 7.104e-05 | 0.881 | 0.915 | 0.938 | 0.780 | 0.027 | 0.998 | 0.757 | 0.36 | 0.22 | 0.6967 |
2 | 1 | 0.7801 | 1.00 | 1.00 | 0.96 | 0.92 | 0.75 | 0.97 | 0.45 | 0.36 | 0.30 | 0.02479 |
3 | 1 | 0.04068 | 1.00 | 1.00 | 0.93 | 0.97 | 0.51 | 1.00 | 0.18 | 0.30 | 0.27 | 0.1425 |
Finally, the center effect needs to go. Variables to adjust for: batch, center, plate row, plate column. Percent Variance explained after the adjustment:
Under 14%! Still large effect, look at the variables:
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.9986 | 0.9998 | 0.993 | 0.987 | 0.963 | 0.897 | 0.042 | 0.999 | 0.433 | 0.39 | 0.34 | 0.8608 |
2 | 1 | 1 | 1.00 | 1.00 | 0.95 | 0.96 | 0.86 | 0.97 | 0.61 | 0.27 | 0.36 | 0.02406 |
3 | 1 | 1 | 1.00 | 1.00 | 0.84 | 0.96 | 0.50 | 1.00 | 0.75 | 0.33 | 0.45 | 0.3325 |
Now lets take a look at the outliers of the first eigengene (patients 6 vs 367):
I guess it doesn't look too terrible. I also tried to remove all the listed variables as well as the first principal component and here is what I got in terms of the percent variance explained and the outliers:
To me it looks worse than with the first principal component. Final: remove the batch, center, plate row and plate column from the data.
M value: analysis of green probes, identification of adjustment variables
Percent Variance explained:
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2.2e-16 | 2.2e-16 | 1.6e-61 | 5.7e-39 | 3.8e-31 | 1.6e-19 | 6.9e-04 | 3.2e-02 | 2.2e-01 | 0.36 | 0.17 | 0.2778 |
Remove the batch, center, plate row, plate column (also mask that one batch), looks at the first eigengene and the eigenarray:
P values after the adjustment:
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.999 | 0.9999 | 0.995 | 0.991 | 0.956 | 0.863 | 0.074 | 0.997 | 0.626 | 0.60 | 0.26 | 0.6206 |
Look at the outliers:
Weird!
Do one more test and remove the first eigengene together with the variables above:
Conclusion
Now scale (center=TRUE, scale=TRUE) bot datasets (red and green probes), combine them together for network construction.