...
SVD on the entire 27k by 511 patients M value matrix, plot 1st eigenarray (u matrix) and "color" the points according to the dye with with each CpG was labeled:
M value: analysis of red probes, identification of adjustment variables
...
Finally, the center effect needs to go. Variables to adjust for: batch, center, plate row, plate column. Percent Variance explained after the adjustment:
Under 14%! Still large effect, look at the variables:
...
I guess it doesn't look too terrible. I also tried to remove all the listed variables as well as the first principal component and here is what I got in terms of the percent variance explained and the outliers:
To me it looks worse than with the first principal component. Final: remove the batch, center, plate row and plate column from the data.
M value: analysis of green probes, identification of adjustment variables
Percent Variance explained:
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2.2e-16 | 2.2e-16 | 1.6e-61 | 5.7e-39 | 3.8e-31 | 1.6e-19 | 6.9e-04 | 3.2e-02 | 2.2e-01 | 0.36 | 0.17 | 0.2778 |
Remove the batch, center, plate row, plate column (also mask that one batch), looks at the first eigengene and the eigenarray:
P values after the adjustment:
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.999 | 0.9999 | 0.995 | 0.991 | 0.956 | 0.863 | 0.074 | 0.997 | 0.626 | 0.60 | 0.26 | 0.6206 |
Look at the outliers:
Weird!
Do one more test and remove the first eigengene together with the variables above:
Conclusion
Now scale (center=TRUE, scale=TRUE) bot datasets (red and green probes), combine them together for network construction.