...
SVD on the entire 27k by 511 patients M value matrix, plot 1st eigenarray (u matrix) and "color" the points according to the dye with with each CpG was labeled:
M value: analysis of red probes, identification of adjustment variables
...
Now remove the batch, plate row and plate column, look at the percent variance explained:
Now the first principal component explains a little less than 15% of the overall variability. Correlation with the adjustment and bio variables, see if the center effect still present:
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.936 | 7.104e-05 | 0.881 | 0.915 | 0.938 | 0.780 | 0.027 | 0.998 | 0.757 | 0.36 | 0.22 | 0.6967 |
2 | 1 | 0.7801 | 1.00 | 1.00 | 0.96 | 0.92 | 0.75 | 0.97 | 0.45 | 0.36 | 0.30 | 0.02479 |
3 | 1 | 0.04068 | 1.00 | 1.00 | 0.93 | 0.97 | 0.51 | 1.00 | 0.18 | 0.30 | 0.27 | 0.1425 |
Finally, the center effect needs to go. Variables to adjust for: batch, center, plate row, plate column. Percent Variance explained after the adjustment:
Under 14%! Still large effect, look at the variables:
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.9986 | 0.9998 | 0.993 | 0.987 | 0.963 | 0.897 | 0.042 | 0.999 | 0.433 | 0.39 | 0.34 | 0.8608 |
2 | 1 | 1 | 1.00 | 1.00 | 0.95 | 0.96 | 0.86 | 0.97 | 0.61 | 0.27 | 0.36 | 0.02406 |
3 | 1 | 1 | 1.00 | 1.00 | 0.84 | 0.96 | 0.50 | 1.00 | 0.75 | 0.33 | 0.45 | 0.3325 |
Now lets take a look at the outliers of the first eigengene (patients 6 vs 367):
I guess it doesn't look too terrible. I also tried to remove all the listed variables as well as the first principal component and here is what I got in terms of the percent variance explained and the outliers:
To me it looks worse than with the first principal component. Final: remove the batch, center, plate row and plate column from the data.
M value: analysis of green probes, identification of adjustment variables
Percent Variance explained:
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2.2e-16 | 2.2e-16 | 1.6e-61 | 5.7e-39 | 3.8e-31 | 1.6e-19 | 6.9e-04 | 3.2e-02 | 2.2e-01 | 0.36 | 0.17 | 0.2778 |
Remove the batch, center, plate row, plate column (also mask that one batch), looks at the first eigengene and the eigenarray:
P values after the adjustment:
PCs | Batch | Center | Day | Month | Year | Amount | Concentr. | Row | Column | Stage | Grade | Age |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.999 | 0.9999 | 0.995 | 0.991 | 0.956 | 0.863 | 0.074 | 0.997 | 0.626 | 0.60 | 0.26 | 0.6206 |
Look at the outliers:
Weird!
Do one more test and remove the first eigengene together with the variables above:
Conclusion
Now scale (center=TRUE, scale=TRUE) bot datasets (red and green probes), combine them together for network construction.