Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

SVD on the entire 27k by 511 patients M value matrix, plot 1st eigenarray (u matrix) and "color" the points according to the dye with with each CpG was labeled:

Image Added

M value: analysis of red probes, identification of adjustment variables

...

Now remove the batch, plate row and plate column, look at the percent variance explained:

Now the first principal component explains a little less than 15% of the overall variability. Correlation with the adjustment and bio variables, see if the center effect still present:

PCs

Batch

Center

Day

Month

Year

Amount

Concentr.

Row

Column

Stage

Grade

Age

1

0.936

7.104e-05

0.881

0.915

0.938

0.780

0.027

0.998

0.757

0.36

0.22

0.6967

2

1

0.7801

1.00

1.00

0.96

0.92

0.75

0.97

0.45

0.36

0.30

0.02479

3

1

0.04068

1.00

1.00

0.93

0.97

0.51

1.00

0.18

0.30

0.27

0.1425

Finally, the center effect needs to go. Variables to adjust for: batch, center, plate row, plate column. Percent Variance explained after the adjustment:
Image Added Image Added

Under 14%! Still large effect, look at the variables:

PCs

Batch

Center

Day

Month

Year

Amount

Concentr.

Row

Column

Stage

Grade

Age

1

0.9986

0.9998

0.993

0.987

0.963

0.897

0.042

0.999

0.433

0.39

0.34

0.8608

2

1

1

1.00

1.00

0.95

0.96

0.86

0.97

0.61

0.27

0.36

0.02406

3

1

1

1.00

1.00

0.84

0.96

0.50

1.00

0.75

0.33

0.45

0.3325

Now lets take a look at the outliers of the first eigengene (patients 6 vs 367):

Image Added

I guess it doesn't look too terrible. I also tried to remove all the listed variables as well as the first principal component and here is what I got in terms of the percent variance explained and the outliers: Image Added  Image Added
To me it looks worse than with the first principal component. Final: remove the batch, center, plate row and plate column from the data. 

M value: analysis of green probes, identification of adjustment variables

Percent Variance explained:

Image Added

PCs

Batch

Center

Day

Month

Year

Amount

Concentr.

Row

Column

Stage

Grade

Age

1

2.2e-16

2.2e-16

1.6e-61

5.7e-39

3.8e-31

1.6e-19

6.9e-04

3.2e-02

2.2e-01

0.36

0.17

0.2778

Remove the batch, center, plate row, plate column (also mask that one batch), looks at the first eigengene and the eigenarray:

Image Added Image Added

P values after the adjustment:

PCs

Batch

Center

Day

Month

Year

Amount

Concentr.

Row

Column

Stage

Grade

Age

1

0.999

0.9999

0.995

0.991

0.956

0.863

0.074

0.997

0.626

0.60

0.26

0.6206

Look at the outliers:
Image Added
Weird!
Do one more test and remove the first eigengene together with the variables above:
Image Added Image Added

Conclusion

Now scale (center=TRUE, scale=TRUE) bot datasets (red and green probes), combine them together for network construction.