...
Steps outline and decisions:
Split the probes into type I and type II probes because they will be normalized separately as single color and two-color arrays. Demonstrate their differences. Demonstrate batch effect on probe level.
The package has a built in function for calculating the M value but it produced a matrix with NA which is because log2 of negative values was attempted (I think). I extracted methylated and unmethylated probes and calculated the M value as log2((meth+c)/(unmeth+c)) where c is a constant.Code Block collapse true > m<-getMeth(Mset.raw) > u<-getUnmeth(Mset.raw) > dim(m) [1] 485577 247 > dim(u) [1] 485577 247 > mval<-log2((m+0.01)/(u+0.01)) > x<-apply(mval,2,function(x) which(is.na(x))) > length(x) [1] 0
SVD on the entire matrix and correlation with the technical variables. Outliers look very "special":
- Split type I into four datasets: unmethylated red, methylated red, unmethylated green, methylated green. Remove intensity effects using snm package. Scale the datasets, combine into M value (log2(meth/unmeth))
- Normalized type II probes using snm package and adjusting intensity and color dependent effects. Scale and combine into the M value
- Combine type I and type II probes into a single matrix. Identify technical batches and their effect. Use snm package to retain important biological variables (sample type) and remove technical variables as well as age and gender.
...