...
Code Block |
---|
library(corpcor) u<-fast.svd(sweep(expr),1,rowMeans(expr)) #to plot relative variance: plot(u$d^2/sum(u$d^2),main="Relative variance") #plot the first 4 eigengenes: par(mfrow=c(2,2)) plot(u$v[,1],main="1") plot(u$v[,2],main="2")plot(u$v[,3],main="3")plot(u$v[,4],main="4") #Identification of the "worst" offenders of the first eigengene: order(u$v[,1])[1] order(-u$v[,1])[1] plot(expr[,c(37,383)],main="37 vs 383") |
So it seems to me that the data on overall doesn't look horrible. Now I wanted to see if the first eigengene is correlated with batch (eyeballing it). Batch is the sixth field in the patient barcode. So just split patient barcodes by "-" and get the sixth field. First four eigengenes in relation to the batch effect:
Code Block |
---|
library(lattice) xyplot(u$v[,1]~factor(batch),main="1") xyplot(u$v[,2]~factor(batch),main="2") xyplot(u$v[,3]~factor(batch),main="3") xyplot(u$v[,4]~factor(batch),main="4") |
Remove batch effect:
Code Block |
---|
X<-model.matrix(~factor(batch))
bch <- solve(t(X) %*% X) %*% t(X) %*% t(expr)
resExpr <- expr-t(X %*% bch) |
Lets do the same diagnostic tests:
The only thing that concerns me after removing the batch effect is how "the worst" offenders look like. These two arrays were identified as the "worst" offenders of the first eigengene after removing the batch effect and here how they look like (first column). The second column shows the same 2 datasets before removing the batch effect:
- batch effect | + batch effect |
---|---|
| |
It seems that removing the batch effect completely reversed their relationship.
Now another comparison for the two datasets that were identified as the worst offenders before removing the batch effect and what happened to them after removing the batch effect:
- batch | + batch |
---|---|
| |
Is it too drastic?