...
Code Block |
---|
x<-u507$v[,1] names(x)<-colnames(exp507) #Identification of outliers in the boxplot of the first eigengene grouped by batch > boxplot(split(x,shortBatch),xlab="batch",ylab="1st eigengene")$out#Identification of outliers in the boxplot of the first eigengene grouped by batch > boxplot(split(x,shortBatch),xlab="batch",ylab="1st eigengene")$out TCGA-13-0762-01A TCGA-13-0768-01A TCGA-24-1614-01A TCGA-04-1655-01A -0.12132315 0.06403801 -0.10078455 -0.05562895 TCGA-04-1649-01A TCGA-04-1652-01A TCGA-09-2049-01D TCGA-29-2425-01A -0.07523963 -0.15237043 -0.09206651 -0.05197294 TCGA-13-0762-01A TCGA-13-0768-01A TCGA-24-1614-01A TCGA-04-1655-01A -0.12132315 0.06403801 -0.10078455 -0.05562895>05562895 TCGA-04-1649-01A TCGA-04-1652-01A TCGA-09-2049-01D TCGA-29-2425-01A -0.07523963 -0.15237043 -0.09206651 -0.0519729 > boxplot(split(x,shortBatch),xlab="batch",ylab="1st eigengene") > text(x=boxplot(split(x,shortBatch),xlab="batch",ylab="1st eigengene")$group,y=boxplot(split(x,shortBatch),xlab="batch",ylab="1st eigengene")$out,labels=names(boxplot(split(x,shortBatch),xlab="batch",ylab="1st eigengene")$out)) TCGA-04-1649-01A TCGA-04-1652-01A TCGA-09-2049-01D TCGA-29-2425-01A -0.07523963 -0.15237043 -0.09206651 -0.0519729 |
Kruskal-Wallis test for association of the first 4 egengenes with batch:
...
I identified that the first 150 eigengenes account for 80% of the variance, performed Kruskal-Wallis test with all 150 and plotted the P value. Also looked at the correlation of the batch and the center effect and looked at the correlation of the first 150 eigengenes with the center:
Remove the batch effect:
Code Block |
---|
X<-model.matrix(~factor(batch)) bch <- solve(t(X) %*% X) %*% t(X) %*% t(expr) resExpr <- expr-t(X %*% bch) |
Look at the relative variance and the association of the eigengenes with the batch and the center after removing the batch effect (again, looking at the first 150 eigengenes):
I looked specifically at the p values from the Kruskal-Wallis test for association with the center effect:
...
Note: I also performed a few analyses where I removed batch and the center and then also looked at the distribution of Kruskal-Wallis test with day of shipment, month of shipment, year of shipment, concentration, plate column, plate row and amount. Justin suggested that center effect is very minor and not worth removing but I also noticed that removing batch and center also completely removed the day, month and year of shipment effects (which I saw that in DNA methylation normalization these technical factors were highly correlated with the batch effect) and concentration, plate column, plate row and amount are insignificant. The graphs for these analyses can be found here.