Correlation between BCR batch and the processing batch for 27k arrays (January 20, 2012)

Batch on the download page,"# after ""HumanMethylation27k"" in the file name, Level 2 data",Batch as the sixth field in the patient barcode,Comments
Batch 25,1,"0741, 0742, 0743","Level 1 data is uploaded again as .idat files split into green and red probes, I can't figure out how to get batch from the file names. Now, however, they provide slide number and the array letter!"

Analysis of batch vs clinical traits


There is only one center from which all patients come from. 
Significant trait - batch correlations (all other correlations can be found in a table here)
Wiki Markup{csv}"LAML_clinical_traits","DataType","NumberOfNAs","Test","Pvalue" "days_to_form_completion","integer",2,"Kruskal-Wallis rank sum test",1.22E-26 "year_of_initial_pathologic_diagnosis","integer",2,"Kruskal-Wallis rank sum test",2.53E-26 "days_to_death","integer",82,"Kruskal-Wallis rank sum test",3.94E-04 "prior_diagnosis","factor",2,"Pearson's Chi-squared test",1.63E-03 "vital_status","factor",2,"Pearson's Chi-squared test",5.25E-03 "age_at_initial_pathologic_diagnosis","integer",2,"Kruskal-Wallis rank sum test",1.66E-02 "days_to_birth","integer",2,"Kruskal-Wallis rank sum test",1.90E-02 "hydroxyurea_administration_prior_registration_clinical_study_indicator","factor",2,"Pearson's Chi-squared test",3.05E-02 "pretreatment_history","factor",2,"Pearson's Chi-squared test",3.05E-02 {csv}

Survival analysis
Code Block
> death<-clinical[,4]
> vital<-clinical[,22]
> fup<-clinical[,7]
> x<-cbind(vital,death,fup)
> rownames(x)<-rownames(clinical)
> dim(x[[, 2]) &[, 3]), ])
[1] 14  3
> mask <-[, 2]) &[, 3]) #Exclude patients for whom there is no information for days to death or days to the last follow-up, total of 14 patients
> x1 <- x[!mask, ]
> dim(x1)
[1] 188   3
> status <- rep(1, 188) #create censoring indicator
> status[which([, 2]))] <- 0
> x1[[, 2]), 2] <- x1[which([, 2]), 2), 3] #Patients that don't have days to death get days to the last follow-up and status is 0
> x2 <- cbind(x1, status)
> k<-match(rownames(x2),meth[,1])
> methK<-meth[k,]
> library(survival)
Loading required package: splines
> surv <- Surv(as.numeric(x2[, 2]), as.numeric(x2[, 4])) #Create survival object
> plot(survfit(surv ~ 1), xlab = "days to death", ylab = "Probability", main = "Kaplan-Meier survival curve \n for TCGA AML (188 patients)")
> plot(survfit(surv ~ methK[, 2]), xlab = "days to death", ylab = "Probability", col = 1:3, main = "TCGA AML survival correlation with batch")
> legend(2000, 0.6, levels(as.factor(methK[, 2])), text.col = 1:3)
> summary(coxph(surv ~ methK[, 2]))
coxph(formula = surv ~ methK[, 2])

  n= 182, number of events= 116
   (6 observations deleted due to missingness)

                  coef exp(coef) se(coef)      z Pr(>|z|)
methK[, 2]0742 -0.2311    0.7937   0.2139 -1.080 0.279990
methK[, 2]0743  1.1784    3.2492   0.3541  3.328 0.000874 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

               exp(coef) exp(-coef) lower .95 upper .95
methK[, 2]0742    0.7937     1.2599    0.5219     1.207
methK[, 2]0743    3.2492     0.3078    1.6233     6.504

Concordance= 0.563  (se = 0.026 )
Rsquare= 0.06   (max possible= 0.997 )
Likelihood ratio test= 11.17  on 2 df,   p=0.003754
Wald test            = 14.35  on 2 df,   p=0.0007656
Score (logrank) test = 16.2  on 2 df,   p=0.0003037
