Batch vs clinical traits
Number of batches is 12. Correlation between batch and center:
Significant batch/clinical traits correlations (complete list can be found here):
LUSC,DataType,NumberOfNAs,Test,Pvalue
tumor_stage,factor,27,Pearson's Chi-squared test,8.78E-14
year_of_initial_pathologic_diagnosis,integer,23,Kruskal-Wallis rank sum test,7.95E-12
days_to_form_completion,integer,30,Kruskal-Wallis rank sum test,1.48E-09
primary_tumor_pathologic_spread,factor,23,Pearson's Chi-squared test,1.96E-09
distant_metastasis_pathologic_spread,factor,29,Pearson's Chi-squared test,3.77E-05
days_to_last_followup,integer,42,Kruskal-Wallis rank sum test,7.68E-05
vital_status,factor,23,Pearson's Chi-squared test,2.37E-03
year_of_tobacco_smoking_onset,integer,116,Kruskal-Wallis rank sum test,3.12E-03
year_of_tobacco_smoking_cessation,integer,88,Kruskal-Wallis rank sum test,5.84E-03
days_to_last_known_alive,integer,75,Kruskal-Wallis rank sum test,7.37E-03
residual_tumor,factor,46,Pearson's Chi-squared test,2.00E-02
lymphnode_pathologic_spread,factor,23,Pearson's Chi-squared test,5.48E-02
age_at_initial_pathologic_diagnosis,integer,30,Kruskal-Wallis rank sum test,9.24E-02
days_to_birth,integer,30,Kruskal-Wallis rank sum test,9.73E-02
Batch vs survival
Again, for this type of cancer clinical traits file contains days to last known alive but it has more NAs than days to the last follow up so I will use the latter for construction of the survival object.
On overall, correlation of batch with survival is not significant. There is one batch (1096) that seems to be somewhat more involved and it has only 11 patients. When I removed all patients from that batch no other batches showed completely insignificant correlation with survival.