Analysis of batch vs clinical traits
Number of clinical traits: 84
Number of batches based on tumor DNA methylation data (samples retrieved according to this pattern: "TCGA-......-0....D-....-05"): 24
Correlation between center and batches ('two'=center (second field in the patient barcode)):
Significant batch-clinical traits correlations (the entire list can be found here):
"BRCA_clinical_traits","DataType","NumberOfNAs","Test","Pvalue"
"tissue_prospective_collection_indicator","factor",35,"Pearson's Chi-squared test",4.47E-62
"tissue_retrospective_collection_indicator","factor",35,"Pearson's Chi-squared test",4.47E-62
"year_of_initial_pathologic_diagnosis","integer",34,"Kruskal-Wallis rank sum test",3.15E-32
"breast_carcinoma_first_surgical_procedure_name","factor",54,"Pearson's Chi-squared test",5.45E-32
"days_to_last_followup","integer",73,"Kruskal-Wallis rank sum test",3.07E-31
"days_to_form_completion","integer",34,"Kruskal-Wallis rank sum test",5.70E-31
"first_pathologic_diagnosis_biospecimen_acquisition_method_type","factor",123,"Pearson's Chi-squared test",3.39E-28
"breast_tumor_clinical_m_stage","factor",35,"Pearson's Chi-squared test",1.06E-22
"axillary_lymph_node_stage_method_type","factor",223,"Pearson's Chi-squared test",9.33E-19
"breast_tumor_pathologic_n_stage","factor",34,"Pearson's Chi-squared test",2.19E-17
"lab_proc_her2_neu_immunohistochemistry_receptor_status","factor",41,"Pearson's Chi-squared test",6.22E-16
"breast_carcinoma_estrogen_receptor_status","factor",34,"Pearson's Chi-squared test",1.85E-13
"breast_carcinoma_progesterone_receptor_status","factor",34,"Pearson's Chi-squared test",8.87E-13
"vital_status","factor",34,"Pearson's Chi-squared test",2.38E-09
"anatomic_site_location_descriptor","factor",119,"Pearson's Chi-squared test",1.03E-07
"age_at_initial_pathologic_diagnosis","integer",34,"Kruskal-Wallis rank sum test",5.87E-06
"days_to_birth","integer",34,"Kruskal-Wallis rank sum test",6.68E-06
"lab_procedure_her2_neu_in_situ_hybrid_outcome_type","factor",194,"Pearson's Chi-squared test",3.18E-05
"person_menopause_status","factor",161,"Pearson's Chi-squared test",5.70E-05
"breast_tumor_pathologic_grouping_stage","factor",40,"Pearson's Chi-squared test",7.40E-05
"her2_immunohistochemistry_level_result","factor",351,"Pearson's Chi-squared test",1.72E-04
"breast_tumor_pathologic_t_stage","factor",34,"Pearson's Chi-squared test",2.82E-04
"pos_finding_lymph_node_hematoxylin_and_eosin_staining_microscopy_count","integer",177,"Kruskal-Wallis rank sum test",6.49E-04
"cytokeratin_immunohistochemistry_staining_method_micrometastasis_indicator","factor",324,"Pearson's Chi-squared test",8.61E-04
"person_neoplasm_cancer_status","factor",284,"Pearson's Chi-squared test",7.95E-03
"breast_cancer_optical_measurement_histologic_type","factor",34,"Pearson's Chi-squared test",1.47E-02
"disease_surgical_margin_status","factor",82,"Pearson's Chi-squared test",3.70E-02
DNA methylation data
December 21st, 2011: 27k and 450k arrays are available. Downloaded Level 1 450k data. It seems that they started splitting green and red probes into 2 separate files.