Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
h5. Analysis of batch vs clinical traits

Number of clinical traits: 84

Number of batches based on tumor DNA methylation data (samples retrieved according to this pattern: "TCGA-..-....-0..-..D-....-05"): 24

Correlation between center and batches ('two'=center (second field in the patient barcode)):
{code:collapse=true}> table(batchID,two)
       two
batchID A1 A2 A7 A8 AC AN AO AQ AR B6 BH C8 D8 E2 E9 EW GI GM HN
   A00Y  0  3  7 66  0 12  2  0  0  0  4  0  0  0  0  0  0  0  0
   A032  0 22  0 14  0 19 10  1  0 16 10  0  0  0  0  0  0  0  0
   A058  0  2  1  3  0  0 11  0  0  4 26  0  0  0  0  0  0  0  0
   A088  7 14  0  1  0  0  1  0  8  9  7  0  0  0  0  0  0  0  0
   A10A  0 12  0  0  0  9  0  0  5  9  4  0  0  0  0  0  0  0  0
   A10N  0  1  0  0  0  1 12  2  0  1  1  0  0  9  0  0  0  0  0
   A10P  7 16  1  4  0  0 12  0  8 13 32  0  0  0  0  0  0  0  0
   A112  1 11  1  0  0  0  5  0  3  5 18 20  9 15  0  0  0  0  0
   A12E  0  0  0  0  0  0  0  0  1  0 20  2  0 21  0  0  0  0  0
   A12R  0  0  3  0  0  0  0  0 15  0 14  0  0 11  0  0  0  0  0
   A138  0  0  0  0  0  0  0  0  1  0  7  6  0  2  0  0  0  0  0
   A13K  0  7  1  0  0  0  5  2  0  3 18  4 19  6  0  8  0  0  0
   A145  6  0  0  0  0  0  1  0  0  0  0  0  0 10  4 19  0  0  0
   A148  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   A14H  0  1  0  0  0  0  0  0  0  0  1  0  7  5 10  1  0  0  0
   A14N  0  0  0  0  0  0  0  1  0  1  2  0 20  1  6  1  0  0  0
   A161  0  0  0  0  2  0  0  0  0  1  3  0  3  2 17  0  0  0  0
   A16A  0  4  6  0  1  0  0  0 22  0  1  3  0  0  9  0  0  0  0
   A16G  0  3  0  0  0  0  0  0  0  0  1  8 13  0  3  0  1  0  0
   A17F  0  0  0  0  4  0  0  0  0  0  0  0  1  0  0  3  0  0  0
   A17Z  0  0  0  0  2  0  0  0  4  0  0  0  0  0  1  0  0  6  0
   A18O  0  0  0  0  1  0  0  0  5  1  1  0  0  0  1  0  0  7  1
   A19F  0  2  0  1  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0
   A19Z  0  0  0  0  5  0  0  0  1  0  1  0  0  1  0  0  0  0  0
{code}

Significant batch-clinical traits correlations (the entire list can be found [here|^BatchClinicalInfoCorrelationsBRCA.csv]):

{csv}
"BRCA_clinical_traits","DataType","NumberOfNAs","Test","Pvalue"
"tissue_prospective_collection_indicator","factor",35,"Pearson's Chi-squared test",4.47E-62
"tissue_retrospective_collection_indicator","factor",35,"Pearson's Chi-squared test",4.47E-62
"year_of_initial_pathologic_diagnosis","integer",34,"Kruskal-Wallis rank sum test",3.15E-32
"breast_carcinoma_first_surgical_procedure_name","factor",54,"Pearson's Chi-squared test",5.45E-32
"days_to_last_followup","integer",73,"Kruskal-Wallis rank sum test",3.07E-31
"days_to_form_completion","integer",34,"Kruskal-Wallis rank sum test",5.70E-31
"first_pathologic_diagnosis_biospecimen_acquisition_method_type","factor",123,"Pearson's Chi-squared test",3.39E-28
"breast_tumor_clinical_m_stage","factor",35,"Pearson's Chi-squared test",1.06E-22
"axillary_lymph_node_stage_method_type","factor",223,"Pearson's Chi-squared test",9.33E-19
"breast_tumor_pathologic_n_stage","factor",34,"Pearson's Chi-squared test",2.19E-17
"lab_proc_her2_neu_immunohistochemistry_receptor_status","factor",41,"Pearson's Chi-squared test",6.22E-16
"breast_carcinoma_estrogen_receptor_status","factor",34,"Pearson's Chi-squared test",1.85E-13
"breast_carcinoma_progesterone_receptor_status","factor",34,"Pearson's Chi-squared test",8.87E-13
"vital_status","factor",34,"Pearson's Chi-squared test",2.38E-09
"anatomic_site_location_descriptor","factor",119,"Pearson's Chi-squared test",1.03E-07
"age_at_initial_pathologic_diagnosis","integer",34,"Kruskal-Wallis rank sum test",5.87E-06
"days_to_birth","integer",34,"Kruskal-Wallis rank sum test",6.68E-06
"lab_procedure_her2_neu_in_situ_hybrid_outcome_type","factor",194,"Pearson's Chi-squared test",3.18E-05
"person_menopause_status","factor",161,"Pearson's Chi-squared test",5.70E-05
"breast_tumor_pathologic_grouping_stage","factor",40,"Pearson's Chi-squared test",7.40E-05
"her2_immunohistochemistry_level_result","factor",351,"Pearson's Chi-squared test",1.72E-04
"breast_tumor_pathologic_t_stage","factor",34,"Pearson's Chi-squared test",2.82E-04
"pos_finding_lymph_node_hematoxylin_and_eosin_staining_microscopy_count","integer",177,"Kruskal-Wallis rank sum test",6.49E-04
"cytokeratin_immunohistochemistry_staining_method_micrometastasis_indicator","factor",324,"Pearson's Chi-squared test",8.61E-04
"person_neoplasm_cancer_status","factor",284,"Pearson's Chi-squared test",7.95E-03
"breast_cancer_optical_measurement_histologic_type","factor",34,"Pearson's Chi-squared test",1.47E-02
"disease_surgical_margin_status","factor",82,"Pearson's Chi-squared test",3.70E-02
{csv}

h5. Correlation with survival

Relevant clinical traits: days to the last follow-up (27), vital status (83), days to death (24), days to last know alive (28), summaries:
{code:collapse=true}> summary(clinical[,27]) # days to the last follow up
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
    0.0   140.0   457.0   815.8  1194.0  6795.0    73.0
> table(clinical[,83]) #vital status

DECEASED   LIVING
      93      725
> summary(clinical[,24]) # days to death
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
    157     811    1563    1744    2520    4456     759
> summary(clinical[,28]) # days to last known alive
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
    0.0   293.5   607.5  1068.0  1442.0  6795.0   508.0{code}
It seems that similarly to the colon cancer combined datasets days to last known alive is similar to the days to the last follow-up, however days to the last follow up contains more information (fewer NAs), use it for construction of the survival object. The survival object was created in the same way as for the analyses of other TCGA cancer datasets. Info is available ([here|METHYLATION:Colon cancer],  and [here|METHYLATION:AML - acute myeloid leukemia] and [here|METHYLATION:LUAD - lung adenocarcinoma]) 

h5. DNA methylation data

December 21st, 2011: 27k and 450k arrays are available. Downloaded Level 1 450k data. It seems that they started splitting green and red probes into 2 separate files and they also provide now the Illumina's idat files which are the bead level data (not tab delimited files). I need to find a way to process them, it seems that Bioconductor beadarray package can be used to read these files and do some bead level normalization (summarization too?). The Level2 data contains already summarized and normalized data (tab delimited files with CpG ID, value for methylated and value for unmethylated probes). Also tried to download 27k arrays available for breast cancer, however the data is available for \~26 patients (they stopped running those arrays?).