Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

Important

...

update

...

(January

...

20th,

...

2011):

...

the

...

data

...

below

...

have

...

been

...

corrected

...

for

...

the

...

BCR

...

batch

...

which

...

is

...

not

...

necessarily

...

the

...

processing

...

batch.

...

The

...

dataset

...

needs

...

to

...

be

...

reanalyzed.

...

 

Correlation between BCR batch and the processing batch for 27k arrays (January 20, 2012)

Analysis of batch vs clinical traits

Number of clinical traits: 84

Number of batches based on tumor DNA methylation data (samples retrieved according to this pattern: "TCGA-......-0....D-....-05"): 24

Correlation between center and batches ('two'=center (second field in the patient barcode)):

Code Block
collapsetrue
> table(batchID,two)
       two
batchID A1 A2 A7 A8 AC AN AO AQ AR B6 BH C8 D8 E2 E9 EW GI GM HN
   A00Y  0  3  7 66  0 12  2  0  0  50  9  4  0  0  0  0  0  0  0  0
   A10NA032  0 22 1 0 014  0 19 010  1 12  2  0 16 110  10  0  0  90  0  0  0  0
 0  A058  A10P0  72 16  1  43  0  0 1211  0  0 8 134 3226  0  0  0  0  0  0  0  0
   A112A088  17 1114  10  01  0  0  51  0  38  5 18 20  9 159  7  0  0  0  0  0  0  A12E0  0
 0  0A10A  0 12  0  0  0  09  10  0 20 5 2 9 0 214  0  0  0  0  0  0  A12R0  0
 0  3A10N  0  01  0  0  0 15 1 12 0 142  0  01  111  0  0  09  0  0  0  A1380  0
 0  0A10P  7 016  01  04  0  0 12 1  0  78 13 632  0  20  0  0  0  0  0  0
 A13K  A112 0 1 711  1  0  0  0  5  20  03  35 18 20 4 199 15 6  0  80  0  0  0
   A145A12E  60  0  0  0  0  0  10  0  01  0 20  02  0 21  0 10 0 4 19  0  0  0
   A148A12R  0  0  13  0  0  0  0  0 15  0  014  0  0 11 0  0  0  0  0  0
 0  A138  A14H0  0  10  0  0  0  0  0  01  0  07  16  0  72  50 10 0 1  0  0  0
   A14NA13K  0  07  01  0  0  0  05  12  0  13 18 2 4 019 20 6 1 0 6 8 1  0  0  0
   A161A145  06  0  0  0  20  0  01  0  0  10  30  0  30 10 2 174 19 0  0  0  0
   A16AA148  0  4  6  0  1  0  0  0 22  0  10  30  0  0  90  0  0  0  0  0  A16G0  0
 3  0A14H  0  01  0  0  0  0  0  1  8 130  0  30  0  1  0  07  5 10 A17F 1  0  0  0
 0  4A14N  0  0  0  0  0  0  0  1  0  01  32  0 20 0 1 0 6   A17Z1  0  0  0
 0  2A161  0  0  0  40  02  0  0  0  0  1  03  0  63  02 17  0 A18O  0  0  0
 0  1A16A  0  04  06  50  1  10  0  0 22  0  1  3  0  0  79  10  0  A19F0  0
 2  A16G  0  13  0  0  0  0  0  10  0  0  01  8 113  0  3  0  01  0  0
   A19ZA17F  0  0  0  0  54  0  0  0  10  0  1  0  0  1  0  0  03  0  0 {code} 0
Significant batch-clinical traits correlationsA17Z (the entire0 list can be found [here|^BatchClinicalInfoCorrelationsBRCA.csv]):

{csv}
"BRCA_clinical_traits","DataType","NumberOfNAs","Test","Pvalue"
"tissue_prospective_collection_indicator","factor",35,"Pearson's Chi-squared test",4.47E-62
"tissue_retrospective_collection_indicator","factor",35,"Pearson's Chi-squared test",4.47E-62
"year_of_initial_pathologic_diagnosis","integer",34,"Kruskal-Wallis rank sum test",3.15E-32
"breast_carcinoma_first_surgical_procedure_name","factor",54,"Pearson's Chi-squared test",5.45E-32
"days_to_last_followup","integer",73,"Kruskal-Wallis rank sum test",3.07E-31
"days_to_form_completion","integer",34,"Kruskal-Wallis rank sum test",5.70E-31
"first_pathologic_diagnosis_biospecimen_acquisition_method_type","factor",123,"Pearson's Chi-squared test",3.39E-28
"breast_tumor_clinical_m_stage","factor",35,"Pearson's Chi-squared test",1.06E-22
"axillary_lymph_node_stage_method_type","factor",223,"Pearson's Chi-squared test",9.33E-19
"breast_tumor_pathologic_n_stage","factor",34,"Pearson's Chi-squared test",2.19E-17
"lab_proc_her2_neu_immunohistochemistry_receptor_status","factor",41,"Pearson's Chi-squared test",6.22E-16
"breast_carcinoma_estrogen_receptor_status","factor",34,"Pearson's Chi-squared test",1.85E-13
"breast_carcinoma_progesterone_receptor_status","factor",34,"Pearson's Chi-squared test",8.87E-13
"vital_status","factor",34,"Pearson's Chi-squared test",2.38E-09
"anatomic_site_location_descriptor","factor",119,"Pearson's Chi-squared test",1.03E-07
"age_at_initial_pathologic_diagnosis","integer",34,"Kruskal-Wallis rank sum test",5.87E-06
"days_to_birth","integer",34,"Kruskal-Wallis rank sum test",6.68E-06
"lab_procedure_her2_neu_in_situ_hybrid_outcome_type","factor",194,"Pearson's Chi-squared test",3.18E-05
"person_menopause_status","factor",161,"Pearson's Chi-squared test",5.70E-05
"breast_tumor_pathologic_grouping_stage","factor",40,"Pearson's Chi-squared test",7.40E-05
"her2_immunohistochemistry_level_result","factor",351,"Pearson's Chi-squared test",1.72E-04
"breast_tumor_pathologic_t_stage","factor",34,"Pearson's Chi-squared test",2.82E-04
"pos_finding_lymph_node_hematoxylin_and_eosin_staining_microscopy_count","integer",177,"Kruskal-Wallis rank sum test",6.49E-04
"cytokeratin_immunohistochemistry_staining_method_micrometastasis_indicator","factor",324,"Pearson's Chi-squared test",8.61E-04
"person_neoplasm_cancer_status","factor",284,"Pearson's Chi-squared test",7.95E-03
"breast_cancer_optical_measurement_histologic_type","factor",34,"Pearson's Chi-squared test",1.47E-02
"disease_surgical_margin_status","factor",82,"Pearson's Chi-squared test",3.70E-02
{csv}

h5. Correlation with survival

Relevant clinical traits: days to the last follow-up (27), vital status (83), days to death (24), days to last know alive (28), summaries:
{code:collapse=true} 0  0  0  2  0  0  0  4  0  0  0  0  0  1  0  0  6  0
   A18O  0  0  0  0  1  0  0  0  5  1  1  0  0  0  1  0  0  7  1
   A19F  0  2  0  1  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0
   A19Z  0  0  0  0  5  0  0  0  1  0  1  0  0  1  0  0  0  0  0

Significant batch-clinical traits correlations (the entire list can be found here):

Correlation with survival

Relevant clinical traits: days to the last follow-up (27), vital status (83), days to death (24), days to last know alive (28), summaries:

Code Block
collapsetrue
> summary(clinical[,27]) # days to the last follow up
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
    0.0   140.0   457.0   815.8  1194.0  6795.0    73.0
> table(clinical[,83]) #vital status

DECEASED   LIVING
      93      725
> summary(clinical[,24]) # days to death
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
    157     811    1563    1744    2520    4456     759
> summary(clinical[,28]) # days to last known alive
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
    0.0   293.5   607.5  1068.0  1442.0  6795.0   508.0{code}

It

...

seems

...

that

...

similarly

...

to

...

the

...

colon

...

cancer

...

combined

...

datasets

...

days

...

to

...

last

...

known

...

alive

...

is

...

similar

...

to

...

the

...

days

...

to

...

the

...

last

...

follow-up,

...

however

...

days

...

to

...

the

...

last

...

follow

...

up

...

contains

...

more

...

information

...

(fewer

...

NAs),

...

use

...

it

...

for

...

construction

...

of

...

the

...

survival

...

object.

...

No

...

patients

...

missed

...

information

...

for

...

both

...

days

...

to

...

the

...

last

...

follow

...

up

...

and

...

days

...

to

...

death.

...

The

...

survival

...

object

...

was

...

created

...

in

...

the

...

same

...

way

...

as

...

for

...

the

...

analyses

...

of

...

other

...

TCGA

...

cancer

...

datasets.

...

Info

...

is

...

available

...

(

...

here and here

Kaplan Meier curve and survival plots break down by batch:
Image Added Image Added
Here is the summary of the survival vs batch:

Code Block
collapsetrue
> summary(coxph(surv~methM[,2]))
Call:
coxph(formula = surv ~ methM[, 2])

  n= 818, number of events= 93
   (34 observations deleted due to missingness)

                     coef  exp(coef)   se(coef)      z Pr(>|z|)
methM[, 2]A032 -6.961e-01  4.985e-01  5.148e-01 -1.352   0.1763
methM[, 2]A058 -2.321e+00  9.819e-02  1.084e+00 -2.140   0.0323 *
methM[, 2]A088 -5.512e-01  5.763e-01  5.413e-01 -1.018   0.3086
methM[, 2]A10A -2.238e-01  7.995e-01  5.565e-01 -0.402   0.6876
methM[, 2]A10N -1.455e+00  2.334e-01  8.217e-01 -1.771   0.0766 .
methM[, 2]A112 -9.643e-01  3.812e-01  5.847e-01 -1.649   0.0991 .
methM[, 2]A12E  9.108e-01  2.486e+00  5.015e-01  1.816   0.0694 .
methM[, 2]A12R -1.794e+00  1.663e-01  1.082e+00 -1.657   0.0975 .
methM[, 2]A138  8.921e-01  2.440e+00  5.487e-01  1.626   0.1040
methM[, 2]A13K  2.542e-01  1.289e+00  4.825e-01  0.527   0.5983
methM[, 2]A145 -9.748e-01  3.773e-01  1.081e+00 -0.902   0.3673
methM[, 2]A14H  3.164e-01  1.372e+00  8.215e-01  0.385   0.7002
methM[, 2]A14N  9.591e-01  2.609e+00  1.089e+00  0.880   0.3786
methM[, 2]A161 -1.382e-01  8.709e-01  7.183e-01 -0.192   0.8475
methM[, 2]A16A -1.347e+00  2.600e-01  8.206e-01 -1.642   0.1007
methM[, 2]A16G -1.714e+01  3.615e-08  5.030e+03 -0.003   0.9973
methM[, 2]A17F -1.731e+01  3.039e-08  1.338e+04 -0.001   0.9990
methM[, 2]A17Z -1.725e+01  3.216e-08  4.025e+03 -0.004   0.9966
methM[, 2]A18O -1.434e+00  2.383e-01  1.088e+00 -1.318   0.1875
methM[, 2]A19Z         NA         NA  0.000e+00     NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

               exp(coef) exp(-coef) lower .95 upper .95
methM[, 2]A032 4.985e-01  2.006e+00   0.18176    1.3672
methM[, 2]A058 9.819e-02  1.018e+01   0.01172    0.8223
methM[, 2]A088 5.763e-01  1.735e+00   0.19946    1.6649
methM[, 2]A10A 7.995e-01  1.251e+00   0.26858    2.3797
methM[, 2]A10N 2.334e-01  4.284e+00   0.04664    1.1685
methM[, 2]A112 3.812e-01  2.623e+00   0.12121    1.1991
methM[, 2]A12E 2.486e+00  4.022e-01   0.93037    6.6440
methM[, 2]A12R 1.663e-01  6.012e+00   0.01993    1.3879
methM[, 2]A138 2.440e+00  4.098e-01   0.83256    7.1522
methM[, 2]A13K 1.289e+00  7.755e-01   0.50082    3.3197
methM[, 2]A145 3.773e-01  2.651e+00   0.04531    3.1409
methM[, 2]A14H 1.372e+00  7.288e-01   0.27424    6.8653
methM[, 2]A14N 2.609e+00  3.832e-01   0.30851   22.0693
methM[, 2]A161 8.709e-01  1.148e+00   0.21309    3.5598
methM[, 2]A16A 2.600e-01  3.846e+00   0.05206    1.2985
methM[, 2]A16G 3.615e-08  2.766e+07   0.00000       Inf
methM[, 2]A17F 3.039e-08  3.290e+07   0.00000       Inf
methM[, 2]A17Z 3.216e-08  3.109e+07   0.00000       Inf
methM[, 2]A18O 2.383e-01  4.196e+00   0.02825    2.0108
methM[, 2]A19Z        NA         NA        NA        NA

Rsquare= 0.069   (max possible= 0.67 )
Likelihood ratio test= 58.61  on 19 df,   p=6.4e-06
Wald test            = 50  on 19 df,   p=0.0001311
Score (logrank) test = 69.46  on 19 df,   p=1.129e-07

Warning messages:
1: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Loglik converged before variable  16,17,18 ; beta may be infinite.
2: In coxph(surv ~ methM[, 2]) :
  X matrix deemed to be singular; variable 20{code}
_

It

...

seems

...

that

...

there

...

are

...

a

...

lot

...

of

...

errors,

...

I

...

wonder

...

why.

...

I

...

also

...

don't

...

understand

...

where

...

those

...

observations

...

come

...

from

...

that

...

are

...

deleted

...

due

...

to

...

missingness.

...

Need

...

to

...

ask

...

someone

...

to

...

help

...

clarify

...

this

...

output

...

.

...

Update

...

(January

...

5,

...

2012):

...

there

...

are

...

NAs

...

for

...

some

...

batches

...

because

...

I

...

had

...

factor

...

levels

...

left

...

in

...

the

...

batch

...

vector

...

but

...

no

...

data

...

for

...

those

...

levels.

...

Fixed

...

the

...

problem

...

with

...

that.

...

"Deleted

...

due

...

to

...

missingness"

...

also

...

fixed

...

as

...

I

...

figured

...

out

...

how

...

that

...

I

...

need

...

to

...

be

...

more

...

careful

...

about

...

using

...

'match'

...

for

...

subsetting.

...

 

DNA methylation data

December 21st, 2011: 27k and 450k arrays are available. Downloaded Level 1 450k data. It seems that they started splitting green and red probes into 2 separate files and they also provide now the Illumina's idat files which are the bead level data (not tab delimited files). I need to find a way to process them, it seems that Bioconductor beadarray package can be used to read these files and do some bead level normalization (summarization too?). The Level2 data contains already summarized and normalized data (tab delimited files with CpG ID, value for methylated and value for unmethylated probes), however it is available only for 91 patients. Also tried to download 27k arrays available for breast cancer, however the data is available for ~26 patients (they stopped running those arrays?).  I guess I need to figure out how to process Level 1 data.