...
Important update (January 20th, 2011): the data below have been corrected for the BCR batch which is not necessarily the processing batch. The dataset needs to be reanalyzed.
Correlation between BCR batch and the processing batch for 27k arrays(January 20, 2012)
Batch vs clinical traits
Number of batches is 12. Correlation between batch and center:
Code Block | ||
---|---|---|
| ||
table(two,batchID) batchID two 0689 0848 0979 1096 1198 1440 1551 1633 1818 1871 1947 2043 18 0 0 12 2 0 2 0 2 0 0 0 0 21 13 0 0 0 0 0 0 4 0 0 0 0 22 9 0 0 0 6 3 0 12 2 0 2 0 33 0 0 0 0 4 5 0 0 1 0 1 0 34 0 6 0 0 0 2 0 1 7 1 1 0 37 0 0 2 6 1 0 0 1 0 0 0 0 39 0 0 0 0 0 12 0 0 4 0 0 0 43 0 3 2 0 0 0 2 1 4 0 1 0 46 0 0 5 0 0 0 0 0 2 0 0 0 51 0 0 0 3 0 0 0 0 0 0 0 1 56 1 0 0 0 0 0 0 2 2 0 0 6 60 0 20 0 0 1 0 0 0 1 2 0 2 63 0 0 0 0 0 2 0 0 1 0 4 0 66 0 18 15 0 6 0 0 0 0 0 04 0 7066 0 018 015 0 06 0 0 0 20 0 0 0 7770 0 0 0 0 0 0 0 0 02 0 0 4 100 7977 0 0 0 0 0 0 0 0 0 0 14 10 0 8579 0 0 0 0 0 0 0 0 30 0 1 0 9085 0 0 0 0 0 0 0 0 03 0 1 0 92 0 0 0 0 090 0 0 0 0 0 0 0 2 940 0 0 01 0 092 0 0 0 0 0 10 0 96 0 0 0 0 02 094 0 0 0 0 0 0 2 0 98 0 0 0 01 0 096 0 0 0 0 0 1{code}0 Significant batch/clinical traits correlations0 (complete list can be found [here|^BatchClinicalInfoCorrelationsLUSC.txt]): {csv}LUSC,DataType,NumberOfNAs,Test,Pvalue tumor_stage,factor,27,Pearson's Chi-squared test,8.78E-14 year_of_initial_pathologic_diagnosis,integer,23,Kruskal-Wallis rank sum test,7.95E-12 days_to_form_completion,integer,30,Kruskal-Wallis rank sum test,1.48E-09 primary_tumor_pathologic_spread,factor,23,Pearson's Chi-squared test,1.96E-09 distant_metastasis_pathologic_spread,factor,29,Pearson's Chi-squared test,3.77E-05 days_to_last_followup,integer,42,Kruskal-Wallis rank sum test,7.68E-05 vital_status,factor,23,Pearson's Chi-squared test,2.37E-03 year_of_tobacco_smoking_onset,integer,116,Kruskal-Wallis rank sum test,3.12E-03 year_of_tobacco_smoking_cessation,integer,88,Kruskal-Wallis rank sum test,5.84E-03 days_to_last_known_alive,integer,75,Kruskal-Wallis rank sum test,7.37E-03 residual_tumor,factor,46,Pearson's Chi-squared test,2.00E-02 lymphnode_pathologic_spread,factor,23,Pearson's Chi-squared test,5.48E-02 age_at_initial_pathologic_diagnosis,integer,30,Kruskal-Wallis rank sum test,9.24E-02 days_to_birth,integer,30,Kruskal-Wallis rank sum test,9.73E-02{csv} h5. Batch vs survival Again, for this type of cancer clinical traits file contains days to last known alive but it has more NAs than days to the last follow up so I will use the latter for construction of the survival object. !KaplanMeierCurveLUSC.png|thumbnail! !SurvivalByBatchLUSC.png|thumbnail! {code:collapse=true}0 0 0 0 2 98 0 0 0 0 0 0 0 0 0 0 0 1 |
Significant batch/clinical traits correlations (complete list can be found here):
Batch vs survival
Again, for this type of cancer clinical traits file contains days to last known alive but it has more NAs than days to the last follow up so I will use the latter for construction of the survival object.
Code Block | ||
---|---|---|
| ||
Call:
coxph(formula = survivalObject ~ batchVector)
n= 223, number of events= 92
coef exp(coef) se(coef) z Pr(>|z|)
batchVector0848 -0.14217 0.86748 0.37975 -0.374 0.70813
batchVector0979 -0.23685 0.78911 0.42661 -0.555 0.57877
batchVector1096 1.66699 5.29619 0.60925 2.736 0.00622 **
batchVector1198 -0.13837 0.87077 0.42245 -0.328 0.74325
batchVector1440 -0.25689 0.77345 0.38754 -0.663 0.50741
batchVector1633 0.27021 1.31025 0.37760 0.716 0.47423
batchVector1818 -0.21253 0.80853 0.46395 -0.458 0.64688
batchVector1947 -0.05172 0.94959 0.48598 -0.106 0.91524
batchVector2043 0.06161 1.06355 1.03684 0.059 0.95261
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
batchVector0848 0.8675 1.1528 0.4121 1.826
batchVector0979 0.7891 1.2672 0.3420 1.821
batchVector1096 5.2962 0.1888 1.6046 17.481
batchVector1198 0.8708 1.1484 0.3805 1.993
batchVector1440 0.7735 1.2929 0.3619 1.653
batchVector1633 1.3102 0.7632 0.6251 2.746
batchVector1818 0.8085 1.2368 0.3257 2.007
batchVector1947 0.9496 1.0531 0.3663 2.462
batchVector2043 1.0636 0.9402 0.1394 8.116
Rsquare= 0.041 (max possible= 0.973 )
Likelihood ratio test= 9.36 on 9 df, p=0.4044
Wald test = 12.53 on 9 df, p=0.1849
Score (logrank) test = 15.53 on 9 df, p=0.07747
{code}
|
On
...
overall,
...
correlation
...
of
...
batch
...
with
...
survival
...
is
...
not
...
significant.
...
There
...
is
...
one
...
batch
...
(1096)
...
that
...
seems
...
to
...
be
...
somewhat
...
more
...
involved
...
and
...
it
...
has
...
only
...
11
...
patients.
...
When
...
I
...
removed
...
all
...
patients
...
from
...
that
...
batch
...
no
...
other
...
batches
...
showed
...
completely
...
insignificant
...
correlation
...
with
...
survival.
...
DNA methylation
27k (normal Level 1 format: methylated intensities are in the first column, unmethylated intensities are in the forth column), convert to M value, didn't slit into the red and green. 134 patients, matched to the technical clinical information 133 patients. Work with them. SVD:
After fixing the inconsistencies in the concentration column (I had 0.13 ug/uL and .13 ug/uL and like for all other concentrations), converting concentration and plate_column to factors here is the summary of the technical variables:
Code Block | ||
---|---|---|
| ||
> summary(tech) batchID amount concentration plate_column plate_row 0689:22 13.3 uL:97 0.13 ug/uL and .13 ug/uL and like for all other concentrations), converting concentration and plate_column to factors here is the summary of the technical variables: {code:collapse=true}> summary(tech) batchID :11 1:37 A :19 0848:46 26.7 uL:36 0.14 ug/uL:35 2:35 B amount :19 0979:36 concentration plate_column plate_row 0689:22 13.3 uL:97 0.1315 ug/uL:1159 13:3726 AC :1918 0848:461096:11 26.7 uL:36 0.1416 ug/uL:3523 24:3511 BD :1916 09791198:3618 0.1517 ug/uL:59 5 35:2616 CG :18 1096:1116 0.16 ug/uL:23 46:11 8 DE :1615 1198:18 0.17 ug/uL: 5 5:16 G :16 (Other):30 shortDay 10-3-2010 :46 14-7-2010 :11 18-11-2009:22 30-8-2010 :18 5-5-2010 :36 |
Correlation of the first 6 PCs with the tech variables:
Code Block | ||
---|---|---|
| ||
> x 6: 8 batchID E amount concentration plate_column plate_row :15 shortDay V1 1.219915e-14 1.512302e-15 0.22695068 0.004764525 0.5279767 1.219915e-14 V2 1.906561e-01 4.091184e-01 0.96296059 0.292069252 0.1656324 1.906561e-01 V3 7.464626e-02 6.984062e-02 0.22476713 0.467718779 0.5705236 7.464626e-02 V4 2.670836e-01 7.040891e-01 0.78009461 0.638345228 0.2242532 2.670836e-01 V5 5.721394e-01 (Other):307.689742e-01 0.42172107 0.348612502 0.6017037 5.721394e-01 shortDay 10-3-2010 :46 14-7-2010 :11 18-11-2009:22 30-8-2010 :18 5-5-2010 :36 {code} Correlation of the first 6 PCs with the tech variables: {code:collapse=true}V6 2.122370e-13 9.075951e-02 0.07250977 0.015916359 0.7132870 2.122370e-13 |
Start by removing the batch:
This looks strange. I never looked at the data distribution after normalization. Is it ok?
Correlation with the technical variables:
Code Block | ||
---|---|---|
| ||
> x batchID amount concentration plate_column plate_row shortDay V1 10.219915e-149882080 1.512302e-150.9838389 0.226950689665557 0.00476452545086339 0.527976708627734 10.219915e-149882080 V2 10.906561e-019975700 4.091184e-010.8632969 0.962960597730067 0.29206925260414536 0.165632457298681 10.906561e-019975700 V3 70.464626e-029998216 6.984062e-020.9394499 0.224767133675916 0.46771877902054405 0.570523688934477 70.464626e-029998216 V4 20.670836e-019994017 7.040891e-010.9313953 0.780094618917591 0.63834522877577312 0.224253215250595 20.670836e-019994017 V5 50.721394e-019993497 7.689742e-010.9959595 0.421721072796044 0.34861250245763689 0.601703762038204 50.721394e-019993497 V6 20.122370e-139766655 9.075951e-020.6485570 0.072509778141023 0.01591635962257929 0.713287025956284 20.122370e-13 {code} Start by removing the batch:9766655 |
Removing batch fixed correlations with other technical variables. Consider data to be normalized?
ExpressionSet object is available.