Wiki Markup |
---|
{color:#ff0000}{_}Important update (January 20th, 2011): the data below have been corrected for the BCR batch which is not necessarily the processing batch. The dataset needs to be reanalyzed._{color}_ _
h5. Batch vs clinical traits
Batch vs center:
{code}> table(batchID,center)
center
batchID B7 BR CD CG D7 EQ F1
1129 0 31 0 0 0 0 0
1156 0 12 0 23 0 0 0
1601 2 0 7 16 3 1 0
1801 0 9 0 3 10 0 1
1883 0 11 0 0 5 0 1{code}
Most significant correlations (complete list can be found [here|^BatchClinicalInfoCorrelationsSTAD.txt])
{csv}STAD,DataType,NumberOfNAs,Test,Pvalue
residual_tumor,factor,8,Pearson's Chi-squared test,8.06E-17
year_of_initial_pathologic_diagnosis,integer,1,Kruskal-Wallis rank sum test,2.88E-13
days_to_form_completion,integer,1,Kruskal-Wallis rank sum test,3.72E-11
days_to_last_followup,integer,1,Kruskal-Wallis rank sum test,5.15E-11
primary_tumor_pathologic_spread,factor,1,Pearson's Chi-squared test,1.68E-06
histological_type,factor,6,Pearson's Chi-squared test,5.82E-06
lymphnode_pathologic_spread,factor,1,Pearson's Chi-squared test,6.09E-05
number_of_lymphnodes_examined,integer,53,Kruskal-Wallis rank sum test,1.25E-04
vital_status,factor,1,Pearson's Chi-squared test,2.95E-03
tumor_stage,factor,31,Pearson's Chi-squared test,3.49E-02{csv}
\\
h5. Batch vs survival
!KaplanMeierCurveSTAD.png|thumbnail! !SurvivalByBatchSTAD.png|thumbnail!
{code:collapse=true}Call:
coxph(formula = survivalObject ~ batchVector)
n= 134, number of events= 14
coef exp(coef) se(coef) z Pr(>|z|)
batchVector1156 1.722e+01 3.012e+07 7.152e+03 0.002 0.998
batchVector1601 1.786e+01 5.728e+07 7.152e+03 0.002 0.998
batchVector1801 1.663e+01 1.665e+07 7.152e+03 0.002 0.998
batchVector1883 NA NA 0.000e+00 NA NA
exp(coef) exp(-coef) lower .95 upper .95
batchVector1156 30117568 3.320e-08 0 Inf
batchVector1601 57279474 1.746e-08 0 Inf
batchVector1801 16645384 6.008e-08 0 Inf
batchVector1883 NA NA NA NA
Rsquare= 0.026 (max possible= 0.496 )
Likelihood ratio test= 3.58 on 3 df, p=0.3109
Wald test = 2.13 on 3 df, p=0.545
Score (logrank) test = 3.1 on 3 df, p=0.3762
Warning messages:
1: In fitter(X, Y, strats, offset, init, control, weights = weights, :
Loglik converged before variable 1,2,3 ; beta may be infinite.
2: In coxph(survivalObject ~ batchVector) :
X matrix deemed to be singular; variable 4{code}
No correlation with survival. For some reason I got NAs and an error for the last batch although it is definitely not because of the unused factor levels.
h5. DNA methylation
27k arrays, 66 patients. Create M value, don't split between red and green. SVD:
!STAD_Mvalue_noNorm_dataDistribution.png|thumbnail! !STAD_Mvalue_noNorm_RelativeVariance.png|thumbnail! !STAD_Mvalue_noNorm_PC1outliers.png|thumbnail!
Summary of the technical variables:
{code}> summary(methS)
batchID amount concentration plate_column plate_row
1129:31 16.9 uL: 1 0.13 ug/uL: 6 1:16 A :10
1156:35 26.7 uL:65 0.14 ug/uL:27 2:13 C : 9
0.15 ug/uL:25 3:13 D : 9
0.16 ug/uL: 7 4:10 F : 9
0.17 ug/uL: 1 5: 9 B : 8
6: 5 E : 8
(Other):13
shortDay
21-7-2010:31
28-7-2010:35{code}
So this dataset has only 2 batches. Lets see if they have any correlation with the principal components:
{code:collapse=true}> x
batchID amount concentration plate_column plate_row shortDay
V1 4.999780e-01 0.8132652 0.9636092 0.2126458 0.41035836 4.999780e-01
V2 1.080231e-07 0.1214957 0.8025371 0.2954381 0.91858389 1.080231e-07
V3 6.028215e-01 0.4465735 0.9897603 0.5199681 0.07110241 6.028215e-01
V4 7.947106e-02 0.2818850 0.3579813 0.8230956 0.52338954 7.947106e-02
V5 1.125719e-01 0.9790610 0.5150996 0.3113563 0.29650943 1.125719e-01
V6 5.502164e-01 0.4465735 0.3134523 0.3787485 0.50090145 5.502164e-01
V7 7.922533e-01 0.5117243 0.6591395 0.4459644 0.76917348 7.922533e-01
V8 9.614704e-02 0.1488680 0.2382575 0.3455933 0.94015824 9.614704e-02{code}
Looks like the second PC is highly correlated but the batch and also 4th and 8th. The second PC explains 10% of the data variance. Remove the batch: !STAD_Mvalue_batchRemoved_dataDistribution.png|thumbnail! !STAD_Mvalue_batchRemoved_RelativeVariance.png|thumbnail! !STAD_Mvalue_batchRemoved_PC1outliers.png|thumbnail!
{code:collapse=true}> x
batchID amount concentration plate_column plate_row shortDay
V1 0.9538949 0.7329525 0.9668135 0.1956406 0.3925206 0.9538949
V2 0.6951568 0.1346448 0.7778342 0.6589222 0.1054539 0.6951568
V3 0.8522117 0.1642106 0.3273640 0.7278436 0.7377284 0.8522117
V4 0.9436648 0.8132652 0.2334584 0.4411353 0.9901676 0.9436648
V5 0.9743762 0.8955925 0.3016907 0.8663039 0.2159179 0.9743762
V6 0.9130370 0.4158556 0.3873149 0.5267212 0.4462888 0.9130370
V7 0.4145840 0.5815169 0.3605256 0.4940810 0.6986479 0.4145840
V8 0.9028540 0.1214957 0.5218528 0.3929218 0.5285360 0.9028540{code}
Removing batch took care of all other correlations. I was also wondering about correlation of batch with the clinical traits in this smaller dataset (actual DNA methylation data, not potential). Correlation of batch and histological type: 0.001488 (Chi-square test) and 3.0e-05 (Fisher test); residual tumor: 7.465e-07 (Chi-square test) and 6.536e-09 (Fisher test). There weren't any significant correlation with tumor grade. With tumor stage: 0.04773 (Chi-square), 0.009894 (Fisher test).
Consider the data to be normalized.
Expression set object is available. |
Page Comparison
General
Content
Integrations
App links