Wiki Markup |
---|
h5. Batch vs clinical traits Batch vs center: {code}> table(batchID,center) center batchID B7 BR CD CG D7 EQ F1 1129 0 31 0 0 0 0 0 1156 0 12 0 23 0 0 0 1601 2 0 7 16 3 1 0 1801 0 9 0 3 10 0 1 1883 0 11 0 0 5 0 1{code} Most significant correlations (complete list can be found [here|^BatchClinicalInfoCorrelationsSTAD.txt]) {csv}STAD,DataType,NumberOfNAs,Test,Pvalue residual_tumor,factor,8,Pearson's Chi-squared test,8.06E-17 year_of_initial_pathologic_diagnosis,integer,1,Kruskal-Wallis rank sum test,2.88E-13 days_to_form_completion,integer,1,Kruskal-Wallis rank sum test,3.72E-11 days_to_last_followup,integer,1,Kruskal-Wallis rank sum test,5.15E-11 primary_tumor_pathologic_spread,factor,1,Pearson's Chi-squared test,1.68E-06 histological_type,factor,6,Pearson's Chi-squared test,5.82E-06 lymphnode_pathologic_spread,factor,1,Pearson's Chi-squared test,6.09E-05 number_of_lymphnodes_examined,integer,53,Kruskal-Wallis rank sum test,1.25E-04 vital_status,factor,1,Pearson's Chi-squared test,2.95E-03 tumor_stage,factor,31,Pearson's Chi-squared test,3.49E-02{csv} \\ h5. Batch vs survival !KaplanMeierCurveSTAD.png|thumbnail! !SurvivalByBatchSTAD.png|thumbnail! {code:collapse=true}Call: coxph(formula = survivalObject ~ batchVector) n= 134, number of events= 14 coef exp(coef) se(coef) z Pr(>|z|) batchVector1156 1.722e+01 3.012e+07 7.152e+03 0.002 0.998 batchVector1601 1.786e+01 5.728e+07 7.152e+03 0.002 0.998 batchVector1801 1.663e+01 1.665e+07 7.152e+03 0.002 0.998 batchVector1883 NA NA 0.000e+00 NA NA exp(coef) exp(-coef) lower .95 upper .95 batchVector1156 30117568 3.320e-08 0 Inf batchVector1601 57279474 1.746e-08 0 Inf batchVector1801 16645384 6.008e-08 0 Inf batchVector1883 NA NA NA NA Rsquare= 0.026 (max possible= 0.496 ) Likelihood ratio test= 3.58 on 3 df, p=0.3109 Wald test = 2.13 on 3 df, p=0.545 Score (logrank) test = 3.1 on 3 df, p=0.3762 Warning messages: 1: In fitter(X, Y, strats, offset, init, control, weights = weights, : Loglik converged before variable 1,2,3 ; beta may be infinite. 2: In coxph(survivalObject ~ batchVector) : X matrix deemed to be singular; variable 4{code} No correlation with survival. For some reason I got NAs and an error for the last batch although it is definitely not because of the unused factor levels. h5. DNA methylation 27k arrays, 66 patients. Create M value, don't split between red and green. SVD: !STAD_Mvalue_noNorm_dataDistribution.png|thumbnail! !STAD_Mvalue_noNorm_RelativeVariance.png|thumbnail! !STAD_Mvalue_noNorm_PC1outliers.png|thumbnail! Summary of the technical variables: {code}> summary(methS) batchID amount concentration plate_column plate_row 1129:31 16.9 uL: 1 0.13 ug/uL: 6 1:16 A :10 1156:35 26.7 uL:65 0.14 ug/uL:27 2:13 C : 9 0.15 ug/uL:25 3:13 D : 9 0.16 ug/uL: 7 4:10 F : 9 0.17 ug/uL: 1 5: 9 B : 8 6: 5 E : 8 (Other):13 shortDay 21-7-2010:31 28-7-2010:35{code} So this dataset has only 2 batches. Lets see if they have any correlation with the principal components: {code:collapse=true}> x batchID amount concentration plate_column plate_row shortDay V1 4.999780e-01 0.8132652 0.9636092 0.2126458 0.41035836 4.999780e-01 V2 1.080231e-07 0.1214957 0.8025371 0.2954381 0.91858389 1.080231e-07 V3 6.028215e-01 0.4465735 0.9897603 0.5199681 0.07110241 6.028215e-01 V4 7.947106e-02 0.2818850 0.3579813 0.8230956 0.52338954 7.947106e-02 V5 1.125719e-01 0.9790610 0.5150996 0.3113563 0.29650943 1.125719e-01 V6 5.502164e-01 0.4465735 0.3134523 0.3787485 0.50090145 5.502164e-01 V7 7.922533e-01 0.5117243 0.6591395 0.4459644 0.76917348 7.922533e-01 V8 9.614704e-02 0.1488680 0.2382575 0.3455933 0.94015824 9.614704e-02{code} Looks like the second PC is highly correlated but the batch and also 4th and 8th. The second PC explains 10% of the data variance. Remove the batch: !STAD_Mvalue_batchRemoved_dataDistribution.png|thumbnail! !STAD_Mvalue_batchRemoved_RelativeVariance.png|thumbnail! !STAD_Mvalue_batchRemoved_PC1outliers.png|thumbnail! {code:collapse=true}> x batchID amount concentration plate_column plate_row shortDay V1 0.9538949 0.7329525 0.9668135 0.1956406 0.3925206 0.9538949 V2 0.6951568 0.1346448 0.7778342 0.6589222 0.1054539 0.6951568 V3 0.8522117 0.1642106 0.3273640 0.7278436 0.7377284 0.8522117 V4 0.9436648 0.8132652 0.2334584 0.4411353 0.9901676 0.9436648 V5 0.9743762 0.8955925 0.3016907 0.8663039 0.2159179 0.9743762 V6 0.9130370 0.4158556 0.3873149 0.5267212 0.4462888 0.9130370 V7 0.4145840 0.5815169 0.3605256 0.4940810 0.6986479 0.4145840 V8 0.9028540 0.1214957 0.5218528 0.3929218 0.5285360 0.9028540{code} Removing batch took care of all other correlations. I guess remove it. was also wondering about correlation of batch with the clinical traits in this smaller dataset (actual DNA methylation data, not potential). Correlation of batch and histological type: 0.001488 (Chi-square test) and 3.0e-05 (Fisher test); residual tumor: 7.465e-07 (Chi-square test) and 6.536e-09 (Fisher test). There weren't any significant correlation with tumor grade. With tumor stage: 0.04773 (Chi-square), 0.009894 (Fisher test). Consider the data to be normalized. Expression set object is available. |
Page Comparison
General
Content
Integrations
App links