Wiki Markup |
---|
h5. Batch vs clinical traits Batch vs center: {code}> table(batchID,center) center batchID B7 BR CD CG D7 EQ F1 1129 0 31 0 0 0 0 0 1156 0 12 0 23 0 0 0 1601 2 0 7 16 3 1 0 1801 0 9 0 3 10 0 1 1883 0 11 0 0 5 0 1{code} Most significant correlations (complete list can be found [here|^BatchClinicalInfoCorrelationsSTAD.txt]) {csv}STAD,DataType,NumberOfNAs,Test,Pvalue residual_tumor,factor,8,Pearson's Chi-squared test,8.06E-17 year_of_initial_pathologic_diagnosis,integer,1,Kruskal-Wallis rank sum test,2.88E-13 days_to_form_completion,integer,1,Kruskal-Wallis rank sum test,3.72E-11 days_to_last_followup,integer,1,Kruskal-Wallis rank sum test,5.15E-11 primary_tumor_pathologic_spread,factor,1,Pearson's Chi-squared test,1.68E-06 histological_type,factor,6,Pearson's Chi-squared test,5.82E-06 lymphnode_pathologic_spread,factor,1,Pearson's Chi-squared test,6.09E-05 number_of_lymphnodes_examined,integer,53,Kruskal-Wallis rank sum test,1.25E-04 vital_status,factor,1,Pearson's Chi-squared test,2.95E-03 tumor_stage,factor,31,Pearson's Chi-squared test,3.49E-02{csv} \\ h5. Batch vs survival !KaplanMeierCurveSTAD.png|thumbnail! !SurvivalByBatchSTAD.png|thumbnail! {code:collapse=true}Call: coxph(formula = survivalObject ~ batchVector) n= 134, number of events= 14 coef exp(coef) se(coef) z Pr(>|z|) batchVector1156 1.722e+01 3.012e+07 7.152e+03 0.002 0.998 batchVector1601 1.786e+01 5.728e+07 7.152e+03 0.002 0.998 batchVector1801 1.663e+01 1.665e+07 7.152e+03 0.002 0.998 batchVector1883 NA NA 0.000e+00 NA NA exp(coef) exp(-coef) lower .95 upper .95 batchVector1156 30117568 3.320e-08 0 Inf batchVector1601 57279474 1.746e-08 0 Inf batchVector1801 16645384 6.008e-08 0 Inf batchVector1883 NA NA NA NA Rsquare= 0.026 (max possible= 0.496 ) Likelihood ratio test= 3.58 on 3 df, p=0.3109 Wald test = 2.13 on 3 df, p=0.545 Score (logrank) test = 3.1 on 3 df, p=0.3762 Warning messages: 1: In fitter(X, Y, strats, offset, init, control, weights = weights, : Loglik converged before variable 1,2,3 ; beta may be infinite. 2: In coxph(survivalObject ~ batchVector) : X matrix deemed to be singular; variable 4{code} No correlation with survival. For some reason I got NAs and an error for the last batch although it is definitely not because of the unused factor levels. h5. DNA methylation 27k arrays, 66 patients. Create M value, don't split between red and green. SVD: !STAD_Mvalue_noNorm_dataDistribution.png|thumbnail! !STAD_Mvalue_noNorm_RelativeVariance.png|thumbnail! !STAD_Mvalue_noNorm_PC1outliers.png|thumbnail! Summary of the technical variables: {code}> summary(methS) batchID amount concentration plate_column plate_row 1129:31 16.9 uL: 1 0.13 ug/uL: 6 1:16 A :10 1156:35 26.7 uL:65 0.14 ug/uL:27 2:13 C : 9 0.15 ug/uL:25 3:13 D : 9 0.16 ug/uL: 7 4:10 F : 9 0.17 ug/uL: 1 5: 9 B : 8 6: 5 E : 8 (Other):13 shortDay 21-7-2010:31 28-7-2010:35{code} So this dataset has only 2 batches. Lets see if they have any correlation with the principal components: {code:collapse=true}> x batchID amount concentration plate_column plate_row shortDay V1 4.999780e-01 0.8132652 0.9636092 0.2126458 0.41035836 4.999780e-01 V2 1.080231e-07 0.1214957 0.8025371 0.2954381 0.91858389 1.080231e-07 V3 6.028215e-01 0.4465735 0.9897603 0.5199681 0.07110241 6.028215e-01 V4 7.947106e-02 0.2818850 0.3579813 0.8230956 0.52338954 7.947106e-02 V5 1.125719e-01 0.9790610 0.5150996 0.3113563 0.29650943 1.125719e-01 V6 5.502164e-01 0.4465735 0.3134523 0.3787485 0.50090145 5.502164e-01 V7 7.922533e-01 0.5117243 0.6591395 0.4459644 0.76917348 7.922533e-01 V8 9.614704e-02 0.1488680 0.2382575 0.3455933 0.94015824 9.614704e-02{code} Looks like the second PC is highly correlated but the batch and also 4th and 8th. The second PC explains 10% of the data variance. I guess remove it. |
Page Comparison
General
Content
Integrations
App links