Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
h5. Batch vs clinical traits

Batch vs center:
{code}> table(batchID,center)
       center
batchID B7 BR CD CG D7 EQ F1
   1129  0 31  0  0  0  0  0
   1156  0 12  0 23  0  0  0
   1601  2  0  7 16  3  1  0
   1801  0  9  0  3 10  0  1
   1883  0 11  0  0  5  0  1{code}
Most significant correlations (complete list can be found [here|^BatchClinicalInfoCorrelationsSTAD.txt])
{csv}STAD,DataType,NumberOfNAs,Test,Pvalue
residual_tumor,factor,8,Pearson's Chi-squared test,8.06E-17
year_of_initial_pathologic_diagnosis,integer,1,Kruskal-Wallis rank sum test,2.88E-13
days_to_form_completion,integer,1,Kruskal-Wallis rank sum test,3.72E-11
days_to_last_followup,integer,1,Kruskal-Wallis rank sum test,5.15E-11
primary_tumor_pathologic_spread,factor,1,Pearson's Chi-squared test,1.68E-06
histological_type,factor,6,Pearson's Chi-squared test,5.82E-06
lymphnode_pathologic_spread,factor,1,Pearson's Chi-squared test,6.09E-05
number_of_lymphnodes_examined,integer,53,Kruskal-Wallis rank sum test,1.25E-04
vital_status,factor,1,Pearson's Chi-squared test,2.95E-03
tumor_stage,factor,31,Pearson's Chi-squared test,3.49E-02{csv}
\\

h5. Batch vs survival

!KaplanMeierCurveSTAD.png|thumbnail! !SurvivalByBatchSTAD.png|thumbnail!
{code:collapse=true}Call:
coxph(formula = survivalObject ~ batchVector)

  n= 134, number of events= 14

                     coef exp(coef)  se(coef)     z Pr(>|z|)
batchVector1156 1.722e+01 3.012e+07 7.152e+03 0.002    0.998
batchVector1601 1.786e+01 5.728e+07 7.152e+03 0.002    0.998
batchVector1801 1.663e+01 1.665e+07 7.152e+03 0.002    0.998
batchVector1883        NA        NA 0.000e+00    NA       NA

                exp(coef) exp(-coef) lower .95 upper .95
batchVector1156  30117568  3.320e-08         0       Inf
batchVector1601  57279474  1.746e-08         0       Inf
batchVector1801  16645384  6.008e-08         0       Inf
batchVector1883        NA         NA        NA        NA

Rsquare= 0.026   (max possible= 0.496 )
Likelihood ratio test= 3.58  on 3 df,   p=0.3109
Wald test            = 2.13  on 3 df,   p=0.545
Score (logrank) test = 3.1  on 3 df,   p=0.3762
Warning messages:
1: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Loglik converged before variable  1,2,3 ; beta may be infinite.
2: In coxph(survivalObject ~ batchVector) :
  X matrix deemed to be singular; variable 4{code}
No correlation with survival. For some reason I got NAs and an error for the last batch although it is definitely not because of the unused factor levels. 

h5. DNA methylation

27k arrays, 66 patients. Create M value, don't split between red and green. SVD: 

!STAD_Mvalue_noNorm_dataDistribution.png|thumbnail! !STAD_Mvalue_noNorm_RelativeVariance.png|thumbnail! !STAD_Mvalue_noNorm_PC1outliers.png|thumbnail!
Summary of the technical variables:
{code}> summary(methS)
 batchID       amount      concentration plate_column   plate_row
 1129:31   16.9 uL: 1   0.13 ug/uL: 6    1:16         A      :10
 1156:35   26.7 uL:65   0.14 ug/uL:27    2:13         C      : 9
                        0.15 ug/uL:25    3:13         D      : 9
                        0.16 ug/uL: 7    4:10         F      : 9
                        0.17 ug/uL: 1    5: 9         B      : 8
                                         6: 5         E      : 8
                                                      (Other):13
      shortDay
 21-7-2010:31
 28-7-2010:35{code}
So this dataset has only 2 batches. Lets see if they have any correlation with the principal components:
{code:collapse=true}> x
        batchID    amount concentration plate_column  plate_row     shortDay
V1 4.999780e-01 0.8132652     0.9636092    0.2126458 0.41035836 4.999780e-01
V2 1.080231e-07 0.1214957     0.8025371    0.2954381 0.91858389 1.080231e-07
V3 6.028215e-01 0.4465735     0.9897603    0.5199681 0.07110241 6.028215e-01
V4 7.947106e-02 0.2818850     0.3579813    0.8230956 0.52338954 7.947106e-02
V5 1.125719e-01 0.9790610     0.5150996    0.3113563 0.29650943 1.125719e-01
V6 5.502164e-01 0.4465735     0.3134523    0.3787485 0.50090145 5.502164e-01
V7 7.922533e-01 0.5117243     0.6591395    0.4459644 0.76917348 7.922533e-01
V8 9.614704e-02 0.1488680     0.2382575    0.3455933 0.94015824 9.614704e-02{code}
Looks like the second PC is highly correlated but the batch and also 4th and 8th. The second PC explains 10% of the data variance. I guess remove it.