/
STAD stomach adenocarcinoma

STAD stomach adenocarcinoma

Important update (January 20th, 2011): the data below have been corrected for the BCR batch which is not necessarily the processing batch. The dataset needs to be reanalyzed. 

Batch vs clinical traits

Batch vs center:

> table(batchID,center)
       center
batchID B7 BR CD CG D7 EQ F1
   1129  0 31  0  0  0  0  0
   1156  0 12  0 23  0  0  0
   1601  2  0  7 16  3  1  0
   1801  0  9  0  3 10  0  1
   1883  0 11  0  0  5  0  1

Most significant correlations (complete list can be found here)


Batch vs survival

Call:
coxph(formula = survivalObject ~ batchVector)

  n= 134, number of events= 14

                     coef exp(coef)  se(coef)     z Pr(>|z|)
batchVector1156 1.722e+01 3.012e+07 7.152e+03 0.002    0.998
batchVector1601 1.786e+01 5.728e+07 7.152e+03 0.002    0.998
batchVector1801 1.663e+01 1.665e+07 7.152e+03 0.002    0.998
batchVector1883        NA        NA 0.000e+00    NA       NA

                exp(coef) exp(-coef) lower .95 upper .95
batchVector1156  30117568  3.320e-08         0       Inf
batchVector1601  57279474  1.746e-08         0       Inf
batchVector1801  16645384  6.008e-08         0       Inf
batchVector1883        NA         NA        NA        NA

Rsquare= 0.026   (max possible= 0.496 )
Likelihood ratio test= 3.58  on 3 df,   p=0.3109
Wald test            = 2.13  on 3 df,   p=0.545
Score (logrank) test = 3.1  on 3 df,   p=0.3762
Warning messages:
1: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Loglik converged before variable  1,2,3 ; beta may be infinite.
2: In coxph(survivalObject ~ batchVector) :
  X matrix deemed to be singular; variable 4

No correlation with survival. For some reason I got NAs and an error for the last batch although it is definitely not because of the unused factor levels. 

DNA methylation

27k arrays, 66 patients. Create M value, don't split between red and green. SVD:


Summary of the technical variables:

> summary(methS)
 batchID       amount      concentration plate_column   plate_row
 1129:31   16.9 uL: 1   0.13 ug/uL: 6    1:16         A      :10
 1156:35   26.7 uL:65   0.14 ug/uL:27    2:13         C      : 9
                        0.15 ug/uL:25    3:13         D      : 9
                        0.16 ug/uL: 7    4:10         F      : 9
                        0.17 ug/uL: 1    5: 9         B      : 8
                                         6: 5         E      : 8
                                                      (Other):13
      shortDay
 21-7-2010:31
 28-7-2010:35

So this dataset has only 2 batches. Lets see if they have any correlation with the principal components:

> x
        batchID    amount concentration plate_column  plate_row     shortDay
V1 4.999780e-01 0.8132652     0.9636092    0.2126458 0.41035836 4.999780e-01
V2 1.080231e-07 0.1214957     0.8025371    0.2954381 0.91858389 1.080231e-07
V3 6.028215e-01 0.4465735     0.9897603    0.5199681 0.07110241 6.028215e-01
V4 7.947106e-02 0.2818850     0.3579813    0.8230956 0.52338954 7.947106e-02
V5 1.125719e-01 0.9790610     0.5150996    0.3113563 0.29650943 1.125719e-01
V6 5.502164e-01 0.4465735     0.3134523    0.3787485 0.50090145 5.502164e-01
V7 7.922533e-01 0.5117243     0.6591395    0.4459644 0.76917348 7.922533e-01
V8 9.614704e-02 0.1488680     0.2382575    0.3455933 0.94015824 9.614704e-02

Looks like the second PC is highly correlated but the batch and also 4th and 8th. The second PC explains 10% of the data variance. Remove the batch:

> x
     batchID    amount concentration plate_column plate_row  shortDay
V1 0.9538949 0.7329525     0.9668135    0.1956406 0.3925206 0.9538949
V2 0.6951568 0.1346448     0.7778342    0.6589222 0.1054539 0.6951568
V3 0.8522117 0.1642106     0.3273640    0.7278436 0.7377284 0.8522117
V4 0.9436648 0.8132652     0.2334584    0.4411353 0.9901676 0.9436648
V5 0.9743762 0.8955925     0.3016907    0.8663039 0.2159179 0.9743762
V6 0.9130370 0.4158556     0.3873149    0.5267212 0.4462888 0.9130370
V7 0.4145840 0.5815169     0.3605256    0.4940810 0.6986479 0.4145840
V8 0.9028540 0.1214957     0.5218528    0.3929218 0.5285360 0.9028540

Removing batch took care of all other correlations. I was also wondering about correlation of batch with the clinical traits in this smaller dataset (actual DNA methylation data, not potential). Correlation of batch and histological type: 0.001488 (Chi-square test) and 3.0e-05 (Fisher test); residual tumor: 7.465e-07 (Chi-square test) and 6.536e-09 (Fisher test). There weren't any significant correlation with tumor grade. With tumor stage: 0.04773 (Chi-square), 0.009894 (Fisher test). 

Consider the data to be normalized. 
Expression set object is available.