STAD stomach adenocarcinoma
Important update (January 20th, 2011): the data below have been corrected for the BCR batch which is not necessarily the processing batch. The dataset needs to be reanalyzed.
Batch vs clinical traits
Batch vs center:
> table(batchID,center)
center
batchID B7 BR CD CG D7 EQ F1
1129 0 31 0 0 0 0 0
1156 0 12 0 23 0 0 0
1601 2 0 7 16 3 1 0
1801 0 9 0 3 10 0 1
1883 0 11 0 0 5 0 1
Most significant correlations (complete list can be found here)
Batch vs survival
No correlation with survival. For some reason I got NAs and an error for the last batch although it is definitely not because of the unused factor levels.
DNA methylation
27k arrays, 66 patients. Create M value, don't split between red and green. SVD:
Summary of the technical variables:
> summary(methS)
batchID amount concentration plate_column plate_row
1129:31 16.9 uL: 1 0.13 ug/uL: 6 1:16 A :10
1156:35 26.7 uL:65 0.14 ug/uL:27 2:13 C : 9
0.15 ug/uL:25 3:13 D : 9
0.16 ug/uL: 7 4:10 F : 9
0.17 ug/uL: 1 5: 9 B : 8
6: 5 E : 8
(Other):13
shortDay
21-7-2010:31
28-7-2010:35
So this dataset has only 2 batches. Lets see if they have any correlation with the principal components:
Looks like the second PC is highly correlated but the batch and also 4th and 8th. The second PC explains 10% of the data variance. Remove the batch:
Removing batch took care of all other correlations. I was also wondering about correlation of batch with the clinical traits in this smaller dataset (actual DNA methylation data, not potential). Correlation of batch and histological type: 0.001488 (Chi-square test) and 3.0e-05 (Fisher test); residual tumor: 7.465e-07 (Chi-square test) and 6.536e-09 (Fisher test). There weren't any significant correlation with tumor grade. With tumor stage: 0.04773 (Chi-square), 0.009894 (Fisher test).
Consider the data to be normalized.
Expression set object is available.