Relationship between various types of batches
For DNA methylation I considered 3 types of batches: TCGA Archive name (processing batch from .sdrf file), slide and array. I also considered BCR batch because it is often highly correlated with the clinical variables. It may also correlate with one of the technical variables if the samples weren't properly randomized. This may complicated data analyses. Here I show the relationship between technical batches and BCR batches for DNA methylation data:
Slide vs array:
> table(sdrfUnique$IlluminaArray,sdrfUnique$IlluminaSlot)
R01C01 R01C02 R02C01 R02C02 R03C01 R03C02 R04C01 R04C02 R05C01 R05C02 R06C01 R06C02
5775041007 1 1 1 0 1 0 1 0 1 0 1 0
5775041065 1 1 1 1 1 1 1 1 1 1 1 1
5775041068 1 1 1 1 1 1 1 0 1 0 1 0
5775041084 1 1 1 1 1 1 1 1 1 1 1 1
5775041088 1 1 1 1 1 1 1 1 1 1 1 1
6042308158 1 1 1 1 1 1 1 1 1 1 1 1
6042308159 1 1 1 1 1 1 1 1 1 1 1 1
6042308165 1 1 1 1 1 1 1 1 1 1 1 1
6042316001 1 1 1 1 1 1 1 1 1 1 1 1
6042316008 1 1 1 1 1 1 1 1 1 0 1 0
6042316010 0 0 0 0 0 0 0 1 0 1 0 1
6042316011 1 0 1 0 1 0 1 0 1 0 1 0
6042316015 1 1 1 1 1 1 1 1 1 1 1 1
6057825002 1 1 1 1 1 1 1 0 1 0 1 0
6057825020 1 1 1 1 1 1 1 1 1 1 1 1
6057825028 1 1 1 1 1 1 1 1 1 1 1 1
6057825035 0 0 0 0 0 0 0 1 0 1 0 1
6264496026 0 0 0 0 0 0 0 0 0 0 0 1
6264509018 1 1 1 1 1 1 1 1 1 1 1 1
6264509087 1 1 1 1 1 1 1 1 1 1 1 1
6264509127 1 0 1 0 1 0 1 0 1 0 1 0
6929718053 1 1 1 1 1 1 1 1 1 1 1 1
6929718054 1 1 1 1 1 1 1 1 1 1 1 1
6929718065 1 1 1 1 1 1 1 1 1 1 1 1
6929718079 1 1 1 1 1 1 1 1 1 1 1 0
6929718086 0 1 0 1 0 0 0 0 0 0 0 0We can see that not every single array from a slide was provided. It is possible that some arrays didn't pass quality control. 12 arrays per slide
Processing batch vs slide:
> table(sdrfUnique$ArchiveName,sdrfUnique$SamplePlate)
5775041007 5775041065 5775041068 5775041084 5775041088 6042308158 6042308159 6042308165 6042316001 6042316008
1.1.0 7 12 9 12 12 0 0 0 0 0
2.1.0 0 0 0 0 0 12 12 12 12 0
3.1.0 0 0 0 0 0 0 0 0 0 0
4.1.0 0 0 0 0 0 0 0 0 0 10
5.1.0 0 0 0 0 0 0 0 0 0 0
6.1.0 0 0 0 0 0 0 0 0 0 0
7.1.0 0 0 0 0 0 0 0 0 0 0
6042316010 6042316011 6042316015 6057825002 6057825020 6057825028 6057825035 6264496026 6264509018 6264509087
1.1.0 0 0 0 0 0 0 0 0 0 0
2.1.0 0 6 12 0 0 0 0 0 0 0
3.1.0 0 0 0 9 12 12 3 0 0 0
4.1.0 3 0 0 0 0 0 0 0 0 0
5.1.0 0 0 0 0 0 0 0 0 0 0
6.1.0 0 0 0 0 0 0 0 1 12 12
7.1.0 0 0 0 0 0 0 0 0 0 0
6264509127 6929718053 6929718054 6929718065 6929718079 6929718086
1.1.0 0 0 0 0 0 0
2.1.0 0 0 0 0 0 0
3.1.0 0 0 0 0 0 0
4.1.0 0 0 0 0 0 0
5.1.0 0 12 12 12 11 0
6.1.0 6 0 0 0 0 0
7.1.0 0 0 0 0 0 2 Looks like they ran several slides on one day (or per batch). I expect to have within slide variation as well as within batch and between batch variations.
Processing batch vs BCR barcode:
> table(sdrfUnique$ArchiveName,sdrfUnique$BCR)
1407 1551 1721 1772 1837 1926 A153 A17Z
1.1.0 43 9 0 0 0 0 0 0
2.1.0 0 0 66 0 0 0 0 0
3.1.0 0 0 0 36 0 0 0 0
4.1.0 0 0 0 0 0 0 13 0
5.1.0 0 0 0 0 47 0 0 0
6.1.0 0 0 0 0 0 31 0 0
7.1.0 0 0 0 0 0 0 0 2 No randomization relative to BCR barcodes
Slide vs BCR barcode
> table(sdrfUnique$SamplePlate,sdrfUnique$BCR)
1407 1551 1721 1772 1837 1926 A153 A17Z
5775041007 7 0 0 0 0 0 0 0
5775041065 12 0 0 0 0 0 0 0
5775041068 0 9 0 0 0 0 0 0
5775041084 12 0 0 0 0 0 0 0
5775041088 12 0 0 0 0 0 0 0
6042308158 0 0 12 0 0 0 0 0
6042308159 0 0 12 0 0 0 0 0
6042308165 0 0 12 0 0 0 0 0
6042316001 0 0 12 0 0 0 0 0
6042316008 0 0 0 0 0 0 10 0
6042316010 0 0 0 0 0 0 3 0
6042316011 0 0 6 0 0 0 0 0
6042316015 0 0 12 0 0 0 0 0
6057825002 0 0 0 9 0 0 0 0
6057825020 0 0 0 12 0 0 0 0
6057825028 0 0 0 12 0 0 0 0
6057825035 0 0 0 3 0 0 0 0
6264496026 0 0 0 0 0 1 0 0
6264509018 0 0 0 0 0 12 0 0
6264509087 0 0 0 0 0 12 0 0
6264509127 0 0 0 0 0 6 0 0
6929718053 0 0 0 0 12 0 0 0
6929718054 0 0 0 0 12 0 0 0
6929718065 0 0 0 0 12 0 0 0
6929718079 0 0 0 0 11 0 0 0
6929718086 0 0 0 0 0 0 0 2Array vs BCR barcode:
> table(sdrfUnique$SampleWell,sdrfUnique$BCR)
1407 1551 1721 1772 1837 1926 A153 A17Z
R01C01 4 1 6 3 4 3 1 0
R01C02 4 1 5 3 4 2 1 1
R02C01 4 1 6 3 4 3 1 0
R02C02 3 1 5 3 4 2 1 1
R03C01 4 1 6 3 4 3 1 0
R03C02 3 1 5 3 4 2 1 0
R04C01 4 1 6 3 4 3 1 0
R04C02 3 0 5 3 4 2 2 0
R05C01 4 1 6 3 4 3 1 0
R05C02 3 0 5 3 4 2 1 0
R06C01 4 1 6 3 4 3 1 0
R06C02 3 0 5 3 3 3 1 0