Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

Important

...

update

...

(January

...

20th,

...

2011):

...

the

...

data

...

below

...

have

...

been

...

corrected

...

for

...

the

...

BCR

...

batch

...

which

...

is

...

not

...

necessarily

...

the

...

processing

...

batch.

...

The

...

dataset

...

needs

...

to

...

be

...

reanalyzed.

...

 

Correlation between BCR batch and the processing batch for 27k arrays (January 20, 2012):

Wiki Markup
{csv}Batch on the download page,"# after ""HumanMethylation27k"" in the file name, Level 2 data",Batch as the sixth field in the patient barcode,Comments
Batch 32,1,0859,"Level 1 data is uploaded again as .idat files split into green and red probes, I can't figure out how to get batch from the file names. Now, however, they provide slide number and the array letter!"
Batch 50,2,1186,
Batch 63,no data,,
Batch 64,3,"1287, 1284",
Batch 65,4,1303,
Batch 68,no data,,
Batch 69,5,1332,
Batch 70,no data,,
Batch 82,no data,,
Batch 90,no data,,
Batch 105,no data,,{csv}

...

Batch vs clinical traits

Clinical traits: 36, number of batches: 13

Batch vs center:

Code Block
collapsetrue
> table(batchID,two)
       two
batchID A3 AK AS B0 B2 B4 B8 BP CJ CW CZ DV EU
   0859 31  8  2  0  0  0  0  0  0  0  0  0  0
   1186  4  6  0  0  5  0  6  5  9  0  0  0  0
   1275  0 12  0 29  1  0  1  0  0  0  0  0  0
   1284  0  0  0  0  0  0  0 50  0  0  0  0  0
   1303  0  0  0  6  0  0  0 11 24  0  6  0  0
   1323 18  7  0  0  4  0  3  5  9  0  0  0  0
   1332  0  0  0  6  0  0  0 39  2  0  0  0  0
   1418  6  0  0 27  0  0  6  8  0  0  0  0  0
   1424  0  0  0  0  0  0  0 28 16  0  3  0  0
   1500  0  1  0 15  0  2  1  1  0  0 24  0  0
   1536  2  0  0 18  5  0  5  0 13  9  0  9  0
   1551  0  0  0  0  0  0  3  0  0  0  0  0  0
   1670  0  0  0  6  0  7  4  0  7  6  7  0  4{code}

Significant

...

batch/trait

...

correlations

...

(complete

...

table

...

can

...

be

...

found

...

here

...

):

...


Wiki Markup
{csv}KIRC_clinical_traits,DataType,NumberOfNAs,Test,Pvalue
white_cell_count_result,factor,82,Pearson's Chi-squared test,2.09E-13
serum_calcium_result,factor,160,Pearson's Chi-squared test,8.31E-13
tumor_stage,factor,21,Pearson's Chi-squared test,2.11E-11
tumor_grade,factor,5,Pearson's Chi-squared test,6.43E-09
vital_status,factor,0,Pearson's Chi-squared test,9.62E-09
days_to_form_completion,integer,0,Kruskal-Wallis rank sum test,1.16E-07
year_of_initial_pathologic_diagnosis,integer,0,Kruskal-Wallis rank sum test,1.38E-07
days_to_last_known_alive,integer,10,Kruskal-Wallis rank sum test,8.41E-07
days_to_last_followup,integer,4,Kruskal-Wallis rank sum test,1.94E-06
distant_metastasis_pathologic_spread,factor,11,Pearson's Chi-squared test,2.23E-06
primary_tumor_pathologic_spread,factor,0,Pearson's Chi-squared test,3.63E-06
person_neoplasm_cancer_status,factor,28,Pearson's Chi-squared test,4.26E-06
hemoglobin_result,factor,71,Pearson's Chi-squared test,2.66E-04
lymphnode_pathologic_spread,factor,2,Pearson's Chi-squared test,7.85E-04
lymphnodes_examined_prior_presentation,factor,43,Pearson's Chi-squared test,2.05E-03
gender,factor,0,Pearson's Chi-squared test,2.10E-02
age_at_initial_pathologic_diagnosis,integer,0,Kruskal-Wallis rank sum test,2.51E-02
days_to_birth,integer,8,Kruskal-Wallis rank sum test,2.87E-02
prior_diagnosis,factor,0,Pearson's Chi-squared test,4.75E-02{csv}

...

Survival

...

vs

...

Batch

Image Added Image Added
Summary can be found here, batch is significantly correlated with survival:
Likelihood ratio test= 61.35 on 10 df, p=2.007e-09

...


Wald

...

test

...

=

...

64.35

...

on

...

10

...

df,

...

p=5.39e-10

...


Score

...

(logrank)

...

test

...

=

...

75.35

...

on

...

10

...

df,

...

p=4.066e-12

...

DNA

...

methylation

...

data

...

analysis

...

27k

...

dataset,

...

downloaded

...

on

...

December

...

28,

...

2011.

...

219

...

samples.

...

Note:

...

TCGA

...

is

...

terrible

...

about

...

their

...

standards.

...

I

...

am

...

extracting

...

values

...

for

...

methylated

...

and

...

unmethylated

...

probes

...

from

...

the

...

files

...

for

...

each

...

patient.

...

For

...

this

...

dataset

...

it

...

is

...

1st

...

and

...

4th

...

columns.

...

However,

...

for

...

GBM

...

it

...

is

...

1st

...

and

...

2nd

...

columns

...

!

...

Unreliable.

...

It

...

seems

...

that

...

the

...

data

...

for

...

GBM

...

was

...

processed

...

differently

...

because

...

standard

...

deviation

...

and

...

the

...

number

...

of

...

beads

...

are

...

missing

...

for

...

GBM.

...

However

...

I

...

noticed

...

that

...

they actually provide negative controls intensity for the green and red dyes.

Technical variables available: batch, amount, concentration, day of shipment, month of shipment, year of shipment, plate row, plate column. Combine day, month and year in a single variable. Info about technical variables:

Code Block
collapsetrue
> head(methNew)
    batchID  amount concentration plate_column plate_row dateCombined
2      0859 26.7 uL    0.14 ug/uL            1         A    17-3-2010
32     0859 26.7 uL    0.17 ug/uL            1         C    17-3-2010
59     0859 26.7 uL    0.15 ug/uL            1         D    17-3-2010
84     0859 26.7 uL    0.15 ug/uL            1         E    17-3-2010
> table(methNew$batchID)

0859 1186 1284 1303 1332
  40   35   50   47   47
> table(methNew$amount)

26.7 uL
    219
> table(methNew$concentration)

0.13 ug/uL 0.14 ug/uL 0.15 ug/uL 0.16 ug/uL 0.17 ug/uL
         7         50        122         30         10
> table(methNew$plate_column)

 1  2  3  4  5  6  7
39 40 40 40 35 23  2
> table(methNew$plate_row)

 A  B  C  D  E  F  G  H
30 28 28 27 27 27 27 25
> table(methNew$plate_column,methNew$plate_row)

    A B C D E F G H
  1 5 4 5 5 5 5 5 5
  2 5 5 5 5 5 5 5 5
  3 5 5 5 5 5 5 5 5
  4 5 5 5 5 5 5 5 5
  5 5 5 5 4 4 4 4 4
  6 4 3 3 3 3 3 3 1
  7 1 1 0 0 0 0 0 0
> table(methNew$dateCombined)

11-10-2010  17-3-2010  25-8-2010  27-9-2010  6-10-2010
        47         40         35         50         47
> table(methNew$dateCombined,methNew$batchID)

             0859 1186 1284 1303 1332
  11-10-2010    0    0    0    0   47
  17-3-2010    40    0    0    0    0
  25-8-2010     0   35    0    0    0
  27-9-2010     0    0   50    0    0
  6-10-2010     0    0    0   47    0{code}

Exclude

...

"amount"

...

from

...

calculations

...

for

...

the

...

correlations

...

of

...

the

...

first

...

principal

...

components

...

of

...

the

...

data

...

with

...

the

...

technical

...

variables.

...

 

Created a matrix of M values, didn't

...

split

...

read

...

and

...

green.

...

Relative

...

variance,

...

no

...

normalization

...

and

...

the

...

outliers:

Image Added Image Added

Based on the plot will look at the first 8 principal components:

Code Block
collapsetrue
batchID concentration plate_column plate_row dateCombined
V1 2.024556e-22     0.5182919   0.22249235 0.9371285 2.024556e-22
V2 1.777673e-18     0.2878497   0.40175378 0.6195123 1.777673e-18
V3 3.196508e-01     0.3802798   0.27628233 0.5517096 3.196508e-01
V4 1.693859e-30     0.2449447   0.50367703 0.9672545 1.693859e-30
V5 2.435091e-03     0.1812444   0.08644977 0.5581507 2.435091e-03
V6 4.437547e-03     0.9473683   0.15938639 0.8458098 4.437547e-03
V7 1.271181e-03     0.3644802   0.79816984 0.7038321 1.271181e-03
V8 1.051940e-05     0.5905213   0.28713862 0.2173504 1.051940e-05{code}

Batch

...

and

...

dateCombined

...

are

...

highly

...

correlated

...

with

...

the

...

first

...

principal

...

components

...

(V1

...

-

...

V8

...

are

...

the

...

principal

...

components

...

after

...

performing

...

an

...

SVD

...

on

...

unnormalized

...

matrix)

...

Start

...

by

...

removing

...

the batch. Relative variance and the outliers after removing the batch.

Image Added Image Added
Yikes.
Correlation with the first principal components:

Code Block
collapsetrue
batchID concentration plate_column plate_row dateCombined
V1 0.9717423     0.8262431   0.18591881 0.8304766    0.9717423
V2 0.9976239     0.4612353   0.34203646 0.3816463    0.9976239
V3 0.9578584     0.9056604   0.12948457 0.1792408    0.9578584
V4 0.9043202     0.4152433   0.02150515 0.6264030    0.9043202
V5 0.9991262     0.8505841   0.19052765 0.6834312    0.9991262
V6 0.8956311     0.1123490   0.55257726 0.7618414    0.8956311
V7 0.9991696     0.7699433   0.84761783 0.2805982    0.9991696
V8 0.9939025     0.6395495   0.44489016 0.6334089    0.9939025{code}

Consider

...

the

...

data

...

to

...

be

...

normalized.

...

 
eSet object is available.