Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

Important

...

update

...

(January

...

20th,

...

2011):

...

the

...

data

...

below

...

have

...

been

...

corrected

...

for

...

the

...

BCR

...

batch

...

which

...

is

...

not

...

necessarily

...

the

...

processing

...

batch.

...

The

...

dataset

...

needs

...

to

...

be

...

reanalyzed.

...

 

Correlation between BCR batch and the processing batch for 27k arrays(January 20, 2012)

Wiki Markup
{csv}Batch on the download page,"# after ""HumanMethylation27k"" in the file name, Level 1",Batch as the sixth field in the patient barcode
Batch 23,1,0689
Batch 31,2,0848
Batch 39,3,0979
Batch 53,4,1096
Batch 60,5,1198
Batch 77,no data,
Batch 101,no data,
Batch 140,no data,
Batch 159,no data,
Batch 181,no data,
Batch 193,no data,{csv}

...

Batch vs clinical traits

Number of batches is 12. Correlation between batch and center:

Code Block
collapsetrue
table(two,batchID)
    batchID
two  0689 0848 0979 1096 1198 1440 1551 1633 1818 1871 1947 2043
  18    0    0   12    2    0    2    0    2    0    0    0    0
  21   13    0    0    0    0    0    0    4    0    0    0    0
  22    9    0    0    0    6    3    0   12    2    0    2    0
  33    0    0    0    0    4    5    0    0    1    0    1    0
  34    0    6    0    0    0    2    0    1    7    1    1    0
  37    0    0    2    6    1    0    0    1    0    0    0    0
  39    0    0    0    0    0   12    0    0    4    0    0    0
  43    0    3    2    0    0    0    2    1    4    0    1    0
  46    0    0    5    0    0    0    0    0    2    0    0    0
  51    0    0    0    3    0    0    0    0    0    0    0    1
  56    1    0    0    0    0    0    0    2    2    0    0    6
  60    0   20    0    0    1    0    0    0    1    2    0    2
  63    0    0    0    0    0    2    0    0    1    0    4    0
  66    0   18   15    0    6    0    0    0    0    0    0    0
  70    0    0    0    0    0    0    0    0    2    0    0    0
  77    0    0    0    0    0    0    0    0    0    0    4   10
  79    0    0    0    0    0    0    0    0    0    0    1    0
  85    0    0    0    0    0    0    0    0    3    0    1    0
  90    0    0    0    0    0    0    0    0    0    0    1    0
  92    0    0    0    0    0    0    0    0    0    0    0    2
  94    0    0    0    0    0    0    0    0    0    0    1    0
  96    0    0    0    0    0    0    0    0    0    0    0    2
  98    0    0    0    0    0    0    0    0    0    0    0    1{code}

Significant

...

batch/clinical

...

traits

...

correlations

...

(complete

...

list

...

can

...

be

...

found

...

here

...

):

...


Wiki Markup
{csv}LUSC,DataType,NumberOfNAs,Test,Pvalue
tumor_stage,factor,27,Pearson's Chi-squared test,8.78E-14
year_of_initial_pathologic_diagnosis,integer,23,Kruskal-Wallis rank sum test,7.95E-12
days_to_form_completion,integer,30,Kruskal-Wallis rank sum test,1.48E-09
primary_tumor_pathologic_spread,factor,23,Pearson's Chi-squared test,1.96E-09
distant_metastasis_pathologic_spread,factor,29,Pearson's Chi-squared test,3.77E-05
days_to_last_followup,integer,42,Kruskal-Wallis rank sum test,7.68E-05
vital_status,factor,23,Pearson's Chi-squared test,2.37E-03
year_of_tobacco_smoking_onset,integer,116,Kruskal-Wallis rank sum test,3.12E-03
year_of_tobacco_smoking_cessation,integer,88,Kruskal-Wallis rank sum test,5.84E-03
days_to_last_known_alive,integer,75,Kruskal-Wallis rank sum test,7.37E-03
residual_tumor,factor,46,Pearson's Chi-squared test,2.00E-02
lymphnode_pathologic_spread,factor,23,Pearson's Chi-squared test,5.48E-02
age_at_initial_pathologic_diagnosis,integer,30,Kruskal-Wallis rank sum test,9.24E-02
days_to_birth,integer,30,Kruskal-Wallis rank sum test,9.73E-02{csv}

...

Batch

...

vs

...

survival

...

Again,

...

for

...

this

...

type

...

of

...

cancer

...

clinical

...

traits

...

file

...

contains

...

days

...

to

...

last

...

known

...

alive

...

but

...

it

...

has

...

more

...

NAs

...

than

...

days

...

to

...

the

...

last

...

follow

...

up

...

so

...

I

...

will

...

use

...

the

...

latter

...

for

...

construction

...

of

...

the

...

survival

...

object. 

Image Added Image Added

Code Block
collapsetrue
Call:
coxph(formula = survivalObject ~ batchVector)

  n= 223, number of events= 92

                    coef exp(coef) se(coef)      z Pr(>|z|)
batchVector0848 -0.14217   0.86748  0.37975 -0.374  0.70813
batchVector0979 -0.23685   0.78911  0.42661 -0.555  0.57877
batchVector1096  1.66699   5.29619  0.60925  2.736  0.00622 **
batchVector1198 -0.13837   0.87077  0.42245 -0.328  0.74325
batchVector1440 -0.25689   0.77345  0.38754 -0.663  0.50741
batchVector1633  0.27021   1.31025  0.37760  0.716  0.47423
batchVector1818 -0.21253   0.80853  0.46395 -0.458  0.64688
batchVector1947 -0.05172   0.94959  0.48598 -0.106  0.91524
batchVector2043  0.06161   1.06355  1.03684  0.059  0.95261
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                exp(coef) exp(-coef) lower .95 upper .95
batchVector0848    0.8675     1.1528    0.4121     1.826
batchVector0979    0.7891     1.2672    0.3420     1.821
batchVector1096    5.2962     0.1888    1.6046    17.481
batchVector1198    0.8708     1.1484    0.3805     1.993
batchVector1440    0.7735     1.2929    0.3619     1.653
batchVector1633    1.3102     0.7632    0.6251     2.746
batchVector1818    0.8085     1.2368    0.3257     2.007
batchVector1947    0.9496     1.0531    0.3663     2.462
batchVector2043    1.0636     0.9402    0.1394     8.116

Rsquare= 0.041   (max possible= 0.973 )
Likelihood ratio test= 9.36  on 9 df,   p=0.4044
Wald test            = 12.53  on 9 df,   p=0.1849
Score (logrank) test = 15.53  on 9 df,   p=0.07747
{code}

On

...

overall,

...

correlation

...

of

...

batch

...

with

...

survival

...

is

...

not

...

significant.

...

There

...

is

...

one

...

batch

...

(1096)

...

that

...

seems

...

to

...

be

...

somewhat

...

more

...

involved

...

and

...

it

...

has

...

only

...

11

...

patients.

...

When

...

I

...

removed

...

all

...

patients

...

from

...

that

...

batch

...

no

...

other

...

batches

...

showed

...

completely

...

insignificant

...

correlation

...

with

...

survival.

...

 

DNA methylation

27k (normal Level 1 format: methylated intensities are in the first column, unmethylated intensities are in the forth column), convert to M value, didn't slit into the red and green. 134 patients, matched to the technical clinical information 133 patients. Work with them. SVD:

Image Added Image Added Image Added

After fixing the inconsistencies in the concentration column (I had 0.13 ug/uL and .13 ug/uL and like for all other concentrations), converting concentration and plate_column to factors here is the summary of the technical variables:

Code Block
collapsetrue
> summary(tech)
 batchID       amount      concentration plate_column   plate_row
 0689:22   13.3 uL:97   0.13 ug/uL:11    1:37         A      :19
 0848:46   26.7 uL:36   0.14 ug/uL:35    2:35         B      :19
 0979:36                0.15 ug/uL:59    3:26         C      :18
 1096:11                0.16 ug/uL:23    4:11         D      :16
 1198:18                0.17 ug/uL: 5    5:16         G      :16
                                         6: 8         E      :15
                                                      (Other):30
       shortDay
 10-3-2010 :46
 14-7-2010 :11
 18-11-2009:22
 30-8-2010 :18
 5-5-2010  :36
{code}

Correlation

...

of

...

the

...

first

...

6

...

PCs

...

with

...

the

...

tech

...

variables:

...

:=}
Code Block
collapse
true
> x
        batchID       amount concentration plate_column plate_row     shortDay
V1 1.219915e-14 1.512302e-15    0.22695068  0.004764525 0.5279767 1.219915e-14
V2 1.906561e-01 4.091184e-01    0.96296059  0.292069252 0.1656324 1.906561e-01
V3 7.464626e-02 6.984062e-02    0.22476713  0.467718779 0.5705236 7.464626e-02
V4 2.670836e-01 7.040891e-01    0.78009461  0.638345228 0.2242532 2.670836e-01
V5 5.721394e-01 7.689742e-01    0.42172107  0.348612502 0.6017037 5.721394e-01
V6 2.122370e-13 9.075951e-02    0.07250977  0.015916359 0.7132870 2.122370e-13
{code}

Start by removing the batch:
!LUSC_Mvalue_batchRemoved_dataDistrib.png|thumbnail! !LUSC_Mval_batchedRem_RelativeVariance.png|thumbnail! !LUSC_Mval_batchRemoved_PC1outliers.png|thumbnail!

This looks strange. I never looked at the data distribution after normalization. Is it ok?

Correlation with the technical variables:
{code:collapse=true}13

Start by removing the batch:
Image Added Image Added Image Added

This looks strange. I never looked at the data distribution after normalization. Is it ok?

Correlation with the technical variables:

Code Block
collapsetrue
> x
     batchID    amount concentration plate_column  plate_row  shortDay
V1 0.9882080 0.9838389     0.9665557   0.45086339 0.08627734 0.9882080
V2 0.9975700 0.8632969     0.7730067   0.60414536 0.57298681 0.9975700
V3 0.9998216 0.9394499     0.3675916   0.02054405 0.88934477 0.9998216
V4 0.9994017 0.9313953     0.8917591   0.77577312 0.15250595 0.9994017
V5 0.9993497 0.9959595     0.2796044   0.45763689 0.62038204 0.9993497
V6 0.9766655 0.6485570     0.8141023   0.62257929 0.25956284 0.9766655
{code}

Removing

...

batch

...

fixed

...

correlations

...

with

...

other

...

technical

...

variables.

...

Consider

...

data

...

to

...

be

...

normalized?

...


ExpressionSet

...

object

...

is

...

available.