...
Important
...
update
...
(January
...
20th,
...
2011):
...
the
...
data
...
below
...
have
...
been
...
corrected
...
for
...
the
...
BCR
...
batch
...
which
...
is
...
not
...
necessarily
...
the
...
processing
...
batch.
...
The
...
dataset
...
needs
...
to
...
be
...
reanalyzed.
...
Batch vs clinical traits
Batch vs center:
Code Block |
---|
> table(batchID,center)
center
batchID B7 BR CD CG D7 EQ F1
1129 0 31 0 0 0 0 0
1156 0 12 0 23 0 0 0
1601 2 0 7 16 3 1 0
1801 0 9 0 3 10 0 1
1883 0 11 0 0 5 0 1{code}
|
Most
...
significant
...
correlations
...
(complete
...
list
...
can
...
be
...
found
...
...
)
Wiki Markup |
---|
{csv}STAD,DataType,NumberOfNAs,Test,Pvalue
residual_tumor,factor,8,Pearson's Chi-squared test,8.06E-17
year_of_initial_pathologic_diagnosis,integer,1,Kruskal-Wallis rank sum test,2.88E-13
days_to_form_completion,integer,1,Kruskal-Wallis rank sum test,3.72E-11
days_to_last_followup,integer,1,Kruskal-Wallis rank sum test,5.15E-11
primary_tumor_pathologic_spread,factor,1,Pearson's Chi-squared test,1.68E-06
histological_type,factor,6,Pearson's Chi-squared test,5.82E-06
lymphnode_pathologic_spread,factor,1,Pearson's Chi-squared test,6.09E-05
number_of_lymphnodes_examined,integer,53,Kruskal-Wallis rank sum test,1.25E-04
vital_status,factor,1,Pearson's Chi-squared test,2.95E-03
tumor_stage,factor,31,Pearson's Chi-squared test,3.49E-02{csv} |
...
Batch vs survival
Code Block | ||
---|---|---|
| ||
Call:
coxph(formula = survivalObject ~ batchVector)
n= 134, number of events= 14
coef exp(coef) se(coef) z Pr(>|z|)
batchVector1156 1.722e+01 3.012e+07 7.152e+03 0.002 0.998
batchVector1601 1.786e+01 5.728e+07 7.152e+03 0.002 0.998
batchVector1801 1.663e+01 1.665e+07 7.152e+03 0.002 0.998
batchVector1883 NA NA 0.000e+00 NA NA
exp(coef) exp(-coef) lower .95 upper .95
batchVector1156 30117568 3.320e-08 0 Inf
batchVector1601 57279474 1.746e-08 0 Inf
batchVector1801 16645384 6.008e-08 0 Inf
batchVector1883 NA NA NA NA
Rsquare= 0.026 (max possible= 0.496 )
Likelihood ratio test= 3.58 on 3 df, p=0.3109
Wald test = 2.13 on 3 df, p=0.545
Score (logrank) test = 3.1 on 3 df, p=0.3762
Warning messages:
1: In fitter(X, Y, strats, offset, init, control, weights = weights, :
Loglik converged before variable 1,2,3 ; beta may be infinite.
2: In coxph(survivalObject ~ batchVector) :
X matrix deemed to be singular; variable 4{code}
|
No
...
correlation
...
with
...
survival.
...
For
...
some
...
reason
...
I
...
got
...
NAs
...
and
...
an
...
error
...
for
...
the
...
last
...
batch
...
although
...
it
...
is
...
definitely
...
not
...
because
...
of
...
the
...
unused
...
factor
...
levels.
...
DNA methylation
27k arrays, 66 patients. Create M value, don't split between red and green. SVD:
Summary of the technical variables:
Code Block |
---|
> summary(methS)
batchID amount concentration plate_column plate_row
1129:31 16.9 uL: 1 0.13 ug/uL: 6 1:16 A :10
1156:35 26.7 uL:65 0.14 ug/uL:27 2:13 C : 9
0.15 ug/uL:25 3:13 D : 9
0.16 ug/uL: 7 4:10 F : 9
0.17 ug/uL: 1 5: 9 B : 8
6: 5 E : 8
(Other):13
shortDay
21-7-2010:31
28-7-2010:35{code}
|
So
...
this
...
dataset
...
has
...
only
...
2
...
batches.
...
Lets
...
see
...
if
...
they
...
have
...
any
...
correlation
...
with
...
the
...
principal
...
components:
...
Code Block | ||||||
---|---|---|---|---|---|---|
| =
| }|||||
> x
batchID amount concentration plate_column plate_row shortDay
V1 4.999780e-01 0.8132652 0.9636092 0.2126458 0.41035836 4.999780e-01
V2 1.080231e-07 0.1214957 0.8025371 0.2954381 0.91858389 1.080231e-07
V3 6.028215e-01 0.4465735 0.9897603 0.5199681 0.07110241 6.028215e-01
V4 7.947106e-02 0.2818850 0.3579813 0.8230956 0.52338954 7.947106e-02
V5 1.125719e-01 0.9790610 0.5150996 0.3113563 0.29650943 1.125719e-01
V6 5.502164e-01 0.4465735 0.3134523 0.3787485 0.50090145 5.502164e-01
V7 7.922533e-01 0.5117243 0.6591395 0.4459644 0.76917348 7.922533e-01
V8 9.614704e-02 0.1488680 0.2382575 0.3455933 0.94015824 9.614704e-02{code}
|
Looks
...
like
...
the
...
second
...
PC
...
is
...
highly
...
correlated
...
but
...
the
...
batch
...
and
...
also
...
4th
...
and
...
8th.
...
The
...
second
...
PC
...
explains
...
10%
...
of
...
the
...
data
...
variance.
...
Remove
...
the
...
batch:
Code Block | ||
---|---|---|
| ||
> x
batchID amount concentration plate_column plate_row shortDay
V1 0.9538949 0.7329525 0.9668135 0.1956406 0.3925206 0.9538949
V2 0.6951568 0.1346448 0.7778342 0.6589222 0.1054539 0.6951568
V3 0.8522117 0.1642106 0.3273640 0.7278436 0.7377284 0.8522117
V4 0.9436648 0.8132652 0.2334584 0.4411353 0.9901676 0.9436648
V5 0.9743762 0.8955925 0.3016907 0.8663039 0.2159179 0.9743762
V6 0.9130370 0.4158556 0.3873149 0.5267212 0.4462888 0.9130370
V7 0.4145840 0.5815169 0.3605256 0.4940810 0.6986479 0.4145840
V8 0.9028540 0.1214957 0.5218528 0.3929218 0.5285360 0.9028540{code}
|
Removing
...
batch
...
took
...
care
...
of
...
all
...
other
...
correlations.
...
I
...
was
...
also
...
wondering
...
about
...
correlation
...
of
...
batch
...
with
...
the
...
clinical
...
traits
...
in
...
this
...
smaller
...
dataset
...
(actual
...
DNA
...
methylation
...
data,
...
not
...
potential).
...
Correlation
...
of
...
batch
...
and
...
histological
...
type:
...
0.001488
...
(Chi-square
...
test)
...
and
...
3.0e-05
...
(Fisher
...
test);
...
residual
...
tumor:
...
7.465e-
...
07 (Chi-square
...
test)
...
and 6.536e-09
...
(Fisher
...
test).
...
There
...
weren't
...
any
...
significant
...
correlation
...
with
...
tumor
...
grade.
...
With
...
tumor
...
stage:
...
0.04773
...
(Chi-square),
...
0.009894
...
(Fisher
...
test).
...
Consider the data to be normalized.
Expression set object is available.