| Wiki Markup |
|---|
h5. Batch vs clinical traits
Clinical traits: 36, number of batches: 13
Batch vs center:
{code:collapse=true}> table(batchID,two)
two
batchID A3 AK AS B0 B2 B4 B8 BP CJ CW CZ DV EU
0859 31 8 2 0 0 0 0 0 0 0 0 0 0
1186 4 6 0 0 5 0 6 5 9 0 0 0 0
1275 0 12 0 29 1 0 1 0 0 0 0 0 0
1284 0 0 0 0 0 0 0 50 0 0 0 0 0
1303 0 0 0 6 0 0 0 11 24 0 6 0 0
1323 18 7 0 0 4 0 3 5 9 0 0 0 0
1332 0 0 0 6 0 0 0 39 2 0 0 0 0
1418 6 0 0 27 0 0 6 8 0 0 0 0 0
1424 0 0 0 0 0 0 0 28 16 0 3 0 0
1500 0 1 0 15 0 2 1 1 0 0 24 0 0
1536 2 0 0 18 5 0 5 0 13 9 0 9 0
1551 0 0 0 0 0 0 3 0 0 0 0 0 0
1670 0 0 0 6 0 7 4 0 7 6 7 0 4{code}
Significant batch/trait correlations (complete table can be found [here|^BatchClinicalInfoCorrelationsKIRC.txt]):
{csv}KIRC_clinical_traits,DataType,NumberOfNAs,Test,Pvalue
white_cell_count_result,factor,82,Pearson's Chi-squared test,2.09E-13
serum_calcium_result,factor,160,Pearson's Chi-squared test,8.31E-13
tumor_stage,factor,21,Pearson's Chi-squared test,2.11E-11
tumor_grade,factor,5,Pearson's Chi-squared test,6.43E-09
vital_status,factor,0,Pearson's Chi-squared test,9.62E-09
days_to_form_completion,integer,0,Kruskal-Wallis rank sum test,1.16E-07
year_of_initial_pathologic_diagnosis,integer,0,Kruskal-Wallis rank sum test,1.38E-07
days_to_last_known_alive,integer,10,Kruskal-Wallis rank sum test,8.41E-07
days_to_last_followup,integer,4,Kruskal-Wallis rank sum test,1.94E-06
distant_metastasis_pathologic_spread,factor,11,Pearson's Chi-squared test,2.23E-06
primary_tumor_pathologic_spread,factor,0,Pearson's Chi-squared test,3.63E-06
person_neoplasm_cancer_status,factor,28,Pearson's Chi-squared test,4.26E-06
hemoglobin_result,factor,71,Pearson's Chi-squared test,2.66E-04
lymphnode_pathologic_spread,factor,2,Pearson's Chi-squared test,7.85E-04
lymphnodes_examined_prior_presentation,factor,43,Pearson's Chi-squared test,2.05E-03
gender,factor,0,Pearson's Chi-squared test,2.10E-02
age_at_initial_pathologic_diagnosis,integer,0,Kruskal-Wallis rank sum test,2.51E-02
days_to_birth,integer,8,Kruskal-Wallis rank sum test,2.87E-02
prior_diagnosis,factor,0,Pearson's Chi-squared test,4.75E-02{csv}
h5. Survival vs Batch
!KaplanMeierCurveKIRC.png|thumbnail! !SurvivalByBatchKIRC.png|thumbnail!
Summary can be found [here|^SurvivalBatchSummaryStatisticsKIRC.txt], batch is significantly correlated with survival:
Likelihood ratio test= 61.35 on 10 df, p=2.007e-09
Wald test = 64.35 on 10 df, p=5.39e-10
Score (logrank) test = 75.35 on 10 df, p=4.066e-12
h5. DNA methylation data analysis
27k dataset, downloaded on December 28, 2011. 219 samples. Technical variables available: batch, amount, concentration, day of shipment, month of shipment, year of shipment, plate row, plate column. Combine day, month and year in a single variable. Info about technical variables:
{code:collapse=true}> head(methNew)
batchID amount concentration plate_column plate_row dateCombined
2 0859 26.7 uL 0.14 ug/uL 1 A 17-3-2010
32 0859 26.7 uL 0.17 ug/uL 1 C 17-3-2010
59 0859 26.7 uL 0.15 ug/uL 1 D 17-3-2010
84 0859 26.7 uL 0.15 ug/uL 1 E 17-3-2010
> table(methNew$batchID)
0859 1186 1284 1303 1332
40 35 50 47 47
> table(methNew$amount)
26.7 uL
219
> table(methNew$concentration)
0.13 ug/uL 0.14 ug/uL 0.15 ug/uL 0.16 ug/uL 0.17 ug/uL
7 50 122 30 10
> table(methNew$plate_column)
1 2 3 4 5 6 7
39 40 40 40 35 23 2
> table(methNew$plate_row)
A B C D E F G H
30 28 28 27 27 27 27 25
> table(methNew$plate_column,methNew$plate_row)
A B C D E F G H
1 5 4 5 5 5 5 5 5
2 5 5 5 5 5 5 5 5
3 5 5 5 5 5 5 5 5
4 5 5 5 5 5 5 5 5
5 5 5 5 4 4 4 4 4
6 4 3 3 3 3 3 3 1
7 1 1 0 0 0 0 0 0
> table(methNew$dateCombined)
11-10-2010 17-3-2010 25-8-2010 27-9-2010 6-10-2010
47 40 35 50 47
> table(methNew$dateCombined,methNew$batchID)
0859 1186 1284 1303 1332
11-10-2010 0 0 0 0 47
17-3-2010 40 0 0 0 0
25-8-2010 0 35 0 0 0
27-9-2010 0 0 50 0 0
6-10-2010 0 0 0 47 0{code}
Exclude "amount" from calculations for the correlations of the first principal components of the data with the technical variables.
Created a matrix of M values, didn't split read and green. Relative variance, no normalization and the outliers:
!KIRC_Mval_noNorm_RelativeVariance.png|thumbnail! !KIRC_Mval_unnorm_PC1_outliers.png|thumbnail!
Based on the plot will look at the first 8 principal components:
{code:collapse=true}batchID concentration plate_column plate_row dateCombined
V1 2.024556e-22 0.5182919 0.22249235 0.9371285 2.024556e-22
V2 1.777673e-18 0.2878497 0.40175378 0.6195123 1.777673e-18
V3 3.196508e-01 0.3802798 0.27628233 0.5517096 3.196508e-01
V4 1.693859e-30 0.2449447 0.50367703 0.9672545 1.693859e-30
V5 2.435091e-03 0.1812444 0.08644977 0.5581507 2.435091e-03
V6 4.437547e-03 0.9473683 0.15938639 0.8458098 4.437547e-03
V7 1.271181e-03 0.3644802 0.79816984 0.7038321 1.271181e-03
V8 1.051940e-05 0.5905213 0.28713862 0.2173504 1.051940e-05{code}
Batch and dateCombined are highly correlated with the first principal components (V1 - V8 are the principal components after performing an SVD on unnormalized matrix)
Start by removing the batch:. Relative variance and the outliers after removing the batch. |
Content Comparison
General
Content
Integrations
App links