Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
h5. Batch vs clinical traits

Clinical traits: 36, number of batches: 13

Batch vs center:
{code:collapse=true}> table(batchID,two)
       two
batchID A3 AK AS B0 B2 B4 B8 BP CJ CW CZ DV EU
   0859 31  8  2  0  0  0  0  0  0  0  0  0  0
   1186  4  6  0  0  5  0  6  5  9  0  0  0  0
   1275  0 12  0 29  1  0  1  0  0  0  0  0  0
   1284  0  0  0  0  0  0  0 50  0  0  0  0  0
   1303  0  0  0  6  0  0  0 11 24  0  6  0  0
   1323 18  7  0  0  4  0  3  5  9  0  0  0  0
   1332  0  0  0  6  0  0  0 39  2  0  0  0  0
   1418  6  0  0 27  0  0  6  8  0  0  0  0  0
   1424  0  0  0  0  0  0  0 28 16  0  3  0  0
   1500  0  1  0 15  0  2  1  1  0  0 24  0  0
   1536  2  0  0 18  5  0  5  0 13  9  0  9  0
   1551  0  0  0  0  0  0  3  0  0  0  0  0  0
   1670  0  0  0  6  0  7  4  0  7  6  7  0  4{code}
Significant batch/trait correlations (complete table can be found [here|^BatchClinicalInfoCorrelationsKIRC.txt]):
{csv}KIRC_clinical_traits,DataType,NumberOfNAs,Test,Pvalue
white_cell_count_result,factor,82,Pearson's Chi-squared test,2.09E-13
serum_calcium_result,factor,160,Pearson's Chi-squared test,8.31E-13
tumor_stage,factor,21,Pearson's Chi-squared test,2.11E-11
tumor_grade,factor,5,Pearson's Chi-squared test,6.43E-09
vital_status,factor,0,Pearson's Chi-squared test,9.62E-09
days_to_form_completion,integer,0,Kruskal-Wallis rank sum test,1.16E-07
year_of_initial_pathologic_diagnosis,integer,0,Kruskal-Wallis rank sum test,1.38E-07
days_to_last_known_alive,integer,10,Kruskal-Wallis rank sum test,8.41E-07
days_to_last_followup,integer,4,Kruskal-Wallis rank sum test,1.94E-06
distant_metastasis_pathologic_spread,factor,11,Pearson's Chi-squared test,2.23E-06
primary_tumor_pathologic_spread,factor,0,Pearson's Chi-squared test,3.63E-06
person_neoplasm_cancer_status,factor,28,Pearson's Chi-squared test,4.26E-06
hemoglobin_result,factor,71,Pearson's Chi-squared test,2.66E-04
lymphnode_pathologic_spread,factor,2,Pearson's Chi-squared test,7.85E-04
lymphnodes_examined_prior_presentation,factor,43,Pearson's Chi-squared test,2.05E-03
gender,factor,0,Pearson's Chi-squared test,2.10E-02
age_at_initial_pathologic_diagnosis,integer,0,Kruskal-Wallis rank sum test,2.51E-02
days_to_birth,integer,8,Kruskal-Wallis rank sum test,2.87E-02
prior_diagnosis,factor,0,Pearson's Chi-squared test,4.75E-02{csv}

h5. Survival vs Batch

!KaplanMeierCurveKIRC.png|thumbnail! !SurvivalByBatchKIRC.png|thumbnail!
Summary can be found [here|^SurvivalBatchSummaryStatisticsKIRC.txt], batch is significantly correlated with survival:
Likelihood ratio test= 61.35  on 10 df,   p=2.007e-09
Wald test            = 64.35  on 10 df,   p=5.39e-10
Score (logrank) test = 75.35  on 10 df,   p=4.066e-12

h5. DNA methylation data analysis

27k dataset, downloaded on December 28, 2011. 219 samples. Technical variables available: batch, amount, concentration, day of shipment, month of shipment, year of shipment, plate row, plate column. Combine day, month and year in a single variable. Info about technical variables:
{code:collapse=true}> head(methNew)
    batchID  amount concentration plate_column plate_row dateCombined
2      0859 26.7 uL    0.14 ug/uL            1         A    17-3-2010
32     0859 26.7 uL    0.17 ug/uL            1         C    17-3-2010
59     0859 26.7 uL    0.15 ug/uL            1         D    17-3-2010
84     0859 26.7 uL    0.15 ug/uL            1         E    17-3-2010
> table(methNew$batchID)

0859 1186 1284 1303 1332
  40   35   50   47   47
> table(methNew$amount)

26.7 uL
    219
> table(methNew$concentration)

0.13 ug/uL 0.14 ug/uL 0.15 ug/uL 0.16 ug/uL 0.17 ug/uL
         7         50        122         30         10
> table(methNew$plate_column)

 1  2  3  4  5  6  7
39 40 40 40 35 23  2
> table(methNew$plate_row)

 A  B  C  D  E  F  G  H
30 28 28 27 27 27 27 25
> table(methNew$plate_column,methNew$plate_row)

    A B C D E F G H
  1 5 4 5 5 5 5 5 5
  2 5 5 5 5 5 5 5 5
  3 5 5 5 5 5 5 5 5
  4 5 5 5 5 5 5 5 5
  5 5 5 5 4 4 4 4 4
  6 4 3 3 3 3 3 3 1
  7 1 1 0 0 0 0 0 0
> table(methNew$dateCombined)

11-10-2010  17-3-2010  25-8-2010  27-9-2010  6-10-2010
        47         40         35         50         47
> table(methNew$dateCombined,methNew$batchID)

             0859 1186 1284 1303 1332
  11-10-2010    0    0    0    0   47
  17-3-2010    40    0    0    0    0
  25-8-2010     0   35    0    0    0
  27-9-2010     0    0   50    0    0
  6-10-2010     0    0    0   47    0{code}
Exclude "amount" from calculations for the correlations of the first principal components of the data with the technical variables. 

Created a matrix of M values, didn't split read and green. Relative variance, no normalization and the outliers:

!KIRC_Mval_noNorm_RelativeVariance.png|thumbnail! !KIRC_Mval_unnorm_PC1_outliers.png|thumbnail!

Based on the plot will look at the first 8 principal components:
{code:collapse=true}batchID concentration plate_column plate_row dateCombined
V1 2.024556e-22     0.5182919   0.22249235 0.9371285 2.024556e-22
V2 1.777673e-18     0.2878497   0.40175378 0.6195123 1.777673e-18
V3 3.196508e-01     0.3802798   0.27628233 0.5517096 3.196508e-01
V4 1.693859e-30     0.2449447   0.50367703 0.9672545 1.693859e-30
V5 2.435091e-03     0.1812444   0.08644977 0.5581507 2.435091e-03
V6 4.437547e-03     0.9473683   0.15938639 0.8458098 4.437547e-03
V7 1.271181e-03     0.3644802   0.79816984 0.7038321 1.271181e-03
V8 1.051940e-05     0.5905213   0.28713862 0.2173504 1.051940e-05{code}
Batch and dateCombined are highly correlated with the first principal components (V1 - V8 are the principal components after performing an SVD on unnormalized matrix)

Start by removing the batch:. Relative variance and the outliers after removing the batch.