Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
h5. Batch vs clinical traits

Clinical traits: 36, number of batches: 13

Batch vs center:
{code:collapse=true}> table(batchID,two)
       two
batchID A3 AK AS B0 B2 B4 B8 BP CJ CW CZ DV EU
   0859 31  8  2  0  0  0  0  0  0  0  0  0  0
   1186  4  6  0  0  5  0  6  5  9  0  0  0  0
   1275  0 12  0 29  1  0  1  0  0  0  0  0  0
   1284  0  0  0  0  0  0  0 50  0  0  0  0  0
   1303  0  0  0  6  0  0  0 11 24  0  6  0  0
   1323 18  7  0  0  4  0  3  5  9  0  0  0  0
   1332  0  0  0  6  0  0  0 39  2  0  0  0  0
   1418  6  0  0 27  0  0  6  8  0  0  0  0  0
   1424  0  0  0  0  0  0  0 28 16  0  3  0  0
   1500  0  1  0 15  0  2  1  1  0  0 24  0  0
   1536  2  0  0 18  5  0  5  0 13  9  0  9  0
   1551  0  0  0  0  0  0  3  0  0  0  0  0  0
   1670  0  0  0  6  0  7  4  0  7  6  7  0  4{code}
Significant batch/trait correlations (complete table can be found [here|^BatchClinicalInfoCorrelationsKIRC.txt]):
{csv}KIRC_clinical_traits,DataType,NumberOfNAs,Test,Pvalue
white_cell_count_result,factor,82,Pearson's Chi-squared test,2.09E-13
serum_calcium_result,factor,160,Pearson's Chi-squared test,8.31E-13
tumor_stage,factor,21,Pearson's Chi-squared test,2.11E-11
tumor_grade,factor,5,Pearson's Chi-squared test,6.43E-09
vital_status,factor,0,Pearson's Chi-squared test,9.62E-09
days_to_form_completion,integer,0,Kruskal-Wallis rank sum test,1.16E-07
year_of_initial_pathologic_diagnosis,integer,0,Kruskal-Wallis rank sum test,1.38E-07
days_to_last_known_alive,integer,10,Kruskal-Wallis rank sum test,8.41E-07
days_to_last_followup,integer,4,Kruskal-Wallis rank sum test,1.94E-06
distant_metastasis_pathologic_spread,factor,11,Pearson's Chi-squared test,2.23E-06
primary_tumor_pathologic_spread,factor,0,Pearson's Chi-squared test,3.63E-06
person_neoplasm_cancer_status,factor,28,Pearson's Chi-squared test,4.26E-06
hemoglobin_result,factor,71,Pearson's Chi-squared test,2.66E-04
lymphnode_pathologic_spread,factor,2,Pearson's Chi-squared test,7.85E-04
lymphnodes_examined_prior_presentation,factor,43,Pearson's Chi-squared test,2.05E-03
gender,factor,0,Pearson's Chi-squared test,2.10E-02
age_at_initial_pathologic_diagnosis,integer,0,Kruskal-Wallis rank sum test,2.51E-02
days_to_birth,integer,8,Kruskal-Wallis rank sum test,2.87E-02
prior_diagnosis,factor,0,Pearson's Chi-squared test,4.75E-02{csv}

h5. Survival vs Batch

!KaplanMeierCurveKIRC.png|thumbnail! !SurvivalByBatchKIRC.png|thumbnail!
Summary can be found [here|^SurvivalBatchSummaryStatisticsKIRC.txt], batch is significantly correlated with survival:
Likelihood ratio test= 61.35  on 10 df,   p=2.007e-09
Wald test            = 64.35  on 10 df,   p=5.39e-10
Score (logrank) test = 75.35  on 10 df,   p=4.066e-12

h5. DNA methylation data analysis

27k dataset, downloaded on December 28, 2011. 219 samples. Technical variables available: batch, amount, concentration, day of shipment, month of shipment, year of shipment, plate row, plate column. Combine day, month and year in a single variable. Info about technical variables:
{code:collapse=true}> head(methNew)
    batchID  amount concentration plate_column plate_row dateCombined
2      0859 26.7 uL    0.14 ug/uL            1         A    17-3-2010
32     0859 26.7 uL    0.17 ug/uL            1         C    17-3-2010
59     0859 26.7 uL    0.15 ug/uL            1         D    17-3-2010
84     0859 26.7 uL    0.15 ug/uL            1         E    17-3-2010
110  > table(methNew$batchID)

0859 26.71186 uL1284 1303 1332
 0.15 ug/uL 40   35   50   47   147
> table(methNew$amount)

26.7 uL
   F 219
>  17-3-2010
124    0859 26.7 uL    0.15table(methNew$concentration)

0.13 ug/uL 0.14 ug/uL 0.15 ug/uL 0.16 ug/uL 0.17 ug/uL
          7 1         G50    17-3-2010 > table(methNew$batchID)  0859122 1186 1275 1284 1303 1323 1332 1418 1424 150030 1536 1551 1670   40   3510
> table(methNew$plate_column)

0 1  502  3 47 4  5 0 6  477
39 40 40 040 35 23  02
> table(methNew$plate_row)

0 A  B 0 C  D 0 E  F 0 >G table(methNew$amount) H
10.330 uL28 28 27 1027 uL27 11.227 uL25
>  11 uL 12.4 uL 13.2 uL 13.3 uL 15.7 uL   15 uL 16.1 uL
      0       0       0       0       0       0       0       0       0       0
16.3 uL 16.7 uL   16 uL   17 uL   19 uL 20.9 uL   20 uL 21.5 uL   22 uL 23.7 uL
      0       0table(methNew$plate_column,methNew$plate_row)

    A B C D E F G H
  1 5 4 5 5 5 5 5 5
  2 5 5 5 5 5 5 5 5
  3 5 5 5 5 5 5 5 5
  4 5 5 5 5 5 5 5 5
  5 5 5 5 4 4 4 4 4
  6 4 3 3 3 3 3 3 1
  7 1 1 0 0 0 0 0 0
> table(methNew$dateCombined)

11-10-2010  17-3-2010  25-8-2010  27-9-2010  6-10-2010
        47         40         35         050       0  47
> table(methNew$dateCombined,methNew$batchID)

 0       0     0859 1186 01284 1303 1332
  11-10-2010  0  0    0 0   0    0   2547
uL 26.7 uL 17-3-2010   30 uL40   40 uL0    50 uL   600 uL   610
 uL 63.1 uL 66.6 uL 6.67 uL 25-8-2010     0   35    0    0 219   0
  27-9-2010  0   0    0   50    0    0
  06-10-2010      0 0   0    0   47    0{code}
Exclude "amount" from calculations for the 0correlations 66.7of uLthe first 6.7principal uLcomponents of 7.2the uLdata with the 80 uL  8.9 uL
      0       0       0       0       0
#It seems that all values of the amount are 26.7 (although I have factor levels from the all values available for future DNA methylation datasets for patients for whom samples are already collected)
> table(methNew$concentration)

  0.01 ug/uL   0.03 ug/uL   0.04 ug/uL 0.0500 ug/uL  0.050 ug/uL   0.05 ug/uL
           0            0            0            0            0            0
  0.09 ug/uL  0.100 ug/uL   0.10 ug/uL   0.11 ug/uL   0.12 ug/uL   0.13 ug/uL
           0            0            0            0            0            7
  0.14 ug/uL   0.15 ug/uL   0.16 ug/uL   0.17 ug/uL    0.1 ug/uL   0.50 ug/uL
          50          122           30           10            0            0
   .05 ug/uL    0.5 ug/uL   .100 ug/uL   .150 ug/uL     .1 ug/uL    .50 ug/uL
           0            0            0            0            0            0
    .5 ug/uL
           0
> table(methNew$plate_column)

 1  2  3  4  5  6  7
39 40 40 40 35 23  2
> table(methNew$plate_row)

 A  B  C  D  E  F  G  H
30 28 28 27 27 27 27 25
> table(methNew$plate_row,methNew$plate_column)

    1 2 3 4 5 6 7
  A 5 5 5 5 5 4 1
  B 4 5 5 5 5 3 1
  C 5 5 5 5 5 3 0
  D 5 5 5 5 4 3 0
  E 5 5 5 5 4 3 0
  F 5 5 5 5 4 3 0
  G 5 5 5 5 4 3 0
  H 5 5 5 5 4 1 0
> table(methNew$dateCombined)

11-10-2010  17-3-2010  25-8-2010  27-9-2010  6-10-2010
        47         40         35         50         47
> table(methNew$dateCombined,methNew$batchID)

             0859 1186 1275 1284 1303 1323 1332 1418 1424 1500 1536 1551 1670
  11-10-2010    0    0    0    0    0    0   47    0    0    0    0    0    0
  17-3-2010    40    0    0    0    0    0    0    0    0    0    0    0    0
  25-8-2010     0   35    0    0    0    0    0    0    0    0    0    0    0
  27-9-2010     0    0    0   50    0    0    0    0    0    0    0    0    0
  6-10-2010     0    0    0    0   47    0    0    0    0    0    0    0    0{code}technical variables. 

Created a matrix of M values, didn't split read and green. Relative variance, no normalization: !KIRC_Mval_noNorm_RelativeVariance.png|thumbnail!

Based on the plot will look at the first 8 principal components:
{code:collapse=true}batchID concentration plate_column plate_row dateCombined
V1 2.024556e-22     0.5182919   0.22249235 0.9371285 2.024556e-22
V2 1.777673e-18     0.2878497   0.40175378 0.6195123 1.777673e-18
V3 3.196508e-01     0.3802798   0.27628233 0.5517096 3.196508e-01
V4 1.693859e-30     0.2449447   0.50367703 0.9672545 1.693859e-30
V5 2.435091e-03     0.1812444   0.08644977 0.5581507 2.435091e-03
V6 4.437547e-03     0.9473683   0.15938639 0.8458098 4.437547e-03
V7 1.271181e-03     0.3644802   0.79816984 0.7038321 1.271181e-03
V8 1.051940e-05     0.5905213   0.28713862 0.2173504 1.051940e-05{code}
Batch and dateCombined are highly correlated with the first principal components (V1 - V8 are the principal components after performing an SVD on unnormalized matrix)

Start by removing the batch: