Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

For the pilot project we want to recreate the results obtained in Shen, 2007 paper. To do that I need to identify IDs of the CpGs that they used in their analyses. Here is the list:

Classical CIMP markers: MINT1, MINT2, MINT27, MINT12, MINT31, MINT17

Type C genes (methylated only in cancer): p14 (9p21), MLH1 (2p22), THBS1 (15q14),  THBS2 (6q), MGMT (10q26), COX2 (1q25.2), Megalin (1q21.1), RIZ1 (1p36), p16 (9p21),  RASSF1A (I remember from my PhD work that this was my very good positive control for methylation in cancer cell lines; 3p21.31), DAPK (9p34.1), TIMP3 (22q12.1), hTERT (5p15.33), Neurog1 (5q31.1), SOCS1 (16p13.13), RUNX3 (1p36.11)

Type A genes (methylated in normal and cancer): ER alpha (6q25.1), MyoD1 (11p15.4), N33 (8p22), SFRP1 (8p12), HPP1 (2q32.3)

They provided the sequences for some of the markers and indicated the publications from which they got the sequence for the other markers for the methylation assays. Since it was too much pain to retrieve the sequences for the primers I ended up retrieving CpGs near the genes listed and then finding only the CpGs within CpG islands using Illumina's HMM predictions (part of the Bioconductor package). The reason for that is because I am pretty sure in 2007 they focused on the promoter regions and CpG islands are only within promoter regions of the genes. For the MINTs I mapped the sequences to hg19 (find the table here) and then found all CpGs on 450k platform within those sequences.

Number of CpGs profiles on 450k platform for every gene listed:

> tapply(x$Symbol,as.factor(x$Symbol),length)
  COX2/PTGS2   DAPK/DAPK1 ERalpha/ESR1  HPP1/TMEEF2         MGMT         MLH1 
          17           29           68           17          176            7 
Megalin/LRP2        MyoD1    N33/TUSC3      Neurog1      RASSF1A   RIZ1/PRDM2 
          21           16           21           31           51           66 
       RUNX3        SFRP1        SOCS1         TERT        THBS1        THBS2 
          91           37           28          100           33           57 
       TIMP3   p14/CDKN2D   p16/CDKN2A 
          NA            3            3 

No CpGs were found for TIMP3. Exclude from further analyses.

Now the number of CpGs within CpG island for every gene based on Illumina's HMM predictions:

> tapply(x$Is_HMM_Island,as.factor(x$Symbol),function(x) length(which(x==1)))
  COX2/PTGS2   DAPK/DAPK1 ERalpha/ESR1  HPP1/TMEEF2         MGMT         MLH1 
          12           11           24            7          142            0 
Megalin/LRP2        MyoD1    N33/TUSC3      Neurog1      RASSF1A   RIZ1/PRDM2 
          13           16            7           24           33           32 
       RUNX3        SFRP1        SOCS1         TERT        THBS1        THBS2 
          59           12           23           87           15           42 
       TIMP3   p14/CDKN2D   p16/CDKN2A 
          NA            3            3

Somehow MLH1 doesn't have any CpGs that are predicted to be within a CpG island. So I manually looked at the CpGs that were pulled down for this gene to see if they are within the promoter region. 

Looking for CpGs within MINT regions:

> mint1<-subset(chrloc, Chromosome_37=="5" & Coordinate_37<75380277 & Coordinate_37>75379746)
> mint1
         Probe_ID Chromosome_37 Coordinate_37
200977 cg10583931             5      75379800
469458 cg26904700             5      75379748
##MINT2:
> mint2
         Probe_ID Chromosome_37 Coordinate_37
185088 cg09671962             2      58655463
188239 cg09859179             2      58655104
456366 cg26154670             2      58655041
##MINT27
 mint27<-subset(chrloc, Chromosome_37=="10" & Coordinate_37>46970744 & Coordinate_37<46970985)
> mint27
         Probe_ID Chromosome_37 Coordinate_37
336461 cg18513313            10      46970868
##MINT12:
> mint12<-subset(chrloc, Chromosome_37=="7" & Coordinate_37>129424950 & Coordinate_37<129425501)
> mint12
         Probe_ID Chromosome_37 Coordinate_37
28893  cg01390574             7     129425451
356420 cg19776616             7     129425330
##MINT31:
> mint31<-subset(chrloc, Chromosome_37=="17" & Coordinate_37>48636113 & Coordinate_37<48636784)
> mint31
         Probe_ID Chromosome_37 Coordinate_37
157895 cg08133931            17      48636626
266041 cg14315444            17      48636344
309320 cg16829453            17      48636646
##MINT17:
> mint17<-subset(chrloc, Chromosome_37=="12" & Coordinate_37>117316933 & Coordinate_37<117317411)
> mint17
         Probe_ID Chromosome_37 Coordinate_37
291793 cg15824316            12     117317338

Total number of CpGs is 582. However there are 4 duplicate CpG IDs probably because they are within the promoter regions of a couple of genes. I decided to kick them out from the analysis. The final working set of CpGs is 578. 

  • No labels