Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

We needed to build a new coexpression network with the same patients whose DNA methylation data was used for comethylation networks and other analyses.

Steps:

  1. Retrieve level 3 data from TCGA, request the same patient barcodes that we have in DNA methylation dataset. Level 3 data contains within array normalized and summarized data (infomation from multiple probes is summarized to a single gene)
  2. Perform normalization (most likely I will only need to remove the batch effect since from the analyses of DNA methylation data I know that other variables available to us such as day, month and center are correlated with batch)
  3. Use normalized data to build gene coexpression network

Data retrieval

  1. Get the patient barcodes from DNA methylation data (511 patients). Truncate the barcodes so they have only 3 fields ("-" separated), request data for download from TCGA: OV cancer, level 3, expression-gene, copy paste 511 truncate barcodes. Platform of choice: BI HT_HG-U133A (Affymetrix). For 3 barcodes Affymetrix data wasn't "applicable". 
  2. Get the download link, download the data to belltown: /work/DAT_002__TCGA_Ovarian/01_2011/vita_work/data/AffyExpression_Sep2011. Total number of obtained files is 523.
  3. Get clinical files for the same 511 patient barcodes. Download to the same folder.
  4. Data is stored as one file per patient with the following header: barcode, gene symbol, value. The whole first column represents patient barcode. The code to extract the data in a single list and then combining it in a single matrix: 
  5.  

exprProcess<-function(){
  patientIDs<-list()
  data<-list()
  files<-list.files(pattern=".txt")
  y<-read.table(files[1],skip=1,header=F)
  geneSymbols<-y[,2]
  n=0
  for (i in seq(along=files)){
    print(files[i])
    patExpr<-read.table(files[i],skip=1,header=F)
    patientIDs[[length(patientIDs)+1]]<-patExpr[1,1]
    data[[length(data)+1]]<-patExpr[,3]
    n<-n+1
    print(n)}

  return(list(Data=data,patNames=patientIDs,genes=geneSymbols))
}

  • No labels