Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Get the patient barcodes from DNA methylation data (511 patients). Truncate the barcodes so they have only 3 fields ("-" separated), request data for download from TCGA: OV cancer, level 3, expression-gene, copy paste 511 truncate barcodes. Platform of choice: BI HT_HG-U133A (Affymetrix). For 3 barcodes Affymetrix data wasn't "applicable". 
  2. Get the download link, download the data to belltown: /work/DAT_002__TCGA_Ovarian/01_2011/vita_work/data/AffyExpression_Sep2011. Total number of obtained files is 523.
  3. Get clinical files for the same 511 patient barcodes. Download to the same folder.
  4. Data is stored as one file per patient with the following header: barcode, gene symbol, value. The whole first column represents patient barcode. The code to extract the data in a single list and then combining it in a single matrix: 
    Code Block
    exprProcess<-function(){

...

  1. 
      patientIDs<-list()

...

  1. 
      data<-list()

...

  1. 
      files<-list.files(pattern=".txt")

...

  1. 
      y<-read.table(files[1],skip=1,header=F)

...

  1. 
      geneSymbols<-y[,2]

...

  1. 
      n=0

...

  1. 
      for (i in seq(along=files)){

...

  1. 
        print(files[i])

...

  1. 
        patExpr<-read.table(files[i],skip=1,header=F)

...

  1. 
        patientIDs[[length(patientIDs)+1]]<-patExpr[1,1]

...

  1. 
        data[[length(data)+1]]<-patExpr[,3]

...

  1. 
        n<-n+1

...

  1. 
        print(n)}

...

  1. 
    
      return(list(Data=data,patNames=patientIDs,genes=geneSymbols))

...

  1. 
    }
    After that I combined the list of datasets from each patient to a single matrix, geneSymbols were used as row names and patientIDs as column names. 

Normalization

Next step is to do PCA that I will use for identification of variables that affect the data. 

Code Block
library(corpcor)
u<-fast.svd(sweep(expr),1,rowMeans(expr))
#to plot relative variance:
plot(u$d^2/sum(u$d^2),main="Relative variance")