Page Comparison

...

Get the patient barcodes from DNA methylation data (511 patients). Truncate the barcodes so they have only 3 fields ("-" separated), request data for download from TCGA: OV cancer, level 3, expression-gene, copy paste 511 truncate barcodes. Platform of choice: BI HT_HG-U133A (Affymetrix). For 3 barcodes Affymetrix data wasn't "applicable".

Get the download link, download the data to belltown: /work/DAT_002__TCGA_Ovarian/01_2011/vita_work/data/AffyExpression_Sep2011. Total number of obtained files is 522. Types of patients represented in this set:

Code Block

> colnames(expr)[1:4]
[1] "TCGA-04-1331-01A-01R-0434-01" "TCGA-04-1332-01A-01R-0434-01"
[3] "TCGA-04-1335-01A-01R-0434-01" "TCGA-04-1336-01A-01R-0434-01"
> sp<-strsplit(colnames(expr),split="-")
> samType<-sapply(sp,"[",4)
> table(samType)
samType
01A 01B 01C 01D 02A
481  24   2   1  14

So all the patients are represented by the tumor samples (01-09: tumor types; 01 - solid tumor; 02 - recurrent solid tumor; A-D: vial count as pertains to an individual patient_sample)
For the comparison, I also looked at the samples in the methylation dataset:

Code Block

> mb<-read.table("methylated_batch.txt",header=T,row.names=1)
> length(colnames(mb))
[1] 511
> colnames(mb)[1:4]
[1] "TCGA.04.1331.01A.01D.0432.05" "TCGA.04.1332.01A.01D.0432.05"
[3] "TCGA.04.1335.01A.01D.0432.05" "TCGA.04.1336.01A.01D.0432.05"
> sp<-strsplit(colnames(mb),"[.]")
> samType<-sapply(sp,"[",4)
> table(samType)
samType
01A 01B 01C 01D
483  25   2   1

So for the expression analysis I need to get rid of all patients that have secondary tumor (02) and I am missing one patient with 01A and 01B in the expression dataset. So the solution is to match the long patient IDs and work with that.

Get clinical files for the same 511 patient barcodes. Download to the same folder.

Data is stored as one file per patient with the following header: barcode, gene symbol, value. The whole first column represents patient barcode. The code to extract the data in a single list and then combining it in a single matrix:

Code Block

exprProcess<-function(){
  patientIDs<-list()
  data<-list()
  files<-list.files(pattern=".txt")
  y<-read.table(files[1],skip=1,header=F)
  geneSymbols<-y[,2]
  n=0
  for (i in seq(along=files)){
    print(files[i])
    patExpr<-read.table(files[i],skip=1,header=F)
    patientIDs[[length(patientIDs)+1]]<-patExpr[1,1]
    data[[length(data)+1]]<-patExpr[,3]
    n<-n+1
    print(n)}

  return(list(Data=data,patNames=patientIDs,genes=geneSymbols))
}

After that I combined the list of datasets from each patient to a single matrix, geneSymbols were used as row names and patientIDs as column names.

...

Version	Old Version 6	New Version 7
Changes made by	Vitalina Komashko (Unlicensed)	Vitalina Komashko (Unlicensed)
Saved on	Sept 28, 2011	Sept 28, 2011

Versions Compared

Key