...
- Get the patient barcodes from DNA methylation data (511 patients). Truncate the barcodes so they have only 3 fields ("-" separated), request data for download from TCGA: OV cancer, level 3, expression-gene, copy paste 511 truncate barcodes. Platform of choice: BI HT_HG-U133A (Affymetrix). For 3 barcodes Affymetrix data wasn't "applicable".
- Get the download link, download the data to belltown: /work/DAT_002__TCGA_Ovarian/01_2011/vita_work/data/AffyExpression_Sep2011. Total number of obtained files is 523.
- Get clinical files for the same 511 patient barcodes. Download to the same folder.
- Data is stored as one file per patient with the following header: barcode, gene symbol, value. The whole first column represents patient barcode. The code to extract the data in a single list and then combining it in a single matrix:
exprProcess<-function(){
patientIDs<-list()
data<-list()
files<-list.files(pattern=".txt")
y<-read.table(files[1],skip=1,header=F)
geneSymbols<-y[,2]
n=0
for (i in seq(along=files)){
print(files[i])
patExpr<-read.table(files[i],skip=1,header=F)
patientIDs[[length(patientIDs)+1]]<-patExpr[1,1]
data[[length(data)+1]]<-patExpr[,3]
n<-n+1
print(n)
}
...