Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device.
Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
[XY1-9] - matches one character that is either X, Y or 1-9
[0-9]* - matches 0 or more times integers 0 to 9
\> - matches the end of the word
How to create eSet object from DNA methylation data
Example: eSet object with TCGA LAML data (see analysis here)
> library(Biobase)
> class(laml)
[1] "matrix" #has to be a matrix
# Useful to add clinical data = information about the samples, which will got to pData. Colnames of which should match colnames of the exprs.
# Need to reformat the columns of the assayData (laml) to match clinical file
> dim(clinical)
[1] 202 23
> one<-sapply(newc,"[",1) #I split the col names by "-" before, now I am about to assemble a short patient ID used in the clinical file
> two<-sapply(newc,"[",2)
> three<-sapply(newc,"[",3)
> shortID<-paste(one, two, three, sep="-")
> colnames(laml)<-shortID
> k<-match(colnames(laml),rownames(clinical))
> length(k)
[1] 190
> clinicalMeth<-clinical[k,] # Get clinical info for the patients for whom DNA methylation data is available
[1] 202 23
> one<-sapply(newc,"[",1)
> two<-sapply(newc,"[",2)
> three<-sapply(newc,"[",3)
> shortID<-paste(one, two, three, sep="-")
> colnames(laml)<-shortID
> k<-match(colnames(laml),rownames(clinical))
> length(k)
[1] 190
> clinicalMeth<-clinical[k,]
> all(rownames(clinicalMeth) == colnames(laml)) #critical step
[1] TRUE
> phenoData <- new("AnnotatedDataFrame", data = clinicalMeth) #critical step
#For the features (CpGs) annotation I download adf files from TCGA: http://tcga-data.nci.nih.gov/tcga/tcgaPlatformDesign.jsp. Read the file into R and
#construct a feature class object
> adf<-read.table("jhu-usc.edu_TCGA_HumanMethylation450.adf.txt",header=T,row.names=1,sep="\t")
> dim(adf)
[1] 486427 11
> dim(laml)
[1] 486411 190
#Next need to match adf and laml. For some weird reason they have different CpG ID (?!). Final datasets are adfMeth and lamlT.
#Need to make sure that the rownames in lamlT and adfMeth are the same using all command.
>feature<-new("AnnotatedDataFrame",data=adfMeth) #create featureData class object: critical step
#Finally create the expression set class object:
> exprs<-new("ExpressionSet",exprs=lamlT, phenoData=phenoData,featureData=feature,annotation=annotation)
> exprs
ExpressionSet (storageMode: lockedEnvironment)
assayData: 483308 features, 190 samples
element names: exprs
protocolData: none
phenoData
sampleNames: TCGA-AB-2802 TCGA-AB-2803 ... TCGA-AB-3012 (190 total)
varLabels: age_at_initial_pathologic_diagnosis
cumulative_agent_total_dose ...
year_of_initial_pathologic_diagnosis (23 total)
varMetadata: labelDescription
featureData
featureNames: cg00000029 cg00000108 ... Target.Removal.2 (483308
total)
fvarLabels: CHR GENESYMBOL ... CONTROL_PROBE (11 total)
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: IlluminaHumanMethylation450k
Plotting density histograms for each patients in a matrix
Apparently if a matrix with DNA methylation or gene expression values is converted to an object of ExpressionSet class function density has methods for plotting density distribution for each patient (column) in the matrix. It is very convenient and I don't need to write a function to do so.
#Create a minimum Expression Set
>methExpr <- new("ExpressionSet",exprs=meth) # require(Biobase), require(lumi)
#To make a density plot and color according to a certain parameter
>plotColor<-seq(1,ncols(meth),1)
#x is the indices of columns that are tumor samples. Created using grep and relying on TCGA barcode
>plotColor[x]<-"red"
>plotColor[-x]<-"blue"
#Now plot the results, lines are colored according to tumor-normal. The same was also done for the line type
> png("Colon_Mvalue_dataDistrubitionNormalTumor.png")
> density(methExp,addLegend=F,col=plotColor,main="Colon M value no normalization",lty=lineType)
> legend("topright",legend=c("Tumor","Normal"),lty="solid",col=c("red","blue"),bty="n") #bty - no box around the legend
> dev.off()
#to show distribution by batch extract batch information then assign colors to each batch by converting it to a factor
> colFields<-strsplit(colnames(meth),split="-")
> batch<-sapply(colFields,"[",6)
> batchC<-factor(batchC,labels=brewer.pal(10,"Set3")) #require(RColorBrewer)
> png("Colon_Mvalue_dataDistributionByBatch.png")
> density(methExp,addLegend=F,col=batchC,main="Colon M value no normalization")
> legend("topright",legend=levels(as.factor(batch)),lty="solid",col=brewer.pal(10,"Set3"),bty="n")
> dev.off()