Skip to end of banner
Go to start of banner

Datasets

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

File view driving this table: https://www.synapse.org/#!Synapse:syn9630847/tables/

The following is a brief description of the relevant columns in the table below (by column header):

column name: name as it should appear in data portal

current synapse file view column: corresponding name (e.g., for SQL query) in Synapse table view

eventual synapse file view column: name (e.g., for SQL query) in Synapse table view that we will eventually migrate to

difference between current and eventual columns: as we migrate to GDC, we will put new annotation keys in "eventual" column names. for now, use "current."

facet: true if column_name should be faceted in data portal.

card location: primary (i.e., top) or secondary (i.e., bottom) key/annotation

The following annotations do not exist on file: 

The following annotations need to be "ported" to GDC:  

The following have been added to the synapse table, but are blank. They need to be filled in:

column namecurrent synapse file view columneventual synapse file view columnfacetshow on card–no,  primary, secondaryconceptexampleGDC equivalentfaceted on GDCfacet on CSBCGDC referencesizerestricted valuescommentsin AMP-AD portalin NF portal
Speciesspeciesspeciesyesprimary

none
yes





Scientific ThemeNAthemeyesprimary
tumor-heterogeneitynone
yes





Data Categoryassaydata_categoryyesprimaryBroad categorization of the contents of the data file.
  • Transcriptome Profiling
data_categoryyesyes


CSBC will need to add values to those in GDC (which only cover sequencing)

Data TypeNAdata_typeno?noSpecific content type of the data file.
  • Exon Expression Quantification
  • Gene Expression Quantification
  • Isoform Expression Quantification
  • Splice Junction Quantification
data_typeyesyes


CSBC will need to add values to those in GDC (which only cover sequencing)

Data FormatfileFormatdata_formatno?noFormat of the data files.
  • CSV
  • HDF5
  • TSV
  • TXT
  • SRA XML
  • MAGE-TAB
  • SDRF
  • IDF
  • ADF
data_formatyesyes





Experiment Strategyassayexperimental_strategyyesprimaryThe sequencing strategy used to generate the data file.  REMOVE "sequencing" for CSBC.
  • RNA-Seq
  • Total RNA-Seq
experimental_strategyyesyes


CSBC will need to add values to those in GDC (which only cover sequencing)

file_name

noprimaryThe name (or part of a name) of a file (of any type).
file_namenono





file_size

nonoThe size of the data file (object) in bytes.
file_sizenono





md5sum

nonoThe 128-bit hash value expressed as a 32 digit hexadecimal number (in lower case) used as a file's digital fingerprint.
md5sumnono





platform

yesno

platformyesyes





workflow_typeNA
nonoGeneric name for the workflow used to analyze a data set.
  • BWA
  • BWA with BQSR
  • BWA-aln
  • BWA-mem
  • BWA with Mark Duplicates and BQSR

yes??





Disease Typedisease_type
yesprimaryThe text term used to describe the type of malignant disease, as categorized by the World Health Organization's (WHO) International Classification of Diseases for Oncology (ICD-O).
  • Acinar Cell Neoplasms
  • Adenomas and Adenocarcinomas
  • Adnexal and Skin Appendage Neoplasms
  • Basal Cell Neoplasms
  • Blood Vessel Tumors
case/disease_typeyesyeshttps://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=case




Tissuetissue_or_organ_of_origin
yesprimaryThe text term used to describe the anatomic site of origin, of the patient's malignant disease, as described by the World Health Organization's (WHO) International Classification of Diseases for Oncology (ICD-O).
  • Abdomen, NOS
  • Abdominal esophagus
  • Accessory sinus, NOS
  • Acoustic nerve
  • Adrenal gland, NOS
diagnosis/tissue_or_organ_of_originyesyeshttps://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=diagnosis




tissue_type

nonoText term that represents a description of the kind of tissue collected with respect to disease status or proximity to tumor tissue.
  • Tumor
  • Normal
  • Abnormal
  • Peritumoral
  • Unknown
sample/tissue_typenoyeshttps://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=sample




GDC Data Dictionary viewer: https://docs.gdc.cancer.gov/Data_Dictionary/viewer/

GDC Data Dictionary is implemented in YAML files: https://github.com/NCI-GDC/gdcdictionary

GDC submission process (and metadata templates) are described here: https://docs.gdc.cancer.gov/Data_Submission_Portal/Users_Guide/Data_Submission_Overview/

GDC Data Upload Walkthrough: https://docs.gdc.cancer.gov/Data_Submission_Portal/Users_Guide/Data_Submission_Walkthrough/#clinical-data-requirements

  • No labels