Last updated on 2023-09-15
This page is intended to describe describes the workflow required to build, edit, and update the data model for MODEL-AD.
Table of Contents | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Schematic
Summary
Data Modeling at Sage requires using two in-house tools: Schematic and the Data Curator App (DCA).
Schematic
Summary
SCHEMATIC is an acronym for Schema Engine for Manifest Ingress and Curation. The Python based tool is a schema-based, metadata ingress ecosystem, intended to streamline of biomedical dataset annotation, metadata validation and submission to a data repository for various data contributors.
...
https://github.com/adknowledgeportal/data-models
Sage Data Models for Reference
Lref gdrive file url https://docs.google.com/spreadsheets/d/1vDdcqt3Lgehyq1iCnlF1H9JZi63pLj-u/edit#gid=1939820452
Recommendations
Draw a diagram. A diagram is a useful reference when developing the model.
Start small with a basic skeleton and then build.
Use schematic in dev mode to convert model to JSON-LD regularly to check for errors
...
Github: https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.csv
Formatted for readability:
Lref gdrive file url https://docs.google.com/spreadsheets/d/1Wde5YBFtEa4GhO-smXgbVApGioBGNnc-95n4LY8YB_E/edit#gid=925738608
This model does not actually NOT validate as provided.
Schematic DB
...
schematic schema convert model.csv
What is JSON-LD?
Data models are formatted in JavaScript Object Notation-LinkedData. JSON-LD in schematic is its support by http://schema.orgdataset discoverability in search engines like: Dataset Search
Guide to Developing Data Models in JSON-LD
...
Dublin Core
Friend of a Friend (FOAF)
GoodRelations
GeoNames
MusicBrainz
When developing a JSON-LD data model, it is important to choose the appropriate vocabulary. The vocabulary should be relevant to the type of data that you are modeling.
Metadata Dictionary
AD Knowledge Portal Metadata Dictionary
...
Ontology Resources
...
...
...
Metadata Dictionary
AD Knowledge Portal Metadata Dictionary
https://docssagebio.googleshinyapps.com/spreadsheets/d/1vDdcqt3Lgehyq1iCnlF1H9JZi63pLj-u/edit#gid=1939820452
https://portal.includedcc.org/dashboard io/amp-ad-metadata-dictionary/
Data Curator App
http://dca.app.sagebionetworks.org
https://linkml.io/schemasheets/#examplesdca-dev.app.sagebionetworks.org
https://docsgithub.google.com/spreadsheetsadknowledgeportal/d/1w6zDfz3_yrCjjrqfpXBGNmd0LZL4B03gr1KfzJtk5Cs/edit#gid=674286209data_curator
https://docs.googlegithub.com/presentation/d/129pSx58qDm7Y1OQmSSHKDq6tsoD3pW_gDRNXiX2rd0w/edit#slide=id.g4d21a8c2ba_0_11adknowledgeportal/data-models
https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/2453176326
/wiki/spaces/SCHEM/pages/2458419217
...
Projects
Folder Structure
https://dca-docs.scrollhelp.site/DCA/Working-version/Project-Agnostic/uploading-data
https://dca-docs.scrollhelp.site/DCA/Working-version/ELITE/validate-and-submit-your-metadata
Glossary
Template
Manifest - metadata table submitted for dataset
...
https://dca-dev.app.sagebionetworks.org/
Abby's request for testing
...
Code Block |
---|
.
├── biospecimen_experiment_1
├── manifest1.csv
├── biospecimen_experiment_2
├── manifestA.csv
├── single_cell_RNAseq_batch_1
├── manifestX.csv
├── fileA.txt
├── fileB.txt
├── fileC.txt
└── fileD.txt
└── single_cell_RNAseq_batch_2
├── manifestY.csv
└── file1.txt |
Study Content
/wiki/spaces/AKP/pages/1057882353
Study Description in wiki
Methods description in each data folder
/wiki/spaces/EPD1/pages/2900819969
AMP-AD
Second Test
AD Portal DCA Test ProjectFileview AD Portal DCA Test Project - Table
https://github.com/adknowledgeportal/test-data-model/blob/main/model-ad/model-ad.data.model.jsonld
https://sagebiogithub.shinyapps.iocom/adknowledgeportal-/data-curatormodels/
https:blob/main/www.synapse.org/#!Synapse:syn33582398/wiki/619343
https://github.com/adknowledgeportal/data_curator
https://github.com/adknowledgeportal/test-data-model README.md#editing-data-models
AD data model → modular
repo:
branch: test-split-csvs
folders:
modules/
..biosopecimen/
..mouse/
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Term = Attribute in the data model where Parent = DataProperty
test-split0csvs branch
MODEL-AD
ELITE
Annotate study folder with contentType = 'dataset'
Flattened file structure
Create Project
Maintain File permission access easily
Top level: assay folders
All data files of one type in assay folder
These assay folder names will be displayed
data_folder/
Schematic Configuration needed config.yml
master_file view ‘synID’
which refers to this:
Fileview - Files and Folders https://www.synapse.org/#!Synapse:syn36759435syn51753858/tables/Add CSV + JSONLD to github – test-data-model
https://github.com/adknowledgeportal/test-data-model
https://github.com/adknowledgeportal/Sage-Bionetworks/data_curator/blob/18dc00723f2e95a98525ff695401ac67e7785475/schematic_config.yml#L31
Data Model Validation Rules
/wiki/spaces/SCHEM/pages/2645262364
...
extract individual and specimen ID from filenames
...
needs to point to this fileview and the data model
fork repo
edit dca-template-config.json
add MODEL-AD folder and edit configuration as needed send a pull request
ADKP example
Fileview DCA Asset View that DCA uses
folder contentType = ‘dataset’
One project for all of AD
Templates
Lref gdrive file | ||
---|---|---|
|
...
|
...
|
...
|
...
|
...
/wiki/spaces/SCHEM/pages/2473623559
...
...
https://dca-docs.scrollhelp.site/DCA/Working-version/Project-Agnostic/uploading-data
https://dca-docs.scrollhelp.site/DCA/Working-version/ELITE/validate-and-submit-your-metadata
Resources
...
...
...
...
...
Lref gdrive file | ||
---|---|---|
|
...
|
...
...
“Manifest Templates”
...
Glossary
Template
Manifest - metadata table submitted for dataset