Schematic
Glossary
Manifest - metadata table submitted for datasets
Summary
SCHEMATIC is an acronym for Schema Engine for Manifest Ingress and Curation. The Python based infrastructure provides a novel schema-based, metadata ingress ecosystem, that is meant to streamline the process of biomedical dataset annotation, metadata validation and submission to a data repository for various data contributors.
Documentation
https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/2967568387/Guide+How+to+use+Schematic+for+Data+Model+Development#About
Code in Github
https://github.com/Sage-Bionetworks/schematic
Installation
https://pypi.org/project/schematicpy/
pip install schematicpy
...
Install for data curator app:
Code Block |
---|
python3 -m venv .venv source .venv/bin/activate python3 -m pip install schematicpy |
Setup Python Environment
Schematic will run on Python 3.10. We must control the Python Environment. PyEnv is one option., https://fathomtech.io/blog/python-environments-with-pyenv-and-vitualenv/
Code Block |
---|
pyenv install 3.10.11 pyenv virtualenv 3.10.11 schematic_3_10_11 pyenv activate schematic_3_10_11 pyenv -m pip install schematic_3_10_11 pip install schematicpy |
Edit Configuration
The following parameters need to be set in the config.yml
https://github.com/Sage-Bionetworks/schematic/blob/develop/config.yml
Using Schematic
Command Line Reference
https://sage-schematic.readthedocs.io/en/develop/cli_reference.html
...
The JSON-LD schema follows the specifications from Schema.org way of specifying for attributes.
https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/2473623559/The+Data+Model+Schema#A.-Schema-properties-and-relationships /wiki/spaces/SCHEM/pages/2967568387
Schematic DB
https://linkmlsagebionetworks.jira.iocom/linkmlwiki/intro/tutorial.html
https://docs.google.com/spreadsheets/d/1vDdcqt3Lgehyq1iCnlF1H9JZi63pLj-u/edit#gid=1939820452
https://portal.includedcc.org/dashboard
https://linkml.io/schemasheets/#examples
https://docs.google.com/spreadsheets/d/1w6zDfz3_yrCjjrqfpXBGNmd0LZL4B03gr1KfzJtk5Cs/edit#gid=674286209 spaces/SCHEM/pages/2473623559/The+Data+Model+Schema#Schemas-and-Schematic-DB
Schematic DB is a package used to ingress the manifests created by Schematic into a database.
Schematic DB will use any of these validation rules:
str
float
num
int
date
If the attribute has none of the above rules it use a string type
the attribute datatype will be determined based on the rule
Build a Data Model
https://docs.google.com/presentation/d/129pSx58qDm7Y1OQmSSHKDq6tsoD3pW_gDRNXiX2rd0w/edit#slide=id.g4d21a8c2bag13aaf3b8358_0_110
Documentation/wiki/spaces/SCHEM/pages/2453176326
/wiki/spaces/SCHEM/pages/2458419217
Install Schematic
Schematic will run on Python 3.10. We must control the Python Environment. PyEnv is one option., https://fathomtech.io/blog/python-environments-with-pyenv-and-vitualenv/
Code Block |
---|
pyenv install 3.10.11
pyenv virtualenv 3.10.11 schematic_3_10_11
pyenv activate schematic_3_10_11
pyenv -m pip install schematic_3_10_11
pip install schematicpy |
Data model visualizer?
...
Recommendations
Draw a diagram for data model
Lucid.app - can use templates like ERD example
Start small - skeleton --> schema
Schema visualization tools?
Useful reference when building
Start from single table
Use schematic in dev mode to convert model to JSON-LD regularly to check for errors
Model Requirements
The data model requires these columns:
Attribute
Description
ValidValues
DependsOn
required
source
parent
properties
dependsOnComponent
Data Model Validation
/wiki/spaces/SCHEM/pages/24736235592645262364Data
Example Model
...
https://docs.googlegithub.com/presentation/d/129pSx58qDm7Y1OQmSSHKDq6tsoD3pW_gDRNXiX2rd0w/edit#slide=id.g13aaf3b8358_0_0
Diagramming - draw out model
Lucid.app - can use templates like ERD example
Can reference diagram when building data model
Schema visualization tool ( data viz collaboration opportunity Rich!!)
Start small - skeleton --> schema
Definitions on /wiki/spaces/SCHEM/pages/2473623559
Manifest - metadata table submitted for datasets
Data Model -
Data Schema -
Start from single table
CSV with basic column set: Attribute, Description, ValidValues, DependsOn, required, source, parent, properties, dependsOnComponent, validationRules
Use schematic in dev mode to conver model to JSON-LD regularly to check for errors
/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.csv
Lref gdrive file | ||
---|---|---|
|
https://ontofox.hegroup.org/
...
`schematic model --config config.hyml submit --manifest_path manifest.csv --datset_id synId -- manifest_record_type table
Data Model Visualization
https://linkml.io/linkml/intro/tutorial.html
https://docs.google.com/spreadsheets/d/1vDdcqt3Lgehyq1iCnlF1H9JZi63pLj-u/edit#gid=1939820452
https://portal.includedcc.org/dashboard
https://linkml.io/schemasheets/#examples
https://docs.google.com/spreadsheets/d/1w6zDfz3_yrCjjrqfpXBGNmd0LZL4B03gr1KfzJtk5Cs/edit#gid=674286209
https://docs.google.com/presentation/d/129pSx58qDm7Y1OQmSSHKDq6tsoD3pW_gDRNXiX2rd0w/edit#slide=id.g4d21a8c2ba_0_11
/wiki/spaces/SCHEM/pages/2453176326
/wiki/spaces/SCHEM/pages/2458419217
JSON for Linking Data JSON-LD
...