Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Schematic

Glossary

Manifest - metadata table submitted for datasets

Summary

SCHEMATIC is an acronym for Schema Engine for Manifest Ingress and Curation. The Python based infrastructure provides a novel schema-based, metadata ingress ecosystem, that is meant to streamline the process of biomedical dataset annotation, metadata validation and submission to a data repository for various data contributors.

Documentation

https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/2967568387/Guide+How+to+use+Schematic+for+Data+Model+Development#About

Code in Github

https://github.com/Sage-Bionetworks/schematic

Installation

https://pypi.org/project/schematicpy/

pip install schematicpy

...

Install for data curator app:

Code Block
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install schematicpy

Setup Python Environment

Schematic will run on Python 3.10. We must control the Python Environment. PyEnv is one option., https://fathomtech.io/blog/python-environments-with-pyenv-and-vitualenv/

Code Block
pyenv install 3.10.11
pyenv virtualenv 3.10.11 schematic_3_10_11
pyenv activate schematic_3_10_11
pyenv -m pip install schematic_3_10_11
pip install schematicpy

Edit Configuration

The following parameters need to be set in the config.yml

https://github.com/Sage-Bionetworks/schematic/blob/develop/config.yml

Using Schematic

Command Line Reference

https://sage-schematic.readthedocs.io/en/develop/cli_reference.html

...

The JSON-LD schema follows the specifications from Schema.org way of specifying for attributes.

https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/2473623559/The+Data+Model+Schema#A.-Schema-properties-and-relationships /wiki/spaces/SCHEM/pages/2967568387

Schematic DB

https://linkmlsagebionetworks.jira.iocom/linkmlwiki/intro/tutorial.html
https://docs.google.com/spreadsheets/d/1vDdcqt3Lgehyq1iCnlF1H9JZi63pLj-u/edit#gid=1939820452
https://portal.includedcc.org/dashboard
https://linkml.io/schemasheets/#examples
https://docs.google.com/spreadsheets/d/1w6zDfz3_yrCjjrqfpXBGNmd0LZL4B03gr1KfzJtk5Cs/edit#gid=674286209 spaces/SCHEM/pages/2473623559/The+Data+Model+Schema#Schemas-and-Schematic-DB

Schematic DB is a package used to ingress the manifests created by Schematic into a database.

  • Schematic DB will use any of these validation rules:

    • str

    • float

    • num

    • int

    • date

    • If the attribute has none of the above rules it use a string type

    • the attribute datatype will be determined based on the rule

Build a Data Model

https://docs.google.com/presentation/d/129pSx58qDm7Y1OQmSSHKDq6tsoD3pW_gDRNXiX2rd0w/edit#slide=id.g4d21a8c2bag13aaf3b8358_0_110

Documentation/wiki/spaces/SCHEM/pages/2453176326

/wiki/spaces/SCHEM/pages/2458419217

Install Schematic

Schematic will run on Python 3.10. We must control the Python Environment. PyEnv is one option., https://fathomtech.io/blog/python-environments-with-pyenv-and-vitualenv/

Code Block
pyenv install 3.10.11
pyenv virtualenv 3.10.11 schematic_3_10_11
pyenv activate schematic_3_10_11
pyenv -m pip install schematic_3_10_11
pip install schematicpy

Data model visualizer?

...

2473623559

Recommendations

  • Draw a diagram for data model

  • Lucid.app - can use templates like ERD example

  • Start small - skeleton --> schema

  • Schema visualization tools?

  • Useful reference when building

  • Start from single table

  • Use schematic in dev mode to convert model to JSON-LD regularly to check for errors

Model Requirements

The data model requires these columns:

  1. Attribute

  2. Description

  3. ValidValues

  4. DependsOn

  5. required

  6. source

  7. parent

  8. properties

  9. dependsOnComponent

Data Model Validation

/wiki/spaces/SCHEM/pages/24736235592645262364Data

Example Model

...

https://docs.googlegithub.com/presentation/d/129pSx58qDm7Y1OQmSSHKDq6tsoD3pW_gDRNXiX2rd0w/edit#slide=id.g13aaf3b8358_0_0

Diagramming - draw out model

Lucid.app - can use templates like ERD example

Can reference diagram when building data model

Schema visualization tool ( data viz collaboration opportunity Rich!!)

Start small - skeleton --> schema

Definitions on /wiki/spaces/SCHEM/pages/2473623559

Manifest - metadata table submitted for datasets
Data Model -
Data Schema -

Start from single table

CSV with basic column set: Attribute, Description, ValidValues, DependsOn, required, source, parent, properties, dependsOnComponent, validationRules

Use schematic in dev mode to conver model to JSON-LD regularly to check for errors

/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.csv

Lref gdrive file
urlhttps://docs.google.com/spreadsheets/d/1Wde5YBFtEa4GhO-smXgbVApGioBGNnc-95n4LY8YB_E/edit#gid=925738608

https://ontofox.hegroup.org/

...

`schematic model --config config.hyml submit --manifest_path manifest.csv --datset_id synId -- manifest_record_type table

command line reference

Data Model Visualization

https://linkml.io/linkml/intro/tutorial.html
https://docs.google.com/spreadsheets/d/1vDdcqt3Lgehyq1iCnlF1H9JZi63pLj-u/edit#gid=1939820452
https://portal.includedcc.org/dashboard
https://linkml.io/schemasheets/#examples
https://docs.google.com/spreadsheets/d/1w6zDfz3_yrCjjrqfpXBGNmd0LZL4B03gr1KfzJtk5Cs/edit#gid=674286209
https://docs.google.com/presentation/d/129pSx58qDm7Y1OQmSSHKDq6tsoD3pW_gDRNXiX2rd0w/edit#slide=id.g4d21a8c2ba_0_11

/wiki/spaces/SCHEM/pages/2453176326

/wiki/spaces/SCHEM/pages/2458419217

JSON for Linking Data JSON-LD

...