Data Model Workflow
This page describes the workflow required to build, edit, and update the data model for MODEL-AD.
- 1 Schematic
- 1.1 Summary
- 1.2 Documentation
- 1.3 Code in Github
- 1.4 Installation
- 1.5 Edit Configuration
- 1.6 Using Schematic
- 2 Data Model Development
- 3 Data Curator App
- 4 Projects
- 4.1 Folder Structure
- 4.2 Study Content
- 4.3 AMP-AD
- 4.4 MODEL-AD
- 4.5 ELITE
- 4.6 ADKP example
- 4.7 Templates
- 5 Resources
- 6 Glossary
Schematic
Summary
Data Modeling at Sage requires using two in-house tools: Schematic and the Data Curator App (DCA). SCHEMATIC is an acronym for Schema Engine for Manifest Ingress and Curation. The Python based tool is a schema-based, metadata ingress ecosystem, intended to streamline of biomedical dataset annotation, metadata validation and submission to a data repository for various data contributors.
Documentation
https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/2967568387
Code in Github
https://github.com/Sage-Bionetworks/schematic
Installation
https://pypi.org/project/schematicpy/
Install for data curator app:
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install schematicpySetup Python Environment
Schematic will run on Python 3.10. We must control the Python Environment. PyEnv is one option., https://fathomtech.io/blog/python-environments-with-pyenv-and-vitualenv/
pyenv install 3.10.10
pyenv virtualenv 3.10.10 py_3_10_10
pyenv activate py_3_10_10
pip install schematicpyEdit Configuration
The following parameters need to be set in the config.yml
https://github.com/Sage-Bionetworks/schematic/blob/develop/config.yml
Using Schematic
Command Line Reference
https://sage-schematic.readthedocs.io/en/develop/cli_reference.html
Need to run commands from ~/schematic
Data Model Development
A data model defines attributes (i.e. data elements) describing metadata associated with any given dataset type. The data model also describes relationships between these attributes.
Documentation
https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/2473623559
Create Data Model
https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/2967568387/How+to+use+Schematic+for+Data+Model+Development#Create-a-Data-Model
The data model is defined in a table, then stored (i.e. serialized) in a JSON-LD schema which specifies attributes as suggested by Schema.org.
https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/2473623559
https://github.com/adknowledgeportal/data-models
Sage Data Models for Reference
Recommendations
Draw a diagram. A diagram is a useful reference when developing the model.
Start small with a basic skeleton and then build.
Use schematic in dev mode to convert model to JSON-LD regularly to check for errors
Requirements
The data model requires these columns:
AttributeDescriptionValidValuesDependsOnrequiredsourceparentpropertiesdependsOnComponent
Example Model
Github: https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.csv
Formatted for readability:
This model does NOT validate as provided.
Schematic DB
Schematic DB is a package used to ingress the manifests created by Schematic into a database.
Schematic DB will use any of these validation rules:
str, float, num, int, dateIf no rule provided, defaults to a string type
the attribute datatype is based on the rule
Data Model Validation
https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/2645262364