Overview of the Data Model and Schema
The data model for this database is described using an entity-relationship diagram, below (for more on these diagrams, please check out this guide). This diagram shows the structure and organization of the Neurofibromatosis Research Tools Database. Each table (e.g. resource
) has a defined set of attributes (e.g. rrid
) that will be captured in that table. These tables are often linked to other tables via primary keys (PKs) and foreign keys (FKs) that define individual records in each table.
In addition to the data model diagram, a schema for this database is maintained in as a Google spreadsheet and regularly released as machine-readable JSON-LD files. Changes to the schema are tracked both in the Google spreadsheet changelog as well as the release notes.
Implementation of the Data Model
Our current plan is to use the schematic
Python package developed by Sage Bionetworks to generate the database tables and data intake spreadsheets (manifests).
The data will be housed on Synapse.org, and, if needed to perform certain database functions that are not currently supported by Synapse, as a relational database in our private cloud-computing environment, with the resulting data being returned to Synapse.
Data will be added to the database using schematic
-generated spreadsheets. We have generated draft data intake manifests using v1.0.1 of the schema:
Please feel free to navigate to File → Make a Copy to create a editable copy of these manifests to explore how they work.
Additional Resources
Data Model Diagram
Description: This is the source diagram of the Data Model for the NF Research Tool Database, generated and maintained in LucidChart (note: please reference the “New Layout Database Schema” sheet). The latest version of this diagram will be updated in this document with each update and release of the database schema.
Database Schema
Description: The machine-readable (JSON-LD on Github) and human readable (Google sheet) versions of the schemas encode the data model for the database as well as the dictionary for the attributes used in the database.
Link to latest version of schema in Github: https://github.com/nf-osi/nf-research-tools-schema
Link to latest version of schema in shared Google Drive: https://docs.google.com/spreadsheets/d/15fwIhZw7YfhPkzOfQhj6zU_5-_7ywS05-L8qOjMIVdM/edit#gid=0
Intake Manifests
Description: Draft data intake manifests that use v1.0.1 of the schema:
Contact Information
Please feel free to contact Robert Allaway (robert.allaway@sagebase.org) and Ashley Clayton (ashley.clayton@sagebase.org) with any questions or comments related to the Neurofibromatosis Research Tools Database, data model, or any of the resources linked in this document
Acknowledgements
Thank you to Brynn Zalmanek, James Eddy, Mialy DeFelice, Milen Nikolov, Kaitlin Throgmorton, and Robert Allaway at Sage Bionetworks for contributions to the development and implementation of the schema and data model, and to the Gilbert Family Foundation for valuable contributions and feedback.
Add Comment