...
Table of Contents |
---|
Introduction
Most of the basic objects that Synapse currently supports are Entities. Each Entity has first class data that makes up the fields of an entity. All Entities also have Annotations that store additional data about an entity.
The following are the current Synapse Entities:
- Project
- Folder
- Dataset
- Layer
- Location
- EULA
Currently, all Entities are defined by "hard-coded" Java objects. The fields of these Java objects define the first class data of each entity. The only mechanism we have for constraining data of an Entity is to write Java code to do the validation. We also lack a mechanism to constrain or define annotations.
While defining entities using Java allowed us to quickly get a first version of Synapse built, we always planed on supporting a more dynamic approach to object definitions. Ideally we would like our users to define entities without writing Java code. As it stands now if our users want to add a field to an entity, an engineering task must be scheduled to get the change implemented. In theory, if we used a schema like JSON Schema 03, for both entity definitions and data constraints, we could make changes to schema with little or no engineering effort. Engineering would no longer be the bottle-neck for the evolution of Synapse Entities and data.
Proposal
We are proposing to use JSON Schema 03 to define both an Entity type. The JSON Schema breaks an object definition into two major categories; properties and additional properties.
An example JSON Schema that describes products might look like:
Code Block |
---|
{
"name":"Product",
"properties":{
"id":{
"type":"number",
"description":"Product identifier",
"required":true
},
"name":{
"description":"Name of the product",
"type":"string",
"required":true
},
"price":{
"required":true,
"type": "number",
"minimum":0,
"required":true
},
"tags":{
"type":"array",
"items":{
"type":"string"
}
},
"releaseStatus":{
"type":"string",
"description":"The release status of a product",
"enum":[ "PROTOTYPE", "RELEASED", "RECALLED", "DEPRECIATED"]
}
},
# not used...
"additionalProperties":{
}
}
{code}
|
In
...
the
...
above
...
example,
...
we
...
can
...
seen
...
an
...
how
...
various
...
types
...
of
...
data
...
can
...
be
...
defined
...
for
...
a
...
Product
...
using
...
the
...
JSON
...
Schema.
...
For
...
example,
...
"id"
...
is
...
a
...
number
...
and
...
required,
...
while
...
"releaseStatus"
...
is
...
an
...
enumeration
...
of
...
strings.
...
We
...
are
...
proposing
...
to
...
use
...
the
...
"properties"
...
to
...
define
...
the
...
primary
...
fields
...
of
...
a
...
Synapse
...
Entity.
...
These
...
primary
...
fields
...
can
...
be
...
considered
...
the
...
expected
...
data
...
of
...
all
...
instances
...
of
...
a
...
given
...
entity.
...
Using
...
the
...
Product
...
example
...
from
...
above,
...
this
...
implies
...
that
...
all
...
instances
...
of
...
Product
...
would
...
have
...
"id",
...
"name",
...
"price"
...
and
...
"tags.
...
Initially
...
we
...
were
...
planning
...
to
...
use
...
"additinalProperties"
...
to
...
define
...
the
...
Annotations
...
of
...
a
...
Synapse
...
Entity,
...
but
...
this
...
raised
...
a
...
fundamental
...
issue.
...
If
...
the
...
Annotations
...
of
...
an
...
entity
...
are
...
provided
...
for
...
ad-hock
...
user
...
data,
...
then
...
formally
...
defining
...
them
...
in
...
the
...
entity
...
schema
...
for
...
all
...
instances
...
of
...
a
...
type
...
seems
...
like
...
a
...
poor
...
fit.
...
That
...
said,
...
we
...
still
...
have
...
many
...
use
...
cases
...
where
...
we
...
want
...
to
...
constrain
...
the
...
data
...
of
...
an
...
annotation
...
when
...
they
...
are
...
added
...
to
...
an
...
instance
...
of
...
an
...
entity.
...
Therefore,
...
we
...
are
...
positioning
...
that
...
these
...
annotation
...
types
...
are
...
set
...
on
...
a
...
per-instances
...
basis
...
rather
...
than
...
at
...
the
...
entity
...
schema
...
level.
...
Annotation
...
types
...
are
...
covered
...
in
...
a
...
separate
...
document:
...
...
...
...
Schema Life-cycle
...
For
...
the
...
initial
...
implementation
...
we
...
are
...
proposing
...
that
...
an
...
Entity
...
Schema
...
can
...
only
...
be
...
defined
...
and
...
edited
...
as
...
part
...
of
...
the
...
compile
...
of
...
synapse.
...
This
...
means
...
run-time
...
edits
...
or
...
additions
...
to
...
each
...
schema
...
will
...
not
...
be
...
possible.
...
The
...
reason
...
for
...
this
...
limitation
...
is
...
to
...
keep
...
the
...
Life-cycle
...
of
...
the
...
schema
...
as
...
simple
...
as
...
possible.
...
As
...
we
...
will
...
see,
...
the
...
life-cycle
...
is
...
already
...
complicated
...
even
...
with
...
this
...
limitation.
...
Define
...
Entities
...
A
...
new
...
entity
...
will
...
be
...
created
...
by
...
first
...
creating
...
a
...
new
...
JSON
...
text
...
file
...
in
...
the
...
lib-auto-generated
...
project's
...
src/main/resources
...
folder.
...
Folder
...
hierarchies
...
should
...
be
...
used
...
to
...
represent
...
the
...
equivalent
...
of
...
"packages"
...
for
...
each
...
entity.
...
The
...
following
...
example
...
show
...
where
...
an
...
Example
...
entity
...
might
...
be
...
created:
...
Code Block |
---|
/lib-auto-generated/src/main/resource/org/sagebionetworks/entity/type/Example.json
{code}
|
Lets
...
say
...
we
...
also
...
want
...
to
...
define
...
an
...
Annotation
...
type
...
and
...
use
...
it
...
to
...
help
...
define
...
our
...
Example.json.
...
This
...
annotation
...
type
...
definition
...
JSON
...
text
...
file
...
might
...
be
...
created
...
in
...
the
...
following
...
location:
...
Code Block |
---|
/lib-auto-generated/src/main/resource/org/sagebionetworks/annotation/types/VertebrateOrganType.json
{code}
|
Before
...
we
...
look
...
at
...
the
...
definition
...
of
...
our
...
Example.json
...
let's
...
first
...
look
...
at
...
the
...
definition
...
of
...
our
...
new
...
VertebrateOrganType.json.
...
For
...
this
...
example
...
we
...
want
...
to
...
use
...
the
...
...
...
...
ontology
...
to
...
define
...
the
...
valid
...
values
...
for
...
Organs:
...
VertebrateOrganType.json
...
Code Block |
---|
{
"type":"string",
"format":"uri",
"enum":["XQUERY":
"doc(http://rest.bioontology.org/bioportal/concepts/4531?conceptid=tbio:Organ&light=1&apikey=2fb9306a-7f3f-477a-821e-e3ccd7356a18)/success/data/classBean/relations/entry[string=Subclass]/list/classBean/fullId"
]
}
{code}
|
In
...
this
...
example,
...
the
...
enumeration
...
values
...
are
...
defined
...
by
...
an
...
XQuery
...
that
...
is
...
used
...
to
...
get
...
the
...
"fullId"
...
(URIs)
...
of
...
all
...
Sub-classes
...
of
...
the
...
Term
...
"Organ"
...
using
...
the
...
XML
...
returned
...
from
...
NCBO's
...
BioPortal
...
Term
...
services.
...
Here
...
is
...
the
...
XML
...
returned
...
by
...
the
...
term
...
service
...
for
...
this
...
exampl:
...
...
.
...
Assuming
...
the
...
XQuery
...
is
...
setup
...
correctly,
...
the
...
effective
...
enum
...
definition
...
for
...
this
...
type
...
would
...
be"
...
Code Block |
---|
"enum":[
"http://www.co-ode.org/ontologies/basic-bio/basic-vertebrate-gross-anatomy.owl#Heart",
"http://www.co-ode.org/ontologies/basic-bio/basic-vertebrate-gross-anatomy.owl#Pericardium",
"http://www.co-ode.org/ontologies/basic-bio/basic-vertebrate-gross-anatomy.owl#Brain",
"http://www.co-ode.org/ontologies/basic-bio/basic-vertebrate-gross-anatomy.owl#Stomach",
"http://www.co-ode.org/ontologies/basic-bio/basic-vertebrate-gross-anatomy.owl#Lung",
"http://www.co-ode.org/ontologies/basic-bio/basic-vertebrate-gross-anatomy.owl#Liver",
]
{code]
|
Now
...
that
...
we
...
have
...
defined
...
an
...
Annotation
...
Type
...
for
...
Organ
...
using
...
the
...
ontology
...
we
...
can
...
use
...
this
...
type
...
in
...
the
...
definition
...
of
...
the
...
entity.
...
Here
...
is
...
our
...
definition
...
of
...
our
...
example
...
Entity:
...
Example.json
...
Code Block |
---|
{ "extends":"org/sagebionetworks/entity/type/Entity.json" "name":"Product", "properties":{ "id":{ "type":"number", "description":"Example identifier", "required":true }, "name":{ "description":"Name of the Example", "type":"string", "required":true }, "organ":{ "$ref":"org/sagebionetworks/annotation/types/VertebrateOrganType.json" } }, } {code} |
The
...
first
...
thing
...
to
...
point
...
out
...
about
...
our
...
Example.json
...
is
...
that
...
it
...
extends
...
Entity.json,
...
which
...
makes
...
it
...
a
...
Synapse
...
Entity.
...
This
...
implies
...
it
...
inherits
...
all
...
of
...
its
...
values
...
from
...
the
...
base
...
Entity.
...
The
...
second
...
thing
...
to
...
point
...
out
...
is
...
that
...
the
...
"organ"
...
property
...
is
...
defined
...
using
...
the
...
annotation
...
type
...
we
...
created
...
earlier.
...
Compile
...
JPJOs
...
(first
...
time)
...
Since
...
we
...
still
...
want
...
Java
...
POJOs
...
to
...
represent
...
all
...
entities,
...
we
...
will
...
use
...
the
...
schema-to-pojo-maven-plugin
...
to
...
build
...
these
...
POJOs.
...
This
...
is
...
done
...
by
...
simply
...
added
...
the
...
following
...
to
...
the
...
lib-auto-generated/pom.xml
...
file:
...
Code Block |
---|
<\!-\- This plugin builds the POJOs from JSON schemas. \--> <plugins> <plugin> <groupId>org.sagebionetworks</groupId> <artifactId>schema-to-pojo-maven-plugin</artifactId> <version>${schema-to-pojo.version}</version> <executions> <execution> <goals> <goals> <goal>generate</goal> </goals> <configuration> <sourceDirectory>src/main/resources</sourceDirectory> <packageName>org.sagebionetworks</packageName> <outputDirectory>target/auto-generated-pojos</outputDirectory> </configuration> </execution> </executions> </plugin> </plugins> {code} |
The
...
plugin
...
will
...
automatically
...
create
...
a
...
POJOs
...
class
...
for
...
each
...
JSON
...
schema
...
found
...
in
...
the
...
resource
...
directory.
...
These
...
POJOs
...
will
...
be
...
placed
...
in
...
the
...
target/auto-generated-pojos
...
directory.
...
Synapse
...
Deploy
...
(first
...
time)
...
The
...
first
...
time
...
Synapse
...
is
...
deployed
...
after
...
creating
...
Entities,
...
the
...
org.sagebionetworks.repo.model.bootstrap.EntityBootstrapper
...
will
...
read
...
all
...
JSON
...
schema
...
files
...
found
...
in
...
the
...
lib-auto-generated.jar
...
file
...
and
...
create
...
a
...
Synapse
...
SchemaEntity
...
(to
...
be
...
defined)
...
for
...
each
...
using
...
the
...
directory
...
structure
...
create
...
each
...
path.
...
All
...
schema
...
entities
...
will
...
be
...
placed
...
in
...
the
...
folder:
...
Code Block |
---|
root/schemas
{code}
|
The
...
resulting
...
SchemaEntity
...
objects
...
from
...
the
...
two
...
examples
...
above
...
would
...
have
...
the
...
following
...
paths:
...
Code Block |
---|
root/schemas/org/sagebionetworks/entity/type/Example.json
root/schemas/org/sagebionetworks/annotation/types/BioOntologyTissueType.json
{code}
|
Folder
...
entities
...
will
...
be
...
created
...
as
...
need
...
to
...
create
...
each
...
path.
...
By
...
giving
...
each
...
SchemaEntity
...
a
...
unique
...
path,
...
we
...
can
...
use
...
this
...
path
...
to
...
reference
...
a
...
schema
...
before
...
we
...
have
...
an
...
entity
...
to
...
represent
...
it.
...
The
...
API
...
user
...
will
...
be
...
able
...
to
...
get
...
the
...
SchemaEntity
...
objects
...
but
...
they
...
will
...
be
...
READ-ONLY
...
copies.
...
This
...
is
...
important,
...
because
...
the
...
"truth"
...
of
...
each
...
entity
...
is
...
the
...
JSON
...
text
...
file
...
from
...
the
...
auto-generated-pojos
...
project.
...
Hopefully,
...
this
...
will
...
make
...
more
...
sense
...
as
...
the
...
rest
...
of
...
the
...
life-cycle
...
is
...
outlined.
...
Edit
...
of
...
an
...
Schema
...
Imagine
...
that
...
we
...
want
...
to
...
add
...
a
...
new
...
primary
...
field
...
to
...
our
...
Example.json
...
Entity.
...
To
...
do
...
this
...
we
...
need
...
to
...
modify
...
the
...
original
...
JSON
...
file
...
in
...
the
...
lib-auto-generated
...
Code Block |
---|
/lib-auto-generated/src/main/resource/org/sagebionetworks/entity/type/Example.json
{code}
|
We
...
want
...
to
...
add
...
a
...
new
...
required
...
primary
...
field
...
called
...
"status".
...
Since
...
"status"
...
is
...
required,
...
we
...
must
...
provide
...
a
...
default
...
value.
...
This
...
is
...
a
...
requirement
...
because
...
we
...
already
...
have
...
instances
...
of
...
Example
...
entities
...
deployed
...
to
...
Synapse,
...
and
...
each
...
of
...
these
...
must
...
be
...
given
...
a
...
default
...
value.
...
We
...
will
...
cover
...
how
...
these
...
default
...
values
...
are
...
applied
...
shortly.
...
Here
...
is
...
our
...
new
...
Example.json:
...
Example.json
...
Code Block |
---|
{ "extends":"org/sagebionetworks/entity/type/Entity.json" "name":"Product", "properties":{ "id":{ "type":"number", "description":"Example identifier", "required":true }, "name":{ "description":"Name of the Example", "type":"string", "required":true }, "status":{ "type":"string", "required":true, "enum":[ "PROTOTYPE", "RELEASED", "RECALLED", "DEPRECIATED"], "default":"PROTOTYPE" }, }, "additionalProperties":{ "tissue":{ "type":"object", "$ref":"org/sagebionetworks/annotation/types/BioOntologyTissueType.json" } } } {code} } h3.} |
Code Block |
---|
Compile
...
POJOs
...
(Nth
...
Time)
...
This
...
time
...
when
...
we
...
compile
...
the
...
new
...
Example.java
...
POJO,
...
the
...
resulting
...
POJO
...
will
...
have
...
a
...
new
...
field
...
called
...
"status"
...
with
...
a
...
default
...
value
...
of
...
"PROTYPE".
...
Backup
...
Deployed
...
Synapse
...
Before
...
we
...
can
...
deploy
...
our
...
update
...
schema
...
we
...
must
...
create
...
a
...
backup
...
of
...
the
...
deployed
...
Synapse.
...
See:
...
...
This is an important step. We will use this backup to deploy our changes to the repository.
Synapse Deploy (Nth Time)
Just like before, the bootstrap system will per-populate all SchemaEntites on the new empty repository. At this point we have an empty Synapse that is up-to-date
...
with
...
regard
...
to
...
the
...
current
...
schema.
...
Restore
...
Synapse
...
from
...
Backup
...
After
...
we
...
have
...
a
...
clean
...
repository,
...
we
...
can
...
restore
...
the
...
backup
...
from the earlier step. See: Repository+Administration
The restore daemon will start off by deleting all of the data in Synapse. It will then restore all entities including the SchemaEntites. One of the main tasks of the restore Daemon is to migrate data to the current version during the restoration process. This means we need to detect that a new property was added to the Example.json schema, and ensure that migrated Example entities have this new field with the default value.
Once all data has been migrated to the current schema the old EntitySchema entities can be replaced using the new JSON schemas from the lib-auto-generated.jar