Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Background & Motivation

See Meredith Slota (Unlicensed)'s 1-pager and the epic issue:

 

Jira Legacy
serverSystem JIRA
serverIdba6fb084-9827-3160-8067-8ac7470f78b2
keyPLFM-4063

In summary, we should aim to do these now:

...

  1. Must transition to a new DOI provider, DataCite (necessary, old provider is discontinuing service)
  2. Overhaul how we handle asynchronous requests to manage a DOI to match how we handle other asynchronous requests
  3. Overhaul how we handle metadata submission to comply with the new provider's standards
    1. At the moment, we submit a lot of "dummy' information with no way to change it, which has caused practical issues for users who utilize DOIs

And prepare to do the following in the future:

  1. Be able to easily extend the metadata interface
    1. Submission of metadata can be extended to include optional fields
    2. DataCite occasionally updates their metadata schema. If we can easily adjust to newer schemas as they are released, then we can more easily avert the risk of using a deprecated schema.

Current API + Notes for change

...

  1. Can enhance usability by allowing clients to set and update descriptive metadata stored by the DOI provider.
  2. Can redesign how we handle DOI creation to better fit with existing software design paradigms in Synapse (including changing our API, handling asynchronous workers differently, etc.)

Current API 

We should strongly consider deprecating these calls in favor of the new API, see below section 'API Additions' 

URL
HTTP Type
Description
Response Object
Notes
/entity/{id}/doiPUTCreates a DOI for the specified entity. The DOI will associated with the most recent version where applicable.Doi
Deprecate in favor of corresponding POST
/entity/{id}/version/{versionNumber}/doiPUTCreates a new DOI for the specified entity version.Doi
Deprecate, ditto
/entity/{id}/doiGETGets the DOI status for the specified entity.Doi
Maintain, as this will not require a call to another service and should be relatively quick
/entity/{id}/version/{versionNumber}/doiGETGets the DOI status for the specified entity version.Doi
Maintain, ditto

Proposed

API

Changes

To be revised

Object changes:
  • Creation of DoiMetadata object
    • This object abstracts DataCite's more complex and granular metadata API by including only fields that are required and cannot be automatically populated.
    • This object is easily extensible to further support DataCite's metadata schema to include optional fields or the introduction of new required fields
  • Doi and DoiMetadata are proposed to be uncoupled because
    • We store the data in Doi objects; it allows us to quickly identify if a DOI has been registered and report that to the client
    • We do NOT store the data in the DoiMetadata objects; this is stored by the DOI provider and retrieved when necessary
      • Caching this data seems unintuitive; retrieving this data should only be expected when a user considers updating it (see notes in table below), which requires the external service to be available anyways.

Image Removed

DataCite-imposed constraints
  • new Doi (v2) data transfer object that implements new interface DataciteMetadata and new class DoiAssociation
    • DataciteMetadata
      • Separates DOI metadata (that is stored by DataCite) from Synapse-associated metadata
      • The client never uses this object, but it clearly delineates metadata that DataCite stores and metadata Synapse stores (in DoiAssociation)
    • DoiAssociation
      • Stores data that describes a Synapse object's association with a DOI object (like the object ID, type, version, etag, etc.)
      • Client can request this object to quickly get a DOI without relying on DataCite service availability to collect additional metadata.
      • We can return the DOI URI and URL to the client (currently, the URI is "guessed" by the client)
    • By using Doi (v2):
      • The client can supply and update metadata
      • We have the option to extend support to additional metadata fields
      • We decouple our API URI from /entity/{id}
      • We have the option to extend support to mint DOIs for non-entity objects in Synapse
  • Creation of DoiRequest object to retrieve a DOI and all of its metadata

The new fields in object is a direct mapping of the most recent version (v4.1) of DataCite's metadata schema, simplified to contain only required fields and a small amount of optional fields that we curate. This ensures that we don't need to deprecate our API if we wish to support more of their optional metadata fields. If we do wish to support new optional metadata fields, we can easily extend our object.

Similarly, this is likely to simplify future transitions if DataCite deprecates the schema that we configure to use.

Image Added

DataCite-imposed constraints for implementing clients to consider:

  • There cannot be more than 8000-10000 creators
  • Publication year must be in 'YYYY' format (regex: /[\d]{4}/)
  • The creators should There must be at least one creator
    • Each creator must have a creatorName that is at least 1 character long
    • nameIdentifier is not required, but if an identifier is provided, the scheme must also be provided
  • There must be at least one title
    • The title should be at least one character long

...

  • There must be a resourceTypeGeneral
    • One of the following
      • Audiovisual
      • Collection
      • Data Paper
      • Dataset
      • Event
      • Image
      • Interactive Resource
      • Model
      • Physical Object
      • Service
      • Software
      • Sound
      • Text
      • Workflow
      • Other

API Additions

URLHTTP VerbDescriptionRequest ObjectResponse ObjectNotes
/entity
/
{id}/
doi/async/startPOST

Asynchronously create or update a DOI. If the DOI does not exist, start a DOI creation job that will attempt to register a DOI with the DOI provider with the supplied metadata. If the DOI does exist, then it will simply update the DOI with the supplied metadata.

Note: The caller must have the ACCESS_TYPE.UPDATE permission on the Entity to make this call.

DoiMetadata

Doi 

(application/json)

AsyncJobId

Shift the work to an asynchronous worker queue (as we have been doing with other asynchronous services).

We combine the create and update calls because they require the same information and are both idempotent.

The workflow for the business logic required to register and

The Doi object must contain all fields required to mint or update a DOI with DataCite

is similar

.

If no DoiMetadata object is provided, we may choose to submit "N/A" fields (pending discussion on if this is appropriate)

/
entity/{id}/version/{versionNumber}/
doi/async/
startPOST

Ditto;

For a specific entity version

DoiMetadataAsyncJobIdDitto/entity/{id}/doi/async/
get/{asyncToken}GET

Asynchronously get the results of a DOI transaction started with POST /

entity/{id}/

doi/async/start

Note: When the result is not ready yet, this method will return a status code of 202 (ACCEPTED) and the response body will be a AsynchronousJobStatus object.

None

AsynchronousJobStatus

Doi

After
Get the status of the asynchronous job
completes, this should be identical in function to the existing GET calls./entity/{id}/version/{versionNumber}/ doi/async/get/{asyncToken}GET

Ditto;

For a specific entity version

None

AsynchronousJobStatus

Doi

Ditto/entity/{id}/doi/metadataGET

Get the metadata associated with a DOI Object, if it exists.

Note: The caller must have the ACCESS_TYPE.UPDATE permission on the Entity to make this call.

NoneDoiMetadata

Can be used to populate the metadata fields of an object that has a DOI, since that data is stored on DataCite.

We restrict this to users that can update the data because it should only be used for update purposes; if an unprivileged user wants to retrieve the metadata for an object, they should use the DOI provider's public API.

Note: the Doi object has a DoiStatus field, we need to evaluate how that should be handled with asynchronous workers (we would probably just deprecate that field in favor of using AsynchronousJobStatus).

Internal design notes

...

. If complete, returns the Doi object created by the job.
/doiGET

Get a Doi object (including associated metadata) for the object referred to in DoiRequest, if the Doi object exists.

None (supply required path parameters id, objectType, and optional objectVersion)

Doi

This call relies on availability of DataCite's API to return the metadata, as we do not store it.

/doi/associationGETGet a DoiAssociation object that contains the DOI of an object referred to in DOI.None (supply required path parameters id, objectType, and optional objectVersion)DoiAssociation

By making this call, a client can get the DOI of an object without certain metadata that relies on availability of DataCite's API.

/doi/locateGETRedirect to a specific object in the web portal, or a "Not found" page if the object cannot be located.None (supply required path parameters id, objectType, and optional objectVersion)Redirect

This redirects a URL that we supply to DataCite to the current URL of an associated object.

This allows us to register a more "permanent" URL with DataCite in case the URL to an object changes. For example, http://synapse.org/#!Synapse:syn123 may change and become http://sage.gov/syn123. This service can be updated to support that and as a result, the DOI does not break.

Internal design notes

API Choice

DataCite has two APIs that we can use. They have a standard "MDS" API that they recommend for users, and they have a new (but also seemingly temporary) "EZ" API that is designed for orgs like us who are transitioning from EZID.

For the sake of not having to do more work later, we are opting to not use DataCite's temporary EZ API that is designed to mock the EZID API.

Instead, we will be using their standard MDS API, as we would need to transition to it eventually anyways.

HTTP Client

The current EZID client interfaces with EZID using now-deprecated implementations of Apache's HTTP client. We will replace this client with a new client that will use our SimpleHttpClient to make requests to the DataCite MDS API.

CRUD Workflows

With the MDS API the basic workflow to create a DOI is to

...

Simply updating the metadata requires just step 1. Both of the above calls are idempotent, so we can combine create and update calls and simply treat the implementation as a create. This would simplify the implementation, though an unnecessary outgoing PUT call would made when existing DOIs are updated.

Retrieval and

...

Translation of Metadata

DataCite requires that new DOIs have associated metadata that adheres to a schema that they revise occasionally. The current version of their schema is v4.1 (Oct 2017). Metadata that we have registered through EZID is adherent to v2.2 (June 2011), v3, and v4. It is unclear if/when DataCite will deprecate v2.2 and no longer accept it. In another attempt to future-proof our DOI minting service, we will only submit metadata adherent to v4.1.

As a result of this, we must be able to retrieve metadata adherent to schemas 2.2+ in order for the client to update it. We can create a translator tool to convert data from these schemas to an intermediate object (see DoiMetadata new fields in Doi above) that can hold the appropriate metadata. This object can be translated into the most recent version of the schema. The client can retrieve and submit this object by interfacing with our API 

...

Work to do

Outside of our control

DataCite has yet to approve us and give us a registration account. This should happen soon, at which point we have ~3 months to shift to the new providernow approved us to transition to their service, so we should coordinate to transition as soon as possible.

Platform

  • Create, test, and implement a Datacite DataCite Java client that simplifies creating/updating DOIs and their metadata.
    • Should begin as soon as we agree upon the Agree on API
    • This can be done without coordination if we preserve existing behavior, but it would be much easier if we create this client intending to only support proposed and agreed-upon behavior.
    • Implementation can use test credentials until we are ready to switch to DataCite in prod
  • Create and route new API changes

Clients

  • Support asynchronous API + metadata submission
    • Can begin as soon as we agree upon the API
    • Can implement as soon as it is tested and implemented on backend

UX

  • User-facing design of DOI minting process and metadata submission

Questions that need Input

  • Should we permit creating DOIs for any object? Or just entities?
    • Shifting to support DOI non-entity objects is non-trivial but it would be easier to support them sooner than later
  • Schema enforcement
    • Should we force users to provide required metadata to mint a DOI?
      • Permit and submit no metadata (this is currently the only way to mint a DOI in Synapse)
        • In Datacite, it is only possible to mint a DOI without metadata with the temporary EZID bridge API i.e. it will likely not be possible to mint without metadata in the near future
      • Should we allow users to not supply required metadata? We could fill required fields with "mock" data. (For example, permit submitting a blank author field, and then the backend can submit  "(Author not available)" to Datacite as we currently do)
      • One required metadata field is ResourceTypeGeneral, which has specified categories for the type of resource a DOI refers to. Should we omit categories of resources that are likely not in Synapse? Like "Audiovisual" or "Physical Object". There is no technical benefit of excluding these fields.
    • Future feature expansion: which recommended/optional metadata fields should we permit or require?
      • Synapse could theoretically support all metadata fields, but for scope/UX reasons, maybe we shouldn't. Input from UX, users, anyone would be helpful.
  • Which fields should be immutable? 
    • DOI ID (this can actually be retrieved from the API call rather than the request body, so the client doesn't need to worry about this)
    • Publisher: "Synapse"
    • Publication Year?
    • Do we hide these from the client, or just automatically overwrite them if they try to change them?

Mockups

TBD

Misc. Notes

...

  • What to show users when reaching
    • embargo/restricted pages?
    • tombstone pages (deleted content)?

Mockups

Mockup of what the DOI minting form could look like. This form could be presented when a user clicks "Mint DOI for this Project/File" (as they currently do)

Image Added

More mockups may be added later

Points of Discussion

  • General feedback
  • Proposed alterations
  • Transition steps
  • Displaying metadata/contact information on deleted/restricted pages
    • Tombstone pages
    • Embargo pages
    • Perhaps we could show this on a 'Not Found' page when referred by the /doi/locate API?

Optional fields we can consider supporting:

Expand
titleCreator sub-fields
  • Name type (organizational or personal)
  • Given Name; Family Name
  • Name Identifiers like ORCID and ISNI
  • Affiliation
  • Title sub-field: title type (one of: alternative title, subtitle, translated title)
Expand
titleContributor - The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource.
  • Contributor type - mandatory. One of:
    • ContactPerson
    • DataCollector
    • DataCurator
    • DataManager
    • Distributor
    • Editor
    • HostingInstitution
    • Producer
    • ProjectLeader
    • ProjectManager
    • ProjectMember
    • RegistrationAgency
    • RegistrationAuthority
    • RelatedPerson
    • Researcher
    • ResearchGroup
    • RightsHolder
    • Sponsor
    • Supervisor
    • WorkPackageLeader
    • Other
  • Contributor name (mandatory)
  • Name type (organizational/personal)
  • Given name; family name
  • Name identifiers
  • Affiliation


  • Subject - Subject, keyword, classification code, or key phrase describing the resource.
Expand
titleDate
  • Mandatory dateType. One of:
    • Accepted
    • Available
    • Copyrighted
    • Collected
    • Created
    • Issued
    • Submitted
    • Updated
    • Valid
    • Other
  • Date information
  • Language
  • AlternateIdentifier. This may be used for local identifiers
    • alternateIdentifierType - mandatory
Expand
titleRelatedIdentifier


Expand
titleRelatedIdentifierType - mandatory. One of:
  • ARK
  • arXiv
  • bibcode
  • DOI
  • EAN13
  • EISSN
  • Handle
  • IGSN
  • ISBN
  • ISSN
  • ISTC
  • LISSN
  • LSID
  • PMID
  • PURL
  • UPC
  • URL
  • URN


Expand
titleRelationType - mandatory. One of:
  • IsCitedBy
  • Cites
  • IsSupplementTo
  • IsSupplementedBy
  • IsContinuedBy
  • Continues
  • IsDescribedBy
  • Describes
  • HasMetadata
  • IsMetadataFor
  • HasVersion
  • IsVersionOf
  • IsNewVersionOf
  • IsPreviousVersionOf
  • IsPartOf
  • HasPart
  • IsReferencedBy
  • References
  • IsDocumentedBy
  • Documents
  • IsCompiledBy
  • Compiles
  • IsVariantFormOf
  • IsOriginalFormOf
  • IsIdenticalTo
  • IsReviewedBy
  • Reviews
  • IsDerivedFrom
  • IsSourceOf
  • IsRequiredBy
  • Requires


  • Size
  • Format
  • Version
  • Rights (include embargo information if applicable)
  • Description
    • Description Type (mandatory). One of:
      • Abstract
      • Methods
      • SeriesInformation
      • TableOfContents
      • TechnicalInfo
      • Other
  • GeoLocation (and many subfields)
  • Funding Reference
    • Funder Name
    • Funder Identifier
    • Funder Identifier Type
    • Award Number
    • Award Title