Background & Motivation
See Meredith Slota (Unlicensed)'s 1-pager and the epic issue:
- PLFM-4063Getting issue details... STATUS
In summary, we
- Must transition to a new DOI provider
- Can enhance usability by allowing clients to set and update descriptive metadata stored by the DOI provider.
- Can redesign how we handle DOI creation to better fit with existing software design paradigms in Synapse (including changing our API, handling asynchronous workers differently, etc.)
Current API
We should strongly consider deprecating these calls in favor of the new API, below
URL | HTTP Type | Description | Response Object | Notes |
---|---|---|---|---|
/entity/{id}/doi | PUT | Creates a DOI for the specified entity. The DOI will associated with the most recent version where applicable. | Doi | Deprecate in favor of corresponding POST |
/entity/{id}/version/{versionNumber}/doi | PUT | Creates a new DOI for the specified entity version. | Doi | Deprecate, ditto |
/entity/{id}/doi | GET | Gets the DOI status for the specified entity. | Doi | Maintain, as this will not require a call to another service and should be relatively quick |
/entity/{id}/version/{versionNumber}/doi | GET | Gets the DOI status for the specified entity version. | Doi | Maintain, ditto |
Proposed Changes
Object changes:
- Extension of Doi object (and creation of child objects)
- Extensions serve two purposes
- Allow the client to supply and update metadata
- Decouple our API URI from /entity/{id}
- Additionally we should deprecate the doiStatus field (in favor of GET /doi/async/get/{asyncToken})
- Extensions serve two purposes
- Creation of DoiRequest object to retrieve a DOI and all of its metadata
- Creation of DoiId object to quickly retrieve a DOI without reliance on collecting metadata from DataCite's service
The new fields in object is a direct mapping of the most recent version (v4.1) of DataCite's metadata schema, simplified to contain only required fields and a small amount of optional fields that we curate. This ensures that we don't need to deprecate our API if we wish to support more of their optional metadata fields. If we do wish to support new optional metadata fields, we can easily extend our object.
Similarly, this is likely to simplify future transitions if DataCite deprecates the schema that we configure to use.
Existing fields in italics
DataCite-imposed constraints for implementing clients to consider:
- There cannot be more than 8000-10000 creators
- Publication year must be in 'YYYY' format (regex:
/[\d]{4}/
) - There must be at least one creator
- Each creator must have a creatorName that is at least 1 character long
- nameIdentifier is not required, but if an identifier is provided, the scheme must also be provided
- There must be at least one title
- The title should be at least one character long
- There must be a resourceTypeGeneral
API Additions
URL | HTTP Verb | Description | Request Object | Response Object | Notes |
---|---|---|---|---|---|
/doi/async/start | POST | Asynchronously create or update a DOI. If the DOI does not exist, start a DOI creation job that will attempt to register a DOI with the DOI provider with the supplied metadata. If the DOI does exist, then it will simply update the DOI with the supplied metadata. Note: The caller must have the ACCESS_TYPE.UPDATE permission on the Entity to make this call. | Doi (application/json) | Shift the work to an asynchronous worker queue (as we have been doing with other asynchronous services). We combine the create and update calls because they require the same information and are both idempotent. The Doi object must contain all fields required to mint or update a DOI with DataCite. | |
/doi/async/get/{asyncToken} | GET | Asynchronously get the results of a DOI transaction started with POST /entity/{id}/doi/async/start Note: When the result is not ready yet, this method will return a status code of 202 (ACCEPTED) and the response body will be a AsynchronousJobStatus object. | None | Get the status of the asynchronous job. If complete, returns the Doi object created by the job. | |
/doi | GET | Get a Doi object (including associated metadata) for the object referred to in DoiRequest, if the Doi object exists. | DoiRequest | Doi | This call relies on availability of DataCite's API to return the metadata, as we do not store it. |
/doi/id | GET | Get a DoiId object that contains the DOI of an object referred to in DOI. | DoiRequest | DoiId (reduction of Doi) | By making this call, a client can get the DOI of an object without relying on DataCite's API. |
Internal design notes
API Choice
DataCite has two APIs that we can use. They have a standard "MDS" API that they recommend for users, and they have a new (but also seemingly temporary) "EZ" API that is designed for orgs like us who are transitioning from EZID.
For the sake of not having to do more work later, we are opting to not use DataCite's temporary EZ API that is designed to mock the EZID API. Instead, we will be using their standard MDS API, as we would need to transition to it eventually anyways.
HTTP Client
The current EZID client interfaces with EZID using now-deprecated implementations of Apache's HTTP client. We will replace this client with a new client that will use our SimpleHttpClient to make requests to the DataCite MDS API.
CRUD Workflows
With the MDS API the basic workflow to create a DOI is to
- (POST) Register metadata (including the DOI symbol e.g. 10.####/syn01234)
- (PUT) Register the DOI symbol and tie it to a URL (synapse.org/#!Synapse:syn01234)
Simply updating the metadata requires just step 1. Both of the above calls are idempotent, so we can combine create and update calls and simply treat the implementation as a create. This would simplify the implementation, though an unnecessary outgoing PUT call would made when existing DOIs are updated.
Retrieval and Translation of Metadata
DataCite requires that new DOIs have associated metadata that adheres to a schema that they revise occasionally. The current version of their schema is v4.1 (Oct 2017). Metadata that we have registered through EZID is adherent to v2.2 (June 2011), v3, and v4. It is unclear if/when DataCite will deprecate v2.2 and no longer accept it. In another attempt to future-proof our DOI minting service, we will only submit metadata adherent to v4.1.
As a result of this, we must be able to retrieve metadata adherent to schemas 2.2+ in order for the client to update it. We can create a translator tool to convert data from these schemas to an intermediate object (see new fields in Doi above) that can hold the appropriate metadata. This object can be translated into the most recent version of the schema. The client can retrieve and submit this object by interfacing with our API
Required Involvement and Timeline
Outside of our control
DataCite has now approved us to transition to their service, so we should transition as soon as possible.
Platform
- Create, test, and implement a DataCite Java client that simplifies creating/updating DOIs and their metadata.
- Agree on API
- This can be done without coordination if we preserve existing behavior, but it would be much easier if we create this client intending to only support proposed and agreed-upon behavior.
- Implementation can use test credentials until we are ready to switch to DataCite in prod
- Create and route new API changes
Clients
- Support asynchronous API + metadata submission
- Can begin as soon as we agree upon the API
- Can implement as soon as it is tested and implemented on backend
UX
- User-facing design of DOI minting process and metadata submission
Mockups
Mockup of what the DOI minting form could look like. This form could be presented when a user clicks "Mint DOI for this Project/File" (as they currently do)
More mockups may be added later