Document toolboxDocument toolbox

TypeScript SDK Generation Using the Synapse OpenAPI Definition

The Synapse backend repository now publishes an OpenAPI definition that describes how a programmatic client interacts with the Synapse REST API. We aim to use the definition to generate TypeScript type definitions of the object models used by Synapse, as well as generate an HTTP client that handles issuing requests to the REST API endpoints, and uses the generated models. This is valuable because it reduces the amount of boilerplate code that client developers have to spend time writing, maintaining, and fixing when errors are inadvertently introduced.

Summary

While it is possible to generate a TypeScript SDK using the Synapse OpenAPI definition, key issues prevent the SDK from being more useful than manually writing and maintaining the mostly-boilerplate code required to programmatically interact with the Synapse REST API. The usability concerns relate to enhancing code quality and developer ease-of-use. Certain solutions to these issues may require changes to our OpenAPI translator, while other required changes may include curating unique identifiers for our backend controller methods.

Background

The OpenAPI Specification describes how an OpenAPI definition document should be written. When a document properly follows the specification, there are many tools that can be used to process the definition for various purposes, such as validation, documentation generation, and client SDK generation. This document shares findings from attempts to use the definition to generate a TypeScript SDK to use in our frontend web applications.

The Synapse REST OpenAPI definition is generated using a custom translator that builds the definition by referencing the Spring Controller implementations. Now that the definition is published, client developers can build and use tools that consume the definition. For more information about the Synapse OpenAPI definition, see

PLFM-7768 - Getting issue details... STATUS
.

Choosing an SDK Generator

There are many SDK generators for the OpenAPI specification. A list can be found here. In general, my criteria for choosing an SDK generator are

  • Regularly maintained

  • Support for more recent versions of the OpenAPI Specification

  • Prefer software that does not require sign-up or payment, avoid SaaS offerings

One of the most popular and most versatile options in that list is the OpenAPI Generator, which is highly configurable, supports many languages, has powerful template customization, and has an active GitHub community. This investigation has utilized the typescript-fetch generator in this project.

OpenAPI Generator also has an NPM shim that makes it easy to write a package in the synapse-web-monorepo that handles the entire codegen pipeline, and publish our generated client to the NPM repository.

Some generators that I tried and opted against using:

  • Kiota - Microsoft’s OpenAPI SDK generator does not yet support oneOf/allOf

  • Autorest - Also developed by Microsoft. Autorest was developed for Azure and it seems the general focus of maintenance will move to Kiota. Additionally, it seemed to be less configurable, so it would be more difficult to overcome certain challenges (outlined later) compared to the OpenAPI Generator.

  • NSwag - Popular C#/TypeScript client generator, challenging to configure on non-Windows machines and no official Docker image

Issues with Generated SDK

The following are some of the issues I have encountered and describe how each issue affects the usability of the generated SDK. I will indicate potential solution for each issue, and link to Jira issues that track relevant work.

Definition does not indicate required fields

Required fields are not indicated, even when they are semantically required. For example org.sagebionetworks.repo.model.principal.PrincipalAliasResponse always returns a principalId. However, principalId is not indicated as required in the specification.

The generated TypeScript interface indicates that the principalId field may be of type number or undefined:

export interface PrincipalAliasResponse { /** * * @type {number} * @memberof PrincipalAliasResponse */ principalId?: number; }

When client code handles a PrincipalAliasResponse object, we have to do an unnecessary null check on principalId to use it as a number. This is a common issue across many defined object types in the specification.

I think the only way to solve this is to manually (and gradually) add required properties to our schema models in lib-auto-generated, and ensure the required designation is included in the translated OpenAPI definition.

This may be more challenging to solve for types used in both requests and responses where certain fields may not be provided in requests to create a resource, but may always be required/provided in other contexts (e.g. object IDs generated by the system).

Without a solution, the generated TypeScript models would more challenging to use than our manually curated models.

Excessively long method names for API calls

The generator creates a method to send a request to the server for each (e.g. GET /repo/v1/entity/{id}) causes the generator to create a operationId also create request methods with excessively long names. GET /repo/v1/entity/{id}/table/transaction/async/get/{asyncToken} has operationId get-/repo/v1/entity/{id}/table/transaction/async/get/{asyncToken}. The generator creates the corresponding method getRepoV1EntityIdTableTransactionAsyncGetAsyncToken. We cannot strip the /(repo|file|auth)/v1/ path from all operationIds because removal of that substring leads to some collisions.

A few possible solutions:

  • Identify a new programmatic scheme for generating unique operationId values

  • Manually curate operationId values in the controllers and include the value in the translator. This is not feasible to do all at once, so these would be incrementally curated over time. The translator should also check the operationIds for uniqueness

The TypeScript client may be usable without a solution to this problem, but attempts to solve this problem will almost certainly result in breaking changes for generated client code.

Generated instanceOf... methods are not reliable

Generated instanceOf<Model> methods do not respect concreteType. The typescript-fetch generator creates currently only checks fields that are required.

As a partial solution I have been able to override a template in the typescript-fetch generator which can validate the values of required enumerations.

The OpenAPI definition would require these changes:

  1. The concreteType property must be listed as required in every object which includes it

  2. The concreteType definition must be defined as an enumeration, where the concrete type value is the sole valid enumeration value, e.g. the org.sagebionetworks.repo.model.Foo schema should contain:

    { "properties": { "concreteType": { "type": "string", "enum": ["org.sagebionetworks.repo.model.Foo"] } } }

The OpenAPI specification does not allow the JSON Schema keyword const, which may seem like a more natural fit here. A single enum value is equivalent to a const value, and is permitted by the OpenAPI specification.

Solving this issue is necessary to use the generated client, because our client code MUST be able to easily identify the concrete type of a data object, especially for polymorphic types.

Polymorphic types are missing the discriminator keyword

Synapse uses polymorphism for various types across the system. One example of this is the FileHandle interface. The current FileHandle definition is as follows:

{ "org.sagebionetworks.repo.model.file.FileHandle": { "type": "object", "properties": { "id": { "type": "string" }, "etag": { "type": "string" }, "createdBy": { "type": "string" }, "createdOn": { "type": "string" }, "modifiedOn": { "type": "string" }, "concreteType": { "type": "string" }, "contentType": { "type": "string" }, "contentMd5": { "type": "string" }, "fileName": { "type": "string" }, "storageLocationId": { "type": "integer", "format": "int32" }, "contentSize": { "type": "integer", "format": "int32" }, "status": { "type": "string" } }, "description": "The FileHandle interface defines all of the fields that are common to all implementations.", "oneOf": [ { "$ref": "#/components/schemas/org.sagebionetworks.repo.model.file.ExternalObjectStoreFileHandle" }, { "$ref": "#/components/schemas/org.sagebionetworks.repo.model.file.GoogleCloudFileHandle" }, { "$ref": "#/components/schemas/org.sagebionetworks.repo.model.file.ProxyFileHandle" }, { "$ref": "#/components/schemas/org.sagebionetworks.repo.model.file.ExternalFileHandle" }, { "$ref": "#/components/schemas/org.sagebionetworks.repo.model.file.S3FileHandle" } ] } }

When deserializing JSON fetched from the API, the generated client does not know which implementation of FileHandle to use. The generated TypeScript code attempts to include properties from all potential implementations. This is acceptable in TypeScript, but may not work for other languages.

If we append the discriminator property to the model schema, generators should be able to identify implementation schemas based on concreteType:

The discriminator property may also include a mapping object that maps discriminator values to model IDs. This may be unnecessary for the Synapse REST API because the concreteType discriminator values are identical to the IDs of the corresponding model schemas.

I do not think that this issue must be solved to use a generated TypeScript client, but that may change as we attempt to use the client in more complicated scenarios. Resolution of this issue may be required to generate client code for other languages.

Resolved issues

This section describes a issues I encountered, but identified solutions that do not rely on changes to the OpenAPI definition provided by the translator.

Excessively Long Model Interfaces

The current model names lead to having TypeScript interface names that are excessively long. Models use the Java canonical name, such as org.sagebionetworks.repo.model.RestrictionInformationRequest. The generator creates a corresponding model called OrgSagebionetworksRepoModelRestrictionInformationRequest.

One possible solution is to use openapi-generator’s model-name-mapping argument to replace the model names with the shortest unique name for each model. The shortest unique name for each model can be determined programmatically in the project that will run the generator, for example, if the system contained the following three models, we could map them to the following model names:

Canonical name

Unique model name mapping result

Canonical name

Unique model name mapping result

org.sagebionetworks.model.abc.Foo

Foo

org.sagebionetworks.model.abc.Bar

abc.Bar

org.sagebionetworks.model.xyz.Bar

xyz.Bar

This mapping should be sufficient for client SDK usage, and no change would be needed in the API or controller-to-specification translator.

Retry logic for Errors

Our manually-curated Synapse TypeScript client has built-in refetching logic based on the status code returned in the HTTP response. For example, if the service returns a 400 (Bad Request) error, we do not retry the request. If the service returns a 429 (Too Many Requests), 502 (Bad Gateway), 503 (Service Unavailable), or 504 (Gateway Timeout), then we retry the request with exponential backoff.

The generated client allows overriding the fetch implementation with any API-compatible drop-in. Our retry logic is already written to work this way, so we can use it in the client configuration generated at runtime.

Changing the endpoint to staging, dev, local instances of the backend

For the client to work with other backend stacks (staging, dev, local), the client must be configurable so requests can be sent to other endpoints.

The runtime configuration allows overriding a property called basePath, which provides this functionality.

Asynchronous job polling

The web client often has to monitor the status of asynchronous jobs, such as table queries, table updates, creation of DOIs, and many other features in Synapse. The web client accomplishes this by polling services that provide the status of an asynchronous job.

We can likely utilize our existing logic for polling asynchronous job status to use the generated client instead of our existing implementation.