Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Proposing 2 methods in which we can maintain a Validation Json Schema Index.

Option A

We can do a lazy update of the index. When a JSON schema is created or updated, we do not update our index. Instead, we only update when the index is used and the target schema is missing from the index. This access to the index will tell us that our schema is not in the index (either outdated or absent). In this case we will build the validation schema for the current job and put it in our index. Any schemas that depend on this schema can be written to the index when they are needed.

Example: Suppose we create 2 schemas, where one schema is a child of the other schema (a dependency of the other). The index will not contain the validation schemas of them on creation of the schemas. Suppose we then ask for both of the validation schemas. Because the schemas are not present in the index, we will add both the validation schemas to the index. This involves building the validation schemas in a possibly synchronous API. Then if we change the child schema, we will not update the index until the validation schema is asked for. When the validation schema for the child is asked for, we will update the child schema in the index (adding the new version). However the parent schema will not be updated in the index until it is asked for.

Option B

In this option we will have 2 workers. The first is a worker that ensures the index is always filled with all validation schemas such that it maps directly to the cardinality of the JSON_SCHEMA_VERSION table. This worker will run periodically on a timer to ensure that the index is up to date with all validation schemas and backfill as needed.We will have a second worker that takes an event for a schema to update in the index. No API will be dependent on this worker finishing its job.handles a single instance of a schema being updated in the index. It will build and index the validation schema.

The second worker will be broadcasting to the first worker, all dependant schemas. We define a dependant schema as a schema that has dependencies. In this case, we can imagine this worker taking as input a given schema, in which we will broadcast messages for all schemas that reference the given schema. We want to have all these schemas to eventually reflect the changes to the referenced schema in the index.

The idea is when a JSON schema is created or updated (an asynchronous job), we will do 2 things.

  1. Build the validation JSON schema of this newly created/updated schema, and add it to the index.

  2. Send out events for each JSON schema that exists that depends on this schema change a notification to the second 2nd worker to broadcast changes.

  3. This 2nd worker will build the validation schemas for each event. The job will not depend on the second worker finishingfind all dependant schemas and send a notification message for each schema to the 1st worker.

So what happens is that we will always have a validation schema to return on an access to the index. We want this index to be precomputed. When a JSON schema is created or updated, it is immediately reflected in the index upon completion. However, the schemas that depend on this update of the index will not be immediately reflected in the index, but this will be the trade off. If someone creates a new schema and accesses the index, it will be there. However if someone updates a schema and accesses the index for a parent schema that depends on the updated schema, it may return an outdated schema. But the second worker workers should eventually get around to updating this parent schema in the indexschemas.

Creating the validation schema in the asynchronous job for creating/updating the JSON schema is also a perfect place because the job already creates a validation schema as part of its routine to ensure that it can be created.

Pros/Cons

Pros

Cons

Option A

  • Simple, no additional workers

  • We still have to build a validation schema if it is not present in the index (possibly take longer than 30 seconds).

Option B

  • We will always have a validation schema to return on an access to the index with no wait.

  • Create or Update of a schema is always consistent for the index for the updated schema.

  • The index may not be up to date immediately (may return an old validation schema if a dependency schema is recently changed).

  • More complicated, 2 new workers.

Conclusion

Option B is the best option as we want to avoid building validation schemas during a synchronous API.