...
Currently, when a user downloads a FileEntity via the packaging option of their download list (POST /download/list/package/async/start), the DownloadListPackageRequest include an option to include a manifest. When “includeManifest=true”, the package will include a CSV file contain all of the annotations for any FileEntity include in the download. We propose extending this manifest to automatically include all derived annotations.
AccessRequirement API Changes
New AccessRequirement Types
Currently, AccessRequirment AccessRequirement (AR) include a list of “subjectIds” that define what Entities (or Teams) the AR applies too. There are currently six types of ARs:
...
The GET /accessRequirement/{requirementId} API returns the full list of ‘subjectIds’ for existing ARs. This means that the entire subject list must fit in both client-side and server-side memory. Considering that existing ARs are managed by hand, it is reasonable to assume that the full list will be small enough to prevent memory problems. In fact, it is common for the ‘subjectIds’ to be container IDs (Projects & Folders), to minimize the micromanagement required to maintain an AR. As a result, a short ‘subjectIds’ list can restrict thousands of Entites, since a container can contain up to 40K children. This type of data compression is not likely to extend to new ARs with subjectsDefinedByAnnotations = true
. While it will be possible to bind _ar#
annotations to containers, it is far more likely that these annotations will be bound to individual files. After all, the new derived annotations features make it easy to apply annotation to millions of entities with only a few lines of schema code. This means we must assume that the subjects of ARs with subjectsDefinedByAnnotations = true
might not fit in memory. Therefore, we cannot return all of the subject’s for such ARs for calls that GET the AR. However, since the subjects of such ARs are controlled by JSON schemas, it is not clear that listing the subject will even be needed. If we find that we do need to provide all of the subjects of these new ARs then we will need to add a new API that provides a paginated list of subjectIds to avoid out-of-memory problems.
The _ar# annotation Lifecycle
The above examples demonstrate the need for _ar#
annotation as derived annotations. Do we also want to support _ar#
annotations as actual annotations that are directly set by users? If so, who would be allowed to set these annotations? It is clear that we would not want a user to unbind an Entity from an AR on a file simply by removing or editing an _ar#
annotation on the file. On the other hand, it might be useful for a member the ACT to add _ar#
directly to project or folder. However, we do not currently have an actual use cases for such a feature. Should we block all users from adding/removing/editing _ar#
annotations?
Invalid Annotations
Currently, derived annotations are reevaluated for any type of Entity change event. This includes JSON schema binding change events, and annotation changes events. We would need to check the AR binding of an Entity each time the derived annotations are reevaluated. However, what happens if a change puts an entity into an invalid state? For such a case, we would not be able to to determine what the correct derived annotations should be. By extension we would not be able to determine the correct AR bindings of an invalid Entity. How do we handle this case?
One option to this problem might be to bind all invalid Entities to a catch-all AR that would blocks download. Similar to how the LockAccessRequirement blocks downloads of reported files. This binding would remain until the Entity is restored to a valid state, and the correct AR binding could be established.