...
Association Type | Table | Foreign Key (ON DELETE CASCADE) | Description | Current Size | Unlinking | ||||
---|---|---|---|---|---|---|---|---|---|
FileEntity | JDOREVISION | FILE_HANDLE_ID (RESTRICT) | Each file entity revision has a FK back to the referenced file handle. Multiple revision can reference a single file entity (e.g. One node might be linked to multiple file handles through revisions). | ~11M | The association can be broken in several ways:
Note that even if a revision is deleted or a file handle is changed other revisions might still refer to the file handle. | ||||
TableEntity | Each table entity has an associated table with the file handles. They are not migratable tables and not consistent. The data is also store in the various transactions used to build tables in S3 in a dedicated bucket. | No, the tables are stored in a separate DB | Each table might reference multiple file handles, when a table is built each transaction is processed and if a file handle is in the transaction it is added to a dedicated table, one for each synapse table. Unfortunately this table is not migratable and is rebuilt every week. We keep a migratable table with all the table transactions and table row changes packages in a zip file and stored in a dedicated S3 bucket. | ~36M, distributed in around ~10K tables (~1.3K non empty) | The association can be broken only when the table is deleted (removed from the trashcan). | ||||
WikiAttachment | V2_WIKI_ATTACHMENT_RESERVATION V2_WIKI_MARKDOWN | FILE_HANDLE_ID (RESTRICT) No, contained in the ATTACHMENT_ID_LIST blob | The attachments to a wiki page, the table includes both the file handles storing the wiki page and its attachments. The list of attachments is also stored in the V2_WIKI_MARKDOWN table in a blob with the list of ids. | ~1M | WikiMarkdown | The association might be broken in several ways:
| |||
WikiMarkdown | V2_WIKI_MARKDOWN | FILE_HANDLE_ID (RESTRICT) | The markdown of a wiki page. | ~770K | See above | ||||
UserProfileAttachment | JDOUSERPROFILE | PICTURE_ID (SET NULL) | The user profile image. | ~60K | The association can be broken when the profile image is changed. | ||||
TeamAttachment | TEAM | No, contained in the PROPERTIES blob that stores a serialized version of the team object (icon property) | The team picture. | ~4.5K | The association can be broken when the team picture is changed. | ||||
MessageAttachment | MESSAGE_CONTENT | FILE_HANDLE_ID (RESTRICT - NO ACTION) | The messages to users content. | ~460K | The association can be broken if the message is deleted (only admins). | ||||
SubmissionAttachment | JDOSUBMISSION_FILE | FILE_HANDLE_ID (RESTRICT - NO ACTION) | The file handles that are part of an evaluation submission, in particular this are the file handles associated with a file entity that is part of a submission (e.g. all the version or a specific version). | ~110K | VerificationSubmission | VERIFICATION_FILE | FILE_HANDLE_ID (RESTRICT) | The | The association can be broken when the submission is deleted or when the evaluation is deleted. |
VerificationSubmission | VERIFICATION_FILE | FILE_HANDLE_ID (RESTRICT) | The files that are submitted as part of the user verification. Note that when a user is approved or rejected the association is removed. | <10 | The association is broken when the submission is approved or rejected. | ||||
AccessRequirementAttachment | ACCESS_REQUIREMENT_REVISION | No, a file handle might be contained in the SERIALIZED_ENTITY blob that stores a managed access requirement (the ducTemplateFileHandleId property) | A managed access requirement might have a file handle pointing to a DUC template. | ~5K | The association is broken when the access requirement is deleted or updated with a new file handle. | ||||
DataAccessRequestAttachment | DATA_ACCESS_REQUEST | No, various file handles are referenced in the REQUEST_SERIALIZED blob that stores a serialized version of the access request. | A data access request might have multiple files attached for the approval phase (e.g DUC, IRB approval and other attachments). | ~2K | The association is broken when the request is updated with different file handles. | ||||
DataAccessSubmissionAttachment | DATA_ACCESS_SUBMISSION | No, various file handles are referenced in the SUBMISSION_SERIALIZED blob that stores a serialized version of the submission. | Same as above, but for the actual submission. | ~3K | Never? | ||||
FormData | FORM_DATA | FILE_HANDLE_ID (RESTRICT) | The data of a form. | ~300 | When the form is deleted. |
Note: An interesting observation (Also from the S3 Bucket Analysis ) is that most of data is referenced by file entities (as expected). Most of our concerns could revolve around un-linking entities rather than other objects. For this it might be actually worth implementing option 2. above instead. We could simply register the links when we encounter them and only take care of un-registering tables and entities links for cleanup.
The idea would be to periodically scan over all the associations and record the last time a file handle was “seen” (e.g. as associated). In this way we can build an index that can be queried to fetch the last time a file handle association was seen so that file handles can be flagged as un-linked.
...