Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Association Type

Table

Foreign Key (ON DELETE CASCADE)

Description

Current Size

Unlinking

FileEntity

JDOREVISION

FILE_HANDLE_ID (RESTRICT)

Each file entity revision has a FK back to the referenced file handle. Multiple revision can reference a single file entity (e.g. One node might be linked to multiple file handles through revisions).

~11M

The association can be broken in several ways:

  • The entity is deleted and purged from the trashcan

  • A revision is deleted

  • The file handle id is updated

Note that even if a revision is deleted or a file handle is changed other revisions might still refer to the file handle.

TableEntity

Each table entity has an associated table with the file handles. They are not migratable tables and not consistent. The data is also store in the various transactions used to build tables in S3 in a dedicated bucket.

No, the tables are stored in a separate DB

Each table might reference multiple file handles, when a table is built each transaction is processed and if a file handle is in the transaction it is added to a dedicated table, one for each synapse table. Unfortunately this table is not migratable and is rebuilt every week. We keep a migratable table with all the table transactions and table row changes packages in a zip file and stored in a dedicated S3 bucket.

~36M, distributed in around ~10K tables (~1.3K non empty)

The association can be broken only when the table is deleted (removed from the trashcan).

WikiAttachment

V2_WIKI_ATTACHMENT_RESERVATION

V2_WIKI_MARKDOWN

FILE_HANDLE_ID (RESTRICT)

No, contained in the ATTACHMENT_ID_LIST blob

The attachments to a wiki page, the table includes both the file handles storing the wiki page and its attachments. The list of attachments is also stored in the V2_WIKI_MARKDOWN table in a blob with the list of ids.

~1M

WikiMarkdown

The association might be broken in several ways:

  • A wiki page is deleted

  • The attachments are updated

WikiMarkdown

V2_WIKI_MARKDOWN

FILE_HANDLE_ID (RESTRICT)

The markdown of a wiki page.

~770K

See above

UserProfileAttachment

JDOUSERPROFILE

PICTURE_ID (SET NULL)

The user profile image.

~60K

The association can be broken when the profile image is changed.

TeamAttachment

TEAM

No, contained in the PROPERTIES blob that stores a serialized version of the team object (icon property)

The team picture.

~4.5K

The association can be broken when the team picture is changed.

MessageAttachment

MESSAGE_CONTENT

FILE_HANDLE_ID (RESTRICT - NO ACTION)

The messages to users content.

~460K

The association can be broken if the message is deleted (only admins).

SubmissionAttachment

JDOSUBMISSION_FILE

FILE_HANDLE_ID (RESTRICT - NO ACTION)

The file handles that are part of an evaluation submission, in particular this are the file handles associated with a file entity that is part of a submission (e.g. all the version or a specific version).

~110K

VerificationSubmission

VERIFICATION_FILE

FILE_HANDLE_ID (RESTRICT)

The

The association can be broken when the submission is deleted or when the evaluation is deleted.

VerificationSubmission

VERIFICATION_FILE

FILE_HANDLE_ID (RESTRICT)

The files that are submitted as part of the user verification. Note that when a user is approved or rejected the association is removed.

<10

The association is broken when the submission is approved or rejected.

AccessRequirementAttachment

ACCESS_REQUIREMENT_REVISION

No, a file handle might be contained in the SERIALIZED_ENTITY blob that stores a managed access requirement (the ducTemplateFileHandleId property)

A managed access requirement might have a file handle pointing to a DUC template.

~5K

The association is broken when the access requirement is deleted or updated with a new file handle.

DataAccessRequestAttachment

DATA_ACCESS_REQUEST

No, various file handles are referenced in the REQUEST_SERIALIZED blob that stores a serialized version of the access request.

A data access request might have multiple files attached for the approval phase (e.g DUC, IRB approval and other attachments).

~2K

The association is broken when the request is updated with different file handles.

DataAccessSubmissionAttachment

DATA_ACCESS_SUBMISSION

No, various file handles are referenced in the SUBMISSION_SERIALIZED blob that stores a serialized version of the submission.

Same as above, but for the actual submission.

~3K

Never?

FormData

FORM_DATA

FILE_HANDLE_ID (RESTRICT)

The data of a form.

~300

When the form is deleted.

Note: An interesting observation (Also from the S3 Bucket Analysis ) is that most of data is referenced by file entities (as expected). Most of our concerns could revolve around un-linking entities rather than other objects. For this it might be actually worth implementing option 2. above instead. We could simply register the links when we encounter them and only take care of un-registering tables and entities links for cleanup.

The idea would be to periodically scan over all the associations and record the last time a file handle was “seen” (e.g. as associated). In this way we can build an index that can be queried to fetch the last time a file handle association was seen so that file handles can be flagged as un-linked.

...