...
- A new API to delete an entity into a trash can. Once an entity is in the trash can, it becomes invisible from the normal CRUD operations. The current delete API will be kept as an alternative to permanently delete an entity.
- Each user has a dedicated trash can. The user can view deleted entities in his/her trash can. (Note: The current trash folder approach is in fact a global trash can.)
- The user can restore deleted entities in the trash can.
- The user can purge the whole trash can or purge individual entities.
- A worker periodically scans trash cans and purges entities that are in the trash can for more than a month.
...
- Need to define a reasonable boundary as to what data should go to the trash can so that it can be restored as completely as possible. The boundary obviously goes beyond the entity/node. We need to also trash the revisions (which has the annotations, references, version numbers), the ACLs, together with entity. What else? When deciding what data to trash, also keep in mind the impact on maintenance and future development.
- Need to handle hierarchies. Especially each entity has a parent and a benefactor. Deleting a node has the cascading effect of also deleting the descendants and the beneficiaries. We must also trash the dependents so that they can be restored together the deleting node. This requirement can hit the performance badly if we end up deleting a large tree of nodes. One approach to this challenge is to set a limit to the number of entities to move to the trash can. If we count the number of entities to be more than 100, for example, we throw an exception ("Too large to fit into the trash can.") and prompt the user to permanently delete the entities instead.
- Need to cope with changes. Once the entity is in trash can, it is frozen from changes. In the meantime, the data in Synapse keeps changing. When we restore an entity from the trash can, its surroundings may have already changed. For example, its parent does not exist any more, or the access requirement has changed. Not only data. Schemas change too. Besides, the schemas are not standardized, not versioned, not persisted, and not very well detached from the code logic. That said, there exists the risk that the items you put in a trash can today may fail to restore a month later due to incompatible schemas. We should at least be able to detect such conflicts and fail the restore with exceptions. Or perhaps better, cut off the conflicting parts and let the user restore manually.
...
Move the entities being deleted to a trash folder. Every entity within the trash folder will have its benefactor set to the trash folder. If we only allow administrators to access the trash folder, it essentially hides the trash can from normal users. We then use an additional table to track individual trash entities (i.e. who deleted what and when) so that a user can view only items deleted by him/her and restore if needed. This approach fits naturally our current design around folders and files. It requires minimal amount of work comparing with the other two approaches. The main disadvantage is the loss of the original ACLs during restore. Once an entity is moved to the trash folder, the ACLs of its descendants are set to inherit that of the trash folder and the original ACLs are lost. When the tree rooted at the entity is restored, the ACL of every node is reset to itselfthe new parent.
The Trash Folder Approach in Details
Use case diagram
Sequence diagram
Proposed Rest APIs
For authenticated users:
...
- Move an entity to the trash can (PLFM-1688)
- View entities in the trash can (PLFM-1688)
- Restore an entity in the trash can (PLFM-1688)
- Purge an entity in the trash can (PLFM-1700)
- Purge the trash can (PLFM-1700)
For administrators:
- View entities in the trash can (PLFM-1698)
- Restore an entity in the trash can (PLFM-1698)
- Purge an entity in the trash can (PLFM-1698)
For daemon workers:
- Purge entities that have been in the trash can for more than 1 month (PLFM-1699)