advise BMGF how best to delete old files

Description

The original request is below.

Is there a way to do the deletes such that the underlying S3 objects will also be deleted. (Might ask )

Is there a way to do the deletes to minimize the impact to Synapse migration. (Might ask and/or )

How might the process be carried out to generally maximize efficiency (running as fast as possible, minimizing impact on Synapse?)

Hi Brian,

We have completed moving all the projects to the Gates storage location and need to clean up the old version(s) of the files that are on the default Synapse bucket.

I updated my original script that does the move to handle the deletes.
The code can be found here (starting on line #115).

Could you have someone review this and make sure I'm not doing anything wrong and if there is a better way to handle this?

Thanks,
Patrick Stout
Senior Technical Architect
Preva Group

Environment

None

Activity

Show:
Jordan Kiang
November 10, 2020, 10:07 PM

Yes okay I guess that makes sense if there is a newest version of each file entity in the new storage location then no folders would be deleted anyway.

did we ever enact the rate limit that we talked about? It shouldn't really matter since all calls will be through the Synapse client which retries on 429s but there will be a lot of entity bundle fetches and version deletion calls in the above. Also let me know if any of the above proposed script concerns you or would otherwise exacerbate any migration difficulties.

Xavier Schildwachter
November 10, 2020, 10:29 PM

We did (they still allow 1440 calls / minute overall). Just wondering, if we put logic to stop during week-end then is it still worth sorting by id? Sorting would also help when migrating during the week, but then we have 2-days of changes by other users that dilute the efffect.

Jordan Kiang
November 10, 2020, 10:35 PM

If only one of the two (ordering vs sleeping on weekends) is necessary I’m happy to just do one. Only ordering might be nice if that works (a bit awkward to have someone run a script that might immediately sleep for days at a time).

Jordan Kiang
November 19, 2020, 9:24 PM

I’m going to send Patrick from Gates this fork of the original script which has the changes discussed above:

  1. It records all the file entities to be moved into a sqlite db and then orders the operations by synapse id rather than issuing the moves/deletes in the order returned by a recursive Synapse walk of the input project which would otherwise have normally distributed Synapse ids.

  2. It records the file handles of all deleted entities so that these can be removed. The script has a new option to remove the file handles in a second pass after the deletion, although per our discussion in the stack review last week I will not be asking Patrick to run that step but instead will ask him to upload the resulting sqlite db which will contain all the file handle ids and then we can decide if we want to remove the file handles specially or leave it to later consolidation of unlinked file handles in general.

Per Xa’s last comment, it does not sleep during the migration window.

do you have a preference for when I should ask Patrick to run it? Would it make sense for him to start after the next stack deployment?

Jordan Kiang
January 14, 2021, 8:21 PM

Patrick has run the deletion script and saved the listing of deleted files with their file handles to project syn24173451.

As per above the file handles themselves have not been deleted, as they’ll be deleted naturally with the orphan file cleanup that is doing, and we could use the file handle info as validation that the orphaned files were detected.

Fixed

Assignee

Jordan Kiang

Reporter

Bruce Hoff

Labels

None

Validator

Bruce Hoff

Development Area

None

Release Version History

None

Priority

Major
Configure