Skip to end of banner
Go to start of banner

Synapse Cost Allocations: API Design Document

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »


Jira ticket:  PLFM-5227 - Getting issue details... STATUS

Background Jiras:

key summary type created updated due assignee reporter priority status resolution
Loading...
Refresh

Background

See Use Case 1 in Service to migrate content at the S3/Storage Level: Use Cases for use case notes.

In Synapse, the top 10 projects by file size account for about 3/4 of our total S3 bucket. These projects can be very expensive, so there is a need to determine the costs of projects. Using file metadata, we can determine approximate size very easily. Egress is more difficult to determine, but per the analysis in PLFM-5009, storage counts for approximately 80% of our bill. There is not currently a need to be incredibly precise/accurate, so we may simply ignore egress for now. At the moment, cost of egress can be assumed to be distributed proportionally to costs of storage.

API

We can create a new Cost Allocation service in Synapse that can be used by members of a "Synapse Cost Allocation Team", a bootstrapped team with users that have the authority to manage these cost allocations.

The members of the Cost Allocation Team can create new cost allocations with a descriptive name (e.g. ampad, nih-r01-12345) that matches how costs should be broken down in Synapse. When a new cost allocation being created, project sizes can be calculated by group to determine that cost allocation's total impact.

After a cost allocation is created, a project can be assigned to it. Cost allocations can have many projects, but a project can be assigned to at most one cost allocation. When a project is assigned to a cost allocation, the underlying files associated with that project (that is, the file handles pointed to by all versions of all file entities in a project) will be included in the total cost of that cost allocation.

VerbURIRequest bodyResponse bodyNotes
GET/costAllocation

None

CostAllocationPage

body: Array<CostAllocation>

nextPageToken: String

Lists all existing CostAllocations
POST/costAllocation/report/csv/async/start

CostAllocationReportRequest

numberOfResults: Long

allocated: Boolean

AsyncJobId

Initiates a job to create a CSV report for the sizes of cost allocations or unallocated projects in Synapse.

The results will contain the top <numberOfResults> cost allocations/unallocated projects by descending size.

If allocated is true, the report will include the largest cost allocations. If allocated is false, the report will include the largest projects that are not currently assigned to a cost allocation.

This request can only be made by a member of the Synapse Cost Allocation Team

GET/costAllocation/report/csv/async/get/{token}None

CostAllocationReportResult:

resultsFileHandleId: String

timestamp: Date

Get an object containing a file handle that points to a Cost Allocation Report CSV
POST/entity/{entityId}/costAllocation

name: String

CostAllocation

id: String

name: String

bucket: String

projects: Array<String>

eTag: String

createdBy: Long

createdOn: Date

Associates a project with a cost allocation. If the cost allocation doesn't exist, it creates a new one. If the project is currently associated with a different cost allocation, it will be replaced with a new one.

Name is case-insensitive (will be coerced to lowercase) and can include alphanumeric, "-", ".", and "_".

GET/entity/{entityId}/costAllocationNone

CostAllocation

Gets the cost allocation for a specific project.
DELETE/entity/{entityId}/costAllocationNoneNone

Removes the cost allocation tied to a project. The contents of the project that are in the cost allocation storage location will be moved to the default Synapse storage.

After all of the contents have been moved, the project is removed from the cost allocation.

Sample Reports

CostAllocation report (default, allocated=TRUE)

Cost Allocation IDNameSize (B)Proportion of Synapse Storage
18cost_alloc_14244830139853910.6534
1amp-ad535798138753830.0824
4grant123271482472428410.0414
............
0unallocated895732857987190.2139

Unallocated projects report (allocated=FALSE)

Project IDNameSize (B)Proportion of Synapse Storage
syn123456Cool Project 123144830139853910.1244
syn999999Research Group Data75798138753830.0644
syn583725Smith Lab Repository31482472428410.0311
............
0allocated4244830139853910.8538

Implementation Details

This section is unrelated to the API. Feel free to ignore it if it is not within your scope of concern.

Detailed Requirements 

  • Creation of a new bootstrapped "Cost Allocation Team" to have access to these APIs.
  • Retrieval of file handle metadata, particularly project association and file size 
    • File Replication Table
      • Columns from File Table: ID, ETAG, PREVIEW_ID, CREATED_ON, CREATED_BY, METADATA_TYPE, CONTENT_TYPE, CONTENT_SIZE, CONTENT_MD5, BUCKET_NAME, NAME, KEY, STORAGE_LOCATION_ID, ENDPOINT
      • Primary key ID
    • Creation of this table allows retrieval of file metadata by joining it with the entity replication table. This allows us to find all of the file handles and metadata for a particular project in one database call. Without this table, we must query the tables database to find the entities in a project, and then separately query the repo database to retrieve the metadata of those files.
    • Entity Replication Table
      • New column COST_ALLOCATION_ID
    • PLFM-4148 - Getting issue details... STATUS  is another issue that may benefit from this
  • Enumeration of cost allocations
    • Cost allocation table: ID, NAME, CREATED_BY, CREATED_ON
  • Associate cost allocations and projects
    • Cost Allocation association table
      • Columns: COST_ALLOCATION_ID, PROJECT_ID
      • Primary key: PROJECT_ID (a project may have no more than one cost allocation)

Concerns

  • This method will not accurately capture egress. It simply calculates proportions of cost based on storage.
  • No labels