Page Comparison

...

Jira Legacy

server	System JIRA
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues	20
jqlQuery	key in (PLFM-5009, PLFM-5082, PLFM-5085, PLFM-5108, PLFM-5201, PLFM-5108)
serverId	ba6fb084-9827-3160-8067-8ac7470f78b2

Background

See Use Case 1 in Service to migrate content at the S3/Storage Level: Use Cases for use case notes.

In Synapse, the top 10 projects by file size account for about 3/4 of our total S3 bucket. These projects can be very expensive, so there is a need to determine the costs of projects. Using file metadata, we can determine approximate size very easily. Egress is more difficult to determine, but per the analysis in PLFM-5009, storage counts for approximately 80% of our bill. There is not currently a need to be incredibly precise/accurate, so we may simply ignore egress for now. At the moment, cost of egress can be assumed to be distributed proportionally to costs of storage.

API

We can create Create a new Cost Allocation Storage Report asynchronous service in Synapse that can be used by members of a "Synapse Cost Allocation Reports Team", a bootstrapped team with users that have the authority to manage these cost allocations.

Cost allocations in Synapse are groupings of projects that should be pooled for cost services. Projects can only have 0 or 1 cost allocation.

Cost allocations are uniquely identified by their name and can be created by a member of the Cost Allocation Team by assigning one to a project.

When a project is assigned to a cost allocation, the underlying files associated with that project (that is, the file handles used by all versions of all file entities in a project) will be included in the total cost of that cost allocationcreate these reports.

A member of the cost allocation Reports team can make an asynchronous query to retrieve a CSV report with about the sizes of their choice of all cost allocations, all allocated projects, or all unallocated projects. All requests can only be made members of the Synapse Cost Allocation Teamusage of the Synapse S3 bucket with project-level resolution across all projects.

Verb	URI	Request body	Response body	Notes

POST/entity/{entityId}/costAllocation

name: String

CostAllocation

id: String

name: String

bucket: String

projects: Array<String>

eTag: String

createdBy: Long

createdOn: DateGET/costAllocation/report/

Associates a project with a cost allocation. If the cost allocation doesn't exist, it creates a new one. If the project is currently associated with a different cost allocation, it will be associated with the new one.

Name is case-insensitive (will be coerced to lowercase) and can include alphanumeric, "-", ".", and "_".

GET/entity/{entityId}/costAllocationNone

CostAllocation

Gets the cost allocation for a specific project.

This request can only be made by a member of the Synapse Cost Allocation Team

DELETE/entity/{entityId}/costAllocationNoneNone

Removes the cost allocation tied to a project. The contents of the project that are in the cost allocation storage location will be moved to the default Synapse storage.

After all of the contents have been moved, the project is removed from the cost allocation.

GET

/storageReport/csv/async/get/{token}

None

CostAllocationReportResult

DownloadStorageReportResponse:

resultsFileHandleId: String

timestamp: Date

Get an object containing a file handle that points to a

Cost Allocation Report CSV

Storage Report CSV

The caller can download a CSV with the file handle ID

POST

/

costAllocation

storageReport/

report/

csv/async/start

CostAllocationReportRequest

DownloadStorageReportRequest

type: Enum (

COST_ALLOCATIONS, ALLOCATED_PROJECTS, UNALLOCATED

ALL_PROJECTS)

AsyncJobId

Initiates a job to create a CSV report for the sizes of

cost allocations or unallocated

projects in Synapse (where size is usage of the Synapse S3 bucket).

The

type describes the type of report will be generated. COST_ALLOCATIONS will generate a report of all cost allocations, ALLOCATED_PROJECTS will generate a report of the largest projects that are assigned to cost allocations, and UNALLOCATED_PROJECTS will generate a report of the largest projects that are not assigned to cost allocations. See examples below.

Sample Reports

Type: COST_ALLOCATION

...

request will create a report about all projects when specifying ALL_PROJECTS. The enum allows requests for different types of reports (for example, project groups, if that gets implemented)

Sample Report

Type: ALL_PROJECTS

Project IDNameSize (B)Proportion of Synapse Storagesyn123456Cool Project 123144830139853910.1829syn999999Research Group Data85798138753830.1041syn583725Smith Lab Repository 251482472428410.0612...

Project ID	Name	Size (B)

Proportion of Synapse Storagesyn154314Project Onc-RNA1244830139853910.1244syn428582Super Cool Data745798138753830.0644syn523913Smith Lab Repository 1314824724284170.0311............0unallocated4244830139853910.8538

Type: UNALLOCATED_PROJECTS

syn5382532	Cool Project 1	424483013985391
syn635535	NIH-Grant 53532 Public Data Repository	53579813875383
syn9359135	Dr. Smith's Private FASTQ files	31482472428417
...	...

.

..

0allocated4244830139853910

.

7538

Implementation Details

Note
This section is unrelated to the API. Feel free to ignore it if it is not within your scope of concern.

Detailed Requirements

Creation of a new bootstrapped "Cost Allocation Reports Team" to have access to these APIs.
Retrieval of file handle metadata, particularly project association and file size
- File Replication Table
  - Columns from File Table: ID, ETAG, PREVIEW_ID, CREATED_ON, CREATED_BY, METADATA_TYPE, CONTENT_TYPE, CONTENT_SIZE, CONTENT_MD5, BUCKET_NAME, NAME, KEY, STORAGE_LOCATION_ID, ENDPOINT
  - Primary key ID
- Creation of this table allows retrieval of file metadata by joining it with the entity replication table. This allows us to find all of the file handles and metadata for a particular project in one database call. Without this table, we must query the tables database to find the entities in a project, and then separately query the repo database to retrieve the metadata of those files.
- Entity Replication Table
  - New column COST_ALLOCATION_ID
- Jira Legacy
  server System JIRA
  columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
  serverId ba6fb084-9827-3160-8067-8ac7470f78b2
  key PLFM-4148
  is another issue that may benefit from this
Enumeration of cost allocations
- Cost allocation table: ID, NAME, CREATED_BY, CREATED_ON
Associate cost allocations and projects
- Cost Allocation association table
- Columns: COST_ALLOCATION_ID, PROJECT_ID

Concerns

This method will not accurately capture egress. It simply calculates proportions of cost based on storage.

...

Versions Compared

Old Version 13

New Version Current

Key

Background

API

Sample Reports

Sample Report

Detailed Requirements

Concerns