Entity Type Migration

Entity Type Migration

Currently, there are eighteen entity types supported in Synapse but most of those types are old and have been unofficially deprecated for a few years now.  We have not provided the means to convert these old entities to the new types so there are many of these old object in Synapse (see PLFM-1879).

Since each client can encounter one of these old objects at any time, client developers have been forced to support both the 'old' way and the 'new' way of working with entities for years now.  This adds complexity and increases the support cost for each client.  This document outlines a plan for migrating all old types into the types we currently support in Synapse.

 

Entity types as of 2/2/15:

Alias(es)

Full Name

Container

Locationable

Status

count

Last Modified

Valid Parents

Deprecated Entity Fields

Alias(es)

Full Name

Container

Locationable

Status

count

Last Modified

Valid Parents

Deprecated Entity Fields

dataset, study

org.sagebionetworks.repo.model.Study

true

true

Deprecated

11479

2015-01-13 04:05:16

folder, project

numSamples, species, disease, tissueType, platform

layer, data

org.sagebionetworks.repo.model.Data

false

true

Deprecated

835532

2015-01-28 23:10:06

folder, project, study

numSamples, species, disease, tissueType, platform, type

project

org.sagebionetworks.repo.model.Project

true

false

Supported

8964

2015-01-28 23:25:29

root only

 

preview

org.sagebionetworks.repo.model.Preview

false

false

Deprecated

93

2012-01-19 05:52:23

folder, data

 

folder

org.sagebionetworks.repo.model.Folder

true

false

Supported

42101

2015-01-28 23:10:11

project, folder, study, analysis

 

analysis

org.sagebionetworks.repo.model.Analysis

true

false

Deprecated

29

2013-05-09 12:44:15

project, folder

properties

step

org.sagebionetworks.repo.model.Step

false

false

Deprecated

34

2012-09-16 15:31:39

folder, analysis

code, input, output, environmentDescriptors, startDate, endDate, commandLine

code

org.sagebionetworks.repo.model.Code

false

true

Deprecated

769

2015-01-28 23:09:42

project, folder

startDate, endDate

link

org.sagebionetworks.repo.model.Link

false

false

Supported

253

2015-01-14 15:44:07

project, folder, study, data, step, analysis

 

phenotypedata

org.sagebionetworks.repo.model.PhenotypeData

false

true

Deprecated

569

2013-08-09 21:12:03

project, folder, study

numSamples, species, disease

genotypedata

org.sagebionetworks.repo.model.GenotypeData

false

true

Deprecated

1642

2014-05-08 06:21:11

project, folder, study

numSamples, species, disease, platform

expressiondata

org.sagebionetworks.repo.model.ExpressionData

false

true

Deprecated

18724

2013-10-10 16:32:21

project, folder, study

numSamples, species, disease, tissueType, platform

robject

org.sagebionetworks.repo.model.RObject

false

true

Deprecated

52

2013-03-02 22:28:36

project, folder, study

properties

summary

org.sagebionetworks.repo.model.Summary

false

false

Supported

102

2014-11-06 21:56:38

project, folder, study

 

genomicdata

org.sagebionetworks.repo.model.GenomicData

false

true

Deprecated

2

2013-01-25 22:50:59

project, folder, study

numSamples, species, technology, dataType,molecularFeatureType, status, tissue, platform

page

org.sagebionetworks.repo.model.Page

true

false

Deprecated

0

 

folder, page

 

file

org.sagebionetworks.repo.model.FileEntity

false

false

Supported

198820

2015-01-28 23:10:26

project, folder, study

 

table

org.sagebionetworks.repo.model.table.TableEntity

false

false

Supported

2260

2015-01-28 23:17:52

project, folder, study

 

community

org.sagebionetworks.bridge.model.Community

true

false

Deprecated

1

2014-02-06 20:47:44

root only

teamId, welcomePageWikiId, indexPageWikiId

Table 1.

Locationable

Locationable entities can have one or more LocationData objects in their list of locations.  LocationData is the precursor to our currently supported FileHandle.  Each LocationData object points to either object in S3 (like a S3FileHandle) or an external URL (ExternalFileHandle).  Most of the client support burden is around supporting Locationable Entites. Table 1 shows which entity types are Locationable.  Even though a Locationable can have one or more LocationData it is assumed that each LocationData represents the same file just store in alternate locations and that all "copies" have the same md5 (see Figure 1).

Figure 1
Figure 1

Figure 1.

Containers

In the past we support many types of entity containers.  Each container can have one more child entity.  Table 1 shows which entity types are allowed to be containers of other entities.  Our users were either confused by all of these container types or wanted a new type for their special cases.  We decided that the only containers we want to support were Folders and Projects.

 

Locationable & Containers

Table 1 shows that datasets/studies are both containers and locationable.  None of the currently supported objects can have files and children entities.  Projects and Folders can have children but cannot have files.  Files can have files but cannot have any children.

Links In Everything

Links are a special case that can be contained in many types (see Table 1 'valid parents').  Since a Link can have study, data, step, or analysis as a parent, any of these object that have a link child cannot simple by changed to a File (file cannot have children). This is going to force us to make difficult decision about how to convert old types that have links as children.

Convert Entity Services

In order to fully depreciated the unsupported types, we need to convert each old type objects into an object of the supported types (without changing the entity ID).  The plan is to add a new services that will convert an old entity to a new type. This service make all changes to an old Entity in a single transaction.  This means an Entity type change will either succeed (making it permanent if done on production), or the transaction will be rolled-back leaving the entity exactly as it was before the call.   Many conversions will be simple.  For example, it will be simple to convert a a data object with a single file.  Conversion will not be simple for all cases where there is not a one-to-one relationship between the old and new.

Response Body

Method

URL

Request Body

Authorization

Response Body

Method

URL

Request Body

Authorization

Entity

PUT

/entity/{entityId}/convertType

Entity

Caller Must have the UPDATE permission on the Entity.

Response codes:

Return code

Name

Condition

Return code

Name

Condition

201

Created

The entity type convertion was successful

412

Precondition Failed

Returned if the passed entity.etag does not match the etag of the entity (ConflictingUpdateException)

400

Bad Request

The entity cannot be converted for any reason.  The message should explain why the type change failed.

Table 2.

The following table lists how each old type will be converted into a new type. 

Note: Any deprecated entity fields with values (see Table 1 'Deprecated Entity Fields')  will be preserved as annotations on the new entity.   Any non-primitive deprecated field will be saved as a string annotation.

Original Entity

Converted To

Details

Original Entity

Converted To

Details

Non-container Locationable with a single LocationData of type awss3

FileEntity

An S3FileHandle will be created for the location.  The type will be changed to a FileEntity pointing to the new FileHandle

Non-container Locationable with a single LocationData of type external

FileEntity

An ExternalFileHandle will be created for the location.  The type will be change to a FileEntity pointing to the new FilewHandle.

Non-container Locationable with a single LocationData of type awsebs, sage, github

Unsupported (400)

There is no way to represent these types with  FileHandles.  The caller will need to convert the type to either awss3 or external and then try again.

Non-container Locationable with more than one LocationData of any type

Unsupported (400)

There is now way to represent multiple location with a single file.  the caller will need to pick a single location to keep and remove the rest and then try again.

Non-container Locationable that has one or more Link children

Unsupported (400)

File cannot have children.  The caller must move or delete the Links and try again.

Container non-locationable

Folder

The container will simply be converted to folder.

Container locationable with n number of LocationData of type awss3 or external

Folder + n child FileEntity

The original entity will be converted to a folder, and a child FileEntity will be added for each valid LocationData.

Container locationable with n number of LocationData of type awsebs, sage, github

Unsupported (400)

There is no way to represent these types with  FileHandles.  The caller will need to convert the type to either awss3 or external and then try again.

Table 3

Versionable

All locationable objects are also versionable.  This means there can be multiple version of the same file(s).  When an old entity with multiple versions is changed to a new type, each version will be converted the same way as the current version.  If any version of an entity cannot be changed then the change will fail (leaving the entity in its original state).  The failure report will indicate exactly what went wrong.  Since versions of an entity are not editable, the only way to "fix" an old entity with in incompatible version will be to delete the incompatible version.  Once all incompatible version are delete, another attempt can be made to convert the type.

 

Entity Query

Entity aliases can be used in the "from" clause of the entity query services (see: Table 1. "alias(es)").  For example, "select * from dataset" would list all Studies that a user can see.  After an old entity's type has changed, it will not longer appear in query results that filter by its old type alias.  Instead, changed types will show up only under the new type alias.  For example, a Data object will currently be listed with "select * from layer", if the data object is converted to a file, it will then be listed using "select * from file".  The type change should have no other effects on entity query.   Any deprecated entity field can still be used to filter entity query results even though the property values will be moved to annotations upon a type change.  The entity query services already treats entity fields and annotations the same.

Migration plan

  1. Once the new entity conversion service is in place, we will setup a script to to call it for every old entity on staging.  The script will record all entities that file to convert for any reason.  The normal migration process will "undo" all type changes on staging.

  2. We will need to work with the original entity owners to find fixes for all entities that cannot be changed.  We will also need the original owners to review the type changes on staging and confirm the migration went as expected.

  3. For all successful type conversions on staging we can repeat the conversion on production (making the type change permanent).

  4. Repeat steps 1-3 until all old types are converted.