Document toolboxDocument toolbox

Entity Type Migration

Currently, there are eighteen entity types supported in Synapse but most of those types are old and have been unofficially deprecated for a few years now.  We have not provided the means to convert these old entities to the new types so there are many of these old object in Synapse (see PLFM-1879).

Since each client can encounter one of these old objects at any time, client developers have been forced to support both the 'old' way and the 'new' way of working with entities for years now.  This adds complexity and increases the support cost for each client.  This document outlines a plan for migrating all old types into the types we currently support in Synapse.

 

Entity types as of 2/2/15:

Alias(es)Full NameContainerLocationableStatuscountLast ModifiedValid ParentsDeprecated Entity Fields

dataset, study

org.sagebionetworks.repo.model.StudytruetrueDeprecated114792015-01-13 04:05:16folder, projectnumSamples, species, disease, tissueType, platform
layer, dataorg.sagebionetworks.repo.model.DatafalsetrueDeprecated8355322015-01-28 23:10:06folder, project, studynumSamples, species, disease, tissueType, platform, type
projectorg.sagebionetworks.repo.model.ProjecttruefalseSupported89642015-01-28 23:25:29root only 
previeworg.sagebionetworks.repo.model.PreviewfalsefalseDeprecated932012-01-19 05:52:23folder, data 
folderorg.sagebionetworks.repo.model.FoldertruefalseSupported421012015-01-28 23:10:11project, folder, study, analysis 
analysisorg.sagebionetworks.repo.model.AnalysistruefalseDeprecated292013-05-09 12:44:15project, folderproperties
steporg.sagebionetworks.repo.model.StepfalsefalseDeprecated342012-09-16 15:31:39folder, analysiscode, input, output, environmentDescriptors, startDate, endDate, commandLine
codeorg.sagebionetworks.repo.model.CodefalsetrueDeprecated7692015-01-28 23:09:42project, folderstartDate, endDate
linkorg.sagebionetworks.repo.model.LinkfalsefalseSupported2532015-01-14 15:44:07project, folder, study, data, step, analysis 
phenotypedataorg.sagebionetworks.repo.model.PhenotypeDatafalsetrueDeprecated5692013-08-09 21:12:03project, folder, studynumSamples, species, disease
genotypedataorg.sagebionetworks.repo.model.GenotypeDatafalsetrueDeprecated16422014-05-08 06:21:11project, folder, studynumSamples, species, disease, platform
expressiondataorg.sagebionetworks.repo.model.ExpressionDatafalsetrueDeprecated187242013-10-10 16:32:21project, folder, studynumSamples, species, disease, tissueType, platform
robjectorg.sagebionetworks.repo.model.RObjectfalsetrueDeprecated522013-03-02 22:28:36project, folder, studyproperties
summaryorg.sagebionetworks.repo.model.SummaryfalsefalseSupported1022014-11-06 21:56:38project, folder, study 
genomicdataorg.sagebionetworks.repo.model.GenomicDatafalsetrueDeprecated22013-01-25 22:50:59project, folder, studynumSamples, species, technology, dataType,molecularFeatureType, status, tissue, platform
pageorg.sagebionetworks.repo.model.PagetruefalseDeprecated0 folder, page 
fileorg.sagebionetworks.repo.model.FileEntityfalsefalseSupported1988202015-01-28 23:10:26project, folder, study 
tableorg.sagebionetworks.repo.model.table.TableEntityfalsefalseSupported22602015-01-28 23:17:52project, folder, study 
communityorg.sagebionetworks.bridge.model.CommunitytruefalseDeprecated12014-02-06 20:47:44root onlyteamId, welcomePageWikiId, indexPageWikiId

Table 1.

Locationable

Locationable entities can have one or more LocationData objects in their list of locations.  LocationData is the precursor to our currently supported FileHandle.  Each LocationData object points to either object in S3 (like a S3FileHandle) or an external URL (ExternalFileHandle).  Most of the client support burden is around supporting Locationable Entites. Table 1 shows which entity types are Locationable.  Even though a Locationable can have one or more LocationData it is assumed that each LocationData represents the same file just store in alternate locations and that all "copies" have the same md5 (see Figure 1).

Figure 1

Figure 1.

Containers

In the past we support many types of entity containers.  Each container can have one more child entity.  Table 1 shows which entity types are allowed to be containers of other entities.  Our users were either confused by all of these container types or wanted a new type for their special cases.  We decided that the only containers we want to support were Folders and Projects.

 

Locationable & Containers

Table 1 shows that datasets/studies are both containers and locationable.  None of the currently supported objects can have files and children entities.  Projects and Folders can have children but cannot have files.  Files can have files but cannot have any children.

Links In Everything

Links are a special case that can be contained in many types (see Table 1 'valid parents').  Since a Link can have study, data, step, or analysis as a parent, any of these object that have a link child cannot simple by changed to a File (file cannot have children). This is going to force us to make difficult decision about how to convert old types that have links as children.

Convert Entity Services

In order to fully depreciated the unsupported types, we need to convert each old type objects into an object of the supported types (without changing the entity ID).  The plan is to add a new services that will convert an old entity to a new type. This service make all changes to an old Entity in a single transaction.  This means an Entity type change will either succeed (making it permanent if done on production), or the transaction will be rolled-back leaving the entity exactly as it was before the call.   Many conversions will be simple.  For example, it will be simple to convert a a data object with a single file.  Conversion will not be simple for all cases where there is not a one-to-one relationship between the old and new.

Response BodyMethodURLRequest BodyAuthorization
EntityPUT/entity/{entityId}/convertTypeEntityCaller Must have the UPDATE permission on the Entity.

Response codes:

Return codeNameCondition
201CreatedThe entity type convertion was successful
412Precondition FailedReturned if the passed entity.etag does not match the etag of the entity (ConflictingUpdateException)
400Bad RequestThe entity cannot be converted for any reason.  The message should explain why the type change failed.

Table 2.

The following table lists how each old type will be converted into a new type. 

Note: Any deprecated entity fields with values (see Table 1 'Deprecated Entity Fields')  will be preserved as annotations on the new entity.   Any non-primitive deprecated field will be saved as a string annotation.

Original EntityConverted ToDetails
Non-container Locationable with a single LocationData of type awss3FileEntityAn S3FileHandle will be created for the location.  The type will be changed to a FileEntity pointing to the new FileHandle
Non-container Locationable with a single LocationData of type externalFileEntityAn ExternalFileHandle will be created for the location.  The type will be change to a FileEntity pointing to the new FilewHandle.
Non-container Locationable with a single LocationData of type awsebs, sage, githubUnsupported (400)There is no way to represent these types with  FileHandles.  The caller will need to convert the type to either awss3 or external and then try again.
Non-container Locationable with more than one LocationData of any typeUnsupported (400)There is now way to represent multiple location with a single file.  the caller will need to pick a single location to keep and remove the rest and then try again.
Non-container Locationable that has one or more Link childrenUnsupported (400)File cannot have children.  The caller must move or delete the Links and try again.
Container non-locationableFolderThe container will simply be converted to folder.
Container locationable with n number of LocationData of type awss3 or externalFolder + n child FileEntityThe original entity will be converted to a folder, and a child FileEntity will be added for each valid LocationData.
Container locationable with n number of LocationData of type awsebs, sage, githubUnsupported (400)There is no way to represent these types with  FileHandles.  The caller will need to convert the type to either awss3 or external and then try again.

Table 3

Versionable

All locationable objects are also versionable.  This means there can be multiple version of the same file(s).  When an old entity with multiple versions is changed to a new type, each version will be converted the same way as the current version.  If any version of an entity cannot be changed then the change will fail (leaving the entity in its original state).  The failure report will indicate exactly what went wrong.  Since versions of an entity are not editable, the only way to "fix" an old entity with in incompatible version will be to delete the incompatible version.  Once all incompatible version are delete, another attempt can be made to convert the type.

 

Entity Query

Entity aliases can be used in the "from" clause of the entity query services (see: Table 1. "alias(es)").  For example, "select * from dataset" would list all Studies that a user can see.  After an old entity's type has changed, it will not longer appear in query results that filter by its old type alias.  Instead, changed types will show up only under the new type alias.  For example, a Data object will currently be listed with "select * from layer", if the data object is converted to a file, it will then be listed using "select * from file".  The type change should have no other effects on entity query.   Any deprecated entity field can still be used to filter entity query results even though the property values will be moved to annotations upon a type change.  The entity query services already treats entity fields and annotations the same.

Migration plan

  1. Once the new entity conversion service is in place, we will setup a script to to call it for every old entity on staging.  The script will record all entities that file to convert for any reason.  The normal migration process will "undo" all type changes on staging.
  2. We will need to work with the original entity owners to find fixes for all entities that cannot be changed.  We will also need the original owners to review the type changes on staging and confirm the migration went as expected.
  3. For all successful type conversions on staging we can repeat the conversion on production (making the type change permanent).
  4. Repeat steps 1-3 until all old types are converted.