Document toolboxDocument toolbox

Proposal for typed layers

Currently, there is one "Layer" entity that is used to represent the different types of layers (phenotype, genotype, expression, media, ...).

The differences in types are indicated by the "type" field and layer-specific data is stored as annotations (and henceforth not 

constrained and/or consistent). 

The goal of this proposal is to formalize different types of layers with specific information that a Synapse client will be able to use.

The proposal is to take the existing layer POJO and make it an interface on top of which we build the different layer types:

LayerBase:

{

	"implements": [
		{"$ref": "org.sagebionetworks.repo.model.Locationable"},
		{"$ref": "org.sagebionetworks.repo.model.HasPreviews"}
	],
	"properties": {
		"releaseNotes": {
			"type": "string",
			"description": "Notes associated with this layer"
		},
		"numSamples": {
			"type": "integer",
			"description": "Number of samples in this layer."
		},
		"status": {
			"type": "string",
			"description": "Status of this layer"
		},
		"formats": {
			"type": "array",
			"items": {
				"type": "string"
			},
			"description": "Available formats for this layer."
		}
	}
}

PlatformData:

{
	"properties": {
		"platformVendor": {
			"type": "string",
			"description": "Name of vendor of chip used."
		},
		"platform": {
			"type": "string",
			"description": "Chip platform name"
		}
	}
}

Couple of notes from early feedback on LayerBase:

- "status" is meant to differentiate between 'raw', 'curated', 'qced' etc. the alternative would be to have specialized layers for each of the status (i.e. RawPhenotypeLayer, QCedPhenotypeLayer, ...)

- "releaseNotes" should only pertain to 'raw' layers

- some string fields should be made enums

GenericLayer (an 'untyped' layer (i.e. same as we have now) to use for types of layers not yet defined):

{
	"implements": [
		{"ref", "org.sagebionetworks.repo.model.LayerBase"}
	],
	"properties": {
		"layerType": {
			"type": "string",
			"description": "Type of the generic layer."
		}
	}
}

PhenotypeLayer:

{

	"implements": [
		{"ref": "org.sagebionetworks.repo.model.LayerBase"}
	],
	"properties": {
		"processingFacility": {
			"type": "string",
			"description": "Information about facility where phenotype samples were processed"
		}
	}
}

GenotypeLayer:

{
	"implements": [
		{"ref": "org.sagebionetworks.repo.model.LayerBase"}
	],
	"properties": {
		"platformData": {
			"ref": "org.sagebionetworks.repo.model.PlatformData"
		}
	}
}

CnvLayer:

{
	"implements": [
		{"ref": "org.sagebionetworks.repo.model.LayerBase"}
	],
	"properties": {
		"platformData": {
			"ref": "org.sagebionetworks.repo.model.PlatformData"
		}
	}
}

SnpLayer:

{
	"implements": [
		{"ref": "org.sagebionetworks.repo.model.LayerBase"}
	],
	"properties": {
		"platformData": {
			"ref": "org.sagebionetworks.repo.model.PlatformData"
		}
	}
}

Note: CnvLayer and SnpLayer can implement GenotypeLayer instead of LayerBase

ExpressionLayer:

{
	"implements": [
		{"ref": "org.sagebionetworks.repo.model.LayerBase"}
	],
	"properties": {
		"platformData": {
			"ref": "org.sagebionetworks.repo.model.PlatformData"
		}
	}
}

MediaLayer:

{
	"implements": [
		{"ref": "org.sagebionetworks.repo.model.LayerBase"}
	],
	"properties": {
	}
}

NetworkLayer:

{
	"implements": [
		{"ref": "org.sagebionetworks.repo.model.LayerBase"}
	],
	"properties": {
		"numNodes": {
			"type": "integer",
			"description": "Number of nodes in network."
		}
	}
}

Note about searching/selecting for layers:

- There's a hierarchy between layers (e.g. CnvLayer and SnpLayer are derived from GenotypeLayer). When we search for genotype data, the

  'subtypes' should be included. Also, a simple select needs to bring back any types of layers (i.e. "/layer" should bring all flavors xxxLayer back).

  Also need to be able to specify specific type (e.g. "select * from layer where layer.type == 'SnpLayer'")

Migration:

Existing layers to be migrated to typed layers by LayerMigrator (either based

on existing "type" annotation or id-based).

Layer lifecycle:

Goal is to provide most of the needed layer types upfront. New layer types should first

be implemented using GenericLayer and annotations. As needed, new layer schemas

can be added with associated migration process.