Project Description

The phenotype editor is a project that enables curators to better track the Clinical Phenotype information in a dataset.

Use Cases (In order of importance)

  1. make sure all values exist in user-defined enumeration
  2. set units + desc for a col
  3. Make sure all values in a col match an existing ontology
  4. standardized clinical variable names across studies (column)
  5. complete ontology for sage use => EFO partial soln (Brig)
  6. clean up misspellings, synonyms, capitalization => google refine
  7. generate script to curate data, apply same transformations to new/updated dataset => google refine
  8. Show description of term
  9. some sort of record of what was changed, and from what to what
  10. unit conversion
  11. linking across studies by id - cell lines - same patients multiple studies

Server Design

It appears that to satisfy the most important use cases we can use nearly all existing features within the repository service. This includes

What we need and do not have is a validator that can take an ObjectSchema object and a filled JSONAdaptor (aka a JSON object) and validate that the Adaptor object conforms to the requirements of the ObjectSchema. (John)

Google refine looks very good if we create an extension for reconciling data. The other problems are:

See data model below for details on the shape of the objects listed above.

NCBO Web Services

The plan is to use the NCBO services directly, without any routing through our service layers. This means that REST requests will be made directly from the GWT server side. In the future, if we find that the NCBO services become non-performant or are not amenable to batch requests, we can implement a local cache/batching skin.

We're planning to use the following NCBO services:

Client

The web UI will hold a significant amount of the business logic for the Phenotype editor. When at all possible, persistence details will occur on the server to keep the CRUD operations decoupled (i.e. a column's ObjectSchema will be created in the client with an ObjectSchema pojo, but it will be serialized and persisted into the Blob annotations on the server side behind a "saveColumnDefinition" like method).

Data Model