syncToSynapse adds empty annotation values

Description

When users upload with a manifest that contains missing values (see attached example, e.g. specimenID etc.), keys with empty values are assigned to the entity.

There are several cases where empty values will end up in the manifest. For one, the schema might only be applicable to a subset of entities in the manifest.

Before creating a dictionary from the manifest, keys with empty values should be removed.

Environment

mac Os 10.15.6

Activity

Show:
Kelsey Montgomery
October 23, 2020, 5:53 PM

is this a bug to fix or do you think syncToSynapse should continue to function this way?

Jordan Kiang
October 23, 2020, 8:36 PM

Changing the client such that it won't apply an annotation if there is no value for a particular item in the manifest (rather than setting an empty annotation) makes sense to me and isn't a difficult change. The only question would be if there is any existing usage that depends on the presence of an empty annotation.

We discussed this briefly at the Code Review session today and nobody could think of any such cases, but suggested we might ping and to see if they had any thoughts.

Milen Nikolov
October 23, 2020, 9:36 PM

thanks for looping me in. It looks like there are two ways to answer this question depending on whether we talk about the current Synapse annotation system vs the upcoming Synapse annotation system that supports validation schemas associated with entities.

In the case of the current Synapse annotation system,

  • after the change, would you still be able to use set_annotations() with key:value where value = ““/None (i.e. non-bulk case)? This plays into some of annotations' usage in HTAN; we could easily change this usage if needed, so it’s not necessarily a blocker here

  • what is the current behavior of file views where some subset of files in the view have a set of annotation keys A, while other files in the view have a set of annotation keys B, where B is subset of A? E.g. We might have two files x and y in a fileview; we might have a value for key ‘a' in set A for file x, but have an empty value for key ‘a’ for file y; if I understand correctly, with the proposed solution, the latter would cause key 'a’ to be dropped from the annotation set of file y?

  • I can see allowing empty values for annotation keys provides flexibility/convenience in some use cases. Nothing I can see right now that can’t be surmounted by using NaN, Unknown, None, etc..

In the case of the upcoming validation-schema-aware system

  • in general, shouldn’t we let the schema determine if empty annotation values are allowed for a given key (if a schema is associated with the given entity)? Client logic probably shouldn’t conflict with or reimplement schema logic.

  • I think there is a case for a default/generic schema associated with every Synapse entity. The default schema could have logic that does not allow empty values for any annotation key; the default schema can be overridden by a user-submitted schema. The client logic follows whichever is specified in the schema for each annotation key. (I.e. no annotation keys are outside the default schema)

  • btw, there might be other schema-related logic in the client that could be abstracted in a (default) schema; that could simplify the client logic: the client just follows whichever schema (default or not) is associated with each Synapse entity

Jordan Kiang
October 23, 2020, 9:54 PM

This is in the context of the existing behavior for the current client’s syncToSynapse manifest upload only, where there is some ambiguity in a TSV file upload if an empty column in a file means apply no annotation or apply an empty annotation.

If we made this change, it would only apply in that context and other existing behavior would remain unchanged, i.e. you’d still be able to explicitly set_annotations with an empty value.

This change would not affect any plans for the upcoming newer annotation system.

Milen Nikolov
October 23, 2020, 10:26 PM

Great, in this case I can confirm there aren’t any stoppers on my end for implementing the change.

Fixed

Assignee

Jordan Kiang

Reporter

Kelsey Montgomery

Labels

None

Validator

Kelsey Montgomery

Development Area

None

Release Version History

None

Fix versions

Affects versions

Priority

Major
Configure