Special values in Synapse
This document is an attempt to capture how special values like None/NULL/NA/NaN/Inf/-Inf are handled in PLFM and across different clients as of today.
Questions: should this be available in REST and client docs, and keeping them updated there?
Background
There are several feature in Synapse that support floating point values like Annotations on Entity, Annotations on Submissions, and Table.
Since File View and Project View display File's and Project's annotations, there is a conversion between Annotations on Entity to Table Column Type.
Java
Since PLFM is written in Java, the backend, we depends on what Java supports.
Java supports NULL for all types.
Java has Double.POSITIVE_INFINITY, Double.NEGATIVE_INFITIVE, and Double.NaN.
Python
Python uses None (NoneType) for NULL values.
Python supports NaN, Inf and -Inf for floating point values.
R
R uses NULL for NULL values.
R supports NaN, Inf, and -Inf for numeric values.
R also use NA for "Not Available" values.
References:
- http://www.residentmar.io/2016/06/12/null-and-missing-data-python.html
- https://stat.ethz.ch/R-manual/R-devel/library/base/html/NA.html
Annotations
API
At the API level, an annotation on an entity can have the following types:
- String
- byte[]
- Long
- Double
- Date
Client Behaviors
The below experiments are done on Double on the following services:
Client | Empty value | NULL | None | NA | NaN | Inf | -Inf | Date |
---|---|---|---|---|---|---|---|---|
curl (repo-201) | PUT GET Empty list | PUT GET | NOT APPLICABLE | NOT APPLICABLE | PUT GET | PUT GET "Infinity" or "+Infinity" | PUT GET "-Infinity" | 10/19/2017 |
web client (201) | PUT GET | PUT | NOT APPLICABLE | NOT APPLICABLE | PUT GET | PUT GET "Infinity" or "+Infinity" | PUT GET "-Infinity" | 10/20/2017 |
python (synapseclient_1.7.2) | NOT APPLICABLE | NOT APPLICABLE | PUT GET Becomes string | NOT APPLICABLE | PUT GET | PUT GET | PUT GET | 10/20/2017 |
old r client | NOT APPLICABLE | PUT | NOT APPLICABLE | PUT | PUT | PUT | PUT | 10/23/2017 |
synapser | NOT APPLICABLE | PUT | NOT APPLICABLE | PUT GET Becomes "None" | PUT GET | PUT GET | PUT GET | 10/23/2017 |
Related Jira Tickets
Submission Annotations
API
At the API level, an annotation on a submission has the following types:
- StringAnnotation
- DoubleAnnotation
- LongAnnotation
Client Behaviors
The below experiements are done on DoubleAnnotation on the following services:
Client | Empty value | NULL | None | NA | NaN | Inf | -Inf | Date |
---|---|---|---|---|---|---|---|---|
curl (repo-201) | PUT GET 'value' is not set | PUT GET | NOT APPLICABLE | NOT APPLICABLE | PUT GET | PUT GET "Infinity" or "+Infinity" | PUT GET "-Infinity" | 10/19/2017 |
web client (201) | Not supported | 10/20/2017 | ||||||
python (synapseclient_1.7.2) | PUT GET | NOT APPLICABLE | PUT GET Becomes empty | NOT APPLICABLE | PUT GET | PUT GET "Infinity" or "+Infinity" | PUT GET "-Infinity" | 10/23/2017 |
old r client | PUT | PUT | NOT APPLICABLE | PUT | PUT | PUT | PUT | 10/23/2017 |
synapser | PUT GET | PUT | NOT APPLICABLE | PUT GET Becomes empty | PUT GET | PUT GET "Infinity" or "+Infinity" | PUT GET "-Infinity" | 10/23/2017 |
Table
API
At the API level, a table column has the following types:
- STRING
- DOUBLE
- INTEGER
- BOOLEAN
- DATE
- FILEHANDLEID
- ENTITYID
- LINK
- LARGETEXT
- USERID
Client Behaviors
The below experiments are done on DOUBLE on the following services:
- POST /entity/{id}/table/append/async/start
- GET /entity/{id}/table/append/async/get/{asyncToken}
- POST /entity/{id}/table/query/async/start
- GET /entity/{id}/table/query/async/get/{asyncToken}
Client | Empty value | NULL | None | NA | NaN | Inf | -Inf | Date |
---|---|---|---|---|---|---|---|---|
curl (repo-201) | UPDATE QUERY "" becomes NULL | UPDATE QUERY | NOT APPLICABLE | NOT APPLICABLE | UPDATE QUERY | UPDATE QUERY Becomes "Infinity" or "+Infinity" | UPDATE QUERY Becomes "-Infinity" | 10/23/2017 |
web client (201) | UPDATE QUERY | UPDATE | NOT APPLICABLE | NOT APPLICABLE | UPDATE QUERY | UPDATE QUERY "Infinity" or "+Infinity" | UPDATE QUERY "-Infinity" | 10/20/2017 |
python (synapseclient_1.7.2) | NOT APPLICABLE | NOT APPLICABLE | UPDATE QUERY | NOT APPLICABLE | UPDATE QUERY | UPDATE QUERY | UPDATE QUERY | 10/23/2017 |
old r client | ||||||||
synapser | NOT APPLICABLE | NOT APPLICABLE | NOT APPLICABLE | UPDATE QUERY Becomes NULL | UPDATE QUERY | UPDATE QUERY Becomes "Infinity" or "+Infinity" | UPDATE QUERY Becomes "-Infinity" | 10/23/2017 |
The below experiments are done on DOUBLE on the following services:
- POST /entity/{id}/table/upload/csv/async/start
- GET /entity/{id}/table/upload/csv/async/get/{asyncToken}
- POST /entity/{id}/table/download/csv/async/start
- GET /entity/{id}/table/download/csv/async/get/{asyncToken}
Client | Empty Value | NULL | None | NA | NaN | Inf | -Inf | Date |
---|---|---|---|---|---|---|---|---|
curl (repo-201) | ||||||||
web client (201) | UPDATE QUERY Ignore empty rows | UPDATE | NOT APPLICABLE | NOT APPLICABLE | UPDATE QUERY | UPDATE QUERY When query, returns "Infinity" | UPDATE QUERY When retrieved, returns "-Infinity" | 10/20/2017 |
python (synapseclient_1.7.2) | UPDATE Can be uploaded from csv, cannot be read from Pandas | NOT APPLICABLE | UPDATE Lost in Pandas for row with only None, and becomes NaN in float column | NOT APPLICABLE | UPDATE QUERY Pandas writes NaN as empty values to csv, but can read NaN from csv | UPDATE QUERY float inf turns into "Infinity" | UPDATE QUERY float --inf turns into "-Infinity" | 10/20/2017 |
old r client | NOT APPLICABLE | NOT APPLICABLE | NOT APPLICABLE | UPDATE QUERY | UPDATE QUERY | UPDATE QUERY | UPDATE QUERY | 10/23/2017 |
synapser | NOT APPLICABLE | NOT APPLICABLE | NOT APPLICABLE | UPDATE QUERY Cannot save a csv that contains 'NA'. However, 'NA' in r data.frame made the round trip. | UPDATE QUERY | UPDATE QUERY | UPDATE QUERY | 10/31/2017 |