Is it possible to map synapse column types to pandas dtype and enforce dtype on view.asDataFrame() call?

Description

Noted that on view.asDataFrame() python client call, pandas converts columns of type int to float if there is NAN or NA values are among the values of the column schema.

Was hoping to see if the backend can improve this by mapping synapse schema types to pandas dataframe type and enforcing the dtype on view.asDataFrame() call. ex.

This would be beneficial for both python and R client community users. as we go through repetitive type changing steps.

Environment

None

Activity

Show:
Nick Grosenbacher
September 24, 2018, 10:31 PM

Not related to Python, but the old rSynapseClient handled this, so if consistency with the legacy client is worth considering, we can emulate that behavior.

Nick Grosenbacher
September 24, 2018, 10:27 PM

These issues are all caused by R inferring types from read.csv function. These could be resolved by mapping synapse column types to R types

Kimyen Truong
September 24, 2018, 9:45 PM

, could you link the issues you are working on to this one? Thank you.

Kimyen Truong
September 24, 2018, 9:45 PM

I think I understand the question. It's related to Table/ View issues that is working on in the synapser client.

Table and View were downloaded as csv in the client. The client users would like to read it into the language specific format to work on the data. For example, Python users would like to read the data into Pandas DataFrame while R users would like to read the data into R data.frame. However, the data that is stored in the backend is a supper set of the data that can be read into the client data type. Currently, in both Python and synapser client, we tried to map the data into the "correct" type based on the values of the data, but not on the type of the schema (defined with the Table/View).

Ideally, we want the client to display the data in Pandas DataFrame and in R data.frame matching the Table/View schema. However, in some case, this will not be possible because of number of reasons, including the data type in some language doesn't mean the same in another language.

Possibly, we could provide an option for users to read all data as string, and manipulate the column type however they want. has talked to a few key users and this is what they suggested.

We should sync up on this topic and make sure that we are on the same page on the solution.

Meredith Slota
September 24, 2018, 7:46 PM

Do you feel like this ticket as-written is the right thing to do? I'll admit I don't understand it fully.

Assignee

Unassigned

Reporter

Nasim Sanati

Labels

None

Validator

Kenneth Daily

Development Area

None

Release Version History

None

Slack Channel

None

Epic Link

Components

Priority

Minor