extend table.py to handle dict object

Description

A user would be able to add a dict object to Table() and get back a representation of a synapse table. For example:

From there, user should be able to reference the csv file path, call `store()`, `asRowSet()` (if we continue to support `asRowSet()`:

When query from synapse, user should be able to get a dictionary from the query result:

Also, this roundtrip will not inherit existing Pandas limitations.

Environment

None

Activity

Show:
Kimyen Truong
May 25, 2018, 9:38 PM

Per discussion with , we may be able to keep the name `RowSet` and make available functionality to support Python users to handle table data in native "dict-like" objects.
TODO: I will look more into the pain points of using RowSet and figure out what is the natural way Python users use dict-like objects to manipulate rectangular data.

Kimyen Truong
May 14, 2018, 8:58 PM
Edited

After discussing the document above with & , I opened the following issue:

And is blocked by this issue.

Kimyen Truong
May 7, 2018, 7:02 PM
Edited

I'm writing this documentation to formalize the discussion & documenting the suggested solution: https://sagebionetworks.jira.com/wiki/spaces/SYNPY/pages/529399809/Table+Interface

Kimyen Truong
May 2, 2018, 12:44 AM
Edited

Per 's suggestion, I talked to multiple Python users to gather feedback around the Table interface in the python client. The people I have talked to are , , , , (via the python slack channel, still need to sync up with Phil in person).

Below are the summary:

  • To upload a local Pandas DataFrame (and create a Synapse Table), one must type multiple lines:

    This is inconvenient and confusion because the user must manage 2 separate objects (Schema and Table). It's different than how a user would handle File.

  • A user want to create a template (a Schema that can be reused to create multiple table). The current work around is getting the Schema of an existing table and use its columns to create another Schema.

  • To download a Synapse File, one would use `syn.get()`. This function returns both the File metadata and the file itself by default. `syn.get()` on a Synapse Table, however, doesn't download the actual data. Most file are actually bigger than most table in size, so a programatic client user would already know that `syn.get()` may take time.

  • RowSet - most people I talked to have not interact with or avoid working with RowSet.

  • Most people do not care about the type of the object, as long as it has the function that they need and behave more consistent with other type of entities.

Kimyen Truong
April 26, 2018, 12:51 AM

Note that talking to users does not block me from attempting to make Table() function behaves the same for both type of input (Pandas DataFrame and csv file path). However, having a clear picture of what functionality we are going to support helps guide me when I work on the implementation.

Assignee

Unassigned

Reporter

Kimyen Truong

Labels

None

Validator

Kenneth Daily

Development Area

None

Release Version History

None

Slack Channel

None

Epic Link

Components

Priority

Major