validation step for sync function doesn't provide informative error if there is an issue with tab delimittation in the manifest file

Description

Some text editors don't map the tab command to the official unicode tab character, and will not be parsed appropriately by the sync validator. It would be useful to indicate to users that in fact there delimitted file does not have tabs (e.g. if the editor uses multiple spaces instead of tab), so that they can change that.

Environment

None

Activity

Show:
Kenneth Daily
January 11, 2018, 9:23 PM

Also worth noting for #1 as a solution is comma separated is the default output of major spreadsheet editors, which could lower confusion and burden on annotation and data contributors.

Kenneth Daily
January 11, 2018, 8:46 PM
Edited

I would advocate for two solutions:

1. Use commas not tabs. There is less ambiguity in the editor since it's not a 'hidden' character. Probably not enough people using this yet that it would be a terrible breaking change, but that's up to some usage stats to determine.

2. We rely on pandas.read_csv to read in the file with a strict delimiter set (sep="\t"):

https://github.com/Sage-Bionetworks/synapsePythonClient/blob/master/synapseutils/sync.py#L196

We could not set the delimiter and let pandas use Python's csv.Sniffer to infer the delimiter. Has it's own issues. We could still require and document that tabs (or commas) should be used, and anything else you want to try, YMMV.

Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Jordan Kiang

Reporter

Ben Logsdon

Labels

Validator

Ben Logsdon