syn.getColumns does not terminate

Description

Environment

None

Activity

Show:
Bruce Hoff
February 26, 2018, 6:31 PM

I don't think we should have written a Python client function that takes a list of 'n' column IDs and makes 'n' service calls to get each column model. If you want the columns for a schema the right thing to do is to call getTableColumns() as said. Unless there is a compelling reason to support passing a list of column IDs to getColumns(), I'd prefer to deprecate and remove that part of getColumns(). Another approach is to make the Python client handle being throttled more gracefully, e.g. by slowing down and retrying (perhaps printing a warning message to let the user know why things are slowing down). Perhaps we should do both.

Kenneth Daily
February 26, 2018, 7:05 PM

I mentioned the above for backwards compatibility (as you alluded to). The reason is 'getColumns' does other things as well if something besides a Table Schema Synapse ID is passed in. So, in the case that a schema ID is passed, the appropriate and scalable solution would be to call 'getTableColumns' internally. Then it wouldn't break any existing code.

From the user perspective (in this case, my perspective) used this function because it is alphabetically first in the list when looking for what I want. I want to get columns, and know that most functions are called 'synGetSomething'. I started typing 'synGet' and tab complete, and there near the top of the list is 'synGetColumns' - 'synGetTableColumns' is way down!

Bruce Hoff
February 26, 2018, 7:15 PM

> in the case that a schema ID is passed, the appropriate and scalable solution would be to call 'getTableColumns' internally.

The code appears to do that already.

> The reason is 'getColumns' does other things as well if something besides a Table Schema Synapse ID is passed in.

Agreed. That's why I said "I'd prefer to deprecate and remove that part of getColumns()." (just part, not the whole function). Again, we'd have to be careful not to break existing code.

> I started typing 'synGet' and tab complete ...

Good "usability" feedback, thanks!

Ziming Dong
April 5, 2018, 12:14 AM

Moving this to Critical because it is an infinite loop. Should be an easy fix though because we already do this correctly in getTableColumns()

Explanation of infinite loop below:

So discovered that the timeout is due to an infinite loop of constantly getting the same list of column models. PaginatedColumnModels's name is misleading; it is not actually paginated and will always return the full list of column models, but it does contain the 'totalNumberOfResults' field. We used to check the 'totalNumberOfResults' field and stop trying to get the next page once we have everything, but that field is being deprecated so the client moved away from using it (). Instead, we just use limit and offset until no more results are returned. the /entity/{id}/column api does not support limit and offset (since it always returns everything) so we repeatedly ask for all of the columns in a table until we get throttled.

Kenneth Daily
April 24, 2018, 9:25 PM

verified fixed.

Assignee

Ziming Dong

Reporter

Kenneth Daily

Labels

None

Validator

Kenneth Daily

Development Area

None

Release Version History

py-1.8.1

Fix versions

Priority

Critical
Configure