Skip to end of banner
Go to start of banner

Table schema changes

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

There are a number of use cases where we want table schema changes as part of the lifecycle of a table. Most requested is certainly the easy case of increasing the size of a string column. More complex is changing a string column to an int column or even changing a string column to a file handle. The other use case is where a partially different dataset is appended to an existing table and it magically merges and rinses the data to the new table format.

Magic aside, I propose the following:
- in addition to the current schema changes possible (currently, you can update a table entity with a new set of columns, and we presume that non-matching column ids mean you meant to append or delete those columns) we add a new call to explicitly change the columns of a table. A structure like:
{
  tableId = 1,
  [
    { change = DELETE, columnId = 12 },
    { change = APPEND, columnId = 13 },
    { change = CONVERT, fromColumnIid = 14}, toColumnId = 15 }
  ]
}
is passed in to an async call (because validation can take a long time), and the result if an error occurred is a (truncated!) list of row values that did not convert. Conversion only happens if all values can be converted to the new type.
- for csv uploads we add this same change list to the upload request, which allows for creation or modification of a table at the same time the data is appended.
Internally, we currently store rows as a series of rowset deltas. We augment that by adding column modifications to that same list of deltas, so that we can replay the data and schema changes in the same order they were received in the first place (this implies that any changes made to the columns allow for equivalent SQL modification of a table, which I don't think poses any restrictions)
Some questions:
1. should we allow nulling out of non-convertable cell values (default answer: no, too dangerous)
2. should we allow column modifications as part of the CSV, ie special formatted lines at the top of the csv, before (or instead) of the header. (default answer: no, it makes the csv non-standard)
Let me know if you have questions, concerns, or if this needs a meeting
  • No labels