Instant Exporting

Instant Exporting

This is documentation for Bridge Instant Exporting service.

 

Scenarios for Bridge Exporter

Re-export sqs request msg example:

{

  "endDateTime":"2016-10-04T23:59:59Z",

  "startDateTime":"2016-10-03T23:59:59Z",

  "exportType":"DAILY",

  "ignoreLastExportTime":"true",

  "tag":"test exporter"

}

  • re-drive:

    • for redriving tables, set ignoreLastExportTime flag to true;

    1.  

      • then identical to re-export logic;

      • change codes in ExportWorkerManager;

      • identical sqs msg as shwon in re-export;

    • for redriving records, do not change anything (since it goes with s3override);

  • failed export:

  1.  

    • since failed exporting will not be able to update export Time table in ddb, the subsequent export task, whatever type it is (one-time or daily), will export all records from last export date time to given end date time -- that means it will export all records what should be exported for the failed exporting task. And retried failed export will export 0 record;

    • if the retired failed export went well before the next new request, it will update lastExportDateTime and the next request can proceed as normal;

    • do not need to change any codes;

  • identical export tasks

  1.  

    • since we assume Exporter is a single-machine server and it can only deal with one request at a time, if exporter receive two identical request at the same time, it can only deal with one of them and if successfully exported, since the lastExportDateTime already being updated, the next identical request will export 0 record;

    • No codes need to change;

  • normal override export:

  1.  

    • because we don’t use time range at all, we are by default not using lastExportDateTime;

    • will not update last export date time;

    • proceeds as original logic;

  • special cases: delete study:

    • should remove corresponding field in export time table as well;

    • change codes in BridgePF deleteStudy;

 

General requirements

  1. add sqs dependency in bridge pf -- refer to udd;

  2. export only from last export date time to now:

    1. need to create a separate table from study table -- it’s an update frequent behavior -- conflict with what we want for study table:

    2. only contain two columns: studyId, lastExportDateTime;

  3. everytime exporter done exporting, update above new table;

  4. note: all sqs request will NOT contain startDateTime, but add an extra field “exportType” to indicate which type the request is -- ‘instant’, ‘daily’ or ‘hourly’ --

    1. so that when we need to use re-export (i.e. set ignoreLastExportDateTime to true), we can determine what time range exporter needs to query for that request;

    2. normal cases will not use ‘exportType’ at all and always fetch lastExportDateTime from ddb table;

Exporter changes

  1. mostly in RecordIdSourceFactory:

    1. all daily, hourly and instant export request have field ‘exportType’;

    2. Daily and hourly requests without override have end date time;

    3. check if it has white list:

      1. if yes: just use study whitelist as study id list;

      2. if no: scan whole Study table to get study id list;

    4. then get last export date time from export time table:

      1. if given study id field does not exists: just use generated start date time as last export date time;

    5. query items by iterating study id list with given last export date time and end date time;

      1. if endDateTime is before lastExportDateTime, just return an empty list for that study;

  2. BridgeExporterRecordProcessor:

    1. update export time table with new last export date time if the export succeeds (and if ignoreLastExportTime is false);

  3. modify ExportWorkerManager

    1. build request with ignoreLastExportTime flag if it is a re-drive;

Scheduler changes

  1. for daily exporting:

    1. add endDateTime;

    2. add field ‘exportType’ to ‘DAILY’;

    3. remove ‘Date’ field;

  2. for hourly exporting:

    1. remove startDateTime;

    2. add field ‘exportType’ to ‘HOURLY’;

 

Bridge Docs and JDK

  1. add new API in usual way;

 

BridgePF integration test

  1. add an extra instant exporting test;



Test Cases

  1. normal case: daily export without ignoreLastExportTime

    1. check if it use both date and datetime range

    2. lastExportDateTime exists -- check if it use lastExportDateTime;

    3. does not exist -- check if it use correct startDateTime;

  2. normal case: one-time export without ignoreLastExportTime

    1. lastExportDateTime exists -- check if it use it;

    2. does not exist -- check if it use correct startDateTime;

  3. normal case: hourly export without ignoreLastExportTime

    1. lastExportDateTime exists -- check if it use it;

    2. does not exist -- check if it use correct startDateTime;

  4. special case: daily export with ignoreLastExportTime

    1. check if it use correct startDateTime instead of lastExportDateTime;

    2. check if it did not modify ddb table

  5. special case: hourly export with ignoreLastExportTime

    1. check if it use correct startDateTime instead of lastExportDateTime;

    2. check if it did not modify ddb table

  6. special case: s3-override:

    1. check if it did not modify ddb table

  7. special case: re-drive:

    1. for redriving records -- check if it did not modify ddb table;

    2. for redriving tables:

      1. check if it use correct startDateTime instead of lastExportDateTime;

      2. check if it did not modify ddb table;

  8. special case: endDateTime is before lastExportDateTime:

    1. check if it modify the export time table to endDateTime;

  9. special case: failed export: the only way is to test it manually in local environment