Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is documentation for Bridge Instant Exporting service.

...

    1. Migration strategy:

      1. As a bootstrapping process, modify original codes to let daily and hourly exporter modify lastExportDateTime with exported studies -- so that the ddb exportTime table will contain correct most-up-to-date last export date time;

      2. Also, add two new fields into sqs request: "exportType" and "ignoreLastExportTime":

        1. exportType: specify what kind of task the request wants to proceed: DAILY, HOURLY, INSTANT or s3override (note: s3override will have null value for this field);
        2. ignoreLastExportTime: in v1 Instant Exporting, it will be used to determine if exporter needs to modify exportTime table – if it is set to true (case when re-export, re-drive for table), it will not modify that table at all;
      3. Note: for v1, we need startDateTime and endDateTime for both DAILY and HOURLY but will not remove 'date' field in daily export to mitigate side effects;
      4. then we proceed migration --- the next time it exports under v2 instant exporting, exporter will be able to use the correct time range as expected;

    2. normal daily exporting:

      1. get studyid list by scanning the ddbStudyTtable (extract study id from each study) -- we can make sure it contains newest study;

      2. lookup last export date time with given study id in export time table and put this study id into new map with stud is as key and last export date time as value;

        1. if there is no such study in export time table, determine the startDateTime by looking at the ‘exportType’ field in request:

          1. if it is ‘daily’ -- we set startDateTime to 24 hours before the given endDateTime

      3. then rangekey query the studyUploadedOnIndex and return records to export;

      4. finally update export time table with new study export date time;

      5. Note: if there is a one-time export in the middle of the day, since each exporting task will modify the lastExportDateTime value in exportTimeTable, the daily exporting will only query records from lastExportDateTime to given endDateTime. Also, since it will update exportTimeTable as well, every exporting task thereafter will only need to query from previous endDateTime (as lastExportDateTime in exportTimeTable) to given endDateTime as well;

  • Daily exporting sqs msg example:

    {

      "endDateTime":"2016-10-04T23:59:59Z",

      "exportType":"DAILY",

      "tag":"test exporter"

    }

    1. normal hourly exporting: -- will have study whitelist

      1. lookup last export date time with given study id in export time table and put this study id into new map with stud is as key and last export date time as value;

        1. if there is no such study in export time table, look up ‘exportType’ field in sqs request -- if it is ‘hourly’, set startDateTime to 1 hour before the endDateTime;

      2. then rangekey query the studyUploadedOnIndex and return records to export;

      3. finally update export time table with new study export date time;

      4. Note: for cases when previous scheduled one-hour exporting executed after followed scheduled export task, e.g. export for 3-4pm then for 2-3pm, since it will export all records from last export date time, the export for 3-4pm will export all records from 2-4pm and the out-ordered export for 2-3pm will export nothing if it is executed (do not throw exception); -- similar case for daily export;

      5. Note: similarly, if there is a one time export happened between two one-hour exports, e.g. one-hour for 1-2pm, then one-time for 2:40pm and then one-hour for 2-3pm the second one-hour export will export all record from 2:40pm to 3pm but not record before 2:40pm;


    Hourly exporting sqs request example:

    {

      "endDateTime":"2016-10-04T23:59:59Z",

      "exportType":"HOURLY",

      "studyWhitelist":["api"],

      "tag":"test exporter"

    }

    • one-time exporting: -- only has one study id in study whitelist

      • Data workflow for instant-exporting

      • BSM: instant exporting button in BSM: when clicking the button, only export current study’s data to Synapse;

      • BridgePF:

        • API: POST /v3/instantExport

          • both RESEARCHER and DEVELOPER can call this API

        • Controller: InstantExportController extends BaseController

          • method: setInstantExportService(InstantExportService instantExportService);

          • method: requestInstantExport(String endDateTimeStr);

            • return Result object;

        • Service: interface InstantExportService

        • Service: InstantExportViaSqsService implements InstantExportService:

          • method: void export(@Nonnull StudyIdentifier studyIdentifier, @Nonnull DateTime endDateTime);

          • logic:

            • wrap current studyId (wrap into a JsonArray node) as json node and send it to given sqs url -- see example below:


    One-time exporting sqs msg example:

    {

      "studyWhitelist":["api"],

      "exportType":"INSTANT",

      "tag":"test exporter"

    }

        • Config:

          • add a bean ddbExportTimeTable

      • Bridge Exporter:

        • lookup last export date time with given study id in export time table as startDateTime;

        • put this study id into new map with study id as key and last export date time as value;

          • if there is no such study in export time table, put a default startDateTime as map’s value (like the midnight of given endDateTime);

        with ignoreLastExportTime flag to true
        • set query endDateTime to 1 min before right now -- to avoid clock skew issue in distributed systems;

        • then rangekey query the studyUploadedOnIndex and return records to export

        • finally update export time table with new study export date time;

    • re-export:

        • export time table with new study export date time;

    • re-export:

      • with ignoreLastExportTime flag to true;

      • add an optional start date time field to indicate the time range;

      • startDateTime cannot exist with exportType in one request;

      • note: if there is no start date time, it will use default start date time by using given exportType;

      • for daily re-export:

        • Similar to normal daily export but instead of looking up lastExportDateTime, it will only set startDateTime to 24 hours before the given endDateTime, or using given startDateTime;

      • for hourly re-export:

        • Similar to normal cases but instead of looking up lastExportDateTime, it will only set startDateTime to 1 hour before the given endDateTime, or using given startDateTime;

      • no use case for one-time re-export;

      • then rangekey query the studyUploadedOnIndex and return records to export;

      • and not update export time table at all;

Re-export sqs request msg example:

{

  "endDateTime":"2016-10-04T23:59:59Z",

  "startDateTime":"2016-10-03T23:59:59Z",

  "exportType":"DAILY",

  "ignoreLastExportTime":"true",

  "tag":"test exporter"

}

  • re-drive:

    • for redriving tables, set ignoreLastExportTime flag to true;

      • then identical to re-export logic;

      • change codes in ExportWorkerManager;

      • identical sqs msg as shwon in re-export;

    • for redriving records, do not change anything (since it goes with s3override);

  • failed export:

...