Document toolboxDocument toolbox

Demographics Values Validation

BRIDGE-3030 - Getting issue details... STATUS

Overview

Researchers and app developers can submit custom configurations for demographics validation. Submitted demographics are validated against these configurations, if they exist. This validation is separate from, and occurs after, the standard Bridge validation to determine if input is malformed.

Validation is available for both app-level and study-level demographics.

Each combination of app, study, and demographics category requires a separate configuration. The configuration specifies the type of validation to perform (currently “enum” or “number_range”) and additional data required to perform that validation.

Validation types can be “enum” or “number_range” validation. “Enum” validation checks that all submitted values for the specified category match a value provided in the configuration. “Number_range” validation checks that all submitted values for the specified category are valid numbers, that they are not less than an optionally specifiable minimum value, and that they are not larger than an optionally specifiable maximum value (if a min or max is not specified, that side of the range is unbounded).

When demographics are rejected, no errors are returned; the demographics are still processed and stored normally. However, each value that is invalid will have an error message stored in its “invalidity” field. When demographics are exported to Synapse, the error message will be stored in the “demographicInvalidity” column in the Participant Versions Demographics table.

API

Endpoints exist for saving validation configs, fetching validation configs, and deleting validation configs.

POST /v5/studies/{studyId}/participants/demographics/validation/{categoryName}

Save a validation config for validating study-level demographics.

POST /v3/participants/demographics/validation/{categoryName}

Save a validation config for validating app-level demographics.

GET /v5/studies/{studyId}/participants/demographics/validation/{categoryName}

Fetch a validation config used for validating study-level demographics.

GET /v3/participants/demographics/validation/{categoryName}

Fetch a validation config used for validating app-level demographics.

DELETE /v5/studies/{studyId}/participants/demographics/validation/{categoryName}

Delete a validation config used for validating study-level demographics.

DELETE /v3/participants/demographics/validation/{categoryName}

Delete a validation config used for validating app-level demographics.

Model

The data for the configuration should be an object with “validationType” and “validationRules.”

If “validationType” is “enum,” “validationRules” should be an object mapping language codes to arrays of possible values. If “validationType” is “number_range,” “validationRules” should be an object with optional “min” and “max” specifying the bounds of the range.

Enum validation example

The following is POSTed to /v5/studies/{studyId}/participants/demographics/validation/{categoryName} with categoryName = "ethnicity".

{ "validationType": "enum", "validationRules": { "en": [ "American Indian/Alaska Native", "Asian", "Native Hawaiian or Other Pacific Islander", "Black or African American", "White", "More Than One Race", "Prefer Not to Answer" ] } }

When study-level demographics are submitted, each submitted demographic’s category name is used to see if there is a validation config matching the app, study, and category. If there is, each value in that submitted demographic is checked to make sure that it is present in the array of possible values. If any values are not present in the array of possible values, the demographic is rejected.

Example:

User submits "martian" for category "ethnicity". Validation configs are checked to see if one exists matching the corresponding app, corresponding, study, and category of "ethnicity", and one does exist. However, "martian" is not an available option, so the demographic is rejected.

Number range validation example

The following is POSTed to /v3/participants/demographics/validation/{categoryName} with categoryName = "year-of-birth".

{ "validationType": "number_range", "validationRules": { "min": 1900, "max": 2050 } }

When app-level demographics are submitted, each submitted demographic’s category name is used to see if there is a validation config matching the app, study, and category. If there is, each value is checked to make sure that it is a number. If any is not a number, the demographic is rejected. If the min is specified and any value is less than the min, the demographic is rejected. If the max is specified and any value is greater than the max, the demographic is rejected.

Example:

User submits 1800 for category "year-of-birth". Validation configs are checked to see if one exists matching the corresponding app, corresponding, study, and category of "year-of-birth", and one does exist. However, "1800" is not between 1900 and 2050, so the demographic is rejected.

NIH Minimum for MTB

See https://sagebionetworks.jira.com/wiki/spaces/MTB/pages/1934819724

year-of-birth

{ "validationType": "number_range", "validationRules": { "min": 1900, "max": 2050 } }

biological-sex

ethnicity

highest-education

Internals

Validation configs are stored in Dynamo. Hash key is appId + “:” + studyId (blank if null). Range key is categoryName.

Validation configs need to be deserialized twice in separate places (once when saved, in order for them to be validated, and once when fetched from Dynamo, in order to actually validate a demographic), but the deserialization logic should be common. Therefore, the design I went with was to add a method to the validation type enum which returns a DemographicValuesValidator. DemographicValuesValidator is an interface for an opaque type which first accepts validation rules as a JsonNode, and then can either validate the rules themselves or validate a demographic using the rules.

Previous Discussion (now outdated)

  • App developers submit app config elements containing possible enum values or ranges of numbers for app-level demographics, and demographics will be validated with these rules upon submission.

  • Only app-level demographics can be validated this way because only admins/developers can edit app configs

  • Does anything else need to be validated besides demographics?

    • Not currently

  • This would be especially (and maybe only) useful if demographics are submitted outside of demographics surveys since surveys with enum questions will only have the enum values as the possible answers anyway.

  • Bridge does not currently support multiple languages for demographics because they should probably be converted to 1 language for analysis and reporting. If demographics are converted to a single language by the app already, is validation for multiple languages necessary?

    • Not currently, just do English and add other languages later