Synapse Dependency Vulnerability Alerts Strategy

This document will address

PLFM-6419 - Getting issue details... STATUS

.

1 Summary of Issue
2 Considerations
3 Proposals
4 Technical Requirements
- 4.1 GitHub App
- 4.2 Upgrade Redash
5 Other Suggestions

Summary of Issue

GitHub provides the “Dependabot” service on our repositories, where a project’s dependencies will be scanned to see if any dependencies are vulnerable to known security issues. If a security issue is discovered, engineers with sufficient permissions will see an alert on the repository page, and may receive an email notification. If possible, Dependabot may also create a pull request.

Because the alerts are automated and based only on a project dependency tree, a given security issue may not affect our products for various reasons. For example, we may not using a vulnerable feature, or we may only pass restricted input to a dependency which has a vulnerability which can only be exploited by unrestricted input. Therefore, assessing the true risk of a security alert may require an investigation by an engineer to determine an appropriate response.

I’ve tried to collect current (Feb. 2022) stats on Dependabot in our repos, but this information may be incomplete:

Considerations

We have many active repositories, so it would be more valuable to see vulnerability alerts for multiple projects at once.

Synapse-Repository-Services	https://github.com/Sage-Bionetworks/Synapse-Repository-Services
Synapse-Stack-Builder	https://github.com/Sage-Bionetworks/Synapse-Stack-Builder
Synapse-Warehouse-Workers	https://github.com/Sage-Bionetworks/Synapse-Warehouse-Workers
worker-utilities	https://github.com/Sage-Bionetworks/worker-utilities
Synapse-Migration-Utility	https://github.com/Sage-Bionetworks/Synapse-Migration-Utility
SynapseWorkflowOrchestrator	https://github.com/Sage-Bionetworks/SynapseWorkflowOrchestrator
file-proxy	https://github.com/Sage-Bionetworks/file-proxy
Synapse-Sftp-Proxy	https://github.com/Sage-Bionetworks/Synapse-Sftp-Proxy
aws-utilities	https://github.com/Sage-Bionetworks/aws-utilities
database-semaphore	https://github.com/Sage-Bionetworks/database-semaphore
schema-to-pojo	https://github.com/Sage-Bionetworks/schema-to-pojo
Synapse-User-Geolocation	https://github.com/Sage-Bionetworks/Synapse-User-Geolocation
SimpleHttpClient	https://github.com/Sage-Bionetworks/SimpleHttpClient
JSON-java	https://github.com/Sage-Bionetworks/JSON-java
EvaluationStatistics	https://github.com/Sage-Bionetworks/EvaluationStatistics
common-utilities	https://github.com/Sage-Bionetworks/common-utilities
ChallengeDockerAgent	https://github.com/Sage-Bionetworks/ChallengeDockerAgent
csv-utilities	https://github.com/Sage-Bionetworks/csv-utilities
database-utils	https://github.com/Sage-Bionetworks/database-utils
QuizTextToJSONConverter	https://github.com/Sage-Bionetworks/QuizTextToJSONConverter
service-redirect	https://github.com/Sage-Bionetworks/service-redirect
url-signer	https://github.com/Sage-Bionetworks/url-signer
portals	https://github.com/Sage-Bionetworks/portals
Synapse-React-Client	https://github.com/Sage-Bionetworks/Synapse-React-Client
synapse-oauth-signin	https://github.com/Sage-Bionetworks/synapse-oauth-signin
markdown-it-synapse-server	https://github.com/Sage-Bionetworks/markdown-it-synapse-server
markdown-it-synapse	https://github.com/Sage-Bionetworks/markdown-it-synapse
SynapseWebClient	https://github.com/Sage-Bionetworks/SynapseWebClient
GwtVisualizationWrappers	https://github.com/Sage-Bionetworks/GwtVisualizationWrappers
react-base-table	https://github.com/Sage-Bionetworks/react-base-table
nbconvert-webapp	https://github.com/Sage-Bionetworks/nbconvert-webapp
gwtbootstrap3	https://github.com/Sage-Bionetworks/gwtbootstrap3
gwtbootstrap3-extras	https://github.com/Sage-Bionetworks/gwtbootstrap3-extras

We would like to integrate the strategy into our existing SDLC cadences (e.g. addressing vulnerabilities at the weekly Stack Release Meeting), rather than sending additional notifications that could just be ignored.
It would be valuable for our approach to be easily adopted by other teams at Sage. Most of the technical approaches below could be easily modified to look at a different collection of repositories or specific GitHub team.

Proposals

I’ve proposed a few different options and summarized what I view to be the work required to accomplish the proposal. These are not detailed estimates and are subject to change.

Option 1: Build “Dependabot Dashboard” to be reviewed in weekly Stack Release meeting

In our Redash Stack Review Dashboard, add a table that is a collection of all of the un-dismissed Dependabot alerts across all repos. In the meeting, create Jira tickets to address any un-dismissed alerts, and then dismiss the alerts so they will be removed from the view in the future.

Prioritization of these tickets is at the discretion of the product manager, who may use the issue severity reported by GitHub or Synapse Engineers' estimated risk.

Technical Requirements:

GitHub App + granted permissions on Sage-Bionetworks GitHub Organization
Redash upgraded to recent version
Code to fetch and process security alert data on GitHub

Option 2: Assemble all Alerts in Email Digest

Send an email to Synapse Engineers at regular intervals, e.g. weekly containing a digest of all security alerts that have not been dismissed on relevant repositories.

It must be the responsibility of engineers or PM to create tickets based on the email content.

Technical Requirements:

GitHub App + granted permissions on Sage-Bionetworks GitHub Organization
Code to fetch and process vulnerability data on GitHub, send email digest

Option 3: Automatically File Jira Issues for new Alerts

Similar to Option 2, instead of sending an email, a job could automatically create issues in Jira, in which case we may want to ensure we have a more refined process for handling the vulnerability alerts (e.g. 10 alerts regarding the same shared vulnerability across 10 repos results in one Jira ticket instead of 10 tickets). The job could run daily/weekly.

Technical Requirements:

GitHub App + granted permissions on Sage-Bionetworks GitHub Organization
Code to fetch and process vulnerability data on GitHub, create Jira issues

(https://blog.developer.atlassian.com/creating-a-jira-cloud-issue-in-a-single-rest-call/#:~:text=1 call is all it takes to create,through using Basic Auth with an API token.)

Option 4: Manually Inspect Alerts at Regular Cadence

This solution would not require any technical overhead.

For all active repositories, one or more members of the team will inspect the Dependabot alerts and file issues in Jira to address each alert. Issues could be prioritized by the product manager based on preliminary information, such as vulnerability severity score.

Since this would take a fair amount of time, this would likely happen less frequently than the other proposals. It is likely that we would not be able to respond as quickly to severe issues.

Technical Requirements

Brief description and justification of the individual technical requirements mentioned in the above proposals.

GitHub App

To collate all of the vulnerability data, our best option would be to create a GitHub App that can be installed on the Sage-Bionetworks GitHub Organization. The GitHub App should be scoped to have only the permissions required to view security alerts

We should use a GitHub App rather than the GitHub OAuth integration to ensure we can see all security alerts, and not just the alerts visible to the logged-in user. We should use a GitHub App over a Personal Access Token for the same reason, plus we should not tie an individual’s account to this process, in case the individual leaves the organization.

Upgrade Redash

In newer versions of Redash (the visualization tool that is used for the Data Warehouse), you can create data source for running arbitrary Python code, which we could use to fetch and process vulnerability data from the GitHub API.

We need to upgrade Redash at some point, so while this isn’t additional work, the priority of this task may be affected by relying on it here.

We must spend more time considering the risks of allowing arbitrary Python Code Execution in Redash. Here are some relevant details:

GitHub App credentials will likely be supplied as environment variables (which could simply be printed by a user)
Redash is in the VPN, so all users will be employees.
We may be able to configure permissions in Redash to restrict access to the Python data source.

A possible alternative to a Python data source is the JSON Data Connector. We could move all of the sensitive credentials and operations to a more secure environment e.g. Jenkins or AWS, and then publish a JSON document containing Dependabot alert information. The document could be loaded into Redash. The JSON data connector only supports HTTP Basic auth, if the file needs to be secured and on the internet.

Other Suggestions

I am including the following suggestions that are relevant but do not directly achieve the goal of addressing Dependabot issues

Maintain a list of active repositories that a particular team is responsible for
- We could do this by splitting “Synapse-Developers” GitHub team into “Synapse-Developers” and “Bridge-Developers”. Consider all non-archived repos owned by “Synapse-Developers” to be the active list (we can get this info from the GitHub API).
Consider enabling CI runs for pull requests made by Dependabot. This could make it easy to apply certain vulnerability alerts with confidence that the upgrade is unlikely to cause a regression.
Consider enabling GitHub CodeQL code scanning on all repositories. Here’s an example of a CodeQL alert on our Portals repository that seemed helpful, though happened to not be applicable.