Sage Scientific Compute is a secure compute workspace for Sage Bionetworks' workers and their collaborators. The offerings are:
...
On the left is a navigation sidebar. If you do not see, it, look for a hamburger icon in the upper left, and click on it to expand the navigation sidebar. The options in the navigation bar are “Products” and “Provisioned Products”. After you provision a product selected from “Products”, you will be able to see your products under “Provisioned Products”.
...
Compute Instances
...
Compute Instance Products
We currently offer three varieties of virtual machines, each preconfigured for a given purpose, described below.
...
This product is a Microsoft Windows instance.
Creating
...
Compute Products
To create an instance, select “Products List” from the navigation panel on the left. Next, select from the list one of the EC2 compute products described above. On the product page, click the orange “LAUNCH PRODUCT” button under the product description, then fill out the wizard as follows:
...
Name: this names the product and the instance. You’ll use this to manage the product later. Please include your name in the product, e.g. if your name is Jane Doe and you are provisioning a Linux instance for your project Foo, you could name it
jdoe-linux-foo
.Version: choose a version of the product to provision.
Parameters
Use Private Network: This puts the instance on a private subnet. We strongly recommend using the default value,
true
.EC2 Instance Type: there are many instance types to choose from. To learn about their details, including CPU and memory, see https://aws.amazon.com/ec2/instance-types/. To learn about their costs, see https://aws.amazon.com/ec2/pricing/on-demand/ or use the AWS pricing calculator.
Linux Distribution: (EC2 Linux product only) the variety of Linux OS that will be installed.
Disk Size: the amount of Disk Size: the amount of local storage, in gigabytes. Please treat the disk as temporary storage. Long term storage of data should be in a bucket (see below).
...
CostCenter: bill the product to this cost center code. If the appropriate cost center code is not in the list then select “Other / 000001” and create a custom “CostCenterOther” tag. Set the tag to a value from our official list of cost centers codes /wiki/spaces/IT/pages/2553544733. Example:
...
Note: The owner email tag is automatically set to <Synapse Username>@synapse.org
Notifications
Please skip the Notifications pane. SNS notifications are not operational at this time.
...
A new instance takes a few minutes to be created. Once complete, it will appear in the “Provisioned Products” list, showing status Available. Select “Provisioned Product Details” from the navigation panel on the left, and click on your product. A product that has a “Succeeded” event will have outputs that include links for connecting. Click on the "Events" tab and expand , find the "PROVISION_PRODUCT" card (which may require sorting Events by Date, oldest first) and expand the card to see the following links.
ConnectionURI: if your product has a ConnectionURI link, this will open a shell prompt in a browser tab. When you are done with your session click “Terminate” in the upper right corner.
NotebookConnectionURI: Notebook products contain a NotebookConnectionURI link, which will open a notebook in the browser.ConnectionInstructions: For Windows products, click on the ConnectionInstructions link and follow the steps provided there.
The following instructions guide you to set up command line (“shell”) access via AWS SSM. Windows users can add remote desk top on top of SSM access, and details instructions are below. Sage Bionetworks workers can skip the AWS SSM set-up and instead request that Sage IT provide access via the Sage VPN. Instructions for doing this are here.
Create a Synapse personal access token
The AWS SSM allows direct access to private instances from your own computer terminal. To setup access with the AWS SSM we need to create a special Synapse personal access token (PAT) that will work with the Sage Service Catalog. This is special PAT that can only be created using this workflow, creating a PAT from the Synapse personal token manager web page will NOT work.
Request a Synapse PAT by visiting https://sc.sageit.org/personalaccesstoken , for Sage employees, or https://ad.strides.sc.sageit.org/personalaccesstoken for AMP-AD members. (You may need to login to Synapse.) If you have already created a PAT through this mechanism and are repeating the process you must first visit the token management page in Synapse and delete the existing one with the same name.
After logging into Synapse a file containing the PAT, which is a long character string (i.e. eyJ0eXAiOiJ...Z8t9Eg), is returned to you. Save the file to your local machine and note the location where you saved it to then close the browser session.
Note: At this point you can verify that the PAT for the Service Catalog was successfully created by viewing the Synapse token management page. When the PAT expires you will need to repeat these steps to create a new PAT. The PAT should look something like this
...
8. If you plan to use Docker with your instance(for example, with Rstudio or Jupyter notebooks), complete the instructions in SSM access to applications.
...
Run an application on the EC2 (i.e. docker run -p 80:80 httpd)
Code Block [ec2-user@ip-10-49-26-50 ~]$ docker run -p 80:80 httpd Unable to find image 'httpd:latest' locally latest: Pulling from library/httpd 33847f680f63: Pull complete d74938eee980: Pull complete 963cfdce5a0c: Pull complete 8d5a3cca778c: Pull complete e06a573b193b: Pull complete Digest: sha256:71a3a8e0572f18a6ce71b9bac7298d07e151e4a1b562d399779b86fef7cf580c Status: Downloaded newer image for httpd:latest AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message [Thu Jul 22 23:54:12.106344 2021] [mpm_event:notice] [pid 1:tid 140706544895104] AH00489: Apache/2.4.48 (Unix) configured -- resuming normal operations [Thu Jul 22 23:54:12.107307 2021] [core:notice] [pid 1:tid 140706544895104] AH00094: Command line: 'httpd -D FOREGROUND'
To provide access to that app, an SC user can use the port forwarding feature to gain access to the app by running the AWS SSM CLI command:
Code Block aws ssm start-session --profile service-catalog \ --target i-0fd5c9ff0ef675ceb \ --document-name AWS-StartPortForwardingSession \ --parameters '{"portNumber":["80"],"localPortNumber":["9090"]}'
To provide access to that app in the Windows Command Prompt use this syntax:
Code Block aws ssm start-session --profile service-catalog \ --target i-0fd5c9ff0ef675ceb \ --document-name AWS-StartPortForwardingSession \ --parameters "{\"portNumber\":[\"80\"],\"localPortNumber\":[\"9090\"]}"
Now you should be able to access that app on your local machine at
http://localhost:9090
.
...
Connect to the Windows shell.
Create a new user and and it to the “Administrators” group
Code Block $Password = ConvertTo-SecureString "P@ssW0rD!" -AsPlainText -Force New-LocalUser "admin" -Password $Password -PasswordNeverExpires Add-LocalGroupMember -Group "Administrators" -Member "admin"
Follow the SSM access to applications instructions to setup port forwarding to windows RDP
Code Block aws ssm start-session --profile service-catalog \ --target i-0fd5c9ff0ef675ceb \ --document-name AWS-StartPortForwardingSession \ --parameters '{"portNumber":["3389"],"localPortNumber":["3389"]}'
Install the Microsoft Remote Desktop client on your computer.
Click “+” to add a new PC. In the “PC Name” field, enter “localhost”.
Log in with username “admin” and password "P@ssW0rD!"
...
Once an instance is provisioned locate its instance id (i.e. i-06531e8f977ca20ea)
Create a Jira IT issue and make a request to associate your jumpcloud user with that instance id
Once Sage IT will make the association you can login to the VPN and use remote desktop to login to the instance with using its IP address (seen in the Outputs tab of the Service Catalog provisioned product) as the PC name and your Jumpcloud credentials as user/password.
Provisioning and Using a Notebook
...
Once you provision the product, visit the page for the provisioned product. Scroll half way down and click on “Outputs”. At the bottom of the page, click on “NotebookConnectionURI”. This will open a new browser tab with the “Events”, find the "PROVISION_PRODUCT" card (which may require sorting Events by Date, oldest first) and expand the card to see a set of links. Now, click on the link for “NotebookConnectionURI”. This will open a new browser tab with the notebook running.
...
Connecting to Synapse
Both notebooks feature passwordless login to Synapse. In RStudio, simply type:
...
Using the update action allows you to change parameters or update to a new version of the product. WARNING: changes to configuration parameters usually result in a recreation (“replacement”) of the instance, any data saved on the instance will be lost, and the nature of the update by Amazon is difficult to predict. We recommend that you save any important data to S3, provision a new instance and terminate the original.
Terminate
The terminate action deletes the instance permanently.
...
You can access a provisioned Synapse bucket using one of the synapse clients to push and pull data. In order to complete the setup of the Synapse Bucket, you must manually complete the final step of setting the S3 bucket as a Synapse upload location and linking it to a project. The project’s sharing settings can then be used to control what Synapse users can access the bucket.
Managing an S3 Bucket
On the product page for each provisioned product, there is an orange “ACTIONS” button in the upper right that allows you to manage the instance.
Update
Using the update action allows you to change parameters or update to a new version of the product.
Terminate
The terminate action removes the bucket product from the Service Catalog however the bucket is not immediately deleted. The bucket and data in it will be placed in an archived state where no users will have access to the bucket. Upon expiration of the archive period (30 days) the bucket and data will get automatically purged from AWS. The owner of the bucket may request Sage IT to restore access to the bucket before it is purged
Change Owner
Please ask #sageit for help transferring ownership if that is deemed to be necessary rather than using this action.
Scheduled Jobs
Note: Scheduled Jobs products currently are available only to Sage employees.
Scheduled Job Products
Scheduled jobs are essentially cron jobs that we’ve setup to run in AWS batch. This product allows you to run an arbitrary task on the cloud to process your workload. The task must be run using a docker image. The task can be manually triggered or it can be setup to run on a schedule. A scheduled job is limited to 20GB storage, to the memory/CPU chosen when launching the product, and to one hour duration.
Creating Scheduled Job Products
To create a scheduled job, select the “Products List” from the navigation panel on the left. Next, select “Scheduled Jobs” from the list. On the product page, click the orange “LAUNCH PRODUCT” button under the product description, then fill out the wizard. Most of the parameters contain helpful information describing the valid inputs.
...
Notes:
The “Environment” parameters are required fields. You can replace the default values, however please do not leave these fields empty. Also pay special attention to the formatting that’s required for the values. The deployment will fail if the formatting isn’t correct.
There is an AWS bug that prevents disabling the scheduled job after it has been enabled. The workaround is to either (1) Terminate the job and create a new one or (2) Set the rate to some distant time in the future (i.e. 3650 days).
Tag options, Notifications, and Review
The final three screens of the wizard are the same as in Creating an EC2 Instance above.
Manually Run Scheduled Jobs
Once provisioning is complete your Scheduled Jobs product will appear in the “Provisioned Products” list, showing status Available. Select “Provisioned Product Details” from the navigation panel on the left, and click on your product. A product that has a “Succeeded” event will have outputs that include a “SubmitJobApi” link. The link is a URL to trigger the job, which can be entered in your web browser or a cUrl command. Entered into your web browser the result will look like:
...
NOTE: Since invoking the URL requires no authentication, anyone obtaining the URL can trigger the job. So the URL created for the provisioned product must be treated as a secret, just like any password or key. If its secrecy is compromised the provisioned product must be terminated.
View Scheduled Job Status
Click on the “Jobs” link to view the batch job status. Once triggered the job should transition from STARTING → RUNNING → SUCCEEDED.
...
Access Scheduled Job Logs
The job will send logs to AWS cloudwatch. To access the logs click on the “Logs” link in PROVISION_PRODUCT outputs. Below is an example of a log for a job that ran the command “printenv”.
...
Scheduled Job Secrets
Secrets are stored in the AWS secrets manager and exposed to the job as environment variables. The logs above print out the environment variables from the job. Take note of the “SCHEDULED_JOB_SECRETS” parameter in the logs. The secrets that are passed into this product are exposed as environment variables in the logs by the “printenv” command. Please make sure to never expose secrets in this way. DO NOT PRINT ENVIRONMENT VARIABLES.
Accessing Scheduled Job Secrets
Job secrets can be access a number of different ways. The first way is simply to get it from the docker container environment variable SCHEDULED_JOB_SECRETS.
Environment variable example:
Code Block |
---|
printenv SCHEDULED_JOB_SECRETS|jq .SECRET1
"Shh1" |
Passing Synapse access token as a secret using environment variables:
You can use a scheduled job to access data in Synapse. To do so:
Create a personal access token (PAT) as explained here.
In the Secrets parameter of your scheduled job, include the PAT, e.g.
Code Block "PAT":"eyJ0eXAiOiJKV1QiLCJraWQiOiJXN05OOldMSlQ..."
You may include other secrets if needed, each name/value pair separated by a comma.
Add code in your containerized script to parse the “
SCHEDULED_JOB_SECRETS
" JSON object, extract the PAT and use it to authenticate. Here is a Python example, synapse_login.py:Code Block import json, os, synapseclient secrets=json.loads(os.getenv("SCHEDULED_JOB_SECRETS")) auth_token = secrets["PAT"] syn=synapseclient.Synapse() syn.login(authToken=auth_token)
Create a container image which includes the Synapse client, this script, and the command to invoke the script. Here is a Dockerfile to create such a container image:
Code Block FROM sagebionetworks/synapsepythonclient COPY synapse_login.py synapse_login.py CMD python3 synapse_login.py
Now you can build the container image, push to a public Docker registry and use in your Scheduled Job.
Retrieving job secrets using the AWS client
To access the job secrets using either the AWS CLI or one of the AWS SDKs you must copy the “JobSecretArn” from the service catalog PROVISION_PRODUCT outputs. Then provide the JobSecretArn to either the AWS secrets manager CLI or one of the AWS SDKs to retrieve the secret.
AWS secrets manager CLI Example :
Code Block |
---|
aws secretsmanager --output json get-secret-value --secret-id arn:aws:secretsmanager:us-east-1:465877038949:secret:JobSecrets-rEx1eKL9pokj-h7hCGO
{
"ARN": "arn:aws:secretsmanager:us-east-1:465877038949:secret:JobSecrets-rEx1eKL9pokj-h7hCGO",
"Name": "JobSecrets-rEx1eKL9pokj",
"VersionId": "09904c83-f4ea-4664-a773-eded857ab5a0",
"SecretString": "{ \"SECRET1\":\"Shh1\" }",
"VersionStages": [
"AWSCURRENT"
],
"CreatedDate": "2021-12-18T09:01:23.690000-08:00"
} |
...
Transferring existing data into a Synapse-linked bucket
The most straightforward way to transfer files into a Synapse linked bucket is to pull the files from the bucket (e.g. using the S3 client for an S3 bucket) and then push them using the Synapse client. Running this process on an EC2 provisioned through Service Catalog will speed transfer and minimize data transfer costs. Another approach is to use the S3 client to directly move files between buckets and afterwards index them in Synapse. While this may be faster it is more complex because (1) the S3 client must be authorized to access both the source and destination buckets, and (2) indexing in Synapse requires the MD5 hash for each file being indexed. If you wish to pursue this approach please contact contact us for further guidance.
Managing an S3 Bucket
On the product page for each provisioned product, there is an orange “ACTIONS” button in the upper right that allows you to manage the instance.
Update
Using the update action allows you to change parameters or update to a new version of the product.
Terminate
The terminate action removes the bucket product from the Service Catalog however the bucket is not immediately deleted. The bucket and data in it will be placed in an archived state where no users will have access to the bucket. Upon expiration of the archive period (30 days) the bucket and data will get automatically purged from AWS. The owner of the bucket may request Sage IT to restore access to the bucket before it is purged
Change Owner
Please ask #sageit for help transferring ownership if that is deemed to be necessary rather than using this action.
Scheduled Jobs
Note: Scheduled Jobs products currently are available only to Sage employees.
Scheduled Job Products
Scheduled jobs are essentially cron jobs that we’ve setup to run in AWS batch. This product allows you to run an arbitrary task on the cloud to process your workload. The task must be run using a docker image. The task can be manually triggered or it can be setup to run on a schedule. A scheduled job is limited to 20GB storage, to the memory/CPU chosen when launching the product, and to one hour duration.
Creating Scheduled Job Products
To create a scheduled job, select the “Products List” from the navigation panel on the left. Next, select “Scheduled Jobs” from the list. On the product page, click the orange “LAUNCH PRODUCT” button under the product description, then fill out the wizard. Most of the parameters contain helpful information describing the valid inputs.
...
Notes:
The “Environment” parameters are required fields. You can replace the default values, however please do not leave these fields empty. Also pay special attention to the formatting that’s required for the values. The deployment will fail if the formatting isn’t correct.
There is an AWS bug that prevents disabling the scheduled job after it has been enabled. The workaround is to either (1) Terminate the job and create a new one or (2) Set the rate to some distant time in the future (i.e. 3650 days).
Tag options, Notifications, and Review
The final three screens of the wizard are the same as in Creating an EC2 Instance above.
Manually Run Scheduled Jobs
Once provisioning is complete your Scheduled Jobs product will appear in the “Provisioned Products” list, showing status Available. Select “Provisioned Product Details” from the navigation panel on the left, and click on your product. A product that has a “Succeeded” event will have outputs that include a “SubmitJobApi” link. The link is a URL to trigger the job, which can be entered in your web browser or a cUrl command. Entered into your web browser the result will look like:
...
NOTE: Since invoking the URL requires no authentication, anyone obtaining the URL can trigger the job. So the URL created for the provisioned product must be treated as a secret, just like any password or key. If its secrecy is compromised the provisioned product must be terminated.
View Scheduled Job Status
Click on the “Jobs” link to view the batch job status. Once triggered the job should transition from STARTING → RUNNING → SUCCEEDED.
...
Access Scheduled Job Logs
The job will send logs to AWS cloudwatch. To access the logs click on the “Logs” link in PROVISION_PRODUCT outputs. Below is an example of a log for a job that ran the command “printenv”.
...
Aside from these logs, Service Catalog does not monitor or respond to job failures. It the responsibility of the job owner to check CloudWatch and/or instrument their code to send alerts when failures occur.
Scheduled Job Secrets
Secrets are stored in the AWS secrets manager and exposed to the job as environment variables. The logs above print out the environment variables from the job. Take note of the “SCHEDULED_JOB_SECRETS” parameter in the logs. The secrets that are passed into this product are exposed as environment variables in the logs by the “printenv” command. Please make sure to never expose secrets in this way. DO NOT PRINT ENVIRONMENT VARIABLES.
Accessing Scheduled Job Secrets
Job secrets can be access a number of different ways. The first way is simply to get it from the docker container environment variable SCHEDULED_JOB_SECRETS.
Environment variable example:
Code Block |
---|
printenv SCHEDULED_JOB_SECRETS|jq .SECRET1
"Shh1" |
Passing Synapse access token as a secret using environment variables:
You can use a scheduled job to access data in Synapse. To do so:
Create a personal access token (PAT) as explained here.
In the Secrets parameter of your scheduled job, include the PAT, e.g.
Code Block "PAT":"eyJ0eXAiOiJKV1QiLCJraWQiOiJXN05OOldMSlQ..."
You may include other secrets if needed, each name/value pair separated by a comma.
Add code in your containerized script to parse the “
SCHEDULED_JOB_SECRETS
" JSON object, extract the PAT and use it to authenticate. Here is a Python example, synapse_login.py:Code Block import json, os, synapseclient secrets=json.loads(os.getenv("SCHEDULED_JOB_SECRETS")) auth_token = secrets["PAT"] syn=synapseclient.Synapse() syn.login(authToken=auth_token)
Create a container image which includes the Synapse client, this script, and the command to invoke the script. Here is a Dockerfile to create such a container image:
Code Block FROM sagebionetworks/synapsepythonclient COPY synapse_login.py synapse_login.py CMD python3 synapse_login.py
Now you can build the container image, push to a public Docker registry and use in your Scheduled Job.
Retrieving job secrets using the AWS client
To access the job secrets using either the AWS CLI or one of the AWS SDKs you must copy the “JobSecretArn” from the service catalog PROVISION_PRODUCT outputs. Then provide the JobSecretArn to either the AWS secrets manager CLI or one of the AWS SDKs to retrieve the secret.
AWS secrets manager CLI Example :
Code Block |
---|
aws secretsmanager --output json get-secret-value --secret-id arn:aws:secretsmanager:us-east-1:465877038949:secret:JobSecrets-rEx1eKL9pokj-h7hCGO { "ARN": "arn:aws:secretsmanager:us-east-1:465877038949:secret:JobSecrets-rEx1eKL9pokj-h7hCGO" , region_name = "us-east-1"Name": "JobSecrets-rEx1eKL9pokj", # Create a Secrets Manager client session = boto3.session.Session() client = session.client( service_name='secretsmanager'"VersionId": "09904c83-f4ea-4664-a773-eded857ab5a0", "SecretString": "{ \"SECRET1\":\"Shh1\" }", "VersionStages": [ "AWSCURRENT" ], region_name=region_name ) # In this sample we only handle the specific exceptions for the 'GetSecretValue' API. "CreatedDate": "2021-12-18T09:01:23.690000-08:00" } |
Code Block |
---|
# Use this code snippet in your app. # If you need more information about configurations or implementing the sample code, visit the AWS docs: # See https://docs.aws.amazon.com/secretsmanagerdevelopers/latestgetting-started/apireferencepython/API_GetSecretValue.html import boto3 import #base64 Wefrom rethrowbotocore.exceptions theimport exceptionClientError by default. def get_secret(): try: secret_name = "arn:aws:secretsmanager:us-east-1:465877038949:secret:JobSecrets-rEx1eKL9pokj-h7hCGO" get_secret_value_response = client.get_secret_value(region_name = "us-east-1" # Create a Secrets Manager client SecretId=secret_name session = boto3.session.Session() )client = session.client( except ClientError as e: service_name='secretsmanager', if e.response['Error']['Code'] == 'DecryptionFailureException': region_name=region_name ) # In this #sample Secretswe Manageronly can'thandle decrypt the protectedspecific secretexceptions text usingfor the provided KMS key'GetSecretValue' API. # See https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_GetSecretValue.html # DealWe withrethrow the exception here, and/or rethrow at your discretion. by default. try: get_secret_value_response = client.get_secret_value( SecretId=secret_name ) except ClientError raiseas e: elifif e.response['Error']['Code'] == 'InternalServiceErrorExceptionDecryptionFailureException': # An error occurred on the server side Secrets Manager can't decrypt the protected secret text using the provided KMS key. # Deal with the exception here, and/or rethrow at your discretion. raise e elif e.response['Error']['Code'] == 'InvalidParameterExceptionInternalServiceErrorException': # YouAn providederror anoccurred invalidon valuethe forserver aside. parameter. # Deal with the exception here, and/or rethrow at your discretion. raise e elif e.response['Error']['Code'] == 'InvalidRequestExceptionInvalidParameterException': # You provided aan parameterinvalid value that is not valid for the current state of the resourcea parameter. # Deal with the exception here, and/or rethrow at your discretion. raise e elif e.response['Error']['Code'] == 'ResourceNotFoundExceptionInvalidRequestException': # WeYou can'tprovided finda theparameter resourcevalue that is younot askedvalid for the current state of the resource. # Deal with the exception here, and/or rethrow at your discretion. raise e else: elif e.response['Error']['Code'] == 'ResourceNotFoundException': # Decrypts secret using the associated KMS key. # We can't find the # Depending on whether the secret is a string or binary, one of these fields will be populatedresource that you asked for. # Deal with the exception here, and/or rethrow at your discretion. if 'SecretString' in get_secret_value_response: raise e else: secret = get_secret_value_response['SecretString'] else: # Decrypts secret using the associated KMS key. # Depending on whether the decoded_binary_secret is = base64.b64decode(a string or binary, one of these fields will be populated. if 'SecretString' in get_secret_value_response: secret = get_secret_value_response['SecretBinarySecretString']) else: # Your code goes here. |
...
decoded_binary_secret = base64.b64decode(get_secret_value_response['SecretBinary'])
# Your code goes here.
|
Best Practices
When to Use the Service Catalog
...
Frequently Asked Questions
What are my responsibilities regarding security, as a Sage Bionetworks worker?
You are responsible for securing cloud resources and the data they contain much as you are for securing your company issued laptop and its data. Your responsibilities include those described by the Sage Bionetworks Information Security Policy, including the Acceptable Use Agreement. https://sites.google.com/sagebase.org/intranet/hr/policies Note that Service Catalog serves as a secure space for storing and processing sensitive data, but consult with Sage Governance team for the restrictions and requirements for your specific data.
Why didn’t I receive an email about my new product?
...