Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The following provides instructions on how to log on to the Sage Scientific Compute workspace using your Synapse credentials, and how to use the products provided in the AWS Service Catalog to setup or modify EC2 instances and S3 buckets.

For technical support, please use our Service Desk.

Table of Contents

Gaining Access

Access to Sage Scientific Compute workspace is organized by communities. Community membership is is a secure compute workspace for Sage Bionetworks' workers and their collaborators. The offerings are:

  • Dedicated virtual machines (“instances”), either Linux or Windows, with CPU, memory, and disk space of your choice;

  • Notebook instances, which are dedicated virtual machines running R-Studio or Jupyter. With a single click you have an interactive analysis environment, accessed through your web browser;

  • Bucket storage, suitable for linking to a Synapse project. This is where data should be kept long term;

  • Scheduled jobs: Lightweight compute tasks that you would like to run periodically or when triggered by a web request. You provide the code to run in the form of a Docker container.

The following provides instructions on how to log in using your Synapse credentials, and how to setup or modify the various products.

For technical support, please use our Service Desk.

Table of Contents

Gaining Access

Access to Sage Scientific Compute workspace is organized by communities. Community membership is defined by a Synapse Team and managed by its community manager. Each community also has a defined entry point URL, as shown below:

...

For Sage Bionetworks employees, access is granted during employee on boarding. For other groups, the community manager will add your Synapse account to the list of allowed users for the compute workspace. The community manager will also receive reports regarding cloud expenditures and contact community members when costs exceed their expectations, to review the need for the expenditures. For technical support, please contact our Service Desk.

Login

To begin, visit the appropriate entry point URL for your community, listed above, and log in with your Synapse credentials. You will be prompted to allow access to some information from your Synapse profile.

Note: Service Catalog The products, discussed below, are owned by the Synapse account under which you log in. Once created, products will appear in the console only when logged in to that account, which will have sole ability to update or remove the product. Products like S3 buckets may have a life cycle beyond the project participation of any single person. To support such a case you may create and use a so-called service account in Synapse (i.e., an account meant for automation that may outlive one person’s commitment to a project). If so, then to meet regulatory requirements the credentials for the service account must be placed in a secure store whose access is limited and can be reviewed. /wiki/spaces/IT/pages/1200816129.

AWS Service Catalog Portal

Once logged in, you will see a list of “Products” you can provision.  These fall into two categories, EC2 Instances (virtual machines) and S3 Buckets used for data storage. Not all products are available to all communities.

On the left is a navigation sidebar. If you do not see, it, look for a hamburger icon in the upper left, and click on it to expand the navigation sidebar. The options in the navigation bar are “Products” and “Provisioned Products”. After you provision a product selected from “Products”, you will be able to see your products under “Provisioned Products”.

...

Compute Instances

...

Compute Instance Products

We currently offer three varieties of virtual machines, each preconfigured for a given purpose, described below.

Linux Docker

This product provides a basic EC2 instance with Docker installed.

EC2 with Notebook Software

This product is a Linux EC2 instance with R Studio or Jupyter notebook software installed.

Windows

This product is a Microsoft Windows instance.

Creating

...

Compute Products

To create an instance, select “Products List” from the navigation panel on the left. Next, select from the list one of the EC2 compute products described above. On the product page, click the orange “LAUNCH PRODUCT” button under the product description, then fill out the wizard as follows:

Product Version

  • Name: this names the product and the instance. You’ll use this to manage the product later. Please include your name in the product, e.g. if your name is Jane Doe and you are provisioning a Linux instance for your project Foo, you could name it jdoe-linux-foo.

  • Version: choose a version of the product to provision.

Parameters

  • Use Private Network: This puts the instance on a private subnet. We strongly recommend using the default value, true.

  • EC2 Instance Type: there are many instance types to choose from. To learn about their details, including CPU and memory, see https://aws.amazon.com/ec2/instance-types/. To learn about their costs, see https://aws.amazon.com/ec2/pricing/on-demand/ or use the AWS pricing calculator.

  • Linux Distribution: (EC2 Linux product only) the variety of Linux OS that will be installed.

  • Disk Size: the amount of local storage, in gigabytes.  Please treat the disk as temporary storage.  Long term storage of data should be in a bucket (see below).

Tag Options

...

Note: The owner email tag is automatically set to <Synapse Username>@synapse.org

Notifications

Please skip the Notifications pane. SNS notifications are not operational at this time.

Review

On the review pane, you have the opportunity to review all the choices you have made prior to clicking the “Launch” link in the lower right-hand corner.

Connecting to an Instance

A new instance takes a few minutes to be created. Once complete, it will appear in the “Provisioned Products” list, showing status Available. Select “Provisioned Product Details” from the navigation panel on the left, and click on your product. A product that has a “Succeeded” event will have outputs that include links for connecting. Click on the "Events" tab and expand , find the "PROVISION_PRODUCT" card (which may require sorting Events by Date, oldest first) and expand the card to see the following links.

  • ConnectionURI: if your product has a ConnectionURI link, this will open a shell prompt in a browser tab. When you are done with your session click “Terminate” in the upper right corner.

  • NotebookConnectionURI: Notebook products contain a NotebookConnectionURI link, which will open a notebook in the browser.ConnectionInstructions: For Windows products, click on the ConnectionInstructions link and follow the steps provided there

The following instructions guide you to set up command line (“shell”) access via AWS SSM. Windows users can add remote desk top on top of SSM access, and details instructions are below. Sage Bionetworks workers can skip the AWS SSM set-up and instead request that Sage IT provide access via the Sage VPN. Instructions for doing this are here.

Create a Synapse personal access token

The AWS SSM allows direct access to private instances from your own computer terminal. To setup access with the AWS SSM we need to create a special Synapse personal access token (PAT) that will work with the Sage Service Catalog. This is special PAT that can only be created using this workflow, creating a PAT from the Synapse personal token manager web page will NOT work.

  1. Request a Synapse PAT by visiting https://sc.sageit.org/personalaccesstoken , for Sage employees, or https://ad.strides.sc.sageit.org/personalaccesstoken for AMP-AD members. (You may need to login to Synapse.) If you have already created a PAT through this mechanism and are repeating the process you must first visit the token management page in Synapse and delete the existing one with the same name.

  2. After logging into Synapse a file containing the PAT, which is a long character string (i.e. eyJ0eXAiOiJ...Z8t9Eg), is returned to you. Save the file to your local machine and note the location where you saved it to then close the browser session.

Note: At this point you can verify that the PAT for the Service Catalog was successfully created by viewing the Synapse token management page. When the PAT expires you will need to repeat these steps to create a new PAT. The PAT should look something like this

...

SSM access to an Instance

To setup access the AWS EC2 instances with the AWS SSM we need to install the AWS CLI and make it source credentials with an external process.

...

8. If you plan to use Docker with your instance(for example, with Rstudio or Jupyter notebooks), complete the instructions in SSM access to applications.

Debugging Access

  • If you encounter errors try running the AWS start-session command with the --debug option.

  • If you use an invalid personal access token you will get an error similar to this

    Code Block
    ➜ aws ssm start-session --profile service-catalog \
                            --target i-0fd5c9ff0ef675ceb
    
    Expecting value: line 1 column 1 (char 0)

    To check whether your token is valid run the following command

    Code Block
    ➜ curl -I --location-trusted \
          -H Authorization:"Bearer ${SYNAPSE_PAT}" https://sc.sageit.org/ststoken

    If the HTTP response status is 2xx then the PAT is valid. If the PAT is invalid the response will be 4xx

  • If you continue to have similar errors to Expecting value: line X column X (char X) then it could mean that your synapse_creds.sh file is invalid. Try verifying your synapse_creds.sh script independently of the AWS command by executing just the script. A successful execution should return a valid json and look something like this

    Code Block
    ➜ ~/synapse_creds.sh "https://sc.sageit.org" "eyJ0eXAiO...2GLQg"
    {"SessionToken":"FwoGZXIvYXdzEN7//////////wEaDP2imuwAK+...13GnBrJc9SlOW6uY=","Version":1,"AccessKeyId":"XXXXXXX","SecretAccessKey":"XXXXXXXXXXXXXXXX","Expiration":"2021-07-21T22:02:17Z"}
  • Another problem could be that your ~/.aws/config file is invalid. For debugging we recommend backing up your current config file and creating a new one with just one service-catalog profile in it and then try re-running the start-session command.

  • If you get a message similar to “.. AccessDeniedException when calling the TerminateSession operation..”. it could mean that the AWS SSM session plugin was not install correctly. Please verify that it was successfully installed.

SSM access with custom commands

By default the ssm start command starts a session with the SSM-SessionManagerRunShell document which will login you in as the ssm-user with an sh shell. If you prefer to start your session with a ‘bash’ shell then you can use the AWS-StartInteractiveCommand document.

...

Code Block
aws ssm start-session --profile service-catalog \
                      --target i-0fd5c9ff0ef675ceb \
                      --document-name AWS-StartInteractiveCommand \
                      --parameters command="sudo su - ec2-user"  

SSM with SSH

You can use the AWS Command Line Interface (AWS CLI) to establish Secure Shell (SSH) connections to instances using AWS Systems Manager Session Manager. Users who connect using SSH can copy files between their local machines and the EC2 instance using Secure Copy Protocol (SCP).

...

  1. Use the ssm start-session command to connect to the instance

    Code Block
    aws ssm start-session --profile service-catalog \
                          --target i-0fd5c9ff0ef675ceb \
                          --document-name AWS-StartInteractiveCommand \
                          --parameters command="sudo su - ec2-user"  
  2. Copy the public portion of your ssh key (on your local computer) to the instance’s ~/.ssh/authorized_keys file.

  3. Set the permission of the authorized_keys file to 600 (i.e. chmod 600 ~/.ssh/authorized_keys)

  4. Add the following to your local machine’s ~/.ssh/config file

    Code Block
    # SSH over Session Manager
    host i-* mi-*
        ProxyCommand sh -c "aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'"
    
  5. From your local machine, execute the ssh command to access the instance

    Code Block
    ➜ AWS_PROFILE=service-catalog ssh -i ~/.ssh/id_rsa ec2-user@i-0fd5c9ff0ef675ceb
    Last login: Thu Jun 17 21:25:56 2021
    
           __|  __|_  )
           _|  (     /   Amazon Linux 2 AMI
          ___|\___|___|
    
    https://aws.amazon.com/amazon-linux-2/
    [ec2-user@ip-10-41-23-76 ~]$
  6. From your local machine, execute the scp command to copy files directly to the instance

    Code Block
    ➜ AWS_PROFILE=service-catalog scp -i ~/.ssh/id_rsa README.md ec2-user@i-07eeb59282fafe244:~/.
    README.md                                             100%  814     9.2KB/s   00:00

SSM access to applications

When running apps in the instance you may want to run the apps on specific ports. The AWS SSM allows you to expose those ports to your local computer using a technique called port forwarding. Here’s an example of how to enable port forwarding to an application:

  1. Setup profile for SSM access

  2. Run an application on the EC2 (i.e. docker run -p 80:80 httpd)

    Code Block
    [ec2-user@ip-10-49-26-50 ~]$ docker run -p 80:80 httpd
    Unable to find image 'httpd:latest' locally
    latest: Pulling from library/httpd
    33847f680f63: Pull complete
    d74938eee980: Pull complete
    963cfdce5a0c: Pull complete
    8d5a3cca778c: Pull complete
    e06a573b193b: Pull complete
    Digest: sha256:71a3a8e0572f18a6ce71b9bac7298d07e151e4a1b562d399779b86fef7cf580c
    Status: Downloaded newer image for httpd:latest
    AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message
    AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message
    [Thu Jul 22 23:54:12.106344 2021] [mpm_event:notice] [pid 1:tid 140706544895104] AH00489: Apache/2.4.48 (Unix) configured -- resuming normal operations
    [Thu Jul 22 23:54:12.107307 2021] [core:notice] [pid 1:tid 140706544895104] AH00094: Command line: 'httpd -D FOREGROUND'
  3. To provide access to that app, an SC user can use the port forwarding feature to gain access to the app by running the AWS SSM CLI command:

    Code Block
    aws ssm start-session --profile service-catalog \
                          --target i-0fd5c9ff0ef675ceb \
                          --document-name AWS-StartPortForwardingSession \
                          --parameters '{"portNumber":["80"],"localPortNumber":["9090"]}'
      
  4. To provide access to that app in the Windows Command Prompt use this syntax:

    Code Block
    aws ssm start-session --profile service-catalog \
                          --target i-0fd5c9ff0ef675ceb \
                          --document-name AWS-StartPortForwardingSession \
                          --parameters "{\"portNumber\":[\"80\"],\"localPortNumber\":[\"9090\"]}"
      
  5. Now you should be able to access that app on your local machine at http://localhost:9090.

Connecting to Windows Instances

Info

Windows will return an error when connecting via RDP, you may ignore.

“You are connecting to the RDP host "<IP Address>". The certificate couldn't be verified back to a root certificate. Your connection may not be secure. Do you want to continue?”

Connect to Windows shell

Connecting to the Windows instance’s shell is similar to accessing a linux instance’s shell. Just follow instructions in SSM access to an Instance.

Connect to Windows desktop using SSM session manager

Connecting to the Windows desktop requires a few more steps.

  1. Connect to the Windows shell.

  2. Create a new user and and it to the “Administrators” group

    Code Block
    $Password = ConvertTo-SecureString "P@ssW0rD!" -AsPlainText -Force
    New-LocalUser "admin" -Password $Password -PasswordNeverExpires
    Add-LocalGroupMember -Group "Administrators" -Member "admin"
  3. Follow the SSM access to applications instructions to setup port forwarding to windows RDP

    Code Block
    aws ssm start-session --profile service-catalog \
                          --target i-0fd5c9ff0ef675ceb \
                          --document-name AWS-StartPortForwardingSession \
                          --parameters '{"portNumber":["3389"],"localPortNumber":["3389"]}'
  4. Install the Microsoft Remote Desktop client on your computer.

    1. Click “+” to add a new PC. In the “PC Name” field, enter “localhost”. 

  5. Log in with username “admin” and password "P@ssW0rD!"

Connect to Windows desktop using VPN and Jumpcloud (Sage Staff Only)

Sage staff have the option to access the windows desktop using their Jumpcloud credentials. Here are the steps

  1. Once an instance is provisioned locate its instance id (i.e. i-06531e8f977ca20ea)

  2. Create a Jira IT issue and make a request to associate your jumpcloud user with that instance id

  3. Once Sage IT will make the association you can login to the VPN and use remote desktop to login to the instance with your Jumpcloud credentials.

...

  1. using its IP address (seen in the Outputs tab of the Service Catalog provisioned product) as the PC name and your Jumpcloud credentials as user/password.

Provisioning and Using a Notebook

We provide a product which runs an R-Studio or Jupyter notebook.

Launching the Notebook

To create a notebook instance, select “Products List” from the navigation panel on the left. Next, select EC2 with Notebook Software, fill in the requested choices (including selecting R-Studio or Jupyter) and launch.

Once you provision the product, visit the page for the provisioned product. Scroll half way down and click on “Outputs”. At the bottom of the page, click on “NotebookConnectionURI”. This will open a new browser tab with the notebook running.

...

Connecting to Synapse

Both notebooks feature passwordless login to Synapse. In RStudio, simply type:

Code Block
synapser::synLogin()

...

In Jupyter, first install these dependencies:

Code Block
pip install boto3 synapseclient

and then:

Code Block
import synapseclient
syn = synapseclient.Synapse()
syn.login()

...

Managing an Instance

On the product page for each provisioned product, there is an orange “ACTIONS” button in the upper right that allows you to manage the instance. You can also visit the EC2 Console to check on the status of your instance.

Update

Using the update action allows you to change parameters or update to a new version of the product. WARNING: changes to configuration parameters usually result in a recreation (“replacement”) of the instance, any data saved on the instance will be lost, and the nature of the update by Amazon is difficult to predict. We recommend that you save any important data to S3, provision a new instance and terminate the original.

Terminate

The terminate action deletes the instance permanently.

Stop

Stops the instance. We recommend using .

Note that there are many instance types to choose from. To learn about their details, including CPU and memory, see https://aws.amazon.com/ec2/instance-types/. To learn about their costs, see https://aws.amazon.com/ec2/pricing/on-demand/ or use the AWS pricing calculator.

After filling out the product choices, click “Launch.”

Once you provision the product, visit the page for the provisioned product. Scroll half way down and click on “Events”, find the "PROVISION_PRODUCT" card (which may require sorting Events by Date, oldest first) and expand the card to see a set of links. Now, click on the link for “NotebookConnectionURI”. This will open a new browser tab with the notebook running.

Image Added

Connecting to Synapse

Both notebooks feature passwordless login to Synapse. In RStudio, simply type:

Code Block
synapser::synLogin()

...

In Jupyter, first install these dependencies:

Code Block
pip install boto3 synapseclient

and then:

Code Block
import synapseclient
syn = synapseclient.Synapse()
syn.login()

...

Managing an Instance

On the product page for each provisioned product, there is an orange “ACTIONS” button in the upper right that allows you to manage the instance. You can also visit the EC2 Console to check on the status of your instance.

Update

Using the update action allows you to change parameters or update to a new version of the product. WARNING: changes to configuration parameters usually result in a recreation (“replacement”) of the instance, any data saved on the instance will be lost, and the nature of the update by Amazon is difficult to predict. We recommend that you save any important data to S3, provision a new instance and terminate the original.

Terminate

The terminate action deletes the instance permanently.

Stop

Stops the instance. We recommend doing this when the instance is not in use to save cloud costs.

Start

Starts Re-starts the instance, after it has been stopped.

Restart

Performs a restartStops and restarts the instance in a single step. This can help if an instance is in a bad state.

Update Plan

The service catalog allow you to create a plan (AKA a changeset) when updating your product. A plan will provide you with some information on what AWS will do before you execute the planned update. After a plan is created you can either delete, modify, or execute the plan. The update does not happen until you execute the plan.

Change Owner

Please ask #sageit for help transferring ownership if that is deemed to be necessary rather than using this action. Otherwise, you may find that you cannot connect to your instance.

Cloud Storage (S3)

Note: S3 storage products currently are available only to Sage employees.

Cloud Storage Products

To understand the cost of S3 buckets see https://aws.amazon.com/s3/pricing/ or use the AWS pricing calculator. Note that while data egress can be a substantial cost, our Service Catalog provisions buckets and EC2 instances in the same AWS region. Since AWS does not charge for egress to a location within a bucket’s region, accessing data from an instance in provisioned by our Service Catalog will avoid such costs.

S3 Private Encrypted Bucket

This product builds an encrypted AWS S3 bucket with private access. To access the bucket, see the section Using an S3 Bucket, below. Note that only the user who provisioned the bucket can access it. To share a bucket, use the Synapse Bucket instead.

S3 Synapse Bucket

This product builds an AWS S3 bucket with private access for Synapse. The additional configuration parameter is the name of the Synapse user who will be allowed to link the bucket to a Synapse project. Once linked to a project, Synapse can be used to share the bucket with other users.

Creating S3 Products

Product Version and Parameters

When provisioning, you are prompted for two names.  On the “Product Version” screen of the wizard, you must name your product. This is the name you will see listed in Service Catalog under “Provisioned Products” later. Please include your name in the product, e.g. if your name is Jane Doe and you are provisioning a bucket for your project Foo, you could name it jdoe-bucket-foo. On the “Parameters” screen, you have the option of naming the bucket itself (otherwise a name will be assigned). That is the name to use  when accessing the bucket through the Amazon S3 client or via the Amazon S3 console.

...

There is an additional field in S3 Synapse Bucket, “SynapseProdARN”, that should be left as its default value.

Tag options, Notifications, and Review

The final three screens of the wizard are the same as in Creating an EC2 Instance above.

Using an S3 Bucket

As with EC2 products, once provisioning is complete your S3 product will appear in the “Provisioned Products” list, showing status Available. Select “Provisioned Product Details” from the navigation panel on the left, and click on your product. A product that has a “Succeeded” event will have outputs that include a “BucketUrl” link.

Using a bucket from the AWS Console

Clicking on the “BucketUrl” link from the provisioned product takes you to the S3 console where you can upload and download files.

Using a bucket with the S3 client

To authenticate the S3 client for bucket access, follow the set-up steps under SSM access to an Instance, above. You can then access the bucket by including “--profile service-catalog” in the command, e.g. to download a file the syntax is:

Code Block
aws --profile service-catalog s3 cp  s3://<your-bucket-name>/file.txt ./file.txt

Using a bucket with the Synapse client

You can access a provisioned Synapse bucket using one of the synapse clients to push and pull data. In order to complete the setup of the Synapse Bucket, you must manually complete the final step of setting the S3 bucket as a Synapse upload location and linking it to a project. The project’s sharing settings can then be used to control what Synapse users can access the bucket.

Managing an S3 Bucket

On the product page for each provisioned product, there is an orange “ACTIONS” button in the upper right that allows you to manage the instance.

Update

Using the update action allows you to change parameters or update to a new version of the product.

Terminate

The terminate action removes the bucket product from the Service Catalog however the bucket is not immediately deleted. The bucket and data in it will be placed in an archived state where no users will have access to the bucket. Upon expiration of the archive period (30 days) the bucket and data will get automatically purged from AWS. The owner of the bucket may request Sage IT to restore access to the bucket before it is purged

Change Owner

Please ask #sageit for help transferring ownership if that is deemed to be necessary rather than using this action.

Scheduled Jobs

Note: Scheduled Jobs products currently are available only to Sage employees.

Scheduled Job Products

Scheduled jobs are essentially cron jobs that we’ve setup to run in AWS batch. This product allows you to run an arbitrary task on the cloud to process your workload. The task must be run using a docker image. The task can be manually triggered or it can be setup to run on a schedule. A scheduled job is limited to 20GB storage and to the memory/CPU chosen when launching the product.

Creating Scheduled Job Products

To create a scheduled job, select the “Products List” from the navigation panel on the left. Next, select “Scheduled Jobs” from the list. On the product page, click the orange “LAUNCH PRODUCT” button under the product description, then fill out the wizard. Most of the parameters contain helpful information describing the valid inputs.

...

Notes:

  • The “Environment” parameters are required fields. You can replace the default values, however please do not leave these fields empty. Also pay special attention to the formatting that’s required for the values. The deployment will fail if the formatting isn’t correct.

  • There is an AWS bug that prevents disabling the scheduled job after it has been enabled. The workaround is to either (1) Terminate the job and create a new one or (2) Set the rate to some distant time in the future (i.e. 3650 days).

Tag options, Notifications, and Review

The final three screens of the wizard are the same as in Creating an EC2 Instance above.

Manually Run Scheduled Jobs

Once provisioning is complete your Scheduled Jobs product will appear in the “Provisioned Products” list, showing status Available. Select “Provisioned Product Details” from the navigation panel on the left, and click on your product. A product that has a “Succeeded” event will have outputs that include a “SubmitJobApi” link. The link is a URL to trigger the job, which can be entered in your web browser or a cUrl command. Entered into your web browser the result will look like:

...

NOTE: Since invoking the URL requires no authentication, anyone obtaining the URL can trigger the job. So the URL created for the provisioned product must be treated as a secret, just like any password or key. If its secrecy is compromised the provisioned product must be terminated.

View Scheduled Job Status

Click on the “Jobs” link to view the batch job status. Once triggered the job should transition from STARTING → RUNNING → SUCCEEDED.

...

Image RemovedImage Removed

Access Scheduled Job Logs

The job will send logs to AWS cloudwatch. To access the logs click on the “Logs” link in PROVISION_PRODUCT outputs. Below is an example of a log for a job that ran the command “printenv”.

...

Scheduled Job Secrets

Secrets are stored in the AWS secrets manager and exposed to the job as environment variables. The logs above print out the environment variables from the job. Take note of the “SCHEDULED_JOB_SECRETS” parameter in the logs. The secrets that are passed into this product are exposed as environment variables in the logs by the “printenv” command. Please make sure to never expose secrets in this way. DO NOT PRINT ENVIRONMENT VARIABLES.

Accessing Scheduled Job Secrets

Job secrets can be access a number of different ways. The first way is simply to get it from the docker container environment variable SCHEDULED_JOB_SECRETS.

Environment variable example:

Code Block
printenv SCHEDULED_JOB_SECRETS|jq .SECRET1
"Shh1"

Passing Synapse access token as a secret using environment variables:

You can use a scheduled job to access data in Synapse. To do so:

  1. Create a personal access token (PAT) as explained here.

  2. In the Secrets parameter of your scheduled job, include the PAT, e.g.

    Code Block
    "PAT":"eyJ0eXAiOiJKV1QiLCJraWQiOiJXN05OOldMSlQ..."

    You may include other secrets if needed, each name/value pair separated by a comma.

  3. Add code in your containerized script to parse the “SCHEDULED_JOB_SECRETS" JSON object, extract the PAT and use it to authenticate. Here is a Python example, synapse_login.py:

    Code Block
    import json, os, synapseclient
    secrets=json.loads(os.getenv("SCHEDULED_JOB_SECRETS"))
    auth_token = secrets["PAT"]
    syn=synapseclient.Synapse()
    syn.login(authToken=auth_token)
  4. Create a container image which includes the Synapse client, this script, and the command to invoke the script. Here is a Dockerfile to create such a container image:

    Code Block
    FROM sagebionetworks/synapsepythonclient
    COPY synapse_login.py synapse_login.py
    CMD python3 synapse_login.py

Now you can build the container image, push to a public Docker registry and use in your Scheduled Job.

Retrieving job secrets using the AWS client

To access the job secrets using either the AWS CLI or one of the AWS SDKs you must copy the “JobSecretArn” from the service catalog PROVISION_PRODUCT outputs. Then provide the JobSecretArn to either the AWS secrets manager CLI or one of the AWS SDKs to retrieve the secret.

AWS secrets manager CLI Example :

...

Transferring existing data into a Synapse-linked bucket

The most straightforward way to transfer files into a Synapse linked bucket is to pull the files from the bucket (e.g. using the S3 client for an S3 bucket) and then push them using the Synapse client. Running this process on an EC2 provisioned through Service Catalog will speed transfer and minimize data transfer costs. Another approach is to use the S3 client to directly move files between buckets and afterwards index them in Synapse. While this may be faster it is more complex because (1) the S3 client must be authorized to access both the source and destination buckets, and (2) indexing in Synapse requires the MD5 hash for each file being indexed. If you wish to pursue this approach please contact contact us for further guidance.

Managing an S3 Bucket

On the product page for each provisioned product, there is an orange “ACTIONS” button in the upper right that allows you to manage the instance.

Update

Using the update action allows you to change parameters or update to a new version of the product.

Terminate

The terminate action removes the bucket product from the Service Catalog however the bucket is not immediately deleted. The bucket and data in it will be placed in an archived state where no users will have access to the bucket. Upon expiration of the archive period (30 days) the bucket and data will get automatically purged from AWS. The owner of the bucket may request Sage IT to restore access to the bucket before it is purged

Change Owner

Please ask #sageit for help transferring ownership if that is deemed to be necessary rather than using this action.

Scheduled Jobs

Note: Scheduled Jobs products currently are available only to Sage employees.

Scheduled Job Products

Scheduled jobs are essentially cron jobs that we’ve setup to run in AWS batch. This product allows you to run an arbitrary task on the cloud to process your workload. The task must be run using a docker image. The task can be manually triggered or it can be setup to run on a schedule. A scheduled job is limited to 20GB storage, to the memory/CPU chosen when launching the product, and to one hour duration.

Creating Scheduled Job Products

To create a scheduled job, select the “Products List” from the navigation panel on the left. Next, select “Scheduled Jobs” from the list. On the product page, click the orange “LAUNCH PRODUCT” button under the product description, then fill out the wizard. Most of the parameters contain helpful information describing the valid inputs.

...

Notes:

  • The “Environment” parameters are required fields. You can replace the default values, however please do not leave these fields empty. Also pay special attention to the formatting that’s required for the values. The deployment will fail if the formatting isn’t correct.

  • There is an AWS bug that prevents disabling the scheduled job after it has been enabled. The workaround is to either (1) Terminate the job and create a new one or (2) Set the rate to some distant time in the future (i.e. 3650 days).

Tag options, Notifications, and Review

The final three screens of the wizard are the same as in Creating an EC2 Instance above.

Manually Run Scheduled Jobs

Once provisioning is complete your Scheduled Jobs product will appear in the “Provisioned Products” list, showing status Available. Select “Provisioned Product Details” from the navigation panel on the left, and click on your product. A product that has a “Succeeded” event will have outputs that include a “SubmitJobApi” link. The link is a URL to trigger the job, which can be entered in your web browser or a cUrl command. Entered into your web browser the result will look like:

...

NOTE: Since invoking the URL requires no authentication, anyone obtaining the URL can trigger the job. So the URL created for the provisioned product must be treated as a secret, just like any password or key. If its secrecy is compromised the provisioned product must be terminated.

View Scheduled Job Status

Click on the “Jobs” link to view the batch job status. Once triggered the job should transition from STARTING → RUNNING → SUCCEEDED.

...

Image AddedImage Added

Access Scheduled Job Logs

The job will send logs to AWS cloudwatch. To access the logs click on the “Logs” link in PROVISION_PRODUCT outputs. Below is an example of a log for a job that ran the command “printenv”.

...

Aside from these logs, Service Catalog does not monitor or respond to job failures. It the responsibility of the job owner to check CloudWatch and/or instrument their code to send alerts when failures occur.

Scheduled Job Secrets

Secrets are stored in the AWS secrets manager and exposed to the job as environment variables. The logs above print out the environment variables from the job. Take note of the “SCHEDULED_JOB_SECRETS” parameter in the logs. The secrets that are passed into this product are exposed as environment variables in the logs by the “printenv” command. Please make sure to never expose secrets in this way. DO NOT PRINT ENVIRONMENT VARIABLES.

Accessing Scheduled Job Secrets

Job secrets can be access a number of different ways. The first way is simply to get it from the docker container environment variable SCHEDULED_JOB_SECRETS.

Environment variable example:

Code Block
printenv SCHEDULED_JOB_SECRETS|jq .SECRET1
"Shh1"

Passing Synapse access token as a secret using environment variables:

You can use a scheduled job to access data in Synapse. To do so:

  1. Create a personal access token (PAT) as explained here.

  2. In the Secrets parameter of your scheduled job, include the PAT, e.g.

    Code Block
    "PAT":"eyJ0eXAiOiJKV1QiLCJraWQiOiJXN05OOldMSlQ..."

    You may include other secrets if needed, each name/value pair separated by a comma.

  3. Add code in your containerized script to parse the “SCHEDULED_JOB_SECRETS" JSON object, extract the PAT and use it to authenticate. Here is a Python example, synapse_login.py:

    Code Block
    import json, os, synapseclient
    secrets=json.loads(os.getenv("SCHEDULED_JOB_SECRETS"))
    auth_token = secrets["PAT"]
    syn=synapseclient.Synapse()
    syn.login(authToken=auth_token)
  4. Create a container image which includes the Synapse client, this script, and the command to invoke the script. Here is a Dockerfile to create such a container image:

    Code Block
    FROM sagebionetworks/synapsepythonclient
    COPY synapse_login.py synapse_login.py
    CMD python3 synapse_login.py

Now you can build the container image, push to a public Docker registry and use in your Scheduled Job.

Retrieving job secrets using the AWS client

To access the job secrets using either the AWS CLI or one of the AWS SDKs you must copy the “JobSecretArn” from the service catalog PROVISION_PRODUCT outputs. Then provide the JobSecretArn to either the AWS secrets manager CLI or one of the AWS SDKs to retrieve the secret.

AWS secrets manager CLI Example :

Code Block
aws secretsmanager --output json get-secret-value --secret-id arn:aws:secretsmanager:us-east-1:465877038949:secret:JobSecrets-rEx1eKL9pokj-h7hCGO
{
    "ARN": "arn:aws:secretsmanager:us-east-1:465877038949:secret:JobSecrets-rEx1eKL9pokj-h7hCGO",
    "Name": "JobSecrets-rEx1eKL9pokj",
    "VersionId": "09904c83-f4ea-4664-a773-eded857ab5a0",
    "SecretString": "{ \"SECRET1\":\"Shh1\" }",
    "VersionStages": [
        "AWSCURRENT"
    ],
    "CreatedDate": "2021-12-18T09:01:23.690000-08:00"
}

...

Code Block
# Use this code snippet in your app.
# If you need more information about configurations or implementing the sample code, visit the AWS docs:   
# https://aws.amazon.com/developers/getting-started/python/

import boto3
import base64
from botocore.exceptions import ClientError


def get_secret():

    secret_name = "arn:aws:secretsmanager:us-east-1:465877038949:secret:JobSecrets-rEx1eKL9pokj-h7hCGO"
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    # In this sample we only handle the specific exceptions for the 'GetSecretValue' API.
    # See https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_GetSecretValue.html
    # We rethrow the exception by default.

    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
    except ClientError as e:
        if e.response['Error']['Code'] == 'DecryptionFailureException':
            # Secrets Manager can't decrypt the protected secret text using the provided KMS key.
            # Deal with the exception here, and/or rethrow at your discretion.
            raise e
        elif e.response['Error']['Code'] == 'InternalServiceErrorException':
            # An error occurred on the server side.
            # Deal with the exception here, and/or rethrow at your discretion.
            raise e
        elif e.response['Error']['Code'] == 'InvalidParameterException':
            # You provided an invalid value for a parameter.
            # Deal with the exception here, and/or rethrow at your discretion.
            raise e
        elif e.response['Error']['Code'] == 'InvalidRequestException':
            # You provided a parameter value that is not valid for the current state of the resource.
            # Deal with the exception here, and/or rethrow at your discretion.
            raise e
        elif e.response['Error']['Code'] == 'ResourceNotFoundException':
            # We can't find the resource that you asked for.
            # Deal with the exception here, and/or rethrow at your discretion.
            raise e
    else:
        # Decrypts secret using the associated KMS key.
        # Depending on whether the secret is a string or binary, one of these fields will be populated.
        if 'SecretString' in get_secret_value_response:
            secret = get_secret_value_response['SecretString']
        else:
            decoded_binary_secret = base64.b64decode(get_secret_value_response['SecretBinary'])
            
    # Your code goes here. 

Best Practices

When to Use the Service Catalog

The benefits of the Service Catalog are that it is self-service, meant to fulfill the most common needs for compute and storage, and that it creates resources in a PHI-safe environment. We encourage you to use it preferentially. However, it will not fulfill all needs. For custom development in a PHI-safe environment, the “scicomp” account remains the preference for Sage employees. For custom development that does not concern PHI, the “sandbox” account can be used by Sage employees. For more information, see the Sage Bionetworks intranet article on computing. If you have any questions about which environment is most suitable, questions are welcome in the #sageit Slack channel!

Ephemeral Instances, Persistent Data

We encourage you to treat your instances as ephemeral. It is very easy with the Service Catalog to create new instances, and since updating the parameters of an instance frequently results in their recreation, it is best not to get too attached to any one. This makes getting in the habit of storing your data in Synapse or S3 highly desirable. If you leave the organization you should again store any data your colleagues will need in Synapse or S3, then terminate your instances. Any instances left running will be terminated upon your departure.

EC2 SSM Session Timeouts

There is a short idle timeout on the SSM sessions started in the browser by clicking on the EC2 ConnectionURI link as described above. This will happen if you don't actively use the session for twelve minutes or more. The session in a browser is designed to allow quick access to EC2 instances. If you prefer longer sessions then we recommend that you setup command line SSM access to an instance.

...

instances. Any instances left running will be terminated upon your departure.

EC2 SSM Session Timeouts

There is a short idle timeout on the SSM sessions started in the browser by clicking on the EC2 ConnectionURI link as described above. This will happen if you don't actively use the session for twelve minutes or more. The session in a browser is designed to allow quick access to EC2 instances. If you prefer longer sessions then we recommend that you setup command line SSM access to an instance.

Frequently Asked Questions

What are my responsibilities regarding security, as a Sage Bionetworks worker?

You are responsible for securing cloud resources and the data they contain much as you are for securing your company issued laptop and its data. Your responsibilities include those described by the Sage Bionetworks Information Security Policy, including the Acceptable Use Agreement. https://sites.google.com/sagebase.org/intranet/hr/policies Note that Service Catalog serves as a secure space for storing and processing sensitive data, but consult with Sage Governance team for the restrictions and requirements for your specific data.

Why didn’t I receive an email about my new product?

Emails are not enabled in the current system. You can get status updates on the provisioned products pages, including connection instructions when the provisioning succeeds.

Note: you may need to expand the product events to view connection information.

...

Which OS versions are currently available?

  • Amazon Linux 2 AMI 2.0.20190823.1 x86_64 HVM gp2

  • Ubuntu 18.04 (Bionic)

  • Windows Server 2019

Why can’t I set my own AMI ID?

The current product templates are meant to cover the most common use cases. You may request a new product if your use case is not covered. Sage employees can do this by filing a Jira issue. Others should contact their community manager.

Why does the status show “available” even after I stop my EC2?

The status on the service catalog UI is either “available” or “under change”. That status only pertains to the service catalog product, which can be any provisioned AWS resource or bundle of resources (i.e. DB, ALB, S3, ECS, etc..). That status does not pertain specifically to an EC2 instance’s activity status. Unfortunately the service catalog UI does not provide EC2 status. To view your EC2’s status you must click on the LinuxInstance link and view the EC2 status on the AWS EC2 console UI.

...

What is a User ARN?

ARN stands for Amazon Resource Name, a unique identifier for a particular resource in AWS.

Why can’t I access my windows instance?

Windows instances take much longer than linux instances to initialize. It can take between 5-15 minutes to complete the initialization process. If you provision a windows instance you may need to wait up to 15 minutes after the Service Catalog completes before you can RDP into it.

Why can’t I access my provisioned EC2 instance anymore?

You had previously been able to get terminal access your provisioned EC2 instance however you cannot access it anymore. We suggest that you perform the following verification:

  1. Goto the EC2 console page and verify that the EC2 is in the running state. If it’s not in the running state goto the service catalog console and select the EC2 instance then select Actions → Start

  2. If you still cannot access your EC2 instance the cause might be that your disk volume is completely full. If this happens you will need to contact Sage IT for help and provide your EC2 instance ID.

How can I ask a question or otherwise get help?

Sage employees should use the #sageit Slack channel. Members of other communities can contact their community manager or reach out via the Synapse help forum.

...