Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The following provides instructions on how to log on to the Sage Scientific Compute workspace using your Synapse credentials, and how to use the products provided in the AWS Service Catalog to setup or modify EC2 instances and S3 buckets.

...

Access to Sage Scientific Compute workspace is organized by communities. Community membership is defined by a Synapse Team and managed by its community manager. Each community also has a defined entry point URL, as shown below:

Community

Synapse Team

Service Catalog entry point

Community Manager

Sage Bionetworks

Sage Bionetworks Employees

https://sc.sageit.org

Sage IT (scipoolsupport@synapse.org)

RECOVER Data Analysts

RECOVER Service Catalog External Users

https://sc.sageit.org

Solveig Sieberts (sieberts@synapse.org)

Schwannomatosis Open Research Collaborative

SWTNS

https://sc.sageit.org

Robert Allaway (allawayr@synapse.org)

Accelerating Medicines Partnership - Alzheimer’s Disease (AMP-AD) Consortium

AMPAD WG ServiceCatalogUsers

https://ad.strides.sc.sageit.org

Jessica Britton (jessica.britton@synapse.org)

Bill & Melinda Gates - Ki Team

BMGFKI ServiceCatalogUsers

https://bmgfki.sc.sageit.org

Ryan Hafen (rhafen@synapse.org)

For Sage Bionetworks employees, access is granted during employee on boarding. For other groups, the community manager will add your Synapse account to the list of allowed users for the compute workspace. The community manager will also receive reports regarding cloud expenditures and contact community members when costs exceed their expectations, to review the need for the expenditures.

...

This product provides a basic EC2 instance with Docker installed.

...

EC2 with Notebook Software

This product is an Ubuntu a Linux EC2 instance with R Studio or Jupyter notebook software installed.

...

  • CostCenter: bill the product to this cost center code. If the appropriate cost center code is not in the list then select “Other / 000001” and create a custom “CostCenterOther” tag. Set the tag to a value from our official list of cost centers codes. Example:

...

Note: The owner email tag is automatically set to <Synapse Username>@synapse.org

Notifications

Please skip the Notifications pane. SNS notifications are not operational at this time.

...

The AWS SSM allows direct access to private instances from your own computer terminal. To setup access with the AWS SSM we need to create a special Synapse personal access token (PAT) that will work with the Sage Service Catalog. This is special PAT that can only be created using this workflow, creating a PAT from the Synapse personal token manager web page will NOT work.

  1. Request a Synapse PAT by visiting https://sc.sageit.org/personalaccesstoken , for Sage employees, or https://ad.strides.sc.sageit.org/personalaccesstoken for AMP-AD members. (You may need to login to Synapse.) If you have already created a PAT through this mechanism and are repeating the process you must first visit the token management page in Synapse and delete the existing one with the same name.

  2. After logging into Synapse a file containing the PAT, which is a long character string (i.e. eyJ0eXAiOiJ...Z8t9Eg), is returned to you. Save the file to your local machine and note the location where you saved it to then close the browser session.

Note: At this point you can verify that the PAT for the Service Catalog was successfully created by viewing the Synapse token management page. When the PAT expires you will need to repeat these steps to create a new PAT. The PAT should look something like this

...

  1. Setup profile for SSM access

  2. Run an application on the EC2 (i.e. docker run -p 80:80 httpd)

    Code Block
    [ec2-user@ip-10-49-26-50 ~]$ docker run -p 80:80 httpd
    Unable to find image 'httpd:latest' locally
    latest: Pulling from library/httpd
    33847f680f63: Pull complete
    d74938eee980: Pull complete
    963cfdce5a0c: Pull complete
    8d5a3cca778c: Pull complete
    e06a573b193b: Pull complete
    Digest: sha256:71a3a8e0572f18a6ce71b9bac7298d07e151e4a1b562d399779b86fef7cf580c
    Status: Downloaded newer image for httpd:latest
    AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message
    AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message
    [Thu Jul 22 23:54:12.106344 2021] [mpm_event:notice] [pid 1:tid 140706544895104] AH00489: Apache/2.4.48 (Unix) configured -- resuming normal operations
    [Thu Jul 22 23:54:12.107307 2021] [core:notice] [pid 1:tid 140706544895104] AH00094: Command line: 'httpd -D FOREGROUND'
  3. To provide access to that app, an SC user can use the port forwarding feature to gain access to the app by running the AWS SSM CLI command:

    Code Block
    aws ssm start-session --profile service-catalog \
                          --target i-0fd5c9ff0ef675ceb \
                          --document-name AWS-StartPortForwardingSession \
                          --parameters '{"portNumber":["80"],"localPortNumber":["9090"]}'
      
  4. To provide access to that app in the Windows Command Prompt use this syntax:

    Code Block
    aws ssm start-session --profile service-catalog \
                          --target i-0fd5c9ff0ef675ceb \
                          --document-name AWS-StartPortForwardingSession \
                          --parameters "{\"portNumber\":[\"80\"],\"localPortNumber\":[\"9090\"]}"
      
  5. Now you should be able to access that app on your local machine at http://localhost:9090.

...

Provisioning and Using a Notebook

We provide a simple-to-use product which runs an R-Studio or Jupyter notebook. We also provide a way to run notebooks in general (a “custom” notebook), allowing you to run a Jupyter notebook or to run an R-Studio notebook which you have customized.

Simple R-Studio Notebook

To create a notebook instance, select

Launching the Notebook

To create a notebook instance, select “Products List” from the navigation panel on the left. Next, select EC2 : Ubuntu Linux with Notebook Software, fill in the requested choices (including selecting R-Studio or Jupyter) and launch.

Once you provision the product, visit the page for the provisioned product. Scroll half way down and click on “Outputs”. At the bottom of the page, click on “NotebookConnectionURI”. This will open a new browser tab with R-Studio the notebook running.

...

Custom Notebook

Provision and Connect to the Notebook

Following the instructions for provisioning a machine, provision an EC2: Linux Docker instance.

Follow the steps under SSM Access to an Instance to connect to the machine as the ec2-user user.

Start up the Notebook Software

Log in to your Synapse account in your web browser and, under your Settings tab, create a Personal Access Token with View, Modify, and Download scope. Returning to the command line of the provisioned instance, run:

Code Block
nano ~/.bash_profile

At the bottom of the file add the lines:

Code Block
SYNAPSE_AUTH_TOKEN=<token here>

export SYNAPSE_AUTH_TOKEN

Now save the file and run

Code Block
source ~/.bash_profile

Going forward, your Synapse access token will be available as an environment variable. 

We now show how to access a Jupyter Notebook. Instructions for the R Studio Notebook follow.

Run a Jupyter Notebook

Start the notebook, passing the access token:

Code Block
docker run -d -p 8888:8888 -v ~/work:/home/jovyan/work -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN \
--name jupyter jupyter/scipy-notebook

Note: Passing the access token is a convenience, allowing you to connect to Synapse in your notebook without typing your credentials.

Note: The docker image jupyter/scipy-notebook is just an example. The flexibility of Docker is that you can use any image you wish.

Now add the Synapse Python client to your notebook.

Code Block
docker exec jupyter pip install synapseclient

Connect to the notebook in your browser

Get the connection URL. Run:

Code Block
docker logs jupyter

You will see a line something like:

Code Block
http://127.0.0.1:8888/?token=8bd05708879ee8faa23813807874291267281cac9efe6e7b

Return to your laptop console and connect to the notebook, following the instructions SSM access to applications, taking care that “portNumber” matches the value used when starting up the notebook with Docker. The command will look something like:

Code Block
aws ssm start-session --profile service-catalog --target i-01289326a8b08c72d  \
--document-name AWS-StartPortForwardingSession  \
--parameters '{"portNumber":["8888"],"localPortNumber":["8888"]}'

(We call this “port forwarding” because it connects a port on your local machine to a port on the remote EC2 instance.) Now, in browser visit the URL shown in the ‘docker logs’ output. You will see the Jupyter console.

Connect to Synapse from a Jupyter Notebook

In the Jupyter console, open a new notebook and run the following:

Code Block
import synapseclient
syn=synapseclient.Synapse()
# Note, you do not need to pass credentials when logging in.
syn.login()
# download this sample notebook
syn.get("syn25931378", downloadLocation="/home/jovyan/work")

Now return to the Files menu and you will see the downloaded notebook in the ‘work’ folder.

Clicking ‘Save’ in Jupyter will only save a copy top the provisioned EC2 machine, which you may terminate. If you want to save important results, push a copy to Synapse, like so:

Code Block
myOwnProject = "syn25931363"
myCopy = synapseclient.File("/home/jovyan/work/sample_notebook.ipynb", parent=myOwnProject)
syn.store(myCopy)

Stopping your notebook

To stop port-forwarding, in the terminal where you started port forwarding, hit Control-C.

To stop the notebook, return to the terminal in which you have a connection to the EC2 machine or start a new connection as earlier:

Code Block
aws ssm start-session --profile service-catalog --target i-0fd5c9ff0ef675ceb --parameters command="sudo su - ec2-user"

where i-0fd5c9ff0ef675ceb is the instance ID of your EC2 instance. At the EC2 command line you can stop and remove the notebook container:

Code Block
docker stop jupyter
docker rm jupyter

...

On the EC2. run the notebook:

Code Block
sudo docker run -d --rm -p 8787:8787 rocker/rstudio

which connects an R-Studio notebook to port 8787. (You can customize the command based on the instructions.). Now, start a port forwarding session, forwarding port 8787:

Code Block
aws ssm start-session --profile service-catalog --target i-0fd5c9ff0ef675ceb \
 --document-name AWS-StartPortForwardingSession  \
 --parameters '{"portNumber":["8787"],"localPortNumber":["8787"]}'

As before, the value for ‘target’ is just an example.

...

Connecting to Synapse

Both notebooks feature passwordless login to Synapse. In RStudio, simply type:

Code Block
synapser::synLogin()

...

In Jupyter, first install these dependencies:

Code Block
pip install boto3 synapseclient

and then:

Code Block
import synapseclient
syn = synapseclient.Synapse()
syn.login()

...

Managing an Instance

On the product page for each provisioned product, there is an orange “ACTIONS” button in the upper right that allows you to manage the instance. You can also visit the EC2 Console to check on the status of your instance.

...

Using the update action allows you to change parameters or update to a new version of the product. WARNING: changes to configuration parameters usually result in a recreation (“replacement”) of the instance, any data saved on the instance will be lost, and the nature of the update by Amazon is difficult to predict. We recommend that you save any important data to S3, provision a new instance and terminate the original.

Terminate

The terminate action deletes the instance permanently.

...

  • The “Environment” parameters are required fields. You can replace the default values, however please do not leave these fields empty. Also pay special attention to the formatting that’s required for the values. The deployment will fail if the formatting isn’t correct.

  • There is an AWS bug that prevents disabling the scheduled job after it has been enabled. The workaround is to either (1) Terminate the job and create a new one or (2) Set the rate to some distant time in the future (i.e. 3650 days).

...

Secrets are stored in the AWS secrets manager and exposed to the job as environment variables. The logs above print out the environment variables from the job. Take note of the “SCHEDULED_JOB_SECRETS” parameter in the logs. The secrets that are passed into this product are exposed as environment variables in the logs by the “printenv” command. Please make sure to never expose secrets in this way. DO NOT PRINT ENVIRONMENT VARIABLES.

Accessing Scheduled Job Secrets

Job secrets can be access a number of different ways. The first way is simply to get it from the docker container environment variable SCHEDULED_JOB_SECRETS.

Environment variable example:

...