The following provides instructions on how to log on to the Sage Scientific Compute workspace using your Synapse credentials, and how to use the products provided in the AWS Service Catalog to setup or modify EC2 instances and S3 buckets.
...
Access to Sage Scientific Compute workspace is organized by communities. Community membership is defined by a Synapse Team and managed by its community manager. Each community also has a defined entry point URL, as shown below:
Community | Synapse Team | Service Catalog entry point | Community Manager |
---|---|---|---|
Sage Bionetworks | Sage IT (scipoolsupport@synapse.org) | ||
RECOVER Data Analysts | Solveig Sieberts (sieberts@synapse.org) | ||
Schwannomatosis Open Research Collaborative | Robert Allaway (allawayr@synapse.org) | ||
Accelerating Medicines Partnership - Alzheimer’s Disease (AMP-AD) Consortium | Jessica Britton (jessica.britton@synapse.org) | ||
Bill & Melinda Gates - Ki Team | Ryan Hafen (rhafen@synapse.org) |
For Sage Bionetworks employees, access is granted during employee on boarding. For other groups, the community manager will add your Synapse account to the list of allowed users for the compute workspace. The community manager will also receive reports regarding cloud expenditures and contact community members when costs exceed their expectations, to review the need for the expenditures.
...
This product provides a basic EC2 instance with Docker installed.
...
EC2 with Notebook Software
This product is an Ubuntu a Linux EC2 instance with R Studio or Jupyter notebook software installed.
...
CostCenter: bill the product to this cost center code. If the appropriate cost center code is not in the list then select “Other / 000001” and create a custom “CostCenterOther” tag. Set the tag to a value from our official list of cost centers codes. Example:
...
Note: The owner email tag is automatically set to <Synapse Username>@synapse.org
Notifications
Please skip the Notifications pane. SNS notifications are not operational at this time.
...
The AWS SSM allows direct access to private instances from your own computer terminal. To setup access with the AWS SSM we need to create a special Synapse personal access token (PAT) that will work with the Sage Service Catalog. This is special PAT that can only be created using this workflow, creating a PAT from the Synapse personal token manager web page will NOT work.
Request a Synapse PAT by visiting https://sc.sageit.org/personalaccesstoken , for Sage employees, or https://ad.strides.sc.sageit.org/personalaccesstoken for AMP-AD members. (You may need to login to Synapse.) If you have already created a PAT through this mechanism and are repeating the process you must first visit the token management page in Synapse and delete the existing one with the same name.
After logging into Synapse a file containing the PAT, which is a long character string (i.e. eyJ0eXAiOiJ...Z8t9Eg), is returned to you. Save the file to your local machine and note the location where you saved it to then close the browser session.
Note: At this point you can verify that the PAT for the Service Catalog was successfully created by viewing the Synapse token management page. When the PAT expires you will need to repeat these steps to create a new PAT. The PAT should look something like this
...
Run an application on the EC2 (i.e. docker run -p 80:80 httpd)
Code Block [ec2-user@ip-10-49-26-50 ~]$ docker run -p 80:80 httpd Unable to find image 'httpd:latest' locally latest: Pulling from library/httpd 33847f680f63: Pull complete d74938eee980: Pull complete 963cfdce5a0c: Pull complete 8d5a3cca778c: Pull complete e06a573b193b: Pull complete Digest: sha256:71a3a8e0572f18a6ce71b9bac7298d07e151e4a1b562d399779b86fef7cf580c Status: Downloaded newer image for httpd:latest AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message [Thu Jul 22 23:54:12.106344 2021] [mpm_event:notice] [pid 1:tid 140706544895104] AH00489: Apache/2.4.48 (Unix) configured -- resuming normal operations [Thu Jul 22 23:54:12.107307 2021] [core:notice] [pid 1:tid 140706544895104] AH00094: Command line: 'httpd -D FOREGROUND'
To provide access to that app, an SC user can use the port forwarding feature to gain access to the app by running the AWS SSM CLI command:
Code Block aws ssm start-session --profile service-catalog \ --target i-0fd5c9ff0ef675ceb \ --document-name AWS-StartPortForwardingSession \ --parameters '{"portNumber":["80"],"localPortNumber":["9090"]}'
To provide access to that app in the Windows Command Prompt use this syntax:
Code Block aws ssm start-session --profile service-catalog \ --target i-0fd5c9ff0ef675ceb \ --document-name AWS-StartPortForwardingSession \ --parameters "{\"portNumber\":[\"80\"],\"localPortNumber\":[\"9090\"]}"
Now you should be able to access that app on your local machine at
http://localhost:9090
.
...
Provisioning and Using a Notebook
We provide a simple-to-use product which runs an R-Studio or Jupyter notebook. We also provide a way to run notebooks in general (a “custom” notebook), allowing you to run a Jupyter notebook or to run an R-Studio notebook which you have customized.
Simple R-Studio Notebook
To create a notebook instance, select
Launching the Notebook
To create a notebook instance, select “Products List” from the navigation panel on the left. Next, select EC2 : Ubuntu Linux with Notebook Software, fill in the requested choices (including selecting R-Studio or Jupyter) and launch.
Once you provision the product, visit the page for the provisioned product. Scroll half way down and click on “Outputs”. At the bottom of the page, click on “NotebookConnectionURI”. This will open a new browser tab with R-Studio the notebook running.
...
Custom Notebook
Provision and Connect to the Notebook
Following the instructions for provisioning a machine, provision an EC2: Linux Docker instance.
Follow the steps under SSM Access to an Instance to connect to the machine as the ec2-user
user.
Start up the Notebook Software
Log in to your Synapse account in your web browser and, under your Settings tab, create a Personal Access Token with View, Modify, and Download scope. Returning to the command line of the provisioned instance, run:
Code Block |
---|
nano ~/.bash_profile |
At the bottom of the file add the lines:
Code Block |
---|
SYNAPSE_AUTH_TOKEN=<token here>
export SYNAPSE_AUTH_TOKEN |
Now save the file and run
Code Block |
---|
source ~/.bash_profile |
Going forward, your Synapse access token will be available as an environment variable.
We now show how to access a Jupyter Notebook. Instructions for the R Studio Notebook follow.
Run a Jupyter Notebook
Start the notebook, passing the access token:
Code Block |
---|
docker run -d -p 8888:8888 -v ~/work:/home/jovyan/work -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN \
--name jupyter jupyter/scipy-notebook |
Note: Passing the access token is a convenience, allowing you to connect to Synapse in your notebook without typing your credentials.
Note: The docker image jupyter/scipy-notebook
is just an example. The flexibility of Docker is that you can use any image you wish.
Now add the Synapse Python client to your notebook.
Code Block |
---|
docker exec jupyter pip install synapseclient |
Connect to the notebook in your browser
Get the connection URL. Run:
Code Block |
---|
docker logs jupyter |
You will see a line something like:
Code Block |
---|
http://127.0.0.1:8888/?token=8bd05708879ee8faa23813807874291267281cac9efe6e7b |
Return to your laptop console and connect to the notebook, following the instructions SSM access to applications, taking care that “portNumber” matches the value used when starting up the notebook with Docker. The command will look something like:
Code Block |
---|
aws ssm start-session --profile service-catalog --target i-01289326a8b08c72d \
--document-name AWS-StartPortForwardingSession \
--parameters '{"portNumber":["8888"],"localPortNumber":["8888"]}' |
(We call this “port forwarding” because it connects a port on your local machine to a port on the remote EC2 instance.) Now, in browser visit the URL shown in the ‘docker logs’ output. You will see the Jupyter console.
Connect to Synapse from a Jupyter Notebook
In the Jupyter console, open a new notebook and run the following:
Code Block |
---|
import synapseclient
syn=synapseclient.Synapse()
# Note, you do not need to pass credentials when logging in.
syn.login()
# download this sample notebook
syn.get("syn25931378", downloadLocation="/home/jovyan/work") |
Now return to the Files menu and you will see the downloaded notebook in the ‘work’ folder.
Clicking ‘Save’ in Jupyter will only save a copy top the provisioned EC2 machine, which you may terminate. If you want to save important results, push a copy to Synapse, like so:
Code Block |
---|
myOwnProject = "syn25931363"
myCopy = synapseclient.File("/home/jovyan/work/sample_notebook.ipynb", parent=myOwnProject)
syn.store(myCopy) |
Stopping your notebook
To stop port-forwarding, in the terminal where you started port forwarding, hit Control-C.
To stop the notebook, return to the terminal in which you have a connection to the EC2 machine or start a new connection as earlier:
Code Block |
---|
aws ssm start-session --profile service-catalog --target i-0fd5c9ff0ef675ceb --parameters command="sudo su - ec2-user" |
where i-0fd5c9ff0ef675ceb
is the instance ID of your EC2 instance. At the EC2 command line you can stop and remove the notebook container:
Code Block |
---|
docker stop jupyter
docker rm jupyter |
...
On the EC2. run the notebook:
Code Block |
---|
sudo docker run -d --rm -p 8787:8787 rocker/rstudio |
which connects an R-Studio notebook to port 8787. (You can customize the command based on the instructions.). Now, start a port forwarding session, forwarding port 8787
:
Code Block |
---|
aws ssm start-session --profile service-catalog --target i-0fd5c9ff0ef675ceb \
--document-name AWS-StartPortForwardingSession \
--parameters '{"portNumber":["8787"],"localPortNumber":["8787"]}' |
As before, the value for ‘target’ is just an example.
...
Connecting to Synapse
Both notebooks feature passwordless login to Synapse. In RStudio, simply type:
Code Block |
---|
synapser::synLogin() |
...
In Jupyter, first install these dependencies:
Code Block |
---|
pip install boto3 synapseclient |
and then:
Code Block |
---|
import synapseclient
syn = synapseclient.Synapse()
syn.login() |
...
Managing an Instance
On the product page for each provisioned product, there is an orange “ACTIONS” button in the upper right that allows you to manage the instance. You can also visit the EC2 Console to check on the status of your instance.
...
Using the update action allows you to change parameters or update to a new version of the product. WARNING: changes to configuration parameters usually result in a recreation (“replacement”) of the instance, any data saved on the instance will be lost, and the nature of the update by Amazon is difficult to predict. We recommend that you save any important data to S3, provision a new instance and terminate the original.
Terminate
The terminate action deletes the instance permanently.
...
The “Environment” parameters are required fields. You can replace the default values, however please do not leave these fields empty. Also pay special attention to the formatting that’s required for the values. The deployment will fail if the formatting isn’t correct.
There is an AWS bug that prevents disabling the scheduled job after it has been enabled. The workaround is to either (1) Terminate the job and create a new one or (2) Set the rate to some distant time in the future (i.e. 3650 days).
...
Secrets are stored in the AWS secrets manager and exposed to the job as environment variables. The logs above print out the environment variables from the job. Take note of the “SCHEDULED_JOB_SECRETS” parameter in the logs. The secrets that are passed into this product are exposed as environment variables in the logs by the “printenv” command. Please make sure to never expose secrets in this way. DO NOT PRINT ENVIRONMENT VARIABLES.
Accessing Scheduled Job Secrets
Job secrets can be access a number of different ways. The first way is simply to get it from the docker container environment variable SCHEDULED_JOB_SECRETS.
Environment variable example:
...