Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Section
Column
width50%

On This page

Table of Contents
Column
width5%

Column
width45%

On Related Pages

Page Tree
root@self
startDepth3

Note that the elastic map reduce client on the shared servers has moved. See the updated instructions below.

First-Time Setup

Sign up to AWS

  1. Create an AWS Account
    • Use your sagebasefirstName.lastName@sagebase.org email address for the account name
    • Enter Sage Bionetworks' physical address for the address
    • You will need to use your own credit card temporarily
  2. Send Mike Kellen an email to have your new AWS account added to the consolidated bill
    • Once this is done, you will no longer be billed on your own credit card
    Log onto the AWS Console and sign up for S3, EC2, and Elastic MapReduce

Subscribe to services

  1. Sign up for EC2 http://aws.amazon.com/ec2/
  2. Sign up for S3 http://aws.amazon.com/s3/
  3. Sign up for ElasticMapReduce http://aws.amazon.com/elasticmapreduce/
  4. Sign up for Simple DB http://aws.amazon.com/simpledb/

Configure EC2

  1. Use the AWS console to create and download an a new SSH key named SageKeyPair and store it in your home directory
  2. Download it to ~/.ssh on the shared servers
  3. ssh to belltown
  4. Fix the permissions on it
    Code Block
    
    ~>chmod 600 ~/.ssh/SageKeyPair.pem
    mode of `/home/ndeflaux/.ssh/SageKeyPair.pem' retained as 0600 (rw-------)
    

Configure S3

  1. Use the AWS console to make a new S3 bucket named sagetest_YourUsername and make these four sagebio-YourUnixUsername Note: Do not put any underscores in your bucket name. Only use hyphens, lowercase letters and numbers.
    Image Added
  2. Make these five subdirectories
    1. scripts
    2. input
    3. output
    4. results
    5. logs
      Image Added

...

Get your AWS credentials

Get your security credentials from your AWS Account

  • Access Key ID
  • Secret Access Key

Set up the Elastic MapReduce command line tool

Set up your configuration files for the Elastic MapReduce AWS

...

tool installed on the shared

...

servers (belltown, sodo, ballard, ...)

  1. ssh to belltown
  2. Edit your ~/.profile and add the line module load aws/elastic-mapreduce-cli to get the Elastic MapReduce command line tools into your PATH
    Code Block
    
    ~>cat 
    .s3curl #!/bin/perl %awsSecretAccessKeys = ( YourUsername => { id => 'YourAccessKeyID', key => 'YourSecretAccessKey', }, ); Test that you can run s3curl Code Block /work/platform/bin/s3curl.pl --id $USER https://s3.amazonaws.com/sagetest-$USER/
    ~/.profile
    # Sample .profile for SuSE Linux
    #
    # This file is read each time a login shell is started.
    # All other interactive shells will only read .bashrc; this is particularly
    # important for language settings, see below.
    
    test -z "$PROFILEREAD" && . /etc/profile || true
    
    # User-specific settings
    module load aws/elastic-mapreduce-cli
    
  3. Create the configuration file for the Elastic Map Reduce command line tool
    Code Block
    
    ~>cat 
    YourUsername
    ~/.ssh/$USER-credentials.json
    
    {
    "access_id": "
    YourAccessKeyID
    YourAWSAccessKeyID",
    "private_key": "
    YourSecretAccessKey
    YourAWSSecretAccessKey",
    "keypair": "SageKeyPair",
    "key-pair-file": "/home/ndeflaux/
    $user
    .ssh/SageKeyPair.pem",
    "log_uri": "s3n://
    sagetest_YourUsername
    sagebio-YourUnixUsername/logs/",
    "region": "us-east-1"
    }
    
  4. Test that you can run it
    Code Block
    
    ~>elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --help
    Usage: elastic-mapreduce [options]
    
      Creating Job Flows
            --create                     Create a new job flow
            --name NAME                  The name of the job flow being created
            --alive                      Create a job flow that stays running even though it has executed all its steps
            --with-termination-protection
                                         Create a job with termination protection (default is no termination protection)
            --num-instances NUM          Number of instances in the job flow
    ...
    
  5. For less typing, you can make an alias to this command. If you use bash, you can put the following in your .bashrc:
    Code Block
    
    alias emr='elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json'
    

Other useful tools

s3curl

You can use the AWS Console to upload/download files to S3 but sometimes it is handy to do this from the command line too, and this tool will let you do that.

To set up your configuration file for the s3curl AWS tool installed on the shared servers (belltown, sodo, ballard, ...):

  1. ssh to belltown
  2. Create the configuration file for s3curl command line tool
    Code Block
    
    ~>cat ~/.ssh/s3curl
    #!/bin/perl
    %awsSecretAccessKeys = (
        YourUnixUsername => {
            id => 'YourAccessKeyID',
            key => 'YourSecretAccessKey',
        },
    );
    
  3. Make a symlink to it in your home directory
    Code Block
    ~>ln -s ~/.ssh/s3curl ~/.s3curl
  4. Test that you can run s3curl
    Code Block
    
    ~> chmod 600 /home/$USER/.s3curl
    ~>/work/platform/bin/s3curl.pl --id $USER https://s3.amazonaws.com/sagebio-$USER/ | head -c 200
    <?xml version="1.0" encoding="UTF-8"?>
    <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>sagebio-ndeflaux</Name><Prefix></Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><IsTruncated>
    

EC2 Command Line Tools

Add module load aws/ec2-api-tools/1.4.3.0 to ~/.profile and configure the tool as stated in the EC2 Getting Started Guide documentation.

Other software available on the shared servers

nano text editor

The nano editor is available on sodo/ballard/belltown/etc... and on the miami cluster. It does not use X windows. If you need a simple text editor and are not familiar with vi or emacs, nano is a good choice and installed by default on many linux systems.

Code Block

~>ssh ndeflaux@pegasus.ccs.miami.edu
ndeflaux@pegasus.ccs.miami.edu's password:
Last login: Thu May 19 18:59:51 2011 from dhcp149019.fhcrc.org


***********************************************************************
*                                                                     *
* Welcome to Pegasus Linux Cluster at CCS/University of Miami.        *
*                                                                     *
* ....
*                                                                     *
***********************************************************************

[ndeflaux@u01 ~]$ nano file.txt
[ndeflaux@u01 ~]$

Image Added

Where to go next?