Content Comparison

Section

Column

width	50%

On This page

Table of Contents

Column

width	5%

Column

width	45%

On Related Pages

Page Tree

root	SCICOMP:@self
startDepth	3

Note that the elastic map reduce client on the shared servers has moved. See the updated instructions below.

First-Time Setup

Sign up to AWS

...

Use the AWS console to create a new SSH key named SageKeyPair
Download it to ~/.ssh on the shared servers
ssh to sodobelltown

Fix the permissions on it

Code Block
~>chmod 600 ~/.ssh/SageKeyPair.pem mode of `/home/ndeflaux/.ssh/SageKeyPair.pem' retained as 0600 (rw-------)

...

Use the AWS console to make a new S3 bucket named sagebio-YourUnixUsername Note: Do not put any underscores in your bucket name. Only use hyphens, lowercase letters and numbers.
Make these five subdirectories
1. scripts
2. input
3. output
4. results
5. logs

Set up your config file for the AWS Elastic MapReduce command line tool installed on the shared servers

...

Get your AWS credentials

Get your security credentials from your AWS Account

Access Key ID
Secret Access Key

Set up the Elastic MapReduce command line

...

tool

Set up your configuration files for the Elastic MapReduce AWS tool installed on the shared servers (belltown, sodo, ballard, ...)

ssh to sodoto belltown

Edit your ~/.profile and add the line module load aws/elastic-mapreduce-cli to get the Elastic MapReduce command line tools into your PATH

Code Block


~>cat ~/.profile
# Sample .profile for SuSE Linux
#
# This file is read each time a login shell is started.
# All other interactive shells will only read .bashrc; this is particularly
# important for language settings, see below.

test -z "$PROFILEREAD" && . /etc/profile || true

# User-specific settings
module load aws/elastic-mapreduce-cli

Create the configuration file for the Elastic Map Reduce command line tool

Code Block

~>cat ~/.ssh/$USER-credentials.json
{
"access_id": "YourAWSAccessKeyID",
"private_key": "YourAWSSecretAccessKey",
"keypair": "SageKeyPair",
"key-pair-file": "/home/ndeflaux/.ssh/SageKeyPair.pem",
"log_uri": "s3n://sagebio-YourUnixUsername/logs/",
"region": "us-east-1"
}

Test that you can run it

Code Block

~>/work/platform/bin/elastic-mapreduce-cli/elastic~>elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --help
Usage: elastic-mapreduce [options]

  Creating Job Flows
        --create                     Create a new job flow
        --name NAME                  The name of the job flow being created
        --alive                      Create a job flow that stays running even though it has executed all its steps
        --with-termination-protection
                                     Create a job with termination protection (default is no termination protection)
        --num-instances NUM          Number of instances in the job flow
...

For less typing, you can make an alias to this command. If you use bash, you can put the following in your .bashrc:
Code Block
alias emr='/work/platform/bin/elastic-mapreduce-cli/elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json'

...

To set up your configuration file for the s3curl AWS tool installed on the shared servers (belltown, sodo, ballard, ...):

ssh to sodobelltown

Create the configuration file for s3curl command line tool

Code Block
~>cat ~/.ssh/s3curl #!/bin/perl %awsSecretAccessKeys = ( YourUnixUsername => { id => 'YourAccessKeyID', key => 'YourSecretAccessKey', }, );

Make a symlink to it in your home directory
Code Block
~>ln -s ~/.ssh/s3curl ~/.s3curl

Test that you can run s3curl

Code Block


~> chmod 600 /home/$USER/.s3curl
~>/work/platform/bin/s3curl.pl --id $USER https://s3.amazonaws.com/sagebio-$USER/ | head -c 200
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>sagebio-ndeflaux</Name><Prefix></Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><IsTruncated>

EC2 Command Line Tools

Add module load aws/ec2-api-tools/1.4.3.0 to ~/.profile and configure the tool as stated in the EC2 Getting Started Guide documentation.

Other software available on the shared servers

nano text editor

The nano editor is available on sodo/ballard/belltown/etc... and on the miami cluster. It does not use X windows. If you need a simple text editor and are not familiar with vi or emacs, nano is a good choice and installed by default on many linux systems.

Code Block

~>ssh ndeflaux@pegasus.ccs.miami.edu
ndeflaux@pegasus.ccs.miami.edu's password:

Last login: Thu May 19 18:59:51 2011 from dhcp149019.fhcrc.org


***********************************************************************
*                                                                     *
* Welcome to Pegasus Linux Cluster at CCS/University of Miami.        *
*                                                                     *

* ....
*                                                                     *
***********************************************************************

[ndeflaux@u01 ~]$ nano file.txt
[ndeflaux@u01 ~]$

Where to go next?

Try A Simple Example of an R MapReduce Job Getting Started with R on Elastic Map Reduce
Take a look at the AWS documentation for Elastic MapReduce

Version	Old Version 16	New Version Current
Changes made by	Nicole Deflaux (Unlicensed)	Bruce Hoff
Saved on	Jun 30, 2011	Nov 22, 2021

Content Comparison

Versions Compared

Key

First-Time Setup

Sign up to AWS

Set up your config file for the AWS Elastic MapReduce command line tool installed on the shared servers

Get your AWS credentials

Set up the Elastic MapReduce command line

tool

EC2 Command Line Tools

Other software available on the shared servers

nano text editor

Where to go next?