Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Section
Column
width50%

On This page

Table of Contents
Column
width5%

Column
width45%

On Related Pages

Page Tree
rootSCICOMP:@parent
startDepth3

Computing squares in R

...

The following script will download and install the latest version of R on each of your Elastic MapReduce hosts. (The default version of R is very old.)

Name this Download script bootstrapLatestR.sh and it should contain the following code:

Iframe
srchttp://sagebionetworks.jira.com/source/browse/~raw,r=HEAD/PLFM/users/deflaux/scripts/EMR/rWordCountExample/bootstrapLatestR.sh
styleheight:250px;width:80%;

...

IframeWhat is going on in this script?

...

The following script will download and install several packages needed for RHadoop.

Name this Download script bootstrapRHadoop.sh and it should contain the following code:

Iframe
srchttp://sagebionetworks.jira.com/source/browse/~raw,r=HEAD/PLFM/users/deflaux/scripts/EMR/rmrExample/bootstrapRHadoop.sh
styleheight:250px;width:80%;

...

Upload your scripts to S3

...

Code Block
~>elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --create \
--master-instance-type=m1.small --slave-instance-type=m1.small \
--num-instances=1 --enable-debugging \
--bootstrap-action s3://sagebio-$USER/scripts/bootstrapLatestR.sh \
--bootstrap-action s3://sagebio-ndeflaux$USER$USER/scripts/bootstrapRHadoop.sh \
--name rmrTry1 --alive

Created job flow j-79VXH9Z07ECL

...

Code Block
~>elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --ssh --jobflow j-79VXH9Z07ECL
ssh -i /home/ndeflaux/.ssh/SageKeyPair.pem hadoop@ec2-107-20-44-27.compute-1.amazonaws.com 
Linux domU-12-31-39-04-08-C8 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36 EST 2008 i686
--------------------------------------------------------------------------------

Welcome to Amazon Elastic MapReduce running Hadoop and Debian/Lenny.
 
Hadoop is installed in /home/hadoop. Log files are in /mnt/var/log/hadoop. Check
/mnt/var/log/hadoop/steps for diagnosing step failures.

The Hadoop UI can be accessed via the following commands: 

  JobTracker    lynx http://localhost:9100/
  NameNode      lynx http://localhost:9101/
 
--------------------------------------------------------------------------------
hadoop@domU-12-31-39-04-08-C8:~$ 

Set JAVA_HOME and start R

Code Block

hadoop@ip-10-114-89-121:/mnt/var/log/bootstrap-actions$ export JAVA_HOME=/usr/lib/jvm/java-6-sun/jre
hadoop@ip-10-114-89-121:/mnt/var/log/bootstrap-actions$ R

R version 2.14.0 (2011-10-31)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i486-pc-linux-gnu (32-bit)

Initialize RHadoop

Code Block
> Sys.setenv(HADOOP_HOME="/home/hadoop", HADOOP_CONF="/home/hadoop/conf", JAVA_HOME="/usr/lib/jvm/java-6-sun/jre"); library(rmr); library(rhdfs);  hdfs.init();

Loading required package: RJSONIO
Loading required package: itertools
Loading required package: iterators
Loading required package: digest
Loading required package: rJava

...

Code Block
> from.dfs(out)

[[1]]
[[1]]$key
[1] 1

[[1]]$val
[1] 1

attr(,"keyval")
[1] TRUE

[[2]]
[[2]]$key
[1] 2

[[2]]$val
[1] 4

attr(,"keyval")
[1] TRUE
[[3]]
[[3]]$key
[1] 3

[[3]]$val
[1] 9

attr(,"keyval")
[1] TRUE

[[4]]
[[4]]$key
[1] 4

[[4]]$val
[1] 16

attr(,"keyval")
[1] TRUE

[[5]]
[[5]]$key
[1] 5

[[5]]$val
[1] 25

attr(,"keyval")
[1] TRUE

[[6]]
[[6]]$key
[1] 6

[[6]]$val
[1] 36

attr(,"keyval")
[1] TRUE
[[7]]
[[7]]$key
[1] 7

[[7]]$val
[1] 49

attr(,"keyval")
[1] TRUE

[[8]]
[[8]]$key
[1] 8

[[8]]$val
[1] 64

attr(,"keyval")
[1] TRUE

[[9]]
[[9]]$key
[1] 9

[[9]]$val
[1] 81

attr(,"keyval")
[1] TRUE

[[10]]
[[10]]$key
[1] 10

[[10]]$val
[1] 100

attr(,"keyval")
[1] TRUE    

...

Quit r, exit ssh, and stop the cluster:

Code Block
> q()Save workspace image? [y/n/c]: n
hadoop@ip-10-114-89-121:/mnt/var/log/bootstrap-actions$ exit
logout
Connection to ec2-107-20-108-57.compute-1.amazonaws.com closed.
~>elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --terminate --jobflow j-79VXH9Z07ECL
Terminated job flow j-79VXH9Z07ECL

What next?