Section | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Computing squares in R
...
The following script will download and install the latest version of R on each of your Elastic MapReduce hosts. (The default version of R is very old.)
Name this Download script bootstrapLatestR.sh and it should contain the following code:
Iframe | ||||
---|---|---|---|---|
|
...
What is going on in this script? Iframe
...
The following script will download and install several packages needed for RHadoop.
Name this Download script bootstrapRHadoop.sh and it should contain the following code:
Iframe | ||||
---|---|---|---|---|
|
...
Upload your scripts to S3
...
Code Block |
---|
~>elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --create \ --master-instance-type=m1.small --slave-instance-type=m1.small \ --num-instances=1 --enable-debugging \ --bootstrap-action s3://sagebio-$USER/scripts/bootstrapLatestR.sh \ --bootstrap-action s3://sagebio-ndeflaux$USER$USER/scripts/bootstrapRHadoop.sh \ --name rmrTry1 --alive Created job flow j-79VXH9Z07ECL |
...
Code Block |
---|
~>elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --ssh --jobflow j-79VXH9Z07ECL
ssh -i /home/ndeflaux/.ssh/SageKeyPair.pem hadoop@ec2-107-20-44-27.compute-1.amazonaws.com
Linux domU-12-31-39-04-08-C8 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36 EST 2008 i686
--------------------------------------------------------------------------------
Welcome to Amazon Elastic MapReduce running Hadoop and Debian/Lenny.
Hadoop is installed in /home/hadoop. Log files are in /mnt/var/log/hadoop. Check
/mnt/var/log/hadoop/steps for diagnosing step failures.
The Hadoop UI can be accessed via the following commands:
JobTracker lynx http://localhost:9100/
NameNode lynx http://localhost:9101/
--------------------------------------------------------------------------------
hadoop@domU-12-31-39-04-08-C8:~$
|
Set JAVA_HOME and start R
Code Block |
---|
hadoop@ip-10-114-89-121:/mnt/var/log/bootstrap-actions$ export JAVA_HOME=/usr/lib/jvm/java-6-sun/jre
hadoop@ip-10-114-89-121:/mnt/var/log/bootstrap-actions$ R
R version 2.14.0 (2011-10-31)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i486-pc-linux-gnu (32-bit)
|
Initialize RHadoop
Code Block |
---|
> Sys.setenv(HADOOP_HOME="/home/hadoop", HADOOP_CONF="/home/hadoop/conf", JAVA_HOME="/usr/lib/jvm/java-6-sun/jre"); library(rmr); library(rhdfs); hdfs.init();
Loading required package: RJSONIO
Loading required package: itertools
Loading required package: iterators
Loading required package: digest
Loading required package: rJava
|
...
Code Block |
---|
> from.dfs(out)
[[1]]
[[1]]$key
[1] 1
[[1]]$val
[1] 1
attr(,"keyval")
[1] TRUE
[[2]]
[[2]]$key
[1] 2
[[2]]$val
[1] 4
attr(,"keyval")
[1] TRUE
[[3]]
[[3]]$key
[1] 3
[[3]]$val
[1] 9
attr(,"keyval")
[1] TRUE
[[4]]
[[4]]$key
[1] 4
[[4]]$val
[1] 16
attr(,"keyval")
[1] TRUE
[[5]]
[[5]]$key
[1] 5
[[5]]$val
[1] 25
attr(,"keyval")
[1] TRUE
[[6]]
[[6]]$key
[1] 6
[[6]]$val
[1] 36
attr(,"keyval")
[1] TRUE
[[7]]
[[7]]$key
[1] 7
[[7]]$val
[1] 49
attr(,"keyval")
[1] TRUE
[[8]]
[[8]]$key
[1] 8
[[8]]$val
[1] 64
attr(,"keyval")
[1] TRUE
[[9]]
[[9]]$key
[1] 9
[[9]]$val
[1] 81
attr(,"keyval")
[1] TRUE
[[10]]
[[10]]$key
[1] 10
[[10]]$val
[1] 100
attr(,"keyval")
[1] TRUE
|
...
Quit r, exit ssh, and stop the cluster:
Code Block |
---|
> q()Save workspace image? [y/n/c]: n hadoop@ip-10-114-89-121:/mnt/var/log/bootstrap-actions$ exit logout Connection to ec2-107-20-108-57.compute-1.amazonaws.com closed. ~>elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --terminate --jobflow j-79VXH9Z07ECL Terminated job flow j-79VXH9Z07ECL |
What next?
- Try the more complicated examples such as Logistic Regression and K-means in https://github.com/RevolutionAnalytics/RHadoop/wiki/Tutorial.
- Take a look at the Elastic MapReduce FAQ for how to SCP files to the Hadoop master host.
- Take a look at the other Computation Examples