...
- Start your map reduce cluster, when you are trying out new jobs for the first time, specifying
--alive
will keep your hosts alive as you work through the any bugs. But in general you do not want to run jobs with--alive
because you'll need to remember to explicitly shut the hosts down when the job is done.Code Block ~/WordCount>/work/platform/bin/elastic-mapreduce-cli/elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --create --master-instance-type=m1.small \ --slave-instance-type=m1.small --num-instances=3 --enable-debugging --bootstrap-action s3://sagebio-$USER/scripts/bootstrapLatestR.sh --name RWordCount --alive Created job flow j-1H8GKG5L6WAB4 ~/WordCount>/work/platform/bin/elastic-mapreduce-cli/elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --list j-1H8GKG5L6WAB4 STARTING RWordCount PENDING Setup Hadoop Debugging
- Look around on the AWS Console:
- See your new job listed in the Elastic MapReduce tab
- See the individual hosts listed in the EC2 tab
- Create your job step file
Code Block ~/WordCount>cat wordCount.json [ { "Name": "R Word Count MapReduce Step 1: small input file", "ActionOnFailure": "CANCEL_AND_WAIT", "HadoopJarStep": { "Jar": "/home/hadoop/contrib/streaming/hadoop-streaming.jar", "Args": [ "-input","s3n://sagebio-ndeflaux/input/AnInputFile.txt", "-output","s3n://sagebio-ndeflaux/output/wordCountTry1", "-mapper","s3n://sagebio-ndeflaux/scripts/mapper.R", "-reducer","s3n://sagebio-ndeflaux/scripts/reducer.R", ] } }, { "Name": "R Word Count MapReduce Step 2: lots of input", "ActionOnFailure": "CANCEL_AND_WAIT", "HadoopJarStep": { "Jar": "/home/hadoop/contrib/streaming/hadoop-streaming.jar", "Args": [ "-input","s3://elasticmapreduce/samples/wordcount/input", "-output","s3n://sagebio-ndeflaux/output/wordCountTry2", "-mapper","s3n://sagebio-ndeflaux/scripts/mapper.R", "-reducer","s3n://sagebio-ndeflaux/scripts/reducer.R", ] } } ]
- Add the steps to your jobflow
Code Block ~/WordCount>/work/platform/bin/elastic-mapreduce-cli/elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --json wordCount.json --jobflow j-1H8GKG5L6WAB4 --json wordCount.json Added jobflow steps
What next?
...