- Start your map reduce cluster, when you are trying out new jobs for the first time, specifying
will keep your hosts alive as you work through the any bugs. But in general you do not want to run jobs with--alive
because you'll need to remember to explicitly shut the hosts down when the job is done.Code Block ~/WordCount>/work/platform/bin/elastic-mapreduce-cli/elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --create --master-instance-type=m1.small \ --slave-instance-type=m1.small --num-instances=3 --enable-debugging --bootstrap-action s3://sagebio-$USER/scripts/bootstrapLatestR.sh --name RWordCount --alive Created job flow j-1H8GKG5L6WAB4 ~/WordCount>/work/platform/bin/elastic-mapreduce-cli/elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --list j-1H8GKG5L6WAB4 STARTING RWordCount PENDING Setup Hadoop Debugging
- Look around on the AWS Console:
- See your new job listed in the Elastic MapReduce tab
- See the individual hosts listed in the EC2 tab
- Create your job step file
Code Block ~/WordCount>cat wordCount.json [ { "Name": "R Word Count MapReduce Step 1: small input file", "ActionOnFailure": "CANCEL_AND_WAIT", "HadoopJarStep": { "Jar": "/home/hadoop/contrib/streaming/hadoop-streaming.jar", "Args": [ "-input","s3n://sagebio-ndeflaux/input/AnInputFile.txt", "-output","s3n://sagebio-ndeflaux/output/wordCountTry1", "-mapper","s3n://sagebio-ndeflaux/scripts/mapper.R", "-reducer","s3n://sagebio-ndeflaux/scripts/reducer.R", ] } }, { "Name": "R Word Count MapReduce Step 2: lots of input", "ActionOnFailure": "CANCEL_AND_WAIT", "HadoopJarStep": { "Jar": "/home/hadoop/contrib/streaming/hadoop-streaming.jar", "Args": [ "-input","s3://elasticmapreduce/samples/wordcount/input", "-output","s3n://sagebio-ndeflaux/output/wordCountTry2", "-mapper","s3n://sagebio-ndeflaux/scripts/mapper.R", "-reducer","s3n://sagebio-ndeflaux/scripts/reducer.R", ] } } ]
- Add the steps to your jobflow
Code Block ~/WordCount>/work/platform/bin/elastic-mapreduce-cli/elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --json wordCount.json --jobflow j-1H8GKG5L6WAB4 --json wordCount.json Added jobflow steps
What next?