Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Start your map reduce cluster, when you are trying out new jobs for the first time, specifying --alive will keep your hosts alive as you work through the any bugs. But in general you do not want to run jobs with --alive because you'll need to remember to explicitly shut the hosts down when the job is done.
    Code Block
    ~/WordCount>/work/platform/bin/elastic-mapreduce-cli/elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --create --master-instance-type=m1.small \
    --slave-instance-type=m1.small --num-instances=3 --enable-debugging --bootstrap-action s3://sagebio-$USER/scripts/bootstrapLatestR.sh --name RWordCount --alive
    
    Created job flow j-1H8GKG5L6WAB4
    
    ~/WordCount>/work/platform/bin/elastic-mapreduce-cli/elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --list
    j-1H8GKG5L6WAB4     STARTING                                                         RWordCount
       PENDING        Setup Hadoop Debugging   
    
  2. Look around on the AWS Console:
    • See your new job listed in the Elastic MapReduce tab
    • See the individual hosts listed in the EC2 tab
  3. Create your job step file
    Code Block
    ~/WordCount>cat wordCount.json
    [
      {
        "Name": "R Word Count MapReduce Step 1: small input file",
        "ActionOnFailure": "CANCEL_AND_WAIT",
        "HadoopJarStep": {
           "Jar":
               "/home/hadoop/contrib/streaming/hadoop-streaming.jar",
                 "Args": [
                     "-input","s3n://sagebio-ndeflaux/input/AnInputFile.txt",
                     "-output","s3n://sagebio-ndeflaux/output/wordCountTry1",
                     "-mapper","s3n://sagebio-ndeflaux/scripts/mapper.R",
                     "-reducer","s3n://sagebio-ndeflaux/scripts/reducer.R",
                 ]
             }
      },
      {
        "Name": "R Word Count MapReduce Step 2: lots of input",
        "ActionOnFailure": "CANCEL_AND_WAIT",
        "HadoopJarStep": {
           "Jar":
               "/home/hadoop/contrib/streaming/hadoop-streaming.jar",
                 "Args": [
                     "-input","s3://elasticmapreduce/samples/wordcount/input",
                     "-output","s3n://sagebio-ndeflaux/output/wordCountTry2",
                     "-mapper","s3n://sagebio-ndeflaux/scripts/mapper.R",
                     "-reducer","s3n://sagebio-ndeflaux/scripts/reducer.R",
                 ]
             }
      }
    ]
    
  4. Add the steps to your jobflow
    Code Block
    ~/WordCount>/work/platform/bin/elastic-mapreduce-cli/elastic-mapreduce --credentials ~/.ssh/$USER-credentials.json --json wordCount.json --jobflow j-1H8GKG5L6WAB4 --json wordCount.json
    Added jobflow steps
    

What next?

...