Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Once the cluster is active SSH into the master node using the prod-key-pair (see: AccessingInstancesLinux for more information).  Note: the user name must be "hadoop":

Code Block
ssh -i prod-key-pair.pem hadoop@ec2-54-242-184-13.compute-1.amazonaws.com

...

Code Block
hadoop@ip-10-28-72-37:~$ hive

Creating Hive Tables

Once you have an interactive Hive session with the master node of the cluster you are ready to setup the tables that will be used for analysis.  First we must create the external table for access record CSV data in S3:

Code Block
linenumberstrue
CREATE EXTERNAL TABLE access_record_s3 ( 
returnObjectId string,
elapseMS int,
timestamp int,
via string,
host string,
threadId int,
userAgent string,
queryString string,
sessionId string,
xForwardedFor string,
requestURL string,
userId int,
origin string,
date string,
method string,
vmId string,
instance string,
stack string,
success string
 )
PARTITIONED BY ( datep string )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
LOCATION 's3://prod.access.record.sagebase.org/000000012';

In this example,  we created a table using prod-12 data as the source for the table (see: line:24 LOCATION 's3://prod.access.record.sagebase.org/000000009').  Make sure to set the location to the stack data you want to analyses.