Learning Never Stops: Apache Hive example

Once you have HDFS and Hadoop configured, HIVE is a data warehousing solution which runs above HDFS and Hadoop. I have considered the same input file and fired the HIVE queries , which inturn fires hadoop MapReduce jobs.

Following steps were done to install HIVE.

Assume you have hadoop installation up and running (described in earlier post)

Download the HIVE binaries from apache site.
UnZip the hive-0.9.0-bin.tar.gz into a directory
cd to the unzipped directory and fire following command
- export HIVE_HOME=$PWD
- export PATH=$PWD/bin:$PATH

Once all the above steps are done , we are ready to enter the HIVE shell. This shell will help us enter hive commands.

enter command :
- hive

Once you are in hive shell, you are ready to fire hivesql commands

Since in earlier post we had a csv file we will create a table for the same. This will create a hive table in which we will load the data. This data will be distributed over HDFS across all the nodes.

CREATE TABLE person (PERSON_ID INT, NAME STRING, FIRST_NAME STRING, LAST_NAME STRING, MIDDLE_NAMES STRING, TITLE STRING, STREET_ADDRESS STRING, CITY STRING, COUNTRY STRING, POST_CODE STRING, HOME_PHONE STRING, WORK_PHONE STRING, MOBILE_PHONE STRING, NI_NUMBER STRING, CREDITLIMIT STRING, CREDIT_CARD STRING, CREDITCARD_START_DATE STRING, CREDITCARD_END_DATE STRING, CREDITCARD_CVC STRING, DOB STRING) row format delimited fields terminated by ',';

Then we will load the data from the csv file.

load data inpath '<PATH_TO_FILE>/export.csv' overwrite into table person

Now we are ready to fire some HiveQL queries , which will call corresponding map-reduce jobs

select * from person where person_id=1;
select count(1) from person;
select * from person where name like '%Bob%'

Hive is making Map-Reduce programming job simpler by giving warehousing and SQL capabilities.

Peace.
Sanket Raut

Learning Never Stops

Thursday, June 7, 2012

Apache Hive example

1 comment:

About Me