Once you have HDFS and Hadoop
configured, HIVE is a data warehousing solution which runs above HDFS
and Hadoop. I have considered the same input file and fired the HIVE
queries , which inturn fires hadoop MapReduce jobs.
Following steps were done to install
HIVE.
- Assume you have hadoop installation up and running (described in earlier post)
- Download the HIVE binaries from apache site.
- UnZip the hive-0.9.0-bin.tar.gz into a directory
- cd to the unzipped directory and fire following command
- export HIVE_HOME=$PWD
- export PATH=$PWD/bin:$PATH
Once all the above steps are done , we
are ready to enter the HIVE shell. This shell will help us enter hive commands.
- enter command :
- hive
Once you are in hive shell, you are
ready to fire hivesql commands
Since in earlier post we had a csv file
we will create a table for the same. This will create a hive table in which we will load the data. This data will be distributed over HDFS across all the nodes.
- CREATE TABLE person (PERSON_ID INT, NAME STRING, FIRST_NAME STRING, LAST_NAME STRING, MIDDLE_NAMES STRING, TITLE STRING, STREET_ADDRESS STRING, CITY STRING, COUNTRY STRING, POST_CODE STRING, HOME_PHONE STRING, WORK_PHONE STRING, MOBILE_PHONE STRING, NI_NUMBER STRING, CREDITLIMIT STRING, CREDIT_CARD STRING, CREDITCARD_START_DATE STRING, CREDITCARD_END_DATE STRING, CREDITCARD_CVC STRING, DOB STRING) row format delimited fields terminated by ',';
Then we will load
the data from the csv file.
- load data inpath '<PATH_TO_FILE>/export.csv' overwrite into table person
Now we are ready to fire some HiveQL
queries , which will call corresponding map-reduce jobs
- select * from person where person_id=1;
- select count(1) from person;
- select * from person where name like '%Bob%'
Hive is making Map-Reduce programming job
simpler by giving warehousing and SQL capabilities.
Peace.
Sanket Raut
1 comment:
Post a Comment