Setting up Hadoop

Setting up Hadoop 2.4 and Pig 0.12 on OSX locally

This is first of many blog posts to come from our dev bootcamp. Often times you want to test your scripts and run code locally before you hit the push button. We want to share  our findings that we think will be helpful to the wider world as we run around the web figuring out how to do things ourselves.

Setting up Hadoop, Hive and Pig can be a hassle on your macbook pro. Here are the steps that worked for us.

1) First install brew, the easiest and safest way to install and manage many kinds of packages

2) Next make sure java 1.7 or later is installed – For OSX 10.9+ you can download it from here

3) set JAVA_HOME, add in your bashrc for future use

$ export JAVA_HOME=$(/usr/libexec/java_home)

4) Install hadoop with brew, as of this writing it will download and install 2.4.1

$ brew install hadoop

5) To make hadoop work on a single node cluster you have to go through several steps outlined here, here are the steps in brief

a) setup ssh to connect to localhost without login

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/ >> ~/.ssh/authorized_keys

b) test being able to login, if you are not able to you have to turn on Remote Login in System Preferences -> Sharing

$ ssh localhost

c)  brew installs Hadoop usually in /usr/local/Cellar/hadoop/

$ cd /usr/local/Cellar/hadoop/2.4.1

d) edit following config files in directory /usr/local/Cellar/hadoop/2.4.1/libexec/etc/hadoop

$vi hdfs-site.xml


$vi core-site.xml


$vi mapred-site.xml


$vi yarn-site.xml


6) format and start HDFS and Yarn

$ cd /usr/local/Cellar/hadoop/2.4.1
$ ./bin/hdfs namenode -format
$ ./sbin/
$ ./bin/hdfs dfs -mkdir /user
$ ./bin/hdfs dfs -mkdir /user/<username>
$ ./sbin/

7) test examples code that came with the hadoop version

$ ./bin/hdfs dfs -put libexec/etc/hadoop input
$ ./bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
$ ./bin/hdfs dfs -get output output
$ cat output/*

8) remove tmp files

$ ./bin/hdfs dfs -rmr /users//input
$ ./bin/hdfs dfs -rmr /users//ouput
$ rm -rf output/

9) stop HDFS and Yarn after you are done

$ ./sbin/
$ ./sbin/

10) Add HADOOP_HOME and CONFIG to bashrc for future use

$ export HADOOP_HOME=/usr/local/Cellar/hadoop/2.4.1
$ export HADOOP_CONF_DIR=$HADOOP_HOME/libexec/etc/hadoop

11) Install PIG but the current Formula in brew is not compatible with Hadoop 2.4.1 and you will see errors. You can use this one instead, ht akiatoji

$ brew install ant
$ brew install

12) Add PIG to your bashrc for future use

$ export PIG_HOME=/usr/local/Cellar/pig/0.12.0

Yay, you should be all set !! Enjoy hadooping and take a swig 🙂