TimeLinux1

Sunday, December 28, 2014

Installing Standalone Hadoop on Ubuntu 14.04


Today we discuss the process of Installing Standalone Hadoop on Ubuntu 14.04.

Here is the short of it.

Pre-requisites:

1- Java binaries  - Hadoop requires Java to run its process
2- Hadoop Tarball - Thats a no brainer, you need hadoop to run hadoop. :)
Note: For standalone hadoop (one node only), ssh is not mandatory.

To get Java, you can rely on the good old Open JDK (like I did myself) or you can get the 'official' java from https://www.oracle.com/java/index.html.
In my case, long time ago for some other work, I had installed openjdk (the un-official but opensource version of Java) version 7 (or java 1.7 as some like to say) according to the instructions from this page - http://openjdk.java.net/install/
We will come back to this in the configuration part later in the discussion.

As for Hadoop, you can simply download it from Apache site. I downloaded from http://www.apache.org/dyn/closer.cgi/hadoop/common/ and downloaded the (slightly older) version 1.2.1 tarball (hadoop-1.2.1.tar.gz).

The first thing to know is that you DONT need to be root to install ubuntu (as long as you have sudo).
So I was logged in as my own not-root user (mrinal)



Then I created a directory under my home directory where I moved the hadoop tarball I downloaded earlier from Apache mirror (see above) and unpacked it using the following command:

mrinal@ms-dell:~/hadoop$ mv  /home/mrinal/Downloads/hadoop-1.2.1.tar.gz   /home/mrinal/hadoop/
mrinal@ms-dell:~/hadoop$ tar xvzf hadoop-1.2.1.tar.gz

This creates a new directory in the same place and that becomes my hadoop install location.


Now, refering back to the java install process from above, in ubuntu, the java is installed by apt-get installer process under /usr/lib/jvm folder. Sometimes there may be previous older preinstalled versions of java present but you can ignore them--we are interested in the version 7.

Then I set my environment variables in my .bashrc file (for subsequent logins to be set properly). The .bashrc file resides in my home directory (in my case /home/mrinal) as follows

mrinal@ms-dell:~/hadoop$ vim ~/.bashrc 

and at the end of the file enter these lines:

Note: It is crucial that the PATH env variable includes the JAVA_HOME/bin & HADOOP_INSTALL/bin directories.

For Hadoop in distributed mode in a cluster, ssh service is required but in our case being a standalone mode, ssh is not mandatory. Nevertheless its good to know that in Ubuntu (and Linux in general) ssh is preinstalled.

The last thing before hadoop config checks is to set JAVA_HOME variable in the hadoop-env.sh file.
This file resides in the $HADOOP_INSTALL/conf directory (/home/mrinal/hadoop/hadoop-1.2.1/conf in my case). If you dont set this variable, you would get weird errors like "localhost: Error: JAVA_HOME is not set." when you try to start / stop / do anything with your hadoop.

Anyhow, once you are done with all the above steps, you should be able to do basic config checks to see if your hadoop is installed and functioning ok..

mrinal@ms-dell:~$ echo $JAVA_HOME 
/usr/lib/jvm/java-7-openjdk-amd64/
mrinal@ms-dell:~$ echo $HADOOP_INSTALL
/home/mrinal/hadoop/hadoop-1.2.1
mrinal@ms-dell:~$ java -version
java version "1.7.0_55"
OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
mrinal@ms-dell:~$ hadoop version
Hadoop 1.2.1
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152
Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013
From source with checksum 6923c86528809c4e7e6f493b6b413a9a
This command was run using /home/mrinal/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar
mrinal@ms-dell:~$ start-dfs.sh 
starting namenode, logging to /home/mrinal/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-mrinal-namenode-ms-dell.out
mrinal@localhost's password:
localhost: starting datanode, logging to /home/mrinal/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-mrinal-datanode-ms-dell.out
mrinal@localhost's password:
localhost: starting secondarynamenode, logging to /home/mrinal/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-mrinal-secondarynamenode-ms-dell.out
mrinal@ms-dell:~$ start-mapred.sh
starting jobtracker, logging to /home/mrinal/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-mrinal-jobtracker-ms-dell.out
mrinal@localhost's password:
localhost: starting tasktracker, logging to /home/mrinal/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-mrinal-tasktracker-ms-dell.out
mrinal@ms-dell:~$





3 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Best Hadoop Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Best hadoop training institute in chennai). By the way you are running a great blog. Thanks for sharing this.

    Big Data Course in Chennai | Big Data Training Chennai

    ReplyDelete
  2. I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
    Regards,
    Python Training in Chennai|Informatica training in chennai|Python Training Institutes in Chennai

    ReplyDelete
  3. Thanks for sharing this niche useful informative post to our knowledge, Actually SAP is ERP software that can be used in many companies for their day to day business activities it has great scope in future.
    Regards,
    SAP training|SAP institutes in chennai|SAP Institutes in Chennai|sap training institute in Chennai

    ReplyDelete