Hadoop
Single Node Setup
This blog describes how
to setup and configure Hadoop 1.2.1 single node installation on Linux platform.
GNU/Linux is supported as development and production platform.
Required Software
:
Java 1.6.x must be
installed and JAVA_HOME must be set.
ssh must be
installed and sshd must be running. To install ssh on Ubuntu
Linux use below command
$ sudo apt-get install openssh-server
Installing
Hadoop :
1. Download Hadoop1.2.1
distribution from Apache hadoop download mirror, Please click given link http://hadoop.apache.org/releases.html
2. Extract distribution
in to local folder, called hadoop-1.2.1 will be pointing to this directory.
3. By using this command you can extract the downloaded tar file >> tar -xvf hadoop-1.2.1.tar.gz
4. From extracted hadoop-1.2.3 folder, edit the file conf/hadoop-env.sh to define JAVA_HOME as below –
# The java implementation to use. Required.
export JAVA_HOME=/usr/home/java/jdk1.7.0_03 (path of your JAVA_HOME)
# The java implementation to use. Required.
export JAVA_HOME=/usr/home/java/jdk1.7.0_03 (path of your JAVA_HOME)
Now
we are ready to start Hadoop cluster in Local (Standalone) Mode.
Local (Standalone) Mode
– In this mode Hadoop run as non-distribution mode as single java process.
Hadoop will run on local machine without any cluster environment. We can test
Local (Standalone) hadoop mode by following example –
$ bin/hadoop jar hadoop-examples-*.jar wordcount /home/input/test.txt /home/output
by excuting above line,
In home path you will get output folder and inside that output folder your
actual wordcount programm output will be there (based on your input file)
NOTE:
You must have some input file in "/home/input/" path. File name can
be anything but in this example, I have given file name as "text.txt"
and if your file name different means please change in above example also.
We will continue with
seudo-Distributed Mode setup with following steps –
5. From extracted
HADOOP_HOME folder, edit the file conf/core-site.xml as below to define HDFS
file system –
<configuration>
<property>
<name>fs.default.name</name> // providing name for file system.
<value>hdfs://localhost:9000</value> // hdfs running in
given port number.
</property>
</configuration>
6. From extracted
HADOOP_HOME folder, edit the file conf/hdfs-site.xml to define replication
value –
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value> // by
default your file replicated at once in HDFS. If you want more replication, You can change the value as your
wish.
</property>
</configuration>
7. From extracted
HADOOP_HOME folder, edit the file conf/mapred-site.xml to define job tracker
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
SSH
Configuration:
Hadoop node should able
to ssh to localhost without any passphrase for data communication. To achieve
this execute following commands –
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Now try with – $ ssh
localhost
Now you should able to
communicate with localhost without any passphrase.
Testing
Hadoop :
1. During initial setup
we will have to format namenoade. Remember this is one time activity and should
not do during each start. From extracted HADOOP_HOME folder, execute following
command from terminal –
$ bin/hadoop namenode
-format
2. Start Hadoop daemons
–
$ bin/start-all.sh
If everything goes well
then by executing ‘jps’ command you should get following output on console –
$ jps
NameNode
SecondaryNameNode
DataNode
JobTracker
TaskTracker
SecondaryNameNode
DataNode
JobTracker
TaskTracker
Please
share this content if you like !!! Feel free to ask questions!!!