Stop the Hollyweb! No DRM in HTML5.   

Sunday, January 8, 2012

Correcting Hadoop's HDFS Errors

This past Christmas and New Year, like the last three years now, I accompanied my wife to Bogota, Colombia for the holidays. The difference about this trip was; our pet Mickey was going with us, we had no side trips planned, and I was taking my new Lenovo T420 with me. We arrived in Bogota late at night on December, 22nd, and after being greeted by family, we took a cab to my mother-in-laws house. There’s something both exciting and terrifying about cab rides in Bogota, but after time, for me at least, it’s just fun.

I woke up the next morning well rested and we began our Christmas vacation. I began setting up a new wireless router that I had brought with me so that I could work remotely from wherever was most comfortable. After having installed the router, I connected with my new laptop and set my priorities for the task that I needed to complete. Was the typical task, update my time, respond to some emails, finalize peer reviews, and check up on database backups. After these tasks were completed, I had time to relax.

What was planned for us was day to day, but mostly we would go out to have lunch or dinner with friends and family, then return home. On a few nights, we engaged in consuming heavy amounts of adult beverages and dancing which is the custom in Colombia. Most of our time however, was spent at home. I would wake up early, before anyone else, and sit at the dining room table, next to the window with my laptop and watch the sun rise over the mountains enjoying some fresh Colombian coffee.

With nothing to do so early in the morning, I decided to log onto my desktop back in DC. When I connected to my desktop, I saw that I had left open a ssh connection to a small Hadoop cluster I had setup to do some testing, except HDFS was not working properly. This was the perfect opportunity to find out what went wrong in my install and configuration of the cluster. The only catch was, is that I would have do everything from the command shell, no gui. I had followed Michael Noll’s “Running Hadoop on Ubuntu”,but now, I was getting errors in the namenode logs.

I was getting errors. In Michael Noll’s how-to, he describes how he addressed this error by reformatting the cluster. He described how he stopped all running daemons and deleted the /app/hadoop/tmp/hdf/name/data directory and then ran bin/hadoop namenode –format . Somewhere in my troubleshooting my errors and researching online, I found that it was also a good idea to add the following properties to the hdfs-site.xml configuration file.

< !-- Adding 1/1/2012. -- >
< property >
< name > < /name >
< value > /app/hadoop/tmp/dfs/name/data < /value >
< final > true < /final >
< /property >
< property >
< name > < /name >
< value > /app/hadoop/tmp/dfs/name < /value >
< final > true < /final >
< /property >

Also, if you get permission denied (publickey,password), you may want to check that the paths for the properties you added to the hdfs-site.xml file are correct. If this problem persist, you might try running the following on all nodes;

sudo chown –R hduser:hadoop /app/hadoop

Some of the other errors that I ran into were as follows;

Cannot lock storage /app/hadoop/tmp/dfs/name. The directory is already locked.
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All storage directories are inaccessible.
ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Problem binding to Address already in use

I did not do a good job at keeping notes during my troubleshooting these issues, but it seemed that whenever I would try to fix one thing, a different error would pop up. I did find that I had made an typo in the hdfs-site.xml file. At the end of each path, I had added a /. Therefore, instead of /app/hadoop/tmp/dfs/name, I had /app/hadoop/tmp/dfs/name/. But once I corrected that and delete all data in the HDFS directory and then ran format, everything worked! So here is how that went.

After stopping all daemons and correcting the paths in the hdfs-site.xml file, I then deleted all data in the HDFS directory on all nodes.

hduser@bigdata1:/app/hadoop/tmp/dfs$">hduser@bigdata1:/app/hadoop/tmp/dfs$ sudo rm -rf *

Then, I ran the format.

hduser@bigdata1:/usr/local/hadoop/hadoop$ bin/hadoop namenode –format

The output looks like;

12/01/04 07:59:15 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = bigdata1/
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
12/01/04 07:59:15 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop,adm,dialout,fax,cdrom,floppy,tape,audio,dip,video,plugdev,fuse,lpadmin,netdev,admin,sambashare
12/01/04 07:59:15 INFO namenode.FSNamesystem: supergroup=supergroup
12/01/04 07:59:15 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/01/04 07:59:15 INFO common.Storage: Image file of size 96 saved in 0 seconds.
12/01/04 07:59:15 INFO common.Storage: Storage directory /app/hadoop/tmp/dfs/name has been successfully formatted.
12/01/04 07:59:15 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at bigdata1/

Next, I started HDFS.

hduser@bigdata1:/usr/local/hadoop/hadoop$ bin/

It’s output was;

starting namenode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hduser-namenode-bigdata1.out
bigdata2: starting datanode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hduser-datanode-bigdata2.out
bigdata3: starting datanode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hduser-datanode-bigdata3.out
bigdata4: starting datanode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hduser-datanode-bigdata4.out
bigdata1: starting secondarynamenode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-bigdata1.out

I ran the HDFS Admin Report to see the status of my cluster.

hduser@bigdata1:/usr/local/hadoop/hadoop$ bin/hadoop dfsadmin –report

The report displays the following;

Configured Capacity: 206701436928 (192.51 GB)
Present Capacity: 186873368576 (174.04 GB)
DFS Remaining: 186873294848 (174.04 GB)
DFS Used: 73728 (72 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

Datanodes available: 3 (3 total, 0 dead)

Decommission Status : Normal
Configured Capacity: 68900478976 (64.17 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 6721228800 (6.26 GB)
DFS Remaining: 62179225600(57.91 GB)
DFS Used%: 0%
DFS Remaining%: 90.24%
Last contact: Wed Jan 04 08:00:20 EST 2012

Decommission Status : Normal
Configured Capacity: 68900478976 (64.17 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 6732939264 (6.27 GB)
DFS Remaining: 62167515136(57.9 GB)
DFS Used%: 0%
DFS Remaining%: 90.23%
Last contact: Wed Jan 04 08:00:20 EST 2012

Decommission Status : Normal
Configured Capacity: 68900478976 (64.17 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 6373900288 (5.94 GB)
DFS Remaining: 62526554112(58.23 GB)
DFS Used%: 0%
DFS Remaining%: 90.75%
Last contact: Wed Jan 04 08:00:17 EST 2012

After this I started MapReduce.

hduser@bigdata1:/usr/local/hadoop/hadoop$ bin/

It’s output was this;

starting jobtracker, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hduser-jobtracker-bigdata1.out
bigdata3: starting tasktracker, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hduser-tasktracker-bigdata3.out
bigdata2: starting tasktracker, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hduser-tasktracker-bigdata2.out
bigdata4: starting tasktracker, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hduser-tasktracker-bigdata4.out


Now there are three web interface URLs that you can use to check up on your clusters health, there are;

http://localhost:50030/ – web UI for MapReduce job tracker(s)
http://localhost:50060/ – web UI for task tracker(s)
http://localhost:50070/ – web UI for HDFS name node(s)


  1. /usr/local/hadoop/hadoop-
    starting namenode, logging to /usr/local/hadoop/hadoop-
    localhost: starting datanode, logging to /usr/local/hadoop/hadoop-
    localhost: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-
    hduser@hadoop-ThinkCentre-A51:~$ jps
    12799 SecondaryNameNode
    12837 Jps

  2. Hi Clay,
    I tried all the steps you've mentioned,but no luck....

    i have been stuck here for 2days..any help would be very much appreciated