hdfs

The entire file system namespace, including the mapping of blocks to files and file system properties, is stored in a file called the FsImage. The FsImage is stored as a file in the NameNode’s local file system too.
EditLog + FsImage when NamdNode bootstrap.

Architecture

hdfsarchitecture.gif

Secondary Namenode

The secondary NameNode stores the latest checkpoint in a directory which is structured the same way as the primary NameNode's directory. So that the check pointed image is always ready to be read by the primary NameNode if necessary.

The latest checkpoint can be imported to the primary NameNode if all other copies of the image and the edits files are lost. In order to do that one should:

Create an empty directory specified in the dfs.name.dir configuration variable;
Specify the location of the checkpoint directory in the configuration variable fs.checkpoint.dir;
and start the NameNode with -importCheckpoint option.
The NameNode will upload the checkpoint from the fs.checkpoint.dir directory and then save it to the NameNode directory(s) set in dfs.name.dir. The NameNode will fail if a legal image is contained in dfs.name.dir. The NameNode verifies that the image in fs.checkpoint.dir is consistent, but does not modify it in any way.

namenode fsimage and edits recovery
  1. original namenode folder
  2. NFS folder
  3. secondary node's ${fs.checkpoint.dir} ($ hadoop namenode -importCheckpoint)
    1. previous.checkpoint
    2. current

DRBD

Avatar node

Checkpoint node

The Checkpoint node periodically creates checkpoints of the namespace. It downloads fsimage and edits from the active NameNode, merges them locally, and uploads the new image back to the active NameNode. The Checkpoint node usually runs on a different machine than the NameNode since its memory requirements are on the same order as the NameNode.
The role of the Checkpoint node to checkpoint name-node meta-data by merging image and edits files.

Backup Node

The Backup node provides the same checkpointing functionality as the Checkpoint node, as well as maintaining an in-memory, up-to-date copy of the file system namespace that is always synchronized with the active NameNode state.
The Backup node does not need to download fsimage and edits files from the active NameNode in order to create a checkpoint. The NameNode supports one Backup node at a time. No Checkpoint nodes may be registered if a Backup node is in use.

The Backup node extends functionality of the Checkpointer by that it can receive online updates of the file system meta-data, apply them to its memory state and persist them on disks just like the name-node does. Thus at any time the Backup node contains an up-to-date image of the namespace both in memory and on local disk(s).
This also results in much more efficient checkpointing because backup node does not need to transfer files from the active name-node and does not need to replay (merge) edits.

Use case

Facebook

Yahoo

set two dfs.name.dir for replicate meta-data. A backup machine export NFS to NameNode. The NameNode will write its metadata to each directory in the comma-separated list of dfs.name.dir. If /mnt/namenode-backup is NFS-mounted from the backup machine, this will ensure that a redundant copy of HDFS metadata is available. The backup node should serve /mnt/namenode-backup from /home/hadoop/dfs/name on its own drive.

To switch the NameNode and backup nodes, the backup machine should have its IP address changed to the original NameNode's IP address, and the server daemons should be started on that machine. The IP address must be changed to allow the DataNodes to recognize it as the "original" NameNode for HDFS.
hadoop-site.xml:

<property>
    <name>dfs.name.dir</name>
    <value>/home/hadoop/dfs/name,/mnt/namenode-backup</value>
    <final>true</final>
  </property>

NFS for remote directory and standby namenode running.

Cloudera

Hortonworks

Reference

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License