Task — 7.1: Elasticity Task
In this article, I am going to discuss how we can integrate LVM with Hadoop to provide elasticity to the Data node storage and automate LVM script by Python.
INTRODUCTION:
Hadoop: Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.
HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on.
Logical Volume Management : LVM is a tool for logical volume management which includes allocating disks, striping, mirroring and resizing logical volumes.
So, what is logical volume?
With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks.
The physical volumes are combined into logical volumes. Since a physical volume cannot span over multiple drives, to span over more than one drive, create one or more physical volumes per drive.
The volume groups can be divided into logical volumes, which are assigned mount points, such as /home and / and file system types, such as ext2 or ext3.
When “partitions” reach their full capacity, free space from the volume group can be added to the logical volume to increase the size of the partition. When a new hard drive is added to the system, it can be added to the volume group, and partitions that are logical volumes can be increased in size.
Step 1: Start the NameNode Service
In this step, we will start the service of NameNode
#hadoop-daemon.sh start namenode
Step 2: Add Hard Disk to the DataNode
Adding one new harddisk to the Datanode to share storage from this hard disk.
#fdisk -l
A new hard disk= 50GiB (Name /dev/sdb).
Step 3: Create Physical Volume from /dev/sdb.
#pvcreate /dev/sdb (Create pv)
#pvdisplay /dev/sdb (display pv)
Now, we have to allocate this physical volume to some Volume Group.
Step 4: Create the Volume Group
#vgcreate dnvg /dev/sdb (create vg and allocate pv to it).
#vgdisplay dnvg
Step 5: Create Logical Volume of Size 30GiB
#lvcreate — size 30G — name dnlv dnvg
Step 6: Format the Logical Volume
#mkfs.ext4 /dev/dnvg/dnlv
Step 7: Mount the Logical Volume with the DataNode directory
#mount /dev/dnvg/dnlv /dn
#df-h
Step 8: Start Datanode service
#hadoop-daemon.sh start datanode
#hadoop dfsadmin -report
Now we have to increase the storage online(elastically storage will increase without stopping the data node).
Step 9: Increase the Logical Volume Size
#lvextend — size +10G /dev/dnvg/dnlv
Logical volume Size has been increased from 30GiB to 40GiB.
The size of the volume which is mounted with /dn directory is still 30GiB.
Now, we have to update the inode table of the partition(logical volume).
Step 10: Format the extended Logical Volume
#resize2fs /dev/dnvg/dnlv
Python App for increasing the LVM dynamically
Python code:-
This is how we can automate LVM with Hadoop by using python-script.
Thank you!