What's New in Hadoop 3
Hadoop 3 introduces several significant enhancements over its predecessor, Hadoop 2, focusing on improved storage efficiency, scalability.
What's New in Hadoop 3
Erasure Coding in HDFS
Hadoop 3 incorporates erasure coding, a method that provides the same level of fault tolerance as traditional replication but with significantly reduced storage overhead. For instance, using a Reed-Solomon (10,4) scheme, the storage overhead is approximately 1.4x, compared to the 3x overhead with standard replication. This makes it more efficient for storing large volumes of data, especially cold or infrequently accessed data.
YARN Timeline Service v.2
The updated YARN Timeline Service v.2 addresses scalability and reliability issues present in the earlier version. It introduces a distributed writer architecture and scalable backend storage, enhancing the collection and retrieval of application metrics and logs.
Support for Opportunistic Containers
Hadoop 3 introduces opportunistic containers, allowing the system to schedule low-priority tasks on available resources without waiting for guaranteed resources. This improves cluster utilization and throughput.
Multiple Active NameNodes
Unlike Hadoop 2, which supports a single active NameNode, Hadoop 3 allows for multiple active NameNodes. This enhancement increases fault tolerance and availability, enabling the system to continue operating even if one NameNode fails.
Intra-DataNode Balancer
Hadoop 3 includes an intra-DataNode balancer that redistributes data blocks within a DataNode to ensure even disk utilization. This is particularly useful when new disks are added to a DataNode, helping maintain balanced storage across all disks.
Reworked Daemon and Task Heap Management
The heap management for daemons and MapReduce tasks has been restructured. New configuration options allow for better tuning of heap sizes, and the system can auto-tune based on the host's memory, improving performance and resource utilization.
Generalized YARN Resource Model
YARN's resource model has been generalized to support user-defined resources beyond CPU and memory, such as GPUs and software licenses. This flexibility allows for more efficient scheduling of diverse workloads, including those requiring specialized hardware.
S3A Client Enhancements with S3Guard
The S3A client now supports S3Guard, which uses DynamoDB to store metadata, providing consistent and faster access to files stored in Amazon S3. This addresses issues related to eventual consistency in S3, ensuring more reliable file operations.
⚙️ How Hadoop Works Internally
Hadoop's architecture comprises three core components: HDFS, MapReduce, and YARN.
HDFS (Hadoop Distributed File System)
NameNode: Acts as the master server, managing the file system namespace and regulating access to files by clients. It maintains metadata about the file system, such as the directory structure and file-to-block mapping.
DataNode: Runs on slave nodes, storing the actual data blocks. DataNodes handle read and write requests from clients and perform block creation, deletion, and replication upon instruction from the NameNode.
MapReduce
Mapper: Processes input data and produces a set of intermediate key-value pairs.
Reducer: Aggregates the intermediate data and produces the final output.
Execution Flow: The process involves reading input data, splitting it into blocks, mapping, shuffling and sorting intermediate data, and reducing it to produce the final result.TechVidvan+2ProTechSkills+2Apache Hadoop+2Medium+1XenonStack+1
YARN (Yet Another Resource Negotiator)
ResourceManager (RM): Manages the allocation of resources across all applications in the system.
NodeManager (NM): Runs on each node, monitoring resource usage and reporting to the RM.
ApplicationMaster (AM): Manages the lifecycle of applications, negotiating resources from the RM and working with the NM to execute and monitor tasks.
This architecture allows Hadoop to efficiently store and process large datasets across a distributed cluster, providing scalability, fault tolerance, and high throughput.
Write A Comment
No Comments