YARN原论文笔记摘抄

原文:Apache Hadoop YARN: Yet Another Resource Negotiator (2013) pdf, acm

Overview

YARN was designed to address:

  1. tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model;
  2. centralized handling of jobs’ control flow, which resulted in endless scalability concerns for the scheduler.

It achieves: decoupling the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault tolerance) to per-application components.

Guiding Questions

  • What is the relationship between the Resource Manager, Application Master, and Node Manager?
  • How does YARN achieve each of its requirement goals?

Requirements

  • Scalability
  • Multi-tenancy
  • Serviceability
  • Locality awareness
  • High cluster utilization
  • Reliability, availability
  • Secure and auditable operation
  • Support for programming model diversity
  • Flexible resource model
  • Backward compatibility

Architecture

Resource Manager

Basically maps available resources from Node Managers via heartbeats and leases them out to Application Masters as containers. Manages resources.

  • 1 Resource Manager per cluster:
    • track resource usage and node liveliness via heartbeats received from Node Managers
    • enforce allocation invariants
    • arbitrate contention
    • allocates leases (i.e. containers) to applications to run on particular nodes.
  • can also revoke leases or request resources back from an Application Master

Application Master

A job coordinator. Requests resources from the Resource Manager and executes the job on the containers it gets. Basically it will adjust execution according to resource availability and the nature of the job itself.

  • Application Master (1 per job) responsible for:
    • requesting resources from RM by submitting 1+ ResourceRequests
    • generating physical plan from resources received
    • coordinating execution around faults
    • resource management
    • manage flow of execution (mappers -> reducers etc.)
    • handling skew, faults, optimizations ** (what kind of optimization is done?)
    • needs to manage CPU, RAM, disk on multiple nodes
  • sends heartbeats to RM to update record of its demand.
  • if a task fails, the AM is responsible for requesting more resources.

Node Manager

It authenticates container leases, manages containers’ dependencies,
monitors their execution, and provides a set of services to containers.

  • Worker daemon
  • monitors resource availability
  • reports faults
  • container lifecycle management
  • validates lease authenticity

To launch the container, the NM copies all the necessary dependencies– data files, executables, tarballs– to local storage.

YARN framework/application writers

YARN is responsible for:

  1. Submitting the application by passing a CLC for the ApplicationMaster to the RM.
  2. When RM starts the AM, it should register with the RM and periodically advertise its liveness and requirements over the heartbeat protocol
  3. Once the RM allocates a container, AM can construct a CLC to launch the container on the corresponding NM. It may also monitor the status of the running container and stop it when the resource should be reclaimed. Monitoring the progress of work done inside the container is strictly the AM’s responsibility.
  4. Once the AM is done with its work, it should unregister from the RM and exit cleanly.

Fault tolerance and availability

中文词汇

  • Resource Manager: 资源管理器,管理整个系统的资源分配。
  • Application Master: 应用程序管理器
  • Node Manager: 节点管理器
  • Scheduler: 调度器

Terminology

  • Container launch context: map of env variables, dependencies stored in remotely accessible storage, security tokens, payloads for NM services, and the command necessary to create the process.
  • ResourceRequest: a request from the AM to the RM containing:
    • number of containers
    • resources per container (e.g. RAM, CPU)
    • locality preferences
    • priority of requests within application

扩展阅读

http://dl.acm.org/citation.cfm?id=3026195
//www.greatytc.com/p/c21c3a47dde2
//www.greatytc.com/p/7151ba6c5e03

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容