本文共 1590 字,大约阅读时间需要 5 分钟。
Spark 作业的意思是一个 Spark 动作以及执行该动作的所需要启动的任务。 Spark 的调度是完全
线程安全的, 使应用程序支持多个请求。默认情况下, Spark 遵循 FIFO(先进先出) 方式调度作业。Master负责集群资源管理和调度,线启动等待列表中应用程序的Driver,并尽可能分散在集群的Worker上,根据将集群资源按先来先分配原则进行分配,先分配的程序会更多的获得满足条件的资源,后分的只能在剩余资源中获取,如果没有合适资源则只能等待,直到其他应用程序释放资源。
private def schedule(): Unit = { if (state != RecoveryState.ALIVE) { return } // Drivers take strict precedence over executors val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE)) val numWorkersAlive = shuffledAliveWorkers.size var curPos = 0 for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers // We assign workers to each waiting driver in a round-robin fashion. For each driver, we // start from the last worker that was assigned a driver, and continue onwards until we have // explored all alive workers. var launched = false var numWorkersVisited = 0 while (numWorkersVisited < numWorkersAlive && !launched) { val worker = shuffledAliveWorkers(curPos) numWorkersVisited += 1 if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver launched = true } curPos = (curPos + 1) % numWorkersAlive } } startExecutorsOnWorkers()}
公平调度程序还支持将作业分组到作业池中,并为每个池设置不同的调度策略或优先级。 可以
为更重要的作业创建高优先级作业池, 也可以将每个用户的工作组合在一起配置一个作业池, 并为每个用户分配相同的调度资源,而不管他们的并发作业有多少。转载地址:http://ufkai.baihongyu.com/