Amazon’s Docker-centric container service is working on ways to link into Apache Mesos and the popular Marathon services scheduler framework to widen users’ cluster-management options.
Launched last November, Amazon EC2 Container Service, or Amazon ECS, has just unveiled an Apache Mesos scheduler driver as a proof-of-concept integration with Marathon.
The open-source driver, which sends Mesos management commands direct to ECS, is designed to show how Marathon could schedule workloads on ECS.
It is also aimed at demonstrating the core design principles behind the Amazon service, which separates scheduling logic from state management, according to Deepak Singh, who founded and leads ECS.
“This allows you to use the ECS schedulers, write your own schedulers, or integrate with third-party schedulers,” Singh said in a blogpost.
Cluster management is becoming an important issue for developers who are building distributed applications in the cloud.
“Both these systems typically manage a coordinated cluster of machines working together to perform a large task. In the case of Hadoop or Spark, these tasks are most often data-analysis jobs or machine learning.”
Last week version 0.8.0 of Marathon was released. Mesosphere, a major contributor to the Mesos open-source project, describes it as the most popular framework on Mesos and as being used in large-scale production at a number major companies worldwide.
Singh said cluster management systems face two challenges. The first is the complexity of managing the state of the cluster.
“Software like Hadoop and Spark typically has a Leader, or a part of the software that runs in one place and is in charge of coordination. They’ll then have many, often hundreds or even thousands of Followers, or a part of the software that receives commands from the Leader, executes them, and reports state of their sub-task,” he said.
“When machines fail, the Leader must detect these failures, replace machines, and restart the Followers that receive commands. This can be a significant portion of code written for applications which need access to a large pool of resources.”
The second challenge for cluster management systems is that each application typically assumes full ownership of the machine where its tasks are running.
“You will often end up with multiple clusters of machines, each dedicated fully to the management system in use. This can lead to inefficient distribution of resources, and jobs taking longer to run than if a shared pool of resources could be used,” Singh said.
In the GitHub repository for the Marathon driver, Amazon points out that the software is for demonstration purposes and is “not recommended for production use”.
The company goes on to say: “We are working with the Mesos community to develop a more robust integration between Apache Mesos and Amazon ECS.”