Cluster management is the set of tools and processes that Google uses to control the computing infrastructure in our data centers. It includes allocating resources to different applications on our fleet of computers, looking after software installations and hardware, monitoring, and many other things. I'll provide an overview of what we can do, and explain how the lessons we've learned have driven the design of Kubernetes, our new open-source cluster management system. We certainly don't have all the answers, but we do have some pretty impressive systems.
Principal Software Engineer, Technical Infrastructure, Google. John Wilkes has been at Google since 2008, where he is working on cluster management for Google's compute infrastructure; he was one of the architects of Omega. He is interested in far too many aspects of distributed systems, but a recurring theme has been technologies that allow systems to manage themselves.