Automated Performance Tuning with Bayesian Optimization - Joshua Cohen & Ramki Ramakrishna, Twitter
Managing resource utilization is one of the hardest aspects of operating Twitter’s Mesos clusters. As the number of services grows and their resource shapes diversify, the bin packing problem becomes increasingly difficult. Tuning for optimal performance would reduce resource usage, and ease the bin packing burden. However, the multitude of available knobs, heterogeneous hardware, the large number of services, and software and hardware upgrades together make the tuning problem combinatorially intractable.
At Twitter we are developing a system that continuously performs automated tuning of services running in our Mesos clusters, using a machine learning technique called Bayesian optimization. This technique allows us to efficiently search very large parameter spaces to optimize specific performance metrics. We describe our system and share initial results.
Joshua Cohen is a Senior Software Engineer at Twitter on the VM Team, working on performance optimization of JVM services. He is also a committer and PMC member for the Apache Aurora project where he has focused on deploy tooling and filesystem isolation. Previously, amongst other places, he has worked at Flickr on notifications infrastructure and the image processing pipeline.
Staff Software Engineer
San Francisco, CA
Ramki Ramakrishna is a staff software engineer in the Infrastructure Engineering Division of Twitter. He is a member of the JVM Platform team and of the Twitter Architecture Group. Ramki has worked with several generations of the JVM, at Sun and Oracle, before Twitter. He has been a committer and reviewer for the HotSpot group in OpenJDK. His principal contributions have been in the areas of performance analysis, tuning and adaptive optimization, parallel and concurrent garbage collection, and the synchronization infrastructure within the JVM. Before joining industry, Ramki worked at SUNY Stony Brook, the Tata Institute of Fundamental Research in India, and Aalborg University in Denmark, dividing time between teaching and research into the formal verification of concurrent systems, using process algebras, temporal logics and automatic theorem-proving. Ramki holds a Ph.D. in Electrical and Computer Engineering from the University of California at Santa Barbara, and a B.Tech. in Electrical Engineering from IIT Kanpur in India.