Wo45bvx8assqez8foxbs
Meet up

Resilience and Chaos Engineering at Hotels.com

Monday, 23rd September at Skills Matter, London

This meetup was organised by London Chaos Engineering Community in September 2019

Overview

Resilience and Chaos Engineering at Hotels.com

At Hotels.com we run a bunch of microservices and infra in production. Where applications previously ran on fixed hosts for their lifetime, moving our services to AWS and on Kubernetes there are two new types of change we must be prepared for; Kubernetes dynamically managing the lifecycle of our applications and the EC2 servers that underlie our platform are ephemeral and may fail or be replaced at any time. Each incident not only impacts our revenue but also our customers' trust. In an effort to build resilience in our services we've explored processes and tools like Toxiproxy and Kubemonkey to stress and "break" our systems on purpose and without impacting production.

In this talk we'll talk you through our work on resilience and chaos testing. Why we need resilience, what does it mean for us and what kind of tools we have explored and have been using so far.

Daniel Albuquerque

Daniel is a Software Engineer with over 12 years of experience. He specialises on designing microservices and high load, scalable and fault tolerant systems and he's also an open source contributor.

Nikos Katirtzis

Nikos is a Software Engineer at Hotels.com (Expedia Group). He's working for a team that's exploring new technologies that can improve the Hotels.com and Expedia Group platforms and he's part of the Open-Source and InnerSource groups there. His team recently started experimenting with Resilience and Chaos Engineering, mainly from an application's perspective.

Thanks to our sponsors

Who's coming?

Attending Members