Building Fault-Tolerant Microservices

4th November 2014 in London at Skills Matter

Are you developing applications that communicate over a network? Of course you are! This talk will take you through all the ways you can build fault-tolerant applications and how, once you get your team in the mindset that everything will eventually fail, dealing with the failures gracefully is no more work than building fragile applications

The topics covered will be:

  • What to do when one of the dependencies fails to respond in time
  • When to use network level time outs vs application level timeouts
  • What to monitor and how to monitor it, e.g connection pools, thread pools, queue sizes, latency
  • How to test for when the network is slow or saturated
  • How to test for when traffic is lost in transit
  • How to train your stakeholders to expect failure and get them to agree to fallbacks meaning they can choose availability over other requirements
  • When to use automated circuit breakers vs manual kill switches
  • Tips, hints and tricks for doing all of the above in Java

The topics covered are especially relevant if your application has a lot of dependencies that it communicates with over a network i.e. microservices. It is even more applicable if your application is deployed to an environment which is prone to failure e.g. a "cloud".

With supporting powerpoint slides, I'll cover the theory and motivation behind moving to a more distributed architecture and then go through the pitfalls and the strategies for improving fault-tolerance, backed up with real examples from Sky.

Who should attend:

Developers, Testers, Architects Junior developers should be able to follow it as well


Christopher Batey

Christopher Batey is a freelance Software Engineer/Architect/Trainer. His speciality is large scale operational systems and he has worked on trading systems and online television services, as well as building off the shelf software at IBM. Likes: Scala, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership.