The topics covered will be:
- What to do when one of the dependencies fails to respond in time
- When to use network level time outs vs application level timeouts
- What to monitor and how to monitor it, e.g connection pools, thread pools, queue sizes, latency
- How to test for when the network is slow or saturated
- How to test for when traffic is lost in transit
- How to train your stakeholders to expect failure and get them to agree to fallbacks meaning they can choose availability over other requirements
- When to use automated circuit breakers vs manual kill switches
- Tips, hints and tricks for doing all of the above in Java
The topics covered are especially relevant if your application has a lot of dependencies that it communicates with over a network i.e. microservices. It is even more applicable if your application is deployed to an environment which is prone to failure e.g. a "cloud".
With supporting powerpoint slides, I'll cover the theory and motivation behind moving to a more distributed architecture and then go through the pitfalls and the strategies for improving fault-tolerance, backed up with real examples from Sky.
Who should attend:
Developers, Testers, Architects Junior developers should be able to follow it as well
Christopher is a Senior Engineer at Lightbend. He is currently on the core Akka team responsible for developing Akka (https://akka.io/), Akka Http, Akka Streams, Reactive Kafka and Alpakka (https://github.com/akka/alpakka). He has previously built trading systems, online television platforms and worked extensively with Apache Cassandra. Likes: Scala, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Dislikes: Untested software and code ownership.