Please log in to watch this conference skillscast.
The complexity in complex distributed systems isn’t in the code, it’s between the services or functions. And a lot of failures are hard to predict and maybe even hard to detect.
When your system is made up of multiple microservices or a bunch of lambdas and some queues, how do you know whether it’s working the way you think it should?
Quality in these systems isn’t so much about testing up front: if you’re releasing 20 times a day, you can’t pay the cost of running full regression tests every time. You need to have a risk-based approach and focus your testing effort on the things where it really matters. And more importantly, you need to be able to quickly find out when things are going wrong, and quickly fix them.
Your production system is the only place the full complexity comes into play, so you should be doing a lot of your quality work there. Make sure you can find out about problems as early as possible and do as much ‘testing’ here as you can.
During Sarah's keynote you will learn about:
- the importance of observability - building in log aggregation, metrics and tracing so you can tell what’s up
- business-focussed monitoring, including synthetic monitoring
- why documentation is important and how to encourage people to keep it up to date
- how chaos experiments help
You should go away knowing more about what it takes to make your complex distributed systems easier to build and to run with high quality and stability.
YOU MAY ALSO LIKE:
- Mature microservices and how to operate them (SkillsCast recorded in December 2019)
- LDN *Virtual* Talks May 2021 (Online Meetup on 27th May 2021)
- How to Simplify Parsing with Genie Dq (SkillsCast recorded in April 2021)
- How I save myself 30+ minutes a day as a Network Engineer using Ansible (SkillsCast recorded in April 2021)
Keynote: Quality for 'Cloud Natives': What Changes When Your Systems Are Complex And Distributed?
I've been a developer for 15 years, leading delivery teams across consultancy, financial services and media. Over the last few years I have developed a deep interest in operability, observability and devops, and at the beginning of 2018, this led to me taking over responsibility for Operations and Reliability at the Financial Times.