Please log in to watch this conference skillscast.
The complexity in complex distributed systems isn’t in the code, it’s between the services or functions. And a lot of failures are hard to predict and maybe even hard to detect.
When your system is made up of multiple microservices or a bunch of lambdas and some queues, how do you know whether it’s working the way you think it should?
Quality in these systems isn’t so much about testing up front: if you’re releasing 20 times a day, you can’t pay the cost of running full regression tests every time. You need to have a risk-based approach and focus your testing effort on the things where it really matters. And more importantly, you need to be able to quickly find out when things are going wrong, and quickly fix them.
Your production system is the only place the full complexity comes into play, so you should be doing a lot of your quality work there. Make sure you can find out about problems as early as possible and do as much ‘testing’ here as you can.
During Sarah's keynote you will learn about:
- the importance of observability - building in log aggregation, metrics and tracing so you can tell what’s up
- business-focussed monitoring, including synthetic monitoring
- why documentation is important and how to encourage people to keep it up to date
- how chaos experiments help
You should go away knowing more about what it takes to make your complex distributed systems easier to build and to run with high quality and stability.
YOU MAY ALSO LIKE: