A SkillsCast for this session is not available.
As Uber scales its business to new products in new cities, the requirements for high availability and scalability increase. As the engineering team scales, doubling every 6 months, the challenges of building a reliable system grow with it. At our current scale, even brief outages in the service are very costly, both in dollars to the company and with real world impact on people’s lives.
To get better at handling failure and design for it, we’ve had to make failures more common. Every new system that we build is subjected to regular failure testing, even databases. This requires some new technology choices from the more comfortable ones that worked when we were smaller.
The shift from a smaller service with a few hardened components to a global operation with hundreds of services is as much cultural as it is technical. This talk will cover the Uber architecture and how it handles every failure we can think of. It’ll also cover some real outages and how they’ve influenced our new design.
YOU MAY ALSO LIKE:
- MIgrations - The hardest actual problem in computer science (SkillsCast recorded in December 2022)
- The Science of Queues: Performance Monitoring for Themes Parks and Distributed Systems (SkillsCast recorded in December 2020)
- Solving Problems like a Game Designer (SkillsCast recorded in December 2020)
Designing for Failure: Scaling Uber’s Backend by Breaking Everything
Matt Ranney
Principal EngineerDoorDash