Please log in to watch this conference skillscast.
Knowing what's happening in your system is key to effective monitoring, troubleshooting, and crisis resolution. Unfortunately, when your microservice ecosystem scales to dozens or hundreds of microservices and every user action involves 10 microservices to complete, it becomes incredibly difficult to have that needed visibility and insight. At Jet, they want to know the current state of every distributed process, numbering a few hundred million per day. To gain this visibility the team coupled a common communication protocol which provides an ID to correlate all the messages in a single process with telemetry collection for every act of communication between microservices; pulling this data together results in a stream of data from which the current state of our 100 million daily processes can be viewed with ease.
This stream of data allows the Jet team to effectively build metaprograms which operate on the state of the distributed system. For example: monitoring for end-to-end SLAs, checking the status of any single process, powering your Ops platform, and automated integration testing of an entire distributed system.
This talk will share with you what the Jet team has done to build this real time, holistic view of our 700+ microservice architecture, so that they can monitor every single process for completion, validate that every single process is behaving as expected, empower their operations team to investigate and triage long running processes (e.g. catalog management and clean up). The talk will cover the DrOrpheus communication protocol they use to create their distributed process context, the telemetry data collection architecture, and the XRay real time telemetry processing platform which enables them to convert billions of telemetry events per day into many different, but accurate, views of their distributed systems state.
YOU MAY ALSO LIKE:
- Sociotechnical Domain-Driven Design with Kacper Gunia (Online Course on 18th - 19th February 2021)
- Android Architecture with Jorge Ortiz-Fuentes (Online Course on 22nd - 24th February 2021)
- Accelerated Software eXchange Forum (Online Conference on 18th February 2021)
- μCon: The Microservices eXchange (Online Conference on 13th - 14th April 2021)
- Evolutionary Architecture (SkillsCast recorded in January 2021)
- What an Architect Can Learn from Retrospective Failures (SkillsCast recorded in December 2020)
Monitoring highly distributed systems
Erich Ess
Directory of engineering at Jet.com. Building distributed systems and microservice platforms.