Please log in to watch this conference skillscast.
This stream of data allows the Jet team to effectively build metaprograms which operate on the state of the distributed system. For example: monitoring for end-to-end SLAs, checking the status of any single process, powering your Ops platform, and automated integration testing of an entire distributed system.
This talk will share with you what the Jet team has done to build this real time, holistic view of our 700+ microservice architecture, so that they can monitor every single process for completion, validate that every single process is behaving as expected, empower their operations team to investigate and triage long running processes (e.g. catalog management and clean up). The talk will cover the DrOrpheus communication protocol they use to create their distributed process context, the telemetry data collection architecture, and the XRay real time telemetry processing platform which enables them to convert billions of telemetry events per day into many different, but accurate, views of their distributed systems state.
YOU MAY ALSO LIKE:
Monitoring highly distributed systems
Erich Ess
Directory of engineering at Jet.com. Building distributed systems and microservice platforms.