Please log in to watch this conference skillscast.
Knowing what's happening in your system is key to effective monitoring, troubleshooting, and crisis resolution. Unfortunately, when your microservice ecosystem scales to dozens or hundreds of microservices and every user action involves 10 microservices to complete, it becomes incredibly difficult to have that needed visibility and insight. At Jet, they want to know the current state of every distributed process, numbering a few hundred million per day. To gain this visibility the team coupled a common communication protocol which provides an ID to correlate all the messages in a single process with telemetry collection for every act of communication between microservices; pulling this data together results in a stream of data from which the current state of our 100 million daily processes can be viewed with ease.
This stream of data allows the Jet team to effectively build metaprograms which operate on the state of the distributed system. For example: monitoring for end-to-end SLAs, checking the status of any single process, powering your Ops platform, and automated integration testing of an entire distributed system.
This talk will share with you what the Jet team has done to build this real time, holistic view of our 700+ microservice architecture, so that they can monitor every single process for completion, validate that every single process is behaving as expected, empower their operations team to investigate and triage long running processes (e.g. catalog management and clean up). The talk will cover the DrOrpheus communication protocol they use to create their distributed process context, the telemetry data collection architecture, and the XRay real time telemetry processing platform which enables them to convert billions of telemetry events per day into many different, but accurate, views of their distributed systems state.
YOU MAY ALSO LIKE: