Please log in to watch this conference skillscast.
Phoenix 1.5 is here and it comes powered up with out-of-the-box instrumentation and visualization thanks to Telemetry and Live Dashboard.
Phoenix now integrates Erlang's Telemetry library to aggregate and report on standard Phoenix, Ecto and Elixir VM Telemetry events as well as any custom events you'd care to emit from your own application. The Live Dashboard library allows us to visualize the metrics, performance and behavior of our app, as described by these events in real-time. These two offerings together empower every Elixir developer to hit observability goals by writing and shipping fully instrumented code with ease.
In this talk, we'll take a tour through Live Dashboard's usage and features and we'll dive under the hood to understand how it leverages Erlang and Elixir's Telemetry libraries to capture and visualize events as metrics. There will be an obligatory picture of Frankenstein.
Q&A
Question: Can you aggregate metrics of multiple phoenix instances behind a load balancer in the dashboard?
Answer: LiveDashboard does recognize the node from which metrics/events are being collected. I'm not sure if it has out-of-the-box support for aggregating events across nodes though but since nodes are treated as first class citizens by LiveDashboard, I have a feeling you'll be able to find out more in the docs.
Question: If you're in production, what about the privacy aspects, in that some of the telemetry might expose sensitive user data. Could there be some general mechanism for "tainting" some data as sensitive, so that it's handled differently, maybe with suitable anonymization or such? It would at least (in some circumstances) take off from the developer the burden of worrying about user privacy.
Answer: First off, I would definitely recommend putting your LiveDashboard routes behind authentication. I don't think LD has support for anonymization/obfuscation at this time. Some discussion of auth in the docs here https://hexdocs.pm/phoenixlivedashboard/Phoenix.LiveDashboard.html#module-extra-add-dashboard-access-on-all-environments-including-production
Question: It's more than just authentication, though. Just a thought. I guess sometimes for monitoring you don't need to know all the (possibly sensitive) details. Another project... A well-conceived framework to filter and aggregate appropriately what's shown to developers.
Answer: yea I think this is a bigger question on the topic of observability. at GitHub for example we have a hand-rolled whitelisting system that ensures we don't send overly sensitive data to exception reporters like Sentry.
Question: From my first impression it looks focussed on real-time telemetry, ie what’s going on right now. But is it possible to go back in time with LiveDashboard to see data from previous days/weeks?
Answer: in fact, there is a VERY new addition to live dashboard which deals with metrics history! https://hexdocs.pm/phoenixlivedashboard/metricshistory.html#content
Question: Is there a rule of thumb for the overhead of running monitoring? Netflix for example allocates a Prometheus instance for every 10 VMs
Answer: the question of scaling your monitoring tooling appropriately for your infrastructure/load is definitely a tough one! Who is monitoring the monitor?? I don't have a "rule of thumb" answer sadly, since it depends so much on the load that your applications are under.
For something like StatsD, a good approach is to leverage nginx as a UDP proxy so that you can load balance your statsd traffic.
Question: If you use it in production, what’s been the single biggest challenge of running Elixir/Erlang systems at scale?
Answer: I wish I could say we were using Elixir in production! I was shipping Elixir to production at my previous company though--The Flatiron School and I would say the biggest challenge of using Elixir at scale was actually a good problem to have--bc of Elixir's concurrency and fault tolerance and speed, when our Elixir app was responsible for communication to third-parties or to other non-elixir apps in our ecosystem, we often hammered those parties, exceeding rate limits in the case of third party apis or putting our other apps under unexpected load.
Also, a few years ago when we first started putting elixir into production, i'd say the hardest part was that the release lifecycle for elixir was not as mature as it is now--working with edeliver or distillery could be challenging but now that releases are built into elixir, it's much easier.
Also (one more thing!)--for those of us used to Ruby or other non-compiled languages and frameworks, I would sometimes feel frustrated by issues occuring in production builds that I wasn't experiencing locally/in dev.
Question: Do you get telemetry events for how long it actually takes for the patch to get applied to the user's DOM? That would be a really great stat to keep an eye on to see if anyone was suffering with a sluggish feeling UI for some reason if you had massive diffs going over the wire or something
Answer: Im looking through the docs here <a href="https://slack-redir.net/link?url=https%3A%2F%2Fhexdocs.pm%2Fphoenixliveview%2Ftelemetry.html">https://hexdocs.pm/phoenixlive_view/telemetry.html. I feel like that would have to involve the JS side of things though.
Question: Anyone have a recommendation for a metrics reporting system that can show frontend, backend and even higher level business events?
The closest I've used was Azure AppInsights - but it wasn't exactly fun to use
Answer: this doesn't quite fit the bill but we use Lightstep for distributed tracing at GitHub https://lightstep.com/
YOU MAY ALSO LIKE:
It's Alive!!! Instrumenting Phoenix 1.5 with Telemetry and Live Dashboard
Sophie DeBenedetto
EngineerGitHub