You've found a Premium Feature!
Certain videos, events and workspaces require a Premium Membership. Become a Skills Matter Premium Member today to access exclusive benefits including free tickets to online conferences, Members-only events and discounts on training.Unlock this Video
A couple of terabytes of data is not impressive by today's standards. A hard drive of that capacity costs about a hundred dollars. But things quickly get complicated when one needs to draw insights from a corpus of unstructured game scenarios that are increasing at a rate of a terabyte a year.
You will hear some lessons learned by a data scientist wearing an extra hat of data engineer on this fun side project. The talk will cover topics from using Apache Spark distributed computing framework and optimizing Delta tables to making sense of resulted mega-dataset with graph theory and an interactive Streamlit application.
YOU MAY ALSO LIKE:
- Data-Driven Improvement of Software Quality with Markus Harrer (Online Course on 27th - 28th September 2021)
- Deep Learning Fundamentals with Leonardo De Marchi (Online Course on 11th - 14th October 2021)
- Building a Runtime Reflection System for Rust (SkillsCast recorded in May 2021)
- Hedge your Bets with Rust (SkillsCast recorded in May 2021)