Please log in to watch this conference skillscast.
In this highly interactive session, you will discover how to leverage Spark to rapidly mine a large real-world data set. We will conduct the analysis live entirely using an iPython Notebook to show you how easy it can be to get to grips with these technologies. In the first part of the session, we will use a sample of data from the Open Library dataset, and you will learn how to apply common Spark patterns to extract insights and aggregate data. In the second part of the session, you will see how to leverage Spark on Amazon EMR to scale your data processing queries over a cluster of machines and interactively analyse a large data set (100GB) with a Zeppelin Notebook. Along the way you will learn gotchas as well as useful performance and monitoring tips.
YOU MAY ALSO LIKE:
- Java Collections: The Force Awakens (SkillsCast recorded in November 2016)
- Data-Driven Improvement of Software Quality with Markus Harrer (Online Course on 27th - 28th September 2021)
- Agile Functional Data Pipeline in Haskell: A Case Study of Multicloud API Binding (SkillsCast recorded in November 2020)
- Transforming Legal Recruitment with a Market Knowledge Graph (SkillsCast recorded in October 2019)
Interactively Analyse 100GB of Data using Spark, Amazon EMR and Zeppelin.
Raoul-Gabriel Urma is CEO and co-founder of Cambridge Spark, a leading learning community for data scientists and developers in the UK, as well as chairman and co-founder of Cambridge Coding Academy, a growing community of young coders and pre-university students. He is author of 'Java 8 in Action: Lambdas, Streams, and functional-style programming'.