Please log in to watch this conference skillscast.
Apache Spark is now the de-facto framework for building end-to-end Machine Learning pipelines. It allows building pipelines over real-time streaming data and integrates all the data processing steps from ingestion, cleaning, training, testing, tuning and deploying your models in Scala, Python or R.
This presentation aims to be a return of experience implementing these pipelines on a real-world big data project.
At Millesime.ai, they're using Spark to predict fine wine market price and trading strategy for the next 6 months and more. Through this use case, you'll explore the core concepts and principles behind machine learning pipelines, what algorithms Spark MLlib covers and how to implement them using Transformers, Estimators and Model Selection through Hyperparameter Tuning.
In the end we'll try to have an objective summary of the lessons learnt developing this project: the good, the bad and the ugly, with some common pitfalls we've run into and how to avoid them.
YOU MAY ALSO LIKE:
- Deep Learning data pipeline with TensorFlow, Apache Beam and Scio (SkillsCast recorded in December 2017)
- Leonardo De Marchi's Deep Learning Fundamentals (in London on 22nd - 23rd October 2019)
- Real-time Systems with Spark Streaming and Kafka (in London on 21st - 22nd May 2020)
- Scala eXchange London 2019 (in London on 12th - 13th December 2019)
- Practical ML 2020 (in London on 2nd - 3rd July 2020)
- Countdown to Big Data LDN (in London on 17th October 2019)
- Security in the Age of Big Data (Data Anonymisation & Encryption) (in London on 21st October 2019)
- Automating Elaborate-Transform-Load for Busy Data Scientists (SkillsCast recorded in October 2019)
- FaaS composition using Kafka and Cloud-Events (SkillsCast recorded in September 2019)
Spark Streaming Machine Learning Pipelines - The good, the bad and the ugly - Intermediate
Vincent Van Steenbergen is a Senior Data Engineer who’s been working on Big Data projects using Machine Learning (recommender system, fraud detection) and more recently Deep Learning (voice analysis, natural language processing). He regularly speaks at international conferences and meetups about Big Data tech stacks such as Scala, Akka, Spark as well as Machine Learning and Deep Learning.