Der715tf8pssnkg2fqzb
SkillsCast

Spark Streaming Machine Learning Pipelines - The good, the bad and the ugly - Intermediate

6th July 2017 in London at CodeNode

There are 42 other SkillsCasts available from Infiniteconf 2017 - the conference on Big Data and Fast Data

Please log in to watch this conference skillscast.

Https s3.amazonaws.com prod.tracker2 resource 41088130 skillsmatter conference skillscast o9nohu

Apache Spark is now the de-facto framework for building end-to-end Machine Learning pipelines. It allows building pipelines over real-time streaming data and integrates all the data processing steps from ingestion, cleaning, training, testing, tuning and deploying your models in Scala, Python or R.

This presentation aims to be a return of experience implementing these pipelines on a real-world big data project.

At Millesime.ai, they're using Spark to predict fine wine market price and trading strategy for the next 6 months and more. Through this use case, you'll explore the core concepts and principles behind machine learning pipelines, what algorithms Spark MLlib covers and how to implement them using Transformers, Estimators and Model Selection through Hyperparameter Tuning.

In the end we'll try to have an objective summary of the lessons learnt developing this project: the good, the bad and the ugly, with some common pitfalls we've run into and how to avoid them.

YOU MAY ALSO LIKE:

Thanks to our sponsors

Spark Streaming Machine Learning Pipelines - The good, the bad and the ugly - Intermediate

Vincent Van Steenbergen

Vincent Van Steenbergen is a Senior Data Engineer who’s been working on Big Data projects using Machine Learning (recommender system, fraud detection) and more recently Deep Learning (voice analysis, natural language processing). He regularly speaks at international conferences and meetups about Big Data tech stacks such as Scala, Akka, Spark as well as Machine Learning and Deep Learning.

SkillsCast

Please log in to watch this conference skillscast.

Https s3.amazonaws.com prod.tracker2 resource 41088130 skillsmatter conference skillscast o9nohu

Apache Spark is now the de-facto framework for building end-to-end Machine Learning pipelines. It allows building pipelines over real-time streaming data and integrates all the data processing steps from ingestion, cleaning, training, testing, tuning and deploying your models in Scala, Python or R.

This presentation aims to be a return of experience implementing these pipelines on a real-world big data project.

At Millesime.ai, they're using Spark to predict fine wine market price and trading strategy for the next 6 months and more. Through this use case, you'll explore the core concepts and principles behind machine learning pipelines, what algorithms Spark MLlib covers and how to implement them using Transformers, Estimators and Model Selection through Hyperparameter Tuning.

In the end we'll try to have an objective summary of the lessons learnt developing this project: the good, the bad and the ugly, with some common pitfalls we've run into and how to avoid them.

YOU MAY ALSO LIKE:

Thanks to our sponsors

About the Speaker

Spark Streaming Machine Learning Pipelines - The good, the bad and the ugly - Intermediate

Vincent Van Steenbergen

Vincent Van Steenbergen is a Senior Data Engineer who’s been working on Big Data projects using Machine Learning (recommender system, fraud detection) and more recently Deep Learning (voice analysis, natural language processing). He regularly speaks at international conferences and meetups about Big Data tech stacks such as Scala, Akka, Spark as well as Machine Learning and Deep Learning.