Apache Spark now comes with all the tools, libraries and connectors required to build a complete end-to-end data platform. There is a lot of documentation, blogs, examples, books and web sites, covering Spark core and its component libraries, but there are few resources on how to integrate these components to build a complete end-to-end solution.
The Lambda Architecture (LA), being fault-tolerant against hardware failures and human errors, enables developers to build large-scale, distributed data processing systems in a flexible and extensible manner. In this talk Think Reactive co-founder Deenar Toraskar will walk you through building a stream analytics engine using Spark and the Lambda architecture.
The talk will cover building all three layers using Spark, each coming with its own set of requirements: i) the BATCH layer, managing the master dataset (an immutable, append-only set of raw data) and pre-computing batch views, ii) the SERVING layer, indexing batch views so that they can be queried in a low-latency, ad-hoc way, and iii) the SPEED layer, dealing with recent data only, and compensating for the high latency of the batch layer. The talk would be accompanied by a real world example with code and a live demo.
A tutorial style talk, this event will be valuable for developers, architects, or project leads who already know about Spark and are now looking for more insight into how it can be leveraged to implement real-world applications.
You might also be interested in the following course :
Typesafe's Apache Spark: An Introductory Workshop For Developers - September 10-11, 2015
YOU MAY ALSO LIKE:
- Introduction to Apache Spark (SkillsCast recorded in December 2015)
- Refactoring to Streams Course with Dr Heinz Kabutz (Online Course on 7th June 2021)
- Java Design Patterns: The Timeless Way of Coding with Dr Heinz Kabutz (Online Course on 8th - 11th June 2021)
- Abstract Data Types In The Region Of Abysmal Pain, And How To Navigate Them (SkillsCast recorded in September 2019)
- The Last Frontier and Beyond (SkillsCast recorded in August 2019)
Maximize your Spark: Applying the Lambda Architecture with Spark/Spark Streaming
Deenar Toraskar is the co-founder of Think Reactive, which provides Spark based responsive, resilient, elastic and ready-to-go data analytics solutions. The solution is built using state-of-the-art technology, end-to-end; from ETL and data pipelines (both batch and streaming), persistence adapters, to analysis and algorithms. All components are packaged using Docker and can run on bare metal or any cloud.