A SkillsCast for this session is not available.
Streaming data architectures aren't just "faster" Big Data architectures. They must be reliable and scalable as never before, more like microservice architectures.
This talk has three goals:
- Justify the transition from batch-oriented big data to stream-oriented fast data.
- Explain the requirements that streaming architectures must meet and the tools and techniques used to meet them.
- Discuss the ways that fast data and microservice architectures are converging.
Big data started with an emphasis on batch-oriented architectures, where data is captured in large, scalable stores, then processed using batch jobs. To reduce the gap between data arrival and information extraction, these architectures are now evolving to be stream oriented, where data is processed as it arrives. Fast data is the new buzz word.
These architectures introduce new challenges for developers. Whereas a batch job might run for hours, a stream processing system typically runs for weeks or months, which raises the bar for making these systems reliable and scalable to handle any contingency.
The microservice world has faced this challenge for a while. Microservices are inherently message driven, responding to requests for service and sending messages to other microservices, in turn. Hence, they are also stream oriented, in the sense that they must respond reliably to never-ending input. So, they offer guidance for how to build reliable streaming data systems. I'll discuss how these architectures are merging in other ways, too.
We'll also discuss how to pick streaming technologies based on four axes of concern:
- Low latency: What's my time budget for handling this data?
- High volume: How much data per unit time must I handle?
- Data processing: Do I need machine learning, SQL queries, conventional ETL processing, etc.?
- Integration with other tools: Which ones and how is data exchanged between them?
We'll consider specific examples of streaming tools and how they fit on these axes, including Spark, Flink, Akka Streams, and Kafka.
YOU MAY ALSO LIKE:
- Lessons Learned from 15 Years of Scala in the Wild (SkillsCast recorded in May 2022)
- Deep Learning Fundamentals with Leonardo De Marchi (Online Workshop on 12th - 15th September 2022)
- Improving Software Quality through Data with Markus Harrer (Online Workshop on 14th - 15th November 2022)
- Getting Geospatial Data on The Web (SkillsCast recorded in February 2022)
- Deep Learning with F#: An Experience Report (SkillsCast recorded in October 2021)
Stream All the Things!!
Dean Wampler
Product Engineering Director for Accelerated Discovery
IBM Research