Why SMACK for Fast Data
The SMACK stack (Spark, Mesos, Akka, Cassandra, Kafka) is well positioned as the ideal platform for building “Fast Data” applications. The term Fast Data emphasizes how Big Data architectures and applications are evolving to be stream oriented, so that information is extracted as quickly as possible from incoming data, while still supporting traditional data scenarios, such as data warehousing, batch processing, and interactive exploration.
We’ll explore in depth how the SMACK components (and variations) support the requirements of Fast Data systems:
Spark (and similar streaming engines): Used to implement ETL, queries, aggregations, applications of machine learning, etc.
Mesos: The flexible cluster infrastructure that addresses the limitations of Hadoop YARN. It can host and manage the cluster resources for all your applications.
Akka: Microservice development with high scalability, durability, and low-latency processing. Perhaps it should really be Lightbend’s entire Reactive Platform, in which case we have SMRCK (“smirk”)?
Cassandra: Scalable, resilient, distributed database for persistent, durable storage. Most environments will also use a distributed file system like HDFS or S3.
Kafka: The backplane and integration tool for all stream flows. Provides highly scalable and durable short-term storage, organized into topics using message queue semantics.
YOU MAY ALSO LIKE:
- Lessons Learned from 15 Years of Scala in the Wild (SkillsCast recorded in May 2022)
- Deep Learning Fundamentals with Leonardo De Marchi (Online Workshop on 12th - 15th September 2022)
- Improving Software Quality through Data with Markus Harrer (Online Workshop on 14th - 15th November 2022)
- Getting Geospatial Data on The Web (SkillsCast recorded in February 2022)
- Deep Learning with F#: An Experience Report (SkillsCast recorded in October 2021)
Fast Data
Dean Wampler
Product Engineering Director for Accelerated Discovery
IBM Research