Real-time Systems with Spark Streaming and Kafka

Topics covered at CLOUD-DATA-01-02
View Schedule & Book More dates available

Next up:


Constructing real-time production systems is easier than ever, with new use cases enabled by new big data frameworks, and complexity and cost cut by the cloud. But big data processing comes with some problems. This two-day intensive course will teach you to build systems which can contend with the scale of data required for real-time processing, and add real business value to your insights.

Join expert Jesse Anderson and gain the skills you need to choose the right cloud provider for your company, and create systems which can meet the demands of real-time big data, on this intensive two-day course.

Explore the latest real-time frameworks before learning how to build real-time data pipelines in the cloud, ingest big data, use Apache Spark streaming to process your data, analyze, and visualise using Kafka REST.

Upon completion of this Real-time Systems with Data Streaming course, you will have gained the both the understanding and skills you need, to create large scale real-time systems using Apache Kafka and Apache Spark Streaming.

Learn how to:

  • Create large scale real-time data pipelines using both Apache Kafka and Apache Spark Streaming
  • Ingest and process data and create products from sources in real-time and at scale
  • Understand how real-time distributed systems are different from batch systems
  • Create Kafka producers and consumers
  • Process data in Kafka with Spark Streaming and place the results back into Kafka
  • Visualise data and show data in real-time on a web page

  • What the community says

    "This class provided a lot insight into Real-time/Near Real-time Data Engineering and integration into the cloud. It introduced technologies and technical opinions of those technologies that I was unaware of or had little awareness of. The instructor Jesse Anderson was very knowledgeable and presented the material very clearly and was able to address all questions. Best class I've taken in years!!"


    "Very knowledgeable, the extensive experience of the instructor is palpable."


    About the Author

    Jesse Anderson

    Jesse Anderson is a data engineer, creative engineer, and managing director of the Big Data Institute. Jesse trains employees on big data—including cutting-edge technology like Apache Kafka, Apache Hadoop, and Apache Spark. He has taught thousands of students at companies ranging from startups to Fortune 100 companies the skills to become data engineers. He is widely regarded as an expert in the field and recognized for his novel teaching practices. Jesse is published by O’Reilly and Pragmatic Programmers and has been covered in such prestigious media outlets as the Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired. You can learn more about Jesse at Jesse-Anderson.com.


    Real-time Data Pipelines

    • Real-time Technologies
    • Real-time Pipelines
    • Pros and Cons of Real-time

    Using the Cloud

    • Cloud Providers
    • Real-time Technologies
    • Choosing a Provider

    Ingesting Data

    • Real-time Ingestion
    • Real-time ETL


    • About Kafka
    • Kafka Internals
    • Kafka API

    Processing Data

    • Real-time Data Processing
    • Real-time Processing Technologies

    Spark Streaming

    • Spark Streaming
    • Streaming API
    • Advanced Streaming

    Data Products

    • Analysis of Data
    • Dashboarding


    This course is intended for software engineers, QA and Analysts who want to learn more about Big Data systems.


    To participate in this course you will need to have intermediate level Java knowledge.

    Bring your own hardware

    You are required to bring your own laptop for this course, so that you can develop with an environment you are familiar with.