In this intensive and practical two-day Apache Spark course, you will be given an in-depth insight into the Spark platform, the largest big data analytics and distributed processing framework for scale. Showing you how to implement data analytics using Apache Spark for Reactive applications, this Spark training course will set you up to take advantage of Spark and its native Scala to write data-centric, high-performing fast applications.
Join Andy Petrella, certified Scala/Spark speaker and author of Learning Play! Framework 2, as he helps you implement data processing pipelines and analytics using Apache Spark. Explore Spark Core, SQL/DataFrame, Streaming, MLlib (machine learning) APIs as well as best practices for improving application performance.
- Use Apache Spark to apply your understanding of Scala to big data -
Who you will be learning with
This course is aimed for experienced developers who have a solid understanding of Scala and are looking to write data-centric applications using Apache Spark.
How to apply these skills
This Apache Spark training course will leave you with a comprehensive insight into the use of Spark and its utility for developers. Create opportunities for faster application development through large-scale data processing, allowing for future growth in your organization.
Book early to receive a discount on the course price and in doing so you will not only commit to growing your own skill set, but help us grow our community of over 140,000 passionate techies.
ScalaX Fringe Package
Interested in taking Spark for Scala on 11th-12th December? Make it a week of learning by joining us for Scala eXchange 2018, and get a special discount on your conference ticket!
Call or email our team for more information.
Learn how to:
- Use the Spark Scala APIs to implement various data analytics algorithms for offline (batch-mode) and event-streaming applications
- Understand Spark internals
- Consider Spark performance
- Test and deploy Spark applications
- Integrate Spark with Mesos, Hadoop, and Akka
Introduction - Why Spark
- How Spark improves on Hadoop MapReduce
- The core abstractions in Spark
- What happens during a Spark job?
- The Spark ecosystem
- Deployment options
- References for more information
Spark's Core API
- Resilient Distributed Datasets (RDD) and how they implement your job
- Using the Spark Shell (interpreter) vs submitting Spark batch jobs
- Using the Spark web console.
- Reading and writing data files
- Working with structured and unstructured data
- Building data transformation pipelines
- Spark under the hood: caching, checkpointing, partitioning, shuffling, etc.
- Mastering the RDD API
- Broadcast variables, accumulators
Spark SQL and DataFrames
- Working with the DataFrame API for structured data
- Working with SQL
- Performance optimizations
- Support for JSON and Parquet formats
- Integration with Hadoop Hive
Processing events with Spark Streaming:
- Working with time slices, “mini-batches”, of events
- Working with moving windows of mini-batches
- Reuse of code in batch-mode and streaming: the Lambda Architecture
- Working with different streaming sources: sockets, file systems, Kafka, etc.
- Resiliency and fault tolerance considerations
- Stateful transformations (e.g., running statistics)
Other Spark-based Libraries:
- MLlib for machine learning
- Discussion of GraphX for graph algorithms, Tachyon for distributed caching, and BlinkDB for approximate queries
Deploying to clusters:
- Spark’s clustering abstractions: cluster vs. client deployments, coarse-grained and fine-grained process management
- Standalone mode
- Hadoop YARN
- Cassandra rings
Using Spark with the Lightbend Reactive Platform:
- Akka Streams and Spark Streaming
If you are an experienced developer and would like to learn how to write data-centric applications using Spark, this Apache Spark course is for you!
To benefit from this Apache Spark course, you could have prior experience with using Scala on a project, or have attended our Lightbend Scala Language - Professional course. Some prior experience with SQL, machine learning and other Big Data tools will be helpful, but not is not essential.
Bring your own hardware
Please bring your own laptop to this course, as it will help you put your newly learned skills into practice after the course using the same environment. Your laptop should have:
- JDK 7 or above
- Lightbend Activator
- Scala IDE, Intellij IDEA with Scala plugin, or a programmer’s text editor of your choice.
Setup instructions for your laptop will arrive a week or so before the training.