Please log in to watch this conference skillscast.
In this talk though we'll focus on Scalding - a library, developed at Twitter, but used by many others - including eBay - to simplify and bring back the joy to Big Data by using a thin layer of Scala on top of Cascading to build-up data processing "as if it was a simple map { transformation } in plain Scala!
Hadoop and all it's eco system has settled down for good in our hearts and / or minds. It's quite old and has proven to be quite reliable for certain kinds of tasks. Yet one problem still remains - writing Map Reduce jobs in plain Java is really a pain.
The API is clunky and does it's best to hide the actual algorithm beneath tons of boilerplate. Throughout the years many tools and aproaches have shown up - Hadoop's own Streaming API or the great Cascading library.
We'll dive into code examples as well as look into how Scalding actually works, so you can try it out on your cluster when you come back to work on Monday (and smile a bit more when asked to write a Job next time!)
YOU MAY ALSO LIKE:
- Workshop: End-to-end asynchronous back-pressure with Akka Streams (SkillsCast recorded in December 2015)
- Deep Learning Fundamentals with Leonardo De Marchi (Online Course on 8th - 11th February 2021)
- Building a Custom Type Provider (SkillsCast recorded in October 2020)
- Post Quantum Cryptography Apocalypse (SkillsCast recorded in September 2020)
Scalding A.K.A: Writing Hadoop jobs, but without the pain
Konrad Malawski
Konrad is an Akka hakker at Typesafe, where he also participated in the Reactive Streams initiative, and implemented its Technology Compatibility Kit.