Scalding A.K.A: Writing Hadoop jobs, but without the pain

2nd December 2013 in London at Kings Place

There are 47 other SkillsCasts available from Scala eXchange 2013

Please log in to watch this conference skillscast.

456924142 640

In this talk though we'll focus on Scalding - a library, developed at Twitter, but used by many others - including eBay - to simplify and bring back the joy to Big Data by using a thin layer of Scala on top of Cascading to build-up data processing "as if it was a simple map { transformation } in plain Scala!

Hadoop and all it's eco system has settled down for good in our hearts and / or minds. It's quite old and has proven to be quite reliable for certain kinds of tasks. Yet one problem still remains - writing Map Reduce jobs in plain Java is really a pain.

The API is clunky and does it's best to hide the actual algorithm beneath tons of boilerplate. Throughout the years many tools and aproaches have shown up - Hadoop's own Streaming API or the great Cascading library.

We'll dive into code examples as well as look into how Scalding actually works, so you can try it out on your cluster when you come back to work on Monday (and smile a bit more when asked to write a Job next time!)


Thanks to our sponsors

Scalding A.K.A: Writing Hadoop jobs, but without the pain

Konrad Malawski

Konrad is an Akka hakker at Typesafe, where he also participated in the Reactive Streams initiative, and implemented its Technology Compatibility Kit.