Dumbo: Hadoop streaming made elegant and easy

19th August 2008 in London at Sekforde Street

There are 13 other SkillsCasts available from Hadoop User Group Meeting

Skillscast coming soon.

At Last.fm, the number of "write once, run never again" Hadoop programs has been growing steadily, especially in the research team. Since Java is a very verbose and compiled programming language, it is not very suitable for writing such programs. A better way to quickly write MapReduce programs is provided by Hadoop Streaming, but it still is less convenient than it could be. Dumbo is a simple enhancement to Hadoop Streaming that addresses this issue. More specifically, it is a Python module that makes Hadoop Streaming elegant and easy.


Dumbo: Hadoop streaming made elegant and easy

Klaas Bosteels

Klaas Bosteels is a Hadoop expert and works at the Department of Applied Mathematics and Computer Science, Ghent University, where he is working towards a Ph.D. degree as a member of the Fuzziness and Uncertainty Modelling Research Group, in close