Meet up

Hadoop User Group Meeting

Tuesday, 19th August in London

This meetup was organised by HUGUK: Hadoop User Group UK in August 2008


Hadoop User Group meeting, featuring Doug Cutting, founder of Hadoop, and talks on Hadoop on Amazon s3/EC2, Smartfrog and Hadoop, Hadoop usage at, Hadoop and Nutch, PosgreSQL, HBase and Lucene!

Hadoop overview

Doug Cutting, Hadoop Project founder at Yahoo! provides a Hadoop Overview

Doug Cutting

Doug Cutting has worked on search technology for 20 years. This includes five years at Xerox PARC, three years at Apple, and over four years at Excite. In 1998 he wrote Lucene, an open-source search library which became an Apache project in 2001. In

Hadoop Web Services on Amazon S3/EC2

Tom White from Lexemetech will talk on Hadoop on Amazon S3/EC2

Tom White

Tom White is one of the foremost experts on Hadoop. He has been an Apache Hadoop committer since February 2007, and is a Member of the Apache Software Foundation. Tom is a software engineer at Cloudera, where he has worked, since its foundation, on t

Deploying Apache Hadoop with Smartfrog

Steve Loughran and Julio Guijarro (HP) will talk on Smartfrog and Hadoop

Julio Guijarro

Julio Guijarro is one of the main architects of SmartFrog, leads its open source project and manages a team of engineers in Bangalore. Julio is an expert in configuration, deployment and management of distributed systems, areas and Telecom systems.

Steve Loughran

Steve Loughran is an expert in building, deploying and testing distributed computing systems. He works at HP Laboratories, Bristol, on deploying applications on dynamically allocated datacentre infrastructure; one of the core applications used, Sm

Hadoop usage at

Martin Dittus of, spoke on Hadoop usage at

Martin Dittus

Martin is a software developer at, and has been using Hadoop since he joined the company in 2006.

Hadoop at Radio Log Analysis for A/B Tests

This talk briefly summarizes how uses A/B tests to improve the quality of radio stations.

Elias Pampalk

Elias joined's data and recommendation team in 2007 and loves Hadoop.

Welcome at the Hadoop UK UG Meeting 2008

Welcome at the Hadoop UK UG Meeting 2008. Intro session, with crowd survey.

Martin Dittus

Martin is a software developer at, and has been using Hadoop since he joined the company in 2006.

Hadoop: Lessons learned at

A short presentation showing some of the common issues that might pop up when using Hadoop. It tries to suggest some solutions to these problems so that others can avoid them.

Johan Oskarsson

Johan is a Java developer at and a Apache Hadoop Core committer.

Distributed Lucene for Hadoop

This talk describes a parallel, distributed free text index written at HP Labs Bristol called Distributed Lucene. Distributed Lucene is based on two Apache open source projects, Hadoop and Lucene, and follows a design originally proposed by Doug Cutting. It was written to gain a better understanding of the Apache Hadoop architecture, and to investigate approaches to creating large, scalable free text indexes. For more information see the accompanying HP Labs technical report.

Mark Butler

Mark Butler has a varied background in computer science research, having worked on distributed systems, computational biology, software for formulating consumer products, the mobile web and the semantic web. He has a PhD in Computer Science and is

Dumbo: Hadoop streaming made elegant and easy

At, the number of "write once, run never again" Hadoop programs has been growing steadily, especially in the research team. Since Java is a very verbose and compiled programming language, it is not very suitable for writing such programs. A better way to quickly write MapReduce programs is provided by Hadoop Streaming, but it still is less convenient than it could be. Dumbo is a simple enhancement to Hadoop Streaming that addresses this issue. More specifically, it is a Python module that makes Hadoop Streaming elegant and easy.

Klaas Bosteels

Klaas Bosteels is a Hadoop expert and works at the Department of Applied Mathematics and Computer Science, Ghent University, where he is working towards a Ph.D. degree as a member of the Fuzziness and Uncertainty Modelling Research Group, in close

Using Nutch and Hadoop for Natural Language Processing

Miles Osborne on using Nutch and Hadoop for Natural Language Processing

Miles Osborne

Miles Osborne is a Senior Lecturer at Edinburgh University and co-lead the Edinburgh Machine Translation Group. His main research interests are machine learning, machine translation and more recently, dealing with Blog posts.

PostgreSQL to HBase Replication

PostgreSQL to HBase replication: At we are interested in ways of mixing our data in PostgreSQL with our data in Hadoop. We would like to replicate our PostgreSQL data to HBase, to protect our database from load and give us Map/Reduce bindings on the data.

Tim Sell

Tim Sell is a software developer at, and a curious observer of the HBase subproject of Hadoop.

History of Hadoop at

A brief introduction to the history of Hadoop at, their cluster setup with some usage statistics, and examples of what Hadoop is used for.

Martin Dittus

Martin is a software developer at, and has been using Hadoop since he joined the company in 2006.

Hadoop Panel Discussion

Hadoop UG Panel discussion on migrating to Hadoop, MapReduce algorithms, DNS problems, testing and monitoring distributed applications.

Who's coming?

Attending Members

Sorry, no member has joined this event so far.