Hadoop User Group meeting, featuring Doug Cutting, founder of Hadoop, and talks on Hadoop on Amazon s3/EC2, Smartfrog and Hadoop, Hadoop usage at Last.fm, Hadoop and Nutch, PosgreSQL, HBase and Lucene!
Doug Cutting, Hadoop Project founder at Yahoo! provides a Hadoop Overview
Doug Cutting has worked on search technology for 20 years. This includes five years at Xerox PARC, three years at Apple, and over four years at Excite. In 1998 he wrote Lucene, an open-source search library which became an Apache project in 2001. In
Tom White from Lexemetech will talk on Hadoop on Amazon S3/EC2
Tom White is one of the foremost experts on Hadoop. He has been an Apache Hadoop committer since February 2007, and is a Member of the Apache Software Foundation. Tom is a software engineer at Cloudera, where he has worked, since its foundation, on t
Steve Loughran and Julio Guijarro (HP) will talk on Smartfrog and Hadoop
Julio Guijarro is one of the main architects of SmartFrog, leads its open source project and manages a team of engineers in Bangalore. Julio is an expert in configuration, deployment and management of distributed systems, areas and Telecom systems.
Steve Loughran is an expert in building, deploying and testing distributed computing systems. He works at HP Laboratories, Bristol, on deploying applications on dynamically allocated datacentre infrastructure; one of the core applications used, Sm
This talk briefly summarizes how Last.fm uses A/B tests to improve the quality of radio stations.
Elias joined Last.fm's data and recommendation team in 2007 and loves Hadoop.
This talk describes a parallel, distributed free text index written at HP Labs Bristol called Distributed Lucene. Distributed Lucene is based on two Apache open source projects, Hadoop and Lucene, and follows a design originally proposed by Doug Cutting. It was written to gain a better understanding of the Apache Hadoop architecture, and to investigate approaches to creating large, scalable free text indexes. For more information see the accompanying HP Labs technical report.
Mark Butler has a varied background in computer science research, having worked on distributed systems, computational biology, software for formulating consumer products, the mobile web and the semantic web. He has a PhD in Computer Science and is
At Last.fm, the number of "write once, run never again" Hadoop programs has been growing steadily, especially in the research team. Since Java is a very verbose and compiled programming language, it is not very suitable for writing such programs. A better way to quickly write MapReduce programs is provided by Hadoop Streaming, but it still is less convenient than it could be. Dumbo is a simple enhancement to Hadoop Streaming that addresses this issue. More specifically, it is a Python module that makes Hadoop Streaming elegant and easy.
Klaas Bosteels is a Hadoop expert and works at the Department of Applied Mathematics and Computer Science, Ghent University, where he is working towards a Ph.D. degree as a member of the Fuzziness and Uncertainty Modelling Research Group, in close
Miles Osborne on using Nutch and Hadoop for Natural Language Processing
Miles Osborne is a Senior Lecturer at Edinburgh University and co-lead the Edinburgh Machine Translation Group. His main research interests are machine learning, machine translation and more recently, dealing with Blog posts.
PostgreSQL to HBase replication: At last.fm we are interested in ways of mixing our data in PostgreSQL with our data in Hadoop. We would like to replicate our PostgreSQL data to HBase, to protect our database from load and give us Map/Reduce bindings on the data.
Tim Sell is a software developer at Last.fm, and a curious observer of the HBase subproject of Hadoop.
Hadoop UG Panel discussion on migrating to Hadoop, MapReduce algorithms, DNS problems, testing and monitoring distributed applications.
Sorry, no member has joined this event so far.