As the size of your log files and other dynamically-generated data increases, they becomes more and more difficult to manage. In this talk we'll discuss how to use Flume, an open-source framework from Cloudera, to collect your log files as they're generated and aggregate them to where you want to process them.
Cloudera has assembled a comprehensive, fully open-source distribution of the Apache Hadoop software and related projects. This package, Cloudera's Distribution for Hadoop (CDH), version 3, is easy to acquire, install, configure, run and administer, and dramatically simplifies the use and operation of Hadoop.
Mike Olson is the CEO of Cloudera. Formerly he was CEO of Sleepycat Software, makers of Berkeley DB, the open source embedded database engine.
As the size of your log files and other dynamically-generated data increases, they becomes more and more difficult to manage. In this talk we'll discuss how to use Flume
After a stint as a technology journalist, Ian Wrigley started one of the UK's first Web consultancies. He has been managing large amounts of data ever since, starting with flat files and Perl scripts
This talk will tell the story of our adoption of Hadoop from our initial in-house virtualised cluster and EC2 experiment to our current dedicated cluster, the migration from our more traditional RDBMS data warehouse to Hive, and how we've developed tools and infrastructure to integrate Hadoop
Sorry, no member has joined this event so far.