Sean is Director of Data Science at Cloudera in London. Before Cloudera, he founded Myrrix Ltd (now, the Oryx project) to commercialize large-scale real-time recommender systems on Apache Hadoop. He is an Apache Spark committer and co-authored Advanced Analytics on Spark. He was a committer and VP for Apache Mahout, and co-author of Mahout in Action. Previously, Sean was a senior engineer at Google.
Talks I've Given
-
What “50 Years of Data Science” leaves out
Featuring Sean Owen
We’re told data science is the key to unlocking the value in big data, but nobody seems to agree just what it is. Is it engineering, statistics. . .both? David Donoho’s “50 Years of Data Science”, which is itself a survey of Tukey’s “Future of Data Analysis”, will present you with one of the best...
big-data critique statistics computer-engineering apache-hadoop spark data-science-fest -
Keynote: Spark+Hadoop and how it relates to Scala
Featuring Sean Owen
Scala seems to be, suddenly, an important language within the Apache Hadoop ecosystem, with the arrival of Scala-based projects like Apache Kafka and Apache Spark -- which is in essence "distributed Scala". In fact, it's not a surprising marriage: Hadoop has been building on...
spark hadoop scala -
Collaborative Filtering at Scale
Featuring Sean Owen
April's meetup will feature on Collaborative Filtering at scale, and Algorithms on Hadoop at Last.fm. In this talk, Sean Owen goes though an introduction to the Mahout project and how it achieves collaborative filtering at scale.
apache hadoop scalable machine-learning