Yp1yl45v282vyv4lsw4c
SkillsCast

Workshop: Mind blown: Crafting a Distributed Data Science Pipeline using Spark, Cassandra, Akka and the Spark Notebook

10th December 2015 in London at Business Design Centre

There are 36 other SkillsCasts available from Scala eXchange 2015

Please log in to watch this conference skillscast.

547980898 640x360

Get your hands dirty with distributed tools, during these two hours we’ll have a quick overview on how a dataset can be processed in a distributed way towards the exposition exposition as a web service. The tool we’ll use for this are Spark, Cassandra, Akka HTTP and the Spark Notebook. A primary basic knowledge (conceptual) of these tools are not required but welcome. Your take home for this workshop will be a docker image which will allow you to replay the whole thing at home or at work (don’t forget the sunglasses to add even more to the cool effect). Oh! And you’ll also have a better idea why and how these tools can be chained for even general purpose, yet data oriented.

Setup instructions can be found in this PDF

Please download this PDF now as the link will expire after the conference.

Note that this workshop requires a lengthy setup process.

YOU MAY ALSO LIKE:

Workshop: Mind blown: Crafting a Distributed Data Science Pipeline using Spark, Cassandra, Akka and the Spark Notebook

Andy Petrella

Andy is a mathematician-turned-distributed computing entrepreneur. Besides running Skills Matter's Spark (and other) courses, Andy also participated in many projects using spark, cassandra, and other distributed technologies, in a range of fields including Geospatial, IoT, Automotive and Smart cities projects. Andy is the creator of the Spark Noeboo, the only reactive and fully Scala notebook for Apache Spark.

Xavier Tordoir

Xavier started his career as a researcher in Experimental Physics, and also focused on data processing. Further down the road, he took part in projects in finance, genomics, and software development for academic research. During that time, he worked on timeseries, on the prediction of biological molecular structures and interactions, and applied Machine Learning methodologies. He developed solutions to manage and process data distributed across data centres. He founded and now works at Data Fellas, a company dedicated to distributed computing and advanced analytics, leveraging Scala, Spark, and other distributed technologies.

SkillsCast

Please log in to watch this conference skillscast.

547980898 640x360

Get your hands dirty with distributed tools, during these two hours we’ll have a quick overview on how a dataset can be processed in a distributed way towards the exposition exposition as a web service. The tool we’ll use for this are Spark, Cassandra, Akka HTTP and the Spark Notebook. A primary basic knowledge (conceptual) of these tools are not required but welcome. Your take home for this workshop will be a docker image which will allow you to replay the whole thing at home or at work (don’t forget the sunglasses to add even more to the cool effect). Oh! And you’ll also have a better idea why and how these tools can be chained for even general purpose, yet data oriented.

Setup instructions can be found in this PDF

Please download this PDF now as the link will expire after the conference.

Note that this workshop requires a lengthy setup process.

YOU MAY ALSO LIKE:

About the Speakers

Workshop: Mind blown: Crafting a Distributed Data Science Pipeline using Spark, Cassandra, Akka and the Spark Notebook

Andy Petrella

Andy is a mathematician-turned-distributed computing entrepreneur. Besides running Skills Matter's Spark (and other) courses, Andy also participated in many projects using spark, cassandra, and other distributed technologies, in a range of fields including Geospatial, IoT, Automotive and Smart cities projects. Andy is the creator of the Spark Noeboo, the only reactive and fully Scala notebook for Apache Spark.

Xavier Tordoir

Xavier started his career as a researcher in Experimental Physics, and also focused on data processing. Further down the road, he took part in projects in finance, genomics, and software development for academic research. During that time, he worked on timeseries, on the prediction of biological molecular structures and interactions, and applied Machine Learning methodologies. He developed solutions to manage and process data distributed across data centres. He founded and now works at Data Fellas, a company dedicated to distributed computing and advanced analytics, leveraging Scala, Spark, and other distributed technologies.

Photos