Skyscraper: Restructuring the Web

1st December 2016 in London at CodeNode

There are 30 other SkillsCasts available from Clojure eXchange 2016

Please log in to watch this conference skillscast.

605917439 640

This talk will describe Skyscraper, a library that allows for easy scraping of entire Web sites. You will discover how Skyscraper grew organically as a generalization of individual scrapers tailored to different websites, how abstractions common to these scrapers were found, and how these abstractions gave rise to Skyscraper's design.

You will learn about the problems with real-world sites and the solutions Daniel had implemented in Skyscraper to overcome them, including lessons learned from his biggest scraping project to date, a scraper of 500K+ pages of the Polish parliament.

You will also explore the realm of data structures: how the output of Skyscraper fits the definition of a data table and how to represent these in Clojure.


Thanks to our sponsors

Skyscraper: Restructuring the Web

Daniel Janus

Daniel has been in love with functional programming languages ever since being exposed to OCaml in his freshman year at Warsaw University in 2000. He has since worked with Standard ML, Haskell, Scheme, Common Lisp and Clojure, which is now his preferred way of expressing thoughts as code. He writes Ruby for a living, and hacks on Clojure in his spare time. When not coding, he can be found playing Scrabble, cycling or petting cats.