Alex Dean will talk about building Snowplow, an open source event analytics platform, on top of Scala and key libraries and frameworks including Scalding, Scalaz and Spray. He will highlight some of the data processing tricks and techniques picked up along the way, particularly: schema-first development; monadic ETL; datatable-based testing; data transformation maps. He will also introduce some of the Scala libraries the Snowplow team have open sourced along the way (such as scala-forex, referer-parser, scala-maxmind-geoip).
YOU MAY ALSO LIKE:
- Building robust data pipelines in Scala (SkillsCast recorded in December 2014)
- LJC Lunchtime Lightning Talks (Online Meetup on 7th August 2020)
- The Five Stages of Data: A Holistic Approach to Data Analytics and BI (SkillsCast recorded in October 2019)
- Automating Elaborate-Transform-Load for Busy Data Scientists (SkillsCast recorded in October 2019)
Building data processing applications in Scala: the Snowplow experience
I'm the co-founder and tech lead at Snowplow Analytics, the open source web and event analytics platform (https://github.com/snowplow/snowplow). Snowplow is almost exclusively written in Scala, using a range of technologies including Scalaz, Scalding and Spray. I spend a lot of time working with distributed systems (historically Hadoop, increasingly Kinesis, Kafka et al) to deliver really scalable event stream processing. I'm also the author of Unified Log Processing from Manning Publications (http://manning.com/dean/).