With the rapid onset of the global Covid-19 Pandemic in 2020 the USA Centers for Disease Control and Prevention (CDC) quickly implemented a new Covid-19 pipeline to collect testing data from all of the USA’s states and territories, and produce multiple consumable results for federal and public agencies. They did this in under 30 days, using Apache Kafka.
Inspired by this story, we built two demonstration streaming pipelines for ingesting, storing, and visualizing public IoT data (Tidal data from NOAA, the National Oceanic and Atmospheric Administration) using multiple open source technologies. The common ingestion technologies were Apache Kafka, Apache Kafka Connect, and Apache Camel Kafka Connector, supplemented with Prometheus and Grafana for monitoring. The initial experiment used Open Distro for Elasticsearch and Kibana as the target storage and visualisation technologies, while the second experiment used PostgreSQL and Apache Superset.
In this talk we introduce each technology and the pipeline architecture, and walk through the steps followed, challenges encountered, and solutions used to build reliable and scalable pipelines, and visualize the results (including Tidal periods, ranges and locations). We compare and contrast the two approaches, focussing on exception handling, scalability, performance and monitoring, and the pros and cons of the two visualization technologies (Kibana and Superset).
Since learning to program on a VAX 11/780, Paul Brebner has extensive R&D and consulting experience in distributed systems, technology innovation, software architecture and engineering, software performance and scalability, grid and cloud computing, and data analytics and machine learning.
Paul is the Technology Evangelist at Instaclustr. He’s been learning new scalable technologies, solving realistic problems, building applications, and blogging about an ever-increasing list of Open Source technologies. Paul has worked at UNSW, CSIRO, UCL (UK), NICTA/ANU, and several tech start-ups (including Founder/CTO of a NICTA spin-out). Paul has an MSc in Machine Learning and a BSc (Computer Science and Philosophy).