The aim was to connect the data science community and foster the sharing of knowledge, inspiration and ideas. CodeNode was the venue for the Data Science Festival on Saturday, 29 April 2017.
Who is the Data Science Festival for?
Data engineers, analysts, scientists, and other practitioners
R, Python and other software engineers who work with data or want to learn
Data visualisation developers and designers
Non-technical team leads, executives, and other decision makers from data centric startups and large companies looking to utilise open source tools
Excited? Share it!
Day 1: Saturday 29th April
Check out these awesome sessions below!
Track | CTRL | ALT/TAB | CMD | |||||||||
09:00
Invalid Time
Invalid Time
|
The huge flows of data enabled by social media, smart phones and the internet have created a revolution in the way we interact with events around the world, especially in the arena of conflict. This environment has given us access to information which can be used for investigative journalism, conflict analysis and the identification of war crimes. This talk explores the use of open source information in these arenas, specifically by using case studies from Syria and Ukraine.
data-machine-learning
open-source
open-source-data
conflict
analysis
mobile
data-science-fest
About the speaker...Nick WatersHe has been writing for Bellingcat since May 2016, during which time he used open source information to demonstrate the importance of verification, investigate alleged war crimes and examine the use of drones by ISIS. His work has been featured by The Washington Post, Der Spiegel TV and War is Boring, amongst many others. |
data-science
data-science-fest
About the speaker...Jay LuiWhen he is not teaching, he runs Digital-Dandelion, a data science consultancy helping organisations innovate by successfully integrating the latest in data science, machine learning and big data into their systems. |
||||||||||
09:45
Invalid Time
Invalid Time
|
The answer is a mix of measurements, models and statistics. In this session, explore the observations, predictions and forecast models, and weather data as a variable to consider in machine learning models. Learn how it is done and ways you can use weather and climate data from several examples. |
This talk will share a range of examples that covers everything from commuter flows to baboons, cyclists to songbirds in order to demonstrate how maps and data visualisations offer a window into big data. Many of the selected examples started out life in R, you will get the chance to see how R is not just great for data wrangling but for visualisation as well.
complex-datasets
data-visualisation
big-data
data-science-fest
About the speaker...James CheshireHe is a Fellow of the Royal Geographical Society, Deputy Director of the Consumer Data Research Centre and co-author of the books London: The Information Capital and Where the Animals Go. |
||||||||||
10:30
Invalid Time
Invalid Time
|
Coffee Break |
|||||||||||
11:00
Invalid Time
Invalid Time
|
Day 1, 29 Apr starts 11:00 (CTRL)
Learning at hyper-scale: creating the self-learning business
|
During this talk, you will discover how basic data and AI have the potential to bring the next revolution in operational visibility. You will explore case studies from financial and professional services sectors, where StatusToday’s technology was deployed with direct application to understand employee productivity, identify cyber and human risks (insider threat), and categorise communication patterns for key clients.
ai
human-behaviour
operational-visability
data-science-fest
About the speaker...Ankur ModiPrior to StatusToday, Ankur was Project Manager and Software Engineer at Microsoft, leading international efforts behind MS Office, Office 365 and Dynamics for projects now used by over 100 million people worldwide. At StatusToday, Ankur and his team have created a unique Intelligence Platform that helps companies understand human behaviour to ensure Security, Engagement and Productivity. |
In this talk you will focus on the lessons from a recent implementation of a smart search in an instant grocery delivery app. In Part One you will explore the theory of the Support Vector Machine algorithm which is driving the smart search. In Part Two, you will learn the agile practices that are needed to put smart search in to production.
smart-search
support-vector-machine
agile-practices
app
data-science-fest
About the speaker...Shahzia HoltomHer experience lies in building smart applications with the use of machine learning and AI algorithms. Shahzia has previously worked as one of the first data scientist in the UK Government where she championed the use of data science for evidence based policy making. Shahzia holds a PhD in Statistics from the University of Oxford and has also trained as a Software Engineer. |
|||||||||
11:45
Invalid Time
Invalid Time
|
This talk will share the intuitions behind this family of algorithms. You will explore some of the Python tools that allow you to implement modern NLP applications, followed with some practical considerations.
natural-language-processing
python
data-science-fest
About the speaker...Marco BonzaniniHe is the author of Mastering Social Media Mining with Python (PacktPub, July 2016). |
kafka
one-by-one-processing
data-science-fest
About the speaker...Dean MorinHe uses much of his spare time to rock climb, and is planning to sneak away to the Peak District while in the UK, so if you have any good info on the area, track him down! |
“Bob’s on vacation – how do I run his model?” “Is my neural network useless or should I continue tweaking its parameters?” Have you ever heard any of the above before? Pawel Subko's team had the same problems when running research and multiple commercial machine/deep learning projects. In this talk you will discover a number of best practices that can significantly improve your team’s performance, based on Pawel's experiences. You will explore the process of building a robust data science pipeline by using a range of technologies (e.g. Git, Docker or Neptune – Pawel's in-house tool for managing machine learning experiments).
commercial-machine
deep-learning
data-science
git
docker
neptune
data-science-fest
About the speaker...Pawel SubkoIn deepsense.io he works on object detection, classification and image processing. He is specialising in implementing modern, advanced architectures of artificial neural networks and applying them to real-life problems. |
|||||||||
12:30
Invalid Time
Invalid Time
|
Lunch |
|||||||||||
13:15
Invalid Time
Invalid Time
|
Drawing on his blog post, Sean Owen responds, offering counterpoints from an engineer, in search of a better understanding of how to teach and practice data science in 2017. You will explore some key points in the history of data science from the past 50 years in order to build up a more complete view of how data science sprung out of statistics and merged with computer engineering. Finally, you will discover Donoho’s view of what it means to build data science capability with one taken from the experience organizations doing so in the context of Apache Hadoop, Spark, and other big data tools.
big-data
critique
statistics
computer-engineering
apache-hadoop
spark
data-science-fest
About the speaker... |
In this talk you will go through the various steps in the algorithm from data preparation and time series decomposition through to finding potentially multiple anomalies. The technique could be used to spot deviations from behavioural patterns with the benefit that it is easy to see why an anomaly is unusual.
data-learning
seasonal-hybrid-esd
twitter
data-science-fest
About the speaker...Peter TillotsonHe combines the skill-sets of Data Engineer and Analyst and is as happy building fast real-time Kafka / Spark data pipelines as he is doing time series decomposition and building customer profiles. By fast, Peter has worked at 1 million events per second (80B events per day) with a total data warehouse size of 15PB. In his spare time Peter is a keen but talentless mountain biker, he tends to fall off a lot |
See how these detectors can be applied in the context of web media search, advertising and social media, and analyse the precious contribution of computer vision in understanding how people and cultures perceive visual properties, underlining the importance of feature interpretability for this task.
machine-learning
web-media-search
advertising
social-media
data-science-fest
About the speaker...Miriam RediMiriam got her Ph.D. at the Multimedia group in EURECOM, Sophia Antipolis. After obtaining her PhD, she was a Postdoc in the Social Media group at Yahoo Labs Barcelona and a Research Scientist at Yahoo London. |
|||||||||
14:15
Invalid Time
Invalid Time
|
Based on this particular use case, you will learn how to go from a new project to a chatbot handling technical support. You will explore the building blocks of the system like receiving and sending messages, natural language processing and integrations with existing messaging platforms such as Telegram, Messenger or Skype. The session will cover the following topics:
During this talk, you will gain knowledge of the components necessary to build chatbot based system, including natural language processing and messages handling. The aim is that viewers will be able to go from never writing a chatbot, to building one which is capable of holding a conversation.
ai
machine-learning
chatbots
natural-language-processing
data-science-fest
About the speaker...Barbara FusinskaShe tweets at @BasiaFusinska and you can follow her blog. |
In this highly interactive session, you will discover how to leverage Spark to rapidly mine a large real-world data set. We will conduct the analysis live entirely using an iPython Notebook to show you how easy it can be to get to grips with these technologies. In the first part of the session, we will use a sample of data from the Open Library dataset, and you will learn how to apply common Spark patterns to extract insights and aggregate data. In the second part of the session, you will see how to leverage Spark on Amazon EMR to scale your data processing queries over a cluster of machines and interactively analyse a large data set (100GB) with a Zeppelin Notebook. Along the way you will learn gotchas as well as useful performance and monitoring tips.
big data
python
data-processing
data-science-fest
About the speaker...Raoul-Gabriel UrmaIn addition, Raoul has written over 10 peer-reviewed articles and given over 20 technical talks at international conferences. He has worked for large companies such as Google, eBay, Oracle, and Goldman Sachs, as well as for several startup projects. You can find out more about Raoul via his website and follow him on twitter via @raoulUK. You can find out more about his book and get a copy at manning.com/urma |
You will then learn various use cases Soraya and Roberto have been working on, such as product recommendations, related products, out of stock recommendations, category recommendations and visual browsing. This is followed by an illustration of how various important other functions and elements contribute to the success of a recommender system and what specific challenges they faced in putting their algorithms into a production environment. The talk will conlude with an outlining of their data science roadmap, which includes context-aware recommendations, session-based recommendations and tensor decomposition techniques.
recommender-systems
fashion-domain
visual-browsing
data-science
data-science-fest
About the speakers... |
|||||||||
15:00
Invalid Time
Invalid Time
|
COFFEE BREAK |
|||||||||||
15:30
Invalid Time
Invalid Time
|
In this talk, you will learn the chain of events that led up to his capture, his experiences as a hostage, and his eventual release, with many fascinating stories about programming, IT, travel, and survival incorporated along the way. |
Recommender systems are paramount for e-business companies. There is an increasing need to take into account all user information to provide the best, most tailored products. One important element is the content that the user actually sees: the visual of the product. In this talk, you will discover how Dataiku improved an e-business vacation retailer recommender system using the content of images. You'll explore how to leverage open datasets and pre-trained deep learning models to derive user preference information. This transfer learning approach enables companies to use state-of-the-art machine learning methods without having deep learning expertise. |
In this talk you will focus on the challenges faced in moving from a communication strategy based around the timings of production and distribution of catalogues towards a more individually personalised strategy fit for a fast-paced online retailer and the techniques and models used to achieve this goal.
online-retailer
communication-strategy
data-learning
data-science-fest
About the speaker...Simon HillHe has been involved in projects related to customer segmentation, personalised customer communication strategy and customer value, among other topics. |
|||||||||
16:15
Invalid Time
Invalid Time
|
Whilst the techniques covered are completely general, the talk concludes with some applications from financial planning and portfolio management. No previous knowledge of mathematical programming is required, but please note that this talk contains formulae (and Python code).
machine-learning
quadratic-programmes
linear-regression
robust-regression
python
financial-planning
data-science-fest
About the speaker...Gianluca CampanellaGianluca is also the founder and director of Estimand, through which he provides consulting and training services in Statistics, Machine Learning, and Data Science. |
With Oliver Frost, a Data Engineer at Consolidata, you will explore what tools are available to you as a data scientist inside SQL Server 2016 and on Azure. Discover how R adds value where traditional relational databases struggle, how to use ScaleR functions to build predictive models and see how Azure ML can be used to build efficient machine learning pipelines. |
-
War in HD: Conflict and Open Source Infomation
Featuring Nick Waters
The huge flows of data enabled by social media, smart phones and the internet have created a revolution in the way we interact with events around the world, especially in the arena of conflict. This environment has given us access to information which can be used for investigative journalism,...
data-machine-learning open-source open-source-data conflict analysis mobile data-science-fest -
How can we predict a successful data science project?
Featuring Jay Lui
In this talk you will explore a series of Jay's personal anecdotes that illustrate the key factors that explain data science success!
data-science data-science-fest -
A Beginners Guide to Weather and Climate Data
Featuring Margriet Groenendijk
Weather is part of our everyday lives. Who doesn’t check the rain radar before heading out, or the weather forecast when planning a weekend away? But where does this data come from, what is it made of?
climate bigdata machine-learning models data-science-fest -
Less is More: Data Visualisations for Big Data
Featuring James Cheshire
In this talk you will explore how large and complex data-sets can be visualised in compelling and informative ways.
complex-datasets data-visualisation big-data data-science-fest -
Using AI to understand Human Behaviour in the workplace
Featuring Ankur Modi
Can Artificial Intelligence be used to understand the intricacies of human behaviour?
ai human-behaviour operational-visability data-science-fest -
A practical look at putting data science in production
Featuring Shahzia Holtom
Search is a common feature for apps. The basic implementation of search as an information retrieval exercise does not allow for personalisation. A smart search, that boosts the information retrieval with sorting based on the relevance to an individual, adds to the user experience.
smart-search support-vector-machine agile-practices app data-science-fest -
One-by-one Is No Fun: Lessons learned writing Kafka ETL jobs
Featuring Dean Morin
Dean Morin has been writing ETL jobs using Kafka for a couple of years. In this talk, you will explore Dean's experiences doing just about everything wrong, before figuring out what does work. You will also discover:
kafka one-by-one-processing data-science-fest -
Behind the scenes of training, managing and deploying machine learning models
Featuring Pawel Subko
“The model was working just fine two weeks ago, but now I can’t reproduce it!”
commercial-machine deep-learning data-science git docker neptune data-science-fest -
Word Embeddings for Natural Language Processing in Python
Featuring Marco Bonzanini
Word embeddings are a family of Natural Language Processing (NLP) algorithms where words are mapped to vectors in low-dimensional space. The interest around word embeddings has been on the rise in the past few years, because these techniques have been driving important improvements in many NLP...
natural-language-processing python data-science-fest -
What “50 Years of Data Science” leaves out
Featuring Sean Owen
We’re told data science is the key to unlocking the value in big data, but nobody seems to agree just what it is. Is it engineering, statistics. . .both? David Donoho’s “50 Years of Data Science”, which is itself a survey of Tukey’s “Future of Data Analysis”, will present you with one of the best...
big-data critique statistics computer-engineering apache-hadoop spark data-science-fest -
Anomaly Detection: A breakdown of Twitter’s Seasonal Hybrid ESD
Featuring Peter Tillotson
In a world of deep learning statistical techniques are out of fashion but can still be very effective tools. Twitter’s open source anomaly detection project uses a statistical technique call Seasonal Hybrid ESD.
data-learning seasonal-hybrid-esd twitter data-science-fest -
The Science of Visual Interactions
Featuring Miriam Redi
In this talk you will explore the invisible side of visual data, investigating how machine learning can detect subjective properties of images and videos, such as beauty, creativity, sentiment, style, and more curious characteristics.
machine-learning web-media-search advertising social-media data-science-fest -
2
Fashion Recommendations at ASOS: Challenges, Approaches and Learnings
Featuring Soraya Hausl and Roberto Pagliari
In this talk, you will leam Soraya and Roberto's journey in the design and development of recommender systems for their platform, beginning with a discussion of the technical and business challenges they faced when starting to build their recommendation engine. Next, you will explore the main...
recommender-systems fashion-domain visual-browsing data-science data-science-fest -
Handling 1st line technical support with a chatbot
Featuring Barbara Fusinska
1st line of the technical support is frequently providing answers to FAQ and pre-assembled conversations for operators to follow. With the enhancements in A.I. and Machine Learning, how much of this task could be supported with the aid of software? While this raises many questions and challenges,...
ai machine-learning chatbots natural-language-processing data-science-fest -
Interactively Analyse 100GB of Data using Spark, Amazon EMR and Zeppelin.
Featuring Raoul-Gabriel Urma
You may have been hearing a lot of buzz around Big Data, Apache Spark, Amazon Elastic Map Reduce (EMR) and Apache Zeppelin. What’s the fuss about, and how can you benefit from these state of the art technologies?
big data python data-processing data-science-fest -
Reinventing Shop Direct’s customer contact strategy
Featuring Simon Hill
Shop Direct has been on a journey moving away from 80 years of catalogue retail heritage to become a pure-play internet retailer. This transition has created a variety of challenges and opportunities around the way that Shop Direct communicates with its customers to avoid churn.
online-retailer communication-strategy data-learning data-science-fest -
How to Improve your Recommender System with Deep Learning: A Use Case
Featuring Alexandre Hubert
Deep learning is without a doubt among the hottest topics in data science today. Computers are now more powerful than ever, and as a result, deep learning has been applied successfully by academics during the past few years. However, it is still unclear how difficult it is for businesses to apply...
deep-learning recommender-systems dataiku data-science-fest -
I went to work as an SQL programmer, and left as a hostage.
Featuring Peter Moore
Peter Moore was an SQL Server developer of several years’ experience prior to accepting a three-month assignment working on a financial system for the Iraqi government. He returned home two and a half years later, having been held hostage by an Iraqi militia for 946 days.
case-study sql programming data-science-fest -
One of the most notable additions to the Microsoft BI stack is the addition of Microsoft R Server inside SQL Server 2016.
Featuring Oliver Frost
The in-built ScaleR packages deliver scalability and multi-threading capabilities that open-source R can’t easily provide, opening up a world of possibilities for data scientists interested in machine learning and predictive analytics. But what does this look like on a Microsoft platform?
microsoft-bi sql-server data-science machine-learning scaler data-science-fest -
From PhD to life: using science to get a job in data
Featuring Gianluca Campanella
The concepts and methods of mathematical programming underlie many machine learning algorithms, and yet remain relatively unknown outside the operational research community. In this talk you will explore the standard-form linear and quadratic programmes, after a brief overview of optimisation...
machine-learning quadratic-programmes linear-regression robust-regression python financial-planning data-science-fest