Join us at Infiniteconf in London on July 6th and 7th 2017!

Willing to learn how apps can more efficiently support decision making in a new era of smart cities design and planning cities? Eager to understand how to make Neo4j your core database, and how to make the most out of it? Want to hear what's new on Google's Dataflow, or curious about using Python in the context of word embeddings for Natural Language Processing?
Join us at Infiniteconf and learn how to use the amazing technologies, practical tools and methods available to data scientists and engineering teams in two days packed with talks and discussions.
Follow us at #infiniteconf to hear all the latest news.
Programme Announcements!
We're now ready to unveil the line-up of speakers and experts who will make InfiniteConf 2017 the go-to Big Data and Data Science conference! Find out more on the Programme Page. Please note: programme is subject to change.
#infiniteconf 2017 Highlights
Talks on DataFlow, TensorFlow, Gaussian Processes for Big Data problems, BigQuery and Cloud Machine Learning, R and small data, Word Embeddings for Natural Language Processing with Python, and more!
Keynotes from Dean Wampler (Lightbend) and Alison Lowndes (NVIDIA), and talks from Jim Webber (Neo4j), Robert Kubis (Google), Phil Wills and Lindsey Dew (The Guardian), Alex McLintock (Hadoop), Dr Larissa Romuald-Suzuki (Greater London Authority), Samantha Ahern (UCL), and more!
Infiniteconf Diversity Scholarship Plan

Skills Matter is proud and happy to share our Infiniteconf Diversity Scholarship Plan. This plan is based on our commitment to help develop the skills of women and their participation in our community. It is aimed at helping women who want to enter/ re-enter the tech industry.
Find more information here!
Join us for the Infiniteconf Bytes evening events at CodeNode

Want to stay in the loop with the latest developments within the data community?
Join us at the brand new Infiniteconf Bytes series we'll be hosting at CodeNode leading up to Infiniteconf 2017!
Find more information here!
Code of Conduct
Please find our Code of Conduct here.
Excited? Share it!
Day 1: Day 1 - Thursday 6th July
Join us for these awesome sessions!
Track | CTRL | ALT | TAB | SHIFT | ||||||||||||
08:30
Invalid Time
Invalid Time
|
Registration & Breakfast |
|||||||||||||||
09:15
Invalid Time
Invalid Time
|
Welcome to #infiniteconf 2017 |
|||||||||||||||
09:30
Invalid Time
Invalid Time
|
KEYNOTE
|
|||||||||||||||
10:30
Invalid Time
Invalid Time
|
Coffee Break |
|||||||||||||||
10:45
Invalid Time
Invalid Time
|
deeplearning
infiniteconf
bigdata
gpu
artificial-intelligence
neural-networks
About the speaker...Alison B. LowndesALISON B. LOWNDES is responsible for NVIDIA's Artificial Intelligence Developer Relations in the EMEA region. A mature graduate in Artificial Intelligence combining technical and theoretical computer science with a physics background. After researching image & feature recognition using GPUs & deep learning at the University of Leeds, Alison joined NVIDIA as a Deep Learning Solutions Architect. She consults on a wide range of AI applications, including planetary defence with NASA, and continues to manage the community of AI & Machine Learning researchers. As AI DevRel she stays knowledgeable in state of the art across all areas of research and advises, teaches and evangelizes NVIDIA’s platform, around the globe. Alison tweets at @AlisonBLowndes. |
1) Desire to reuse scale-out and on-demand-scaling technologies developed for big-data analysis 2) Very high data volume output from the simulations which is costly to reproduce, meaning that the outputs are saved and mined subsequently for multiple scenarios and analyses. 3) < A similar convergence pattern can be seen in a number of other fields in science and engineering, e.g., climate modelling simulation, which will make this talk of interest to a wide audience. |
In this talk, Sebastien assume you know about overfitting and regularization, and will dissect insidious ways to overfit, as well as the no free lunch theorem. He will explore points of contact between big data and machine learning from an engineering perspective. Finally, Sebastien will present more advanced ML topics which are worth knowing about for data scientists, Bayesian optimization and Auto-ML. |
fastdata
apache
flink
scala
infiniteconf
spark
actionable-insights
streaming-analytics
About the speaker... |
||||||||||||
11:30
Invalid Time
Invalid Time
|
Coffee Break |
|||||||||||||||
11:45
Invalid Time
Invalid Time
|
|
Recommender system technology is the core of Netflix and Amazon's business model and has lead to a tremendous increase in sales and customer satisfaction. Other retailers have seen sales increases of 5-15%, and now recommender systems are making their way to other industries to help customers find products faster, help salespeople find collateral and configure solutions, and help companies accelerate their product development by finding the right components to make products that meet market needs. Real-time recommender systems are one of the sweetspot use cases for native graph databases. Key goals for a good recommender system include relevance, novelty, serendipity and recommendation differentiation. In this talk, Pieter will demonstrate how you can have full and accurate control of the recommender system with Neo4j, interactive response at scale, and "on the fly" tuning for a fast time to market. |
Join Brian and discover a variety of open source tools for extracting the content, identifying elements and structure and analyzing the text can be used in distributed, microservice-friendly ways.
machinelearning
bigdata
infiniteconf
nlp
About the speaker...Brian SlettenBrian has recently been at Devoxx, ÜberConf and NFJS, speaking about an exciting new binary format for fast cross-platform implementation, called WebAssembly. He focuses on web architecture, resource-oriented computing, social networking, the Semantic Web, data science, 3D graphics, visualization, scalable systems, security consulting, and more, and has trained close to 3000 people on machine learning Worldwide. He is also a rabid reader, devoted foodie and has excellent taste in music. If pressed, he might tell you about his International Pop Recording Career. Brian is a liberal arts-educated software engineer with a focus on forward-leaning technologies. He is the President of Bosatsu Consulting, Inc., a professional services company focused on web architecture, resource-oriented computing, the Semantic Web, scalable systems, security consulting and other technologies of the late 20th and early 21st Centuries. Brian tweets regularly from @bsletten |
infiniteconf
datascience
white-box
black-box
neural-networks
About the speaker...Karl SurmaczKarl is a Principal Data Scientist at McLaren Applied Technologies. After completing a PhD in Theoretical Physics, he joined McLaren Racing in 2008 where he worked as a Race Strategist and a Control Systems Engineer. He now works at McLaren Applied Technologies, where he has grown, developed and now leads the data science capability, and is heavily involved in McLaren’s industry-facing activities in Health, Transport, Motorsport, Automotive and Strategic Partnerships. His technical interests lie in mathematical modelling and statistical machine learning. |
||||||||||||
12:30
Invalid Time
Invalid Time
|
Lunch |
|||||||||||||||
14:00
Invalid Time
Invalid Time
|
KEYNOTE
During this talk, you will discover the new requirements that streaming imposes on systems and how they are met by combinations of popular tools like Kafka, Spark, Flink, and Akka. Dean will argue that streaming and microservice architectures are actually converging.
bigdata
spark
infiniteconf
stream-processing
akka
About the speaker...Dean WamplerDean leads the engineering team for the Accelerated Discovery Platform in IBM Research. He is an expert in data systems, Scala, and software engineering practices. Previously he worked for Domino Data Lab, Anyscale, and Lightbend. Follow Dean on Twitter @deanwampler and LinkedIn at /deanwampler. |
|||||||||||||||
14:45
Invalid Time
Invalid Time
|
Coffee Break |
|||||||||||||||
15:00
Invalid Time
Invalid Time
|
The company uses reinforcement learning and bayesian optimisation to analyse customer feedback data, which is provided via a Facebook Messenger bot. This data is then used to tell IntelligentX’s master brewer what to brew next. This ensures each new version is more finely tuned to customers' tastes. Join IntelligentX co-founder Rob McInerney as he shares how they came up with the idea, how it works and what the future holds for ML-based consumer products. Find out more on IntelligentX here.
bigdata
reinforcement-learning
bayesian-optimization
artificial-intelligence
ai
infiniteconf
About the speaker...Rob McInerneyIn 2015, Rob founded Intelligent Layer to re-envisage the relationship between human beings and intelligent technology, underpinned by his belief that AI can help us navigate a world that is changing faster. Last year, he created IntelligentX Brewing Company - a self-evolving beer brand that uses AI to optimise beer recipes using customer feedback, which Popular Science voted “the third greatest software innovation of 2016”. Intelligent Layer recently graduated from Techstars, one of the world’s top accelerator programmes, in New York. Rob regularly contributes to topics around AI and machine learning and has been featured in major publications including Time, Wired, Forbes, The Guardian, and The Huffington Post. |
|
Neural Networks were first created in an attempt to mimic the biological neural networks in the human Brain [3]. Such types of machine learning models are now being extensively used in many prediction problems due to their multi-layer-structures and their ability to generalize based on a plethora of different parameters accounting for many of their initial weaknesses (and their tendency to overfit or underfit). The advances in computing power and specifically the usage of GPUs has allowed such machine learning models to be run at greater speeds [4] shaping the form of today’s deep learning. Join Marios and learn about the StackNet model – a computational, scalable and analytical framework implemented in Java that resembles a feedforward neural network and uses Wolpert’s stacked generalization in multiple levels to improve accuracy in classification problems. In contrast to feedforward neural networks, rather than being trained through back propagation, the network is built iteratively one layer at a time using Wolpert’s stacked generalization. StackNet’s ability to improve accuracy is demonstrated via creating different instances of StackNet models with multiple levels and architectures which are then used to rank best the likelihood of a certain song being created before or after 2002 using a set of 90 numerical attributes out of 515,345 songs that come from a subset of the Million Song Dataset[1]. [1] Bertin-Mahieux, T., Ellis, D. P., Whitman, B., & Lamere, P. (2011, October). The Million Song Dataset. In ISMIR (Vol. 2, No. 9, p. 10). [2] Koren, Y. (2009). The bellkor solution to the netflix grand prize. Netflix prize documentation, 81, 1-10. [3] Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386. [4] Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. [5] Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241-259.
bigdata
infiniteconf
stacked-generalisation
stacking
ensemble-modelling
About the speaker...Marios MichailidisMarios Michailidis is Research data scientist at H2O ai and part-time PhD in machine learning at University College London UCL with a focus on ensemble modelling. He has worked in both marketing and credit sectors in the UK Market and has led many analytics projects with various themes including: Acquisition, Retention, Uplift, fraud detection, portfolio optimization and more. In his spare time he has created KazAnova, a GUI for credit scoring 100% made in Java as well as is the creator of StackNet Meta-Modelling Framework. In his spare time he loves competing on data science challenges and was ranked 1st out of 500,000 members in the popular Kaggle.com data competition platform. StackNet's official Repo is here Find here a blog of StackNet methodology winning a popular data science (kaggle) competition. Here you can find a blog about Marios being ranked top in kaggle out of 470,000 data scientist sharing knowledge tricks and ideas. Finally, Marios' website can be found here, with more info on other related (free) software he has developed in the past for predictive analytics. |
artificial-intelligence
artificial neural networks
scala
akka
real-time data transformations
About the speaker...Maciej GorywodaI’m working in the Scala plugin team at JetBrains, in Berlin. My hobby projects involve making Android apps and educational YouTube videos. Here to convince you that you don’t need Category Theory. |
||||||||||||
15:45
Invalid Time
Invalid Time
|
Coffee Break |
|||||||||||||||
16:00
Invalid Time
Invalid Time
|
He will also briefly cover Social Networking applications, recommendation engines, graph visualisations and Java Unmanaged Extensions in neo4j. He will point out the potential pitfalls that lie in wait and how to avoid them. What’s a good architecture and modelling pattern to work towards? Matt and his team have been using neo4j full time for 18 months and will share what has worked for them as well as sharing some of their glorious and spectacular failures. Warning: Contains hairballs, Java, Python, React.js and Kiwis.
neo4j
infiniteconf
fullstack
bigdata
visualisation
java
python
react.js
About the speaker...Matt WrightMatt tweets at @mrmattwright. |
Join Daniele and Eleanor in this talk and learn how to build an AI for Pac-Man leveraging Monte Carlo Tree Search. With the help of D3.js you can verify the correctness of the algorithm and enter in the mind of the machine.
bigdata
infiniteconf
visualisation
artificial-intelligence
ai
d3.js
javascript
games
About the speakers...Daniele PolencicDaniele is a technical consultant at learnk8s.io and a certified Kubernetes administrator and authorised trainer partner for Kubernetes and the Linux Foundation. He’s passionate about solving problems and programming, particularly in JavaScript. In the last decade, Daniele has trained developers for companies in e-commerce, finance and the public sector. When he isn’t writing code, he advises startups in the London tech scene. Daniele tweets at @danielepolencic. |
redicting congestion is an important part of any traffic management system. With accurate forecasting traffic can be effectively regulated, ensuring safe and fast journeys on the roads. Join Oliver as he shares a deep learning model which accurately forecasts congestion based on road sensor data from Transport for London (TfL). The IoT (internet of things) nature and scale of the raw sensor data (5 TB, 120 billion rows) demands extensive preprocessing as a first step towards a predictive traffic model. Oliver and his team used Apache Beam for this task as it let them create efficient data pipelines which can be executed in distributed frameworks such as Apache Spark or Apache Flink. Beam natively handles streaming workloads which makes it an ideal candidate for large scale preprocessing of real-time data such as streams from road sensors. The preprocessed data was then used to train a neural network to predict the congestion ahead of time. A recurrent neural network (RNN) was chosen to model the traffic time-series for each of the sensors. This deep learning architecture was implemented in Tensorflow and it allowed us to accurately model the time-series including the correlations between the sensors on the road network. With this end-to-end example Oliver will demonstrate how Beam and Tensorflow can be used to build predictive models for time series data. More information on the project (including some results and images) can be found here and here.
beam
tensorflow
infiniteconf
bigdata
big-data
deep-learning
About the speaker...Oliver GindeleHe studied Materials Science at ETH Zurich and moved to London to obtain his PhD in computational physics from UCL. He is passionate about using computers models to solve real-world problems he joined Datatonic to work on bespoke machine learning solutions. He workes with clients in retail, finance and telecommunications Oliver applies deep learning to tackle some of the most challenging use cases in these industries. Oliver tweets at @tinyoli, and the website of DataTonic can be found here. |
|
||||||||||||
16:45
Invalid Time
Invalid Time
|
Coffee Break |
|||||||||||||||
17:00
Invalid Time
Invalid Time
|
datascience
infiniteconf
bigdata
About the speaker...Samantha AhernSamantha tweets at @2standandstare. |
Seen the flexibility and the advantages of GP models, it is not surprising that recent research has been focused on extending GP models to BigData problems. Join Roberta as she shares an overview of GPs in a machine learning context and how they can be used for BigData problems. Explore the basic definition of a Gaussian distribution and GPs, with particular attention to covariance functions. Roberta will then explain how GPs are used for inference problems and present relevant application areas, their advantages and their limitations.
gaussian-process
big-data
bigdata
infiniteconf
About the speaker...Roberta CretellaRoberta studied Bayesian inference and mathematical biology at the University of Glasgow but then decided to leave the academia and become a data scientist. She built her analytical and modelling skills at Ocado Technology for three years and then moved to ICLP Loyalty in September 2016. In 2017 she joined Camelot UK and embrace the challenge of building from the ground up the Big Data capabilities and the Data Science team. She specialise in mathematical modelling, machine learning and Bayesian Statistics. |
Join Mark for a practical session with examples of R code. So if the veracity, variety and value of data are more important aspects than volume and velocity then this talk is for you.
r
small-data
datascience
infiniteconf
About the speaker...Mark WilcockMark tweets at @LonBIzAnalytics. |
|||||||||||||
17:15
Invalid Time
Invalid Time
|
This software application is composed by state-of-the art cross-domain data collection, integration and visualisation capabilities, advanced functionalities such as notification about changes in the projects a user has subscribed to, dashboards, as well as filtering mechanisms by geographical opportunity areas, funding, project costs, among others. Over 12,000 pieces of city data (open, private and commercial) relevant to infrastructure planning and delivery are currently included in the tool, ranging from years 2015 to 2050. Data sets include projected growth (population and jobs), planning frameworks, development proposals, infrastructure projects, infrastructure capacity, environmental threats (such as flooding information), and skills requirements. Ultimately, this solution allows infrastructure providers to identify opportunities to deliver infrastructure jointly, reducing the potential for disruption and driving down the cost of construction.
bigdata
infiniteconf
infrastructure
About the speaker...Dr Larissa Romualdo-SuzukiDr Larissa tweets at @LariRomualdo. |
At Onfido, they apply computer vision and machine learning techniques to the detection of physical and digital fraud in identity documents. When it comes to fraud detection, they’re often dealing with a problem for which they have hugely unbalanced datasets. Moreover, they only understand what they've seen before; if they notice something different, how do they know if it’s really fraudulent, and how confident can they be with that decision? Join Jacques for this lightning talk as he explores the problem of fraud detection, highlight some of the biggest challenges that they face in this area and cover some of Onfido's efforts to circumvent these challenges, touching on topics of anomaly detection and data generation. |
|
|||||||||||||
17:30
Invalid Time
Invalid Time
|
#infiniteconf Party! |
|||||||||||||||
21:00
Invalid Time
Invalid Time
|
End of #infiniteconf 2017 Day 1! |
Day 2: Day 2 - Friday 7th July
Join us for these awesome sessions!
Track | CTRL | ALT | TAB | SHIFT | ||||||||||||
08:30
Invalid Time
Invalid Time
|
Registration & Breakfast |
|||||||||||||||
09:15
Invalid Time
Invalid Time
|
Welcome to #infiniteconf 2017 |
|||||||||||||||
09:30
Invalid Time
Invalid Time
|
KEYNOTE
infiniteconf
datascience
data
data-engineering
fast-data
About the speaker...Dave ThomasDave Thomas has a wide spectrum of experience in the software industry as an executive, investor, board member, consultant, architect, and engineer. He is Chairman of Bedarra Corp, which provides consulting on technology and business strategy for emerging technology, products and services. Bedarra provides virtual CTOs, as well as directors, advisors and mentors to support new initiatives. Dave served as Chief Scientist/CSO of Kx Systems, First Derivatives Plc. He co-founded Bedarra Research Labs, creators of the Ivy visual analytics workbench. Dave is the founder of YOW! Conferences and Workshops. He was Managing Director of Object Mentor - a company specializing in the training and deployment of Agile and Object-Oriented Software Development methodologies. Dave has repeatedly demonstrated how to deliver quality software on time and on budget. He is best known as the founder and past CEO of Object Technology International Inc. (formerly OTI, now IBM OTI Labs) where he led the commercial introduction of object and component technology. The company is often cited as the ideal model of a software technology company and was a pioneer in Agile Product Development with a process called Just-In-Time Software. Throughout his career, he has worked with major global corporations as well as startups on business and technical strategy, development organization and process, as well as competitive analysis and strategy. Dave has a unique ability to translate research breakthroughs into competitive products. He has been a pioneer in the development of embedded systems, object-oriented technology and functional programming. He was the principal visionary and architect for IBM VisualAge Smalltalk and Java tools and virtual machines including the popular open-source, multi-language Eclipse.org IDE. OTI pioneered the use of virtual machines in embedded systems with Tektronix shipping the first commercial products in 1988. He was instrumental in establishing IBM’s Pervasive Computing efforts. He is a popular, humorous, albeit opinionated keynote speaker with a unique breadth of business experience and technical depth. He is widely published in software engineering literature. Dave remains active in various roles within the technical community including YOW, ECOOP, AOSD, JAOO, Agile Development Conference, OOPSLA Onward, ENASE and Dynamic Language Symposium. He is an adjunct research professor at Carleton University and the Queensland University of Technology as well as a founding director of the Agile Alliance, an ACM Distinguished Engineer, past President of AITO and IEEE Software Advisory board. |
|||||||||||||||
10:30
Invalid Time
Invalid Time
|
Coffee Break |
|||||||||||||||
10:45
Invalid Time
Invalid Time
|
The three main lessons you will learn are: 1) the goto data layer infrastructure build around Kafka, Onyx and aggressive usage of materialized views, with emphasis on how to build a system that requires relatively little effort upfront but can grow with one's needs. 2) The problem of managing data and a case for declarative and queryable data descriptions. How these can be used as the basis for automatic materialized view inference where specialized views and data crossings are inferred from raw incoming data or other views based on a combination of heuristics, statistical analysis (seasonality, outlier removal, ...) and predefined ontologies. Doing so is a practical way to maintain a large number of views, increasing their availability and abstracting the complexity into declarative rules, rather than having an ETL pipeline with dozens or even hundreds of hand crafted tasks. 3) how and why Clojure is a natural choice for tasks that involve a lot of data manipulation, touching both on functional programming and lisp-specifics such as code-is-data.
infiniteconf
stream-processing
data-engineering
introspection
bigdata
About the speaker...Simon BelakCurrently, Simon is working hard to become obsolete at Metabase where he is trying to build an artificial data scientist and imbue visualisations with understanding and context. Simon tweets at @sbelak. |
Join Robert as he demonstrates Dataflow’s capabilities through a real-time demo with practical insights on how to manage and visualize streams of data.
bigdata
datascience
infiniteconf
apache-beam
dataflow
data
bigdata-streaming
About the speaker...Robert KubisBefore joining Google, Robert collected over ten years of experience in Software Development and Architecture. He has driven multiple full-stack application developments at SAP with a passion for distributed systems, containers and databases. In his spare time he enjoys following tech trends & good restaurants, traveling and improving his photographing skills. Follow Robert at @hostirosti. |
TensorFlow is an increasingly popular open source Machine Intelligence library that is especially well-suited for deep learning. The Google Cloud Machine Learning Engine (Cloud ML Engine) lets you do distributed training and serving of your TensorFlow models at scale. Join Yufeng as he kicks off with an introduction to TensorFlow concepts and then walks through how to use Cloud ML Engine to do distributed training and scalable serving of your trained models. You will explore the design decisions available to you in scaling up your machine learning, and talk about their tradeoffs.
infiniteconf
bigdata
data
machinelearning
cloud
googlecloud
deeplearning
About the speaker...Yufeng GuoHe enjoys hearing about new and interesting applications of machine learning, share your use case with him on Twitter @YufengG |
architecture
bigdata
datascience
hadoop
infiniteconf
About the speaker...Alex McLintockAlex tweets at @@alexmc6. |
||||||||||||
11:30
Invalid Time
Invalid Time
|
Coffee Break |
|||||||||||||||
11:45
Invalid Time
Invalid Time
|
infiniteconf
bigdata
cuda
opencl
gpgpu
About the speaker...Brian SlettenBrian has recently been at Devoxx, ÜberConf and NFJS, speaking about an exciting new binary format for fast cross-platform implementation, called WebAssembly. He focuses on web architecture, resource-oriented computing, social networking, the Semantic Web, data science, 3D graphics, visualization, scalable systems, security consulting, and more, and has trained close to 3000 people on machine learning Worldwide. He is also a rabid reader, devoted foodie and has excellent taste in music. If pressed, he might tell you about his International Pop Recording Career. Brian is a liberal arts-educated software engineer with a focus on forward-leaning technologies. He is the President of Bosatsu Consulting, Inc., a professional services company focused on web architecture, resource-oriented computing, the Semantic Web, scalable systems, security consulting and other technologies of the late 20th and early 21st Centuries. Brian tweets regularly from @bsletten |
Streaming real time datasets from sensors in connected cars, wearables & IoT devices into Power BI dashboards utilising Cortana Intelligence Suite features such as IoT Hubs, Events Hub, HD Insights, Streaming Analytics and Machine Learning to ingest, clean, store & analyse and enhance your datasets for visualisation in Power BI is an affordable Cloud solution for end users to spin up consume and then pause as and when required without large IT infrastructure overheads costs. David Moss, Microsoft MVP Data Platform & an Azure Cortana Intelligence & Power BI end to end Solutions Architect, will teach you how to fire up this stack in 45 minutes. This is an essential skill that all IoT, Data Scientists and AI learners require within their toolset.
iot
azure
cortana
cloud
powerbi
About the speaker...David MossDavid is a Microsoft MVP (that stands for Most Valued Professional, its like an Oscar for MS BI). He deliver's Power BI end to end Solutions on MS BI Azure & on premises stack across e-commerce, manufacturing, IoT, pharma & gambling verticals. David is also a Power BI trainer. He helps to run the London Power BI User Group and also the Manchester Power BI Meetup and also sits on the Board of Advisors of the official global pbiusergroup.com David is also available for public presentations about Power BI across many verticals so do connect here and via LinkedIn. David tweets at @wottabyte and check out his Website. |
Lindsey and Phil will share with you how the Guardian has used a range of technologies including Apache Spark and PrestoDB on AWS to support simple ingestion and fast querying of a wide range of datasets. Learn why it’s important to decouple storage from compute and raw data sources from optimised query formats and why there’s still no single perfect solution.
agile
bigdata
datascience
infiniteconf
About the speakers...Philip WillsHe has helped build and scale theguardian.com, the tools used to produce it and Ophan, the analytics tool used to ensure our journalism reaches the widest possible audience. Within the team he’s driven the adoption of Scala and Continuous Delivery, which he’s written about as part of Build Quality In. Phil tweets at @philwills. |
Join Alan as he illustrates a new type of ensemble method which makes use of information about how the deep learning algorithm is structured, in order to improve training times, diversity generation, and ultimately accuracy. Alan will also give examples of existing methods from his research at Birkbeck, and provide experimental results. |
||||||||||||
12:30
Invalid Time
Invalid Time
|
Lunch |
|||||||||||||||
14:00
Invalid Time
Invalid Time
|
KEYNOTE
IOT and Event Based systems can process huge volumes of data. Which typically needs to be stored and read in near real time for event processing, in addition to being read in bulk to feed data hungry learning systems. Apache Cassandra provides a high performance, scalable, and fault tolerant database platform with excellent support for time series data models typically seen in IOT systems. It's millisecond (or better) latency can support systems that react to events in real time, while scalable bulk reads via batch processing systems such as Apache Hadoop and Apache Spark can support learning applications. These features, and more, make Cassandra an ideal persistence platform for modern data intensive, event driven, systems. In this talk Aaron Morton, CEO at The Last Pickle, and Jon Haddad, Principal Consultant at The Last Pickle, will discuss lessons learned using Cassandra for IOT systems. They will explain how Cassandra fits into the modern technology landscape and dive into data modelling for common IOT use cases, capacity planning for huge data loads, tuning for high performance, and integration with other data driven systems. Whether starting a new project, or deep into the weeds on an existing system, attendees will leave will leave with an understanding of how Apache Cassandra can help build robust infrastructure for IOT systems.
infiniteconf
iot
data
datascience
apache
apache-spark
cassandra
About the speakers...Jon HaddadCheck out Jon's Github, follow him on twitter @rustyrazorblade and see what he is up to on his blog. Aaron MortonFollow Aaron on Twitter and find out more about his company The Last Pickle. |
|||||||||||||||
14:45
Invalid Time
Invalid Time
|
Coffee Break |
|||||||||||||||
15:00
Invalid Time
Invalid Time
|
bigdata
infiniteconf
serverless
functional-programming
About the speaker...David PollakDavid has written a popular framework in Scala (Lift) and a book about Scala (Beginning Scala). He's been doing Scala since 2006, and more recently has picked up Clojure. David has also started work on his Plugh framework and founded Visi.Pro, Cloud Computing for the Rest of Us along with the Visi Language open source project. |
Join Kaz and discover how you can combine Cloud Machine Learning and BigQuery to realize this vision. By sharing a demo, you'll see how BigQuery's power of "democratizing enterprise data" can be enhanced with a deep neural network model trained with Cloud Machine Learning.
infiniteconf
data
deeplearning
machinelearning
dataanalytics
datawarehouse
About the speaker...Kaz SatoKaz tweets at @kazunori_279. |
This presentation aims to be a return of experience implementing these pipelines on a real-world big data project. At Millesime.ai, they're using Spark to predict fine wine market price and trading strategy for the next 6 months and more. Through this use case, you'll explore the core concepts and principles behind machine learning pipelines, what algorithms Spark MLlib covers and how to implement them using Transformers, Estimators and Model Selection through Hyperparameter Tuning. In the end we'll try to have an objective summary of the lessons learnt developing this project: the good, the bad and the ugly, with some common pitfalls we've run into and how to avoid them. |
Duncan will start by defining linked data, and the value it holds for a data scientist. You'll explore use cases that show how analysing connections using specific data mining and visualisation techniques can help you gain deeper understanding of a topic. Most of this talk will describe Duncan's own experience building an interactive tool for visualising the connections between different music genres. Join Duncan and see how he used SPARQL to mine and query data from DBPedia, and then JavaScript code libraries, including d3, to visualise the connections. He’ll include code snippets, and publish some source code on GitHub for you to try. He’ll share what he learned together with best practices you should follow when designing your own visualisation.
infiniteconf
data-visualisation
javascript
d3
sparql
linked-data
dbpedia
About the speaker...Duncan GrantI work at Cambridge Intelligence where I help companies build attractive and insightful visualisations of their data using the KeyLines JavaScript library. Find Duncan on Twitter at @TheWebNomad. |
||||||||||||
15:45
Invalid Time
Invalid Time
|
Coffee Break |
|||||||||||||||
16:00
Invalid Time
Invalid Time
|
infiniteconf
machine-learning
deep-learning
neural-networks
javascript
About the speaker...Tom MartinTom blogs at tpgmartin.com. |
Asking Data Scientists to be responsible for both Analytics and Engineering is to combine roles as diverse as those of a benign hacker and a DBA. At best, one person will be an amateur in one of those fields but to charge one person with the conflicting responsibilities of discovery and operations is to underrate one of them to the disadvantage of the organisation. Join Kenneth and get an overview of Big Data Issues & solutions in respect of
o Sharing research on the separation of responsibilities for Analytics and Engineering o Data Acquisition and Managing the “Data Lake”
o Operational Algorithms become MISSION CRITICAL o Ontology o Translation of Data Models from Conceptual to schemas for Object Relational, Document, RDF, Graph, etc. o Python v Scala v R, etc. o Movement of data o Architecture for timely, robust and sustainable operations
bigdata
infiniteconf
datascientist
About the speaker...Kenneth HansenKen’s most recent public training was on “Data Modelling for SQL & NoSQL” at the IRM ED&BI Conference Europe, November 2016. Ken tweets at @Ken_Hansen. |
In this talk Marco will describe the intuitions behind this family of algorithms, you'll explore some of the Python tools that allow us to implement modern NLP applications and we'll conclude with some practical considerations.
python
bigdata
infiniteconf
machine-translation
nlp
natural-language-processing
About the speaker...Marco BonzaniniHe is the author of Mastering Social Media Mining with Python (PacktPub, July 2016). |
infiniteconf
bigdata
graphics
machine-learning
deep-learning
ai
About the speaker...Jim WebberJim has written two books on integration and distributed systems: “Developing Enterprise Web Services” on XML Web Services and “REST in Practice” on using the Web for building large-scale systems. His latest book is “Graph Databases” which focuses on the Neo4j database. His blog is located at http://jimwebber.org and he tweets often @jimwebber. Find out more on Neo Technology and Neo4j here. |
||||||||||||
16:45
Invalid Time
Invalid Time
|
Coffee Break |
|||||||||||||||
17:00
Invalid Time
Invalid Time
|
KEYNOTE
|
|||||||||||||||
17:45
Invalid Time
Invalid Time
|
End of #infiniteconf 2017! |
-
2
The Agile Data Warehouse - Beginners
Featuring Philip Wills and Lindsey Dew
How can a small team with a limited budget enable the analysis of large volumes of data in a world of constantly changing requirements?
agile bigdata datascience infiniteconf -
AI for problem-solving - Beginners/Intermediate
Featuring Alison B. Lowndes
Building upon the foundational understanding of deep learning, Alison's talk will cover a wide variety of applications of artificial intelligence for problem-solving and how you can both get started and become proficient with NVIDIA’s hardware, open-source software & classes.
deeplearning infiniteconf bigdata gpu artificial-intelligence neural-networks -
Ethical conumdrums of an educational data scientist - Beginners
Featuring Samantha Ahern
Join Samantha for a lightning talk exploring the joy and pain of working on an exploratory learning anaytics project in HE. Joy of having lots of data to play with and access to some cool toys e.g. HPC, and the pain of data cleansing, ethics and IG compliance.
datascience infiniteconf bigdata -
Is Julia the Future for Big Data Analytics?
Featuring Malcolm Sherrington
Julia was originally seen as a replacement for programming languages such as Matlab when tackling problems in the scientific domain. In the world of Big Data, Python and R are currently seen to be stand supreme. However the growth of data over the next five years, both structured and...
julia bigdata datascience infiniteconf data -
Google Dataflow: The new open model for batch and stream processing - Beginners
Featuring Robert Kubis
In 2004 MapReduce was introduced, a model that kick-started big data. 10 years later, Google published Dataflow - a new paradigm, integrating batch and stream processing in one common abstraction. This time it was more than a paper, but also an open source Java SDK and a cloud managed service to...
bigdata datascience infiniteconf apache-beam dataflow data bigdata-streaming -
Word Embeddings for Natural Language Processing with Python - Beginners
Featuring Marco Bonzanini
Word embeddings are a family of Natural Language Processing (NLP) algorithms where words are mapped to vectors in low-dimensional space. The interest around word embeddings has been on the rise in the past few years, because these techniques have been driving important improvements in many NLP...
python bigdata infiniteconf machine-translation nlp natural-language-processing -
Machine Learning at scale: TensorFlow in the Cloud - Intermediate
Featuring Yufeng Guo
Moving the heavy lifting of machine learning to the cloud is a great way to scale both training and prediction. This session will step through this process in detail, so that you'll be ready to scale your machine learning tasks.
infiniteconf bigdata data machinelearning cloud googlecloud deeplearning -
Solving speech recognition - Intermediate
Featuring Sébastien Bratières
“Solving speech recognition” is one of the achievements which brought deep learning to front stage in the last few years. However the mechanics of how deep learning is applied in a speech engine are described less often in divulgative articles than, say, those of CNN image classification. In this...
datascience infiniteconf -
Neural Networks from Scratch - Beginners
Featuring Tom Martin
Neural networks are an extremely powerful and generalisable tool for analysing data, being a fundamental component of the exciting field of deep learning. But how do they work, and how can we implement them? Join Tom as he covers a neural network implementation in JavaScript from first...
infiniteconf machine-learning deep-learning neural-networks javascript -
TDD-ing a Bayesian classifier - Beginners
Featuring Robert Hardy
In this live coding session Robert Hardy will share how to code up a simple Bayesian classifier, using Python with the Pandas package. That in itself will be interesting to people unfamiliar with the principles behind a Bayesian classifier. However, Robert will also be showing how a TDD approach...
infiniteconf python pandas bayesian-classifier tdd good-code -
Keynote: Intelligent Empowerment: Transforming Businesses with Machine Intelligence
Featuring Danilo Sato
Technology advancements have always improved your productivity. From business process automation to the internet, mobile, and the digital revolution, companies had to adapt and reassess the role technology plays in their business. During this keynote, you will learn what ThoughtWorks believes to...
keynote infiniteconf machine-intelligence machine-learning ai aritificial-intelligence intelligent-empowerment technology-strategy -
Simulation in a Big Data World: Convergence of financial modelling in QuantLib and big-data technologies - Intermediate
Featuring Bojan Nikolic
Quantlib is a long-lived open-source library for pricing for financial derivative contracts. It is in production use in a number commercial organisations for both pricing and risk analysis and management purposes. This type of computing is best classified as <
finance data highly-scalable-computing computing infiniteconf bigdata> but as I will show in this... -
Keynote: Fast Big Data - Enabling Financial Oversight
Featuring Dave Thomas
For the last decade, there has been increased concern about the integrity of capital markets. The crash of 2009, the coverage of flash trading, and weekly announcements of cyber attacks have justifiably shaken the public's confidence. However, fast big data also enables the good guys! In this...
infiniteconf datascience data data-engineering fast-data -
Keynote: Stream All the Things!
Featuring Dean Wampler
Stream processing is now driving the design of big data systems. Streaming architectures must be highly reliable and scalable as never before, more like microservice architectures.
bigdata spark infiniteconf stream-processing akka -
2
Playing Pacman with Monte Carlo Tree Search - Beginners
Featuring Eleanor Keane and Daniele Polencic
Writing algorithms for Artificial Intelligence is a lot of fun. You end up teaching your software how to move and learn in the environment and how to take actions to survive to obstacles and threats. But this is still software. How do you know your AI is not dumb as doorknobs? How do you test...
bigdata infiniteconf visualisation artificial-intelligence ai d3.js javascript games -
Architectures for Data Systems and Hadoop - Beginners
Featuring Alex McLintock
An Apache Hadoop cluster typically works with a wider software ecosystem to create a "data lake": a store of data with raw data coming in, and useful information being extracted. This talk describes just some of the many ways that Hadoop works with that ecosystem including Spark for...
architecture bigdata datascience hadoop infiniteconf -
Neo4j and Machine Learning - Full Stack Applications - Intermediate
Featuring Matt Wright
If you are considering using neo4j as your core database, this is the talk for you. Matt will share his team's failures and successes in building Search algorithms, text analytics, classification and machine learning to improve the ability to search documents using neo4j.
neo4j infiniteconf fullstack bigdata visualisation java python react.js -
White-box Deep Learning Ensembles - Advanced
Featuring Alan Mosca
Ensemble methods are a great way to improve the accuracy of deep learning models. However, current ensemble methods all treat the underlying base learner as a "black box".
infiniteconf data datascience engineering -
Introducing StackNet Meta-Modelling Framework - Intermediate
Featuring Marios Michailidis
In 1992 Wolpert et. al. introduced the concept of a meta model being trained on the outputs of various generalisers with the scope of minimizing the generalization error of a target variable. This methodology - that was named stacked generalization- was used successfully to improve performances...
bigdata infiniteconf stacked-generalisation stacking ensemble-modelling -
Predicting congestion on London’s roads with Beam and Tensorflow - Intermediate
Featuring Oliver Gindele
The talk will be about a project Oliver and the Datatonic team did with TfL, where they used Apache Beam and Tensorflow to predict congestion. The talk will focus in detail on how and why these technologies were used in the use case at hand.
n-episode-ii-predicting.html
beam tensorflow infiniteconf bigdata big-data deep-learning -
CUDA, OpenCL and the GPGPU Revolution
Featuring Brian Sletten
The most basic introduction to computers spells it out: software runs on the CPU. At some point, this became not entirely true. Graphics cards now support general purpose computing via apis such as CUDA and OpenCL. This talk will introduce you to how and why you can take advantage of these...
infiniteconf bigdata cuda opencl gpgpu -
Using linked data to visualise the evolution of music - Intermediate
Featuring Duncan Grant
This is a talk about using freely available linked data: how to collect it, manage it, and visualise it.
infiniteconf data-visualisation javascript d3 sparql linked-data dbpedia -
Machine Learning and NLP
Featuring Brian Sletten
Machine Learning techniques are useful for analyzing numeric data, but they can also be useful for classifying text, extracting content and more.
machinelearning bigdata infiniteconf nlp -
Points of contact between big data engineering and machine learning - Intermediate
Featuring Sébastien Bratières
Machine learning is central to data science, and there deserves to be a dialogue between the corresponding communities.
bigdata dataengineering machinelearning -
Streaming Real Time IoT Power Bi Dashboards - Beginners
Featuring David Moss
The Internet of Things (IoT) has already connected billions of devices and machines and is here to stay. Learn how to ingest IoT streaming Big Data and integrate device to cloud solutions with the Azure IoT Hub with Cortana Intelligence Suite tools with cold Big Data storage and hot downstream...
iot azure cortana cloud powerbi -
Building open visualisations of UK Government Expenditure data - Beginners
Featuring May Yong
What if you could look at the source code behind any online visualization, understand how it works, run it to check the results using the latest data and modify the parameters to explore different aspects that you are interested in?
bigdata open-data visualisation programming tools infiniteconf -
A Little Graph Theory for the Busy Data Scientist
Featuring Jim Webber
In this talk you’ll explore powerful analytic techniques for graph data. Firstly you’ll discover some of the innate properties of (social) graphs from fields like anthropology and sociology. By understanding the forces and tensions within the graph structure and applying some graph theory, you’ll...
infiniteconf bigdata graphics machine-learning deep-learning ai -
Spark Streaming Machine Learning Pipelines - The good, the bad and the ugly - Intermediate
Featuring Vincent Van Steenbergen
Apache Spark is now the de-facto framework for building end-to-end Machine Learning pipelines. It allows building pipelines over real-time streaming data and integrates all the data processing steps from ingestion, cleaning, training, testing, tuning and deploying your models in Scala, Python or...
infiniteconf bigdata machinelearning spark streaming -
Using R with small data - Beginners
Featuring Mark Wilcock
Big data yes, dark data perhaps but small data – really? Small data is high value data which is often the key numbers or performance indicators that are used by senior management to guide the business. It has been collected at great effort and expense but usually supplied in format that is not...
r small-data datascience infiniteconf -
Functions 101 - Beginner
Featuring David Pollak
We hear the word "function" used in a lot of contexts. Serverless uses functions. Functional programming. Writing functions in a big data pipeline. Functional composition. So... what are these function things? Are they the same? Are they different? David Pollak, a geek who's been...
bigdata infiniteconf serverless functional-programming -
IntelligentX beer
Featuring Rob McInerney
IntelligentX creates the world’s first beer that’s been brewed by artificial intelligence.
bigdata reinforcement-learning bayesian-optimization artificial-intelligence ai infiniteconf -
Artificial Neural Networks in Akka - Beginner
Featuring Maciej Gorywoda
Artificial neural networks are not made only to play Go or distinguish tanks from cats on photos. Nowadays, in the world of BigData and TensorFlow, it's easy to forget what inspired them: neurons. In this talk you will take a closer look at them, both those made of jelly and their digital...
artificial-intelligence artificial neural networks scala akka real-time data transformations -
Data Science in Engineering Applications - Intermediate
Featuring Karl Surmacz
Machine learning algorithms are a powerful tool for exploiting large data sets in order to model and predict complex system and human behaviour. In this talk Karl will share practical examples where data science techniques can be used to enhance engineering design and control engineering process....
infiniteconf datascience white-box black-box neural-networks -
Quantifying the Influence of Beautiful Environments on Human Wellbeing
Featuring Chanuki Illushka Seresinhe
Does spending time in beautiful settings boost people’s happiness? The answer to this question has long remained elusive due to a paucity of large-scale data on environmental aesthetics and individual happiness. Chanuki will explore two novel datasets: first, individual happiness data from the...
datascience machine-learning infiniteconf scenic-places deep-learning wellbeing urban-analytics -
Gaussian Processes for Big Data problems - Intermediate
Featuring Roberta Cretella
Gaussian Processes (GPs) underpin a range of algorithms for regression, classification and unsupervised learning. GPs are mathematically equivalent to many well known models, including Bayesian linear models, spline models, large neural networks (under suitable conditions), and are closely...
gaussian-process big-data bigdata infiniteconf -
A data layer in Clojure - Intermediate
Featuring Simon Belak
Clojure has always been good at manipulating data. With the release of spec and Onyx (“a masterless, cloud scale, fault tolerant, high performance distributed computation system”) good became best. In this talk Simon will walk you through a streaming data layer architecture build around Kafka and...
infiniteconf stream-processing data-engineering introspection bigdata -
Streaming Analytics with Apache Flink: Making data analytics more actionable - Beginners
Featuring Giuseppe d'Alessio
ING is using Apache Flink for creating streaming analytics ('fast data') solutions. A cross boarder team created a platform with Flink, Kafka and Cassandra that offers high-throughput and low-latency, ideally suited for complex and high demanding use cases in the international bank, such...
fastdata apache flink scala infiniteconf spark actionable-insights streaming-analytics -
Mapping London's Infrastructure
Featuring Dr Larissa Romualdo-Suzuki
To enter into the new era of smart cities design and planning cities, the Greater London Authority's team has developed the Infrastructure 2050 Application Mapping application to support decision making and improved coordination of infrastructure investments between all the stakeholders...
bigdata infiniteconf infrastructure -
BigQuery and Cloud Machine Learning: advancing large-scale neural network predictions - Intermediate
Featuring Kaz Sato
The real value of BigQuery is not its speed. It's the power of "democratizing enterprise data." Because of BigQuery's scalability, you can isolate any workload on BigQuery from others. That means you can let non-engineers, such as sales, marketing, support and others, execute...
infiniteconf data deeplearning machinelearning dataanalytics datawarehouse -
Big Data needs more than just Data Scientists
Featuring Kenneth Hansen
Data scientists may be the cleverest and most expensive members of the analytics teams but the organisations that charge them with operational solutions overlook the distinct differences and tensions in the three fields of:
bigdata infiniteconf datascientist
-
Infiniteconf 2019 - A one-day community celebration of Big Data, Machine Learning and AI
One day in London
There is a symbiotic relationship between Big Data, Machine Learning and AI, and Infiniteconf is the go to event to explore this connection alongside the communities pushing these topics forward.
ai graphml software-roadmap deep-learning ml real-world-applications ethics architecture adtech edtech fast-data big-data neo4j python spark kafka iot machine-learning data -
Infiniteconf 2018 - The conference on Big Data and AI
Two days in London
Artificial Intelligence is having a dramatic impact on all industries and improving productivity at an exponential rate. Big data is transforming almost every aspect of science and the humanities, driven by the emergence of a data society. Together, Big Data and AI are the driving forces in...
ai fast-data big-data data-science neo4j d3js cassandra python flink spark kafka smack data-visualisation cluster-computing connected-data iot machine-learning data