I will show a simple Bayesian statistical model for which the posterior probability distribution is equivalent to the output of the Fourier transform.
I will show how this model can be implemented and run in probabilistic programming languages.
Although this probabilistic approach is much slower than the Fast Fourier Transform, I will discuss some use cases in which it may be advantageous.
Tom is a data science team leader building predictive analytics-based products, specialising in preference learning, visual analytics and marketing using Bayesian and deep learning methods, probabilistic and functional programming. He has a background as an experimental and computational neuroscientist - Tom obtained his PhD in Neuroscience from University College London by pouring slimy stuff on brain cells, which combined with a reaction-diffusion model allowed him to measure biophysical properties of synapses. As a Research Associate at Harvard Medical School, and then the Universities of Leicester and Nottingham, he built microscopes and Domain Specific Languages in Haskell to control them. Working at the intersection of experimental, theoretical and methodological neuroscience has given him a uniquely creative perspective on data science. As the Chief Data Science Officer at a creative social agency, he led a team team building a series of models for predicting and enhancing the impact an image will have in specific marketing contexts using Deep Learning model; attribution modelling from social media data, and a platform for delivering visual consumer advertising on social media. He is now working on an open source data science stack in Haskell.
The introduction of smartphone assistants like Siri and Google Now has popularised speech recognition. A less well-known use of speech technology is indexing audio and video archives, such as lectures, meetings or TV broadcasts. Automatic transcription allows people to search these far more easily than relying on meta data alone.
However, before speech recognition can be applied to such long audio streams, pre-processing is needed to break up the audio into smaller segments and identify which speakers are present in the recording.
This talk will focus on algorithms for audio segmentation and clustering, which allow us to answer the question "Who spoke when?"
Catherine is a research engineer at Amazon working on speech technology, dialogue, language and machine learning. She holds a PhD in Engineering from Cambridge University, and has since spent time working on speech technology in both industry and academia.