The introduction of smartphone assistants like Siri and Google Now has popularised speech recognition. A less well-known use of speech technology is indexing audio and video archives, such as lectures, meetings or TV broadcasts. Automatic transcription allows people to search these far more easily than relying on meta data alone.
However, before speech recognition can be applied to such long audio streams, pre-processing is needed to break up the audio into smaller segments and identify which speakers are present in the recording.
This talk will focus on algorithms for audio segmentation and clustering, which allow us to answer the question "Who spoke when?"
YOU MAY ALSO LIKE:
- Speech recognition (SkillsCast recorded in April 2014)
- Data-Driven Improvement of Software Quality with Markus Harrer (Online Course on 27th - 28th September 2021)
- Deep Learning Fundamentals with Leonardo De Marchi (Online Course on 11th - 14th October 2021)
- Building a Runtime Reflection System for Rust (SkillsCast recorded in May 2021)
- Hedge your Bets with Rust (SkillsCast recorded in May 2021)
Audio Segmentation and Clustering
Catherine is a research engineer at Amazon working on speech technology, dialogue, language and machine learning. She holds a PhD in Engineering from Cambridge University, and has since spent time working on speech technology in both industry and academia.