The introduction of smartphone assistants like Siri and Google Now has popularised speech recognition. A less well-known use of speech technology is indexing audio and video archives, such as lectures, meetings or TV broadcasts. Automatic transcription allows people to search these far more easily than relying on meta data alone.
However, before speech recognition can be applied to such long audio streams, pre-processing is needed to break up the audio into smaller segments and identify which speakers are present in the recording.
This talk will focus on algorithms for audio segmentation and clustering, which allow us to answer the question "Who spoke when?"
YOU MAY ALSO LIKE:
- Speech recognition (SkillsCast recorded in April 2014)
- Brian Sletten's Data Science with Python Workshop (in London on 18th - 20th November 2019)
- Fast Track to Machine Learning with Louis Dorard (in London on 2nd - 4th December 2019)
- Scala eXchange London 2019 (in London on 12th - 13th December 2019)
- Practical ML 2020 (in London on 2nd - 3rd July 2020)
- Solandra Hands-On Tutorial & Emergent Behaviour In Insects (in London on 28th October 2019)
- Conditional Random Fields: Probabilistic Models for Segmenting and Labelling Sequence Data. (in London on 28th October 2019)
- The Five Stages of Data: A Holistic Approach to Data Analytics and BI (SkillsCast recorded in October 2019)
- Implementing Clean Architecture in Flutter using BLoC (SkillsCast recorded in October 2019)
Audio Segmentation and Clustering
Catherine is a research engineer at Amazon working on speech technology, dialogue, language and machine learning. She holds a PhD in Engineering from Cambridge University, and has since spent time working on speech technology in both industry and academia.