Meeting Room Speech Technology

Dr Steve Cassidy
Department of Computing
Macquarie University

Tuesday 30th March 2004 at 11am

Abstract

In the last few years there has been a lot of interest in applying speech technology in the context of meeting rooms; this seems to have become the next Grand Challenge for speech technology after the Broadcast News domain has been more or less mastered. Meeting rooms provide an interesting range of problems for speech recognition and related technologies. Headset microphones are not convenient and so distant tabletop microphones and microphone arrays are being targetted and room acoustics become a significant problem. The language used in meetings also differs significantly from scripted broadcast news recordings: we see less formal language, interruptions and backchannelling from meeting participants.

In this talk I'll try to give a state-of-the-art overview of various Meeting Room projects around the world and relate my own experiences taking part in the speaker tracking part of the current NIST Meeting Room evaluation.

Short resume

Steve Cassidy is a Computer Scientist who has worked in various areas relating to language and cognition over the last 20 years, from modelling the acquisition of reading in his PhD to Acoustic Phonetics and more recently speech technology. He is the principle author of the Emu Speech Database System which is widely used for the creation and analysis of spoken language corpora. His work on Emu has led to an involvement with groups in the US and Europe who are aiming to define standards for Linguistic annotation tools. Another interest is XML and the World Wide Web and in particular new developments in the Semantic Web space. Steve finds interesting parallels between his work on querying the complex structures in Linguistic Annotations and those being generated from Metadata in the Semantic Web.