Connect:  SPS Facebook Page  SPS Twitter  SPS LinkedIn  SPS YouTube Channel  SPS Google+ Page     Share: Share

SPASR workshop brings together speech production and its use in speech technologies

Karen Livescu

SLTC Newsletter, November 2013

The Workshop on Speech Production in Automatic Speech Recognition (SPASR) was recently held as a satellite workshop of Interspeech 2013 in Lyon on August 30.

The use of speech production knowledge and data to enhance speech recognition and related technologies is being actively pursued by a number of widely dispersed research groups using different approaches. For example, some groups are using measured or inferred articulation to improve speech recognition in noisy or otherwise degraded conditions; others are using models inspired by articulatory phonology to account for pronunciation variation; still others are using articulatory dimensions in speech synthesis; and finally, many are finding clinical applications for production data and articulatory inversion. The goal of this workshop was to bring together these research groups, as well as other researchers who are interested in learning about or contributing ideas to this area, to share ideas, results, and perspectives in an intimate and productive setting. The range of techniques currently being explored is rapidly growing, and is increasingly benefiting from new ideas in machine learning and greater availability of data.

The SPASR technical program included 5 invited speakers, as well as spotlight talks and a poster session for the 11 contributed papers. The invited talks set the scene with inspiring and thought-provoking presentations on topics such as new opportunities in the collection and use of diverse types of articulatory data (Shri Narayanan), invariance in articulatory gestures (Carol Espy-Wilson), models of dysarthric speech (Frank Rudzicz), new articulatory data collections and acoustic-to-articulatory estimation (Korin Richmond), and silent speech interfaces (Bruce Denby). The contributed papers and abstracts presented new work and reviews of work on related topics including computational models of infant language learning, human-machine comparisons in recognition errors, estimation of articulatory parameters from acoustic and articulatory data, induction of articulatory primitives from data, the use of articulatory classification/inversion in speech recognition, silent speech interfaces, and the use of multi-view learning and discriminative training using limited articulatory data.

The format and content of the workshop, as well as the intimate and comfortable setting at l'Institut des Sciences de l'Homme in Lyon, lent themselves to lively and fruitful discussion. The talk slides and submitted papers and abstracts can be found on the workshop's web site, http://ttic.edu/livescu/SPASR2013.

The workshop was co-organized by Karen Livescu (TTI-Chicago), Jeff Bilmes (U. Washington), Eric Fosler-Lussier (Ohio State U.), and Mark Hasegawa-Johnson (U. Illinois at Urbana-Champaign), with local organization by Emmanuel Ferragne (U. Lyon 2). It was supported by ISCA and Carstens.

Karen Livescu is Assistant Professor at TTI-Chicago in Chicago, IL, USA. Her main interests are in speech and language processing, with a slant toward combining machine learning with knowledge from linguistics and speech science. Email: klivescu@ttic.edu.