Connect:  SPS Facebook Page  SPS Twitter  SPS LinkedIn  SPS YouTube Channel  SPS Google+ Page     Share: Share

The INTERSPEECH 2013 Computational Paralinguistics Challenge - A Brief Review

Björn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer, Fabien Ringeval, Mohamed Chetouani

SLTC Newsletter, November 2013

The INTERSPEECH 2013 Computational Paralinguistics Challenge was held in conjunction with INTERSPEECH 2013 in Lyon, France, 25-29 August 2013. This Challenge was the fifth in a series held at INTERSPEECH since 2009 as an open evaluation of speech-based speaker state and trait recognition systems. Four tasks were addressed, namely social signals (such as laughter), conflict, emotion, and autism. 65 teams participated, the baseline as was given by the organisers could be exceeded, and a new reference feature set by the openSMILE feature extractor and the four corpora used are publicly available at the repository of the series.

The INTERSPEECH 2013 Computational Paralinguistics Challenge was organised by Björn Schuller (Université de Genève, Switzerland / TUM, Germany / Imperial College London, England), Stefan Steidl (FAU, Germany), Anton Batliner (TUM/FAU, Germany), Alessandro Vinciarelli (University of Glasgow, Scotland / IDIAP Research Institute, Switzerland), Klaus Scherer (Université de Genève, Switzerland), Fabien Ringeval (Université de Fribourg, Switzerland), and Mohamed Chetouani (UPMC, France). The Challenge dealt with short-term states (emotion in twelve categories) and long-term states (autism in four categories). Moreover, group discussions were targeted to find conflict (in two classes but also given as continuous 'level of conflict' from -10 to +10), and a frame-level task was presented with the detection of the 'social signals' laughter and filler. The area under the receiver operating curve (AUC) was used for the first time as one of the competition measures, besides unweighted average recall (UAR) for the classification tasks. AUC is particularly suited for detection tasks. UAR equals the addition of the accuracies per class (i.e., the recall values) divided by the number of classes. With four tasks, this was the largest Computational Paralinguistics Challenge so far. Also, enacted data alongside naturalistic data had not been considered before this year's edition. A further novelty of this year's Challenge was the provision of a script for re-producing the baseline results on the development set in an automated fashion, including pre-processing, model training, model evaluation, and scoring by the competition and further measures as are outlined below.

The following corpora served as clearly defined test, training, and development partitions incorporating speaker independence as needed in most real-life settings: the Scottish-English SSPNet Vocalisation Corpus (SVC) from mobile phone conversations, the Swiss-French SSPNet Conflict Corpus (SC²) featuring broadcast political debates - both provided by the SSPNet, the Geneva Multimodal Emotion Portrayals (GEMEP) featuring professionally enacted emotions in 16 categories as provided by the Swiss Center for Affective Sciences, and the French Child Pathological Speech Database (CPSD) provided by UPMC including speech of children with Autism Spectrum Condition.

Four Sub-Challenges were addressed: in the Social Signal Sub-Challenge, the non-linguistic events laughter and filler of a speaker had to be detected and localised, based on acoustic information. In the Conflict Sub-Challenge, group discussions had to be automatically evaluated with the aim of recognising conflict as opposed to non-conflict. For training and development data, also continuous level of conflict was given. This information could be used for model construction or for reporting of more precise results in papers when dealing with the development partition. In the Emotion Sub-Challenge, the emotion of a speaker within a closed set of twelve categories had to be determined by a learning algorithm and acoustic features. In the Autism Sub-Challenge, three types of pathology of a speaker had to be determined as opposed to no pathology by a classification algorithm and acoustic features.

A new set of 6,373 acoustic features per speech chunk, again computed with TUM's openSMILE toolkit, was provided by the organisers. This set was based on low level descriptors that can be extracted on the frame level by a script provided by the organisers. For the Social Signals Sub-Challenge that requires localisation, a frame-wise feature set was derived from the above. These features could be used directly or sub-sampled, altered, etc., and combined with other features.

As in the 2009 - 2012 Challenges, the labels of the test set were unknown, and all learning and optimisations needed to be based only on the training material. Each participant could upload instance predictions to receive the results up to five times. The format was instance and prediction, and optionally additional probabilities per class. This allowed a final fusion of all participants' results, to demonstrate the potential maximum by combined efforts. As typical in the field of Computational Paralinguistics, classes were unbalanced. Accordingly, the primary measure to optimise was UAR and unweighted average AUC (UAAUC). The choice of taking the unweighted average is a necessary step to better reflect the imbalance of instances among classes. In addition - but not as competition measure - the correlation coefficient (CC) was given for the continuous level of conflict.

The organisers did not take part in the Sub-Challenges but provided baselines using the WEKA toolkit as a standard-tool so that the results were reproducible. As in previous editions, Support Vector Machines (and Regression for additional information) were chosen for classification and optimised on the development partition. The baselines were 83.3% UAAUC (social signals), 80.8% UAR (2-way conflict class), 40.9% (12-way emotion category), and 67.1% UAR (4-way autism diagnosis) on the test sets with chance levels of 50%, 50%, 8%, and 25% UAR, respectively.

All participants were encouraged to compete in all Sub-Challenges and each participant had to submit a paper to the INTERSPEECH 2013 Computational Paralinguistics Challenge Special Event. The results of the Challenge were presented in a Special Event of INTERSPEECH 2013 (double session) and the winners were awarded in the closing ceremony by the organisers. Four prizes (each 125.- EUR sponsored by the Association for the Advancement of Affective Computing (AAAC) - the former HUMAINE Association) could be awarded following the pre-conditions that the accompanying paper was accepted for the special event following the INTERSPEECH 2013 general peer-review, that the provided baseline was exceeded, and that a best result in a Sub-Challenge was reached in the respective competition measure. Overall, 65 sites registered for the Challenge, 33 groups actively took part in the Challenge and uploaded results, and finally 15 papers of participants were accepted for presentations.

The Social Signals Sub-Challenge was awarded to Rahul Gupta, Kartik Audhkhasi, Sungbok Lee, and Shrikanth Narayanan, all from the Signal Analysis and Interpretation Lab (SAIL), Department of Electrical Engineering, University of Southern California at Los Angeles, CA, U.S.A., who reached 0.915 UAAUC in their contribution "Paralinguistic Event Detection from Speech Using Probabilistic Time-Series Smoothing and Masking".

The Conflict Sub-Challenge Prize was awarded to Okko Räsänen and Jouni Pohjalainen both from the Department of Signal Processing and Acoustics, Aalto University, Finland, who obtained 83.9% UAR in their paper "Random Subset Feature Selection in Automatic Recognition of Developmental Disorders, Affective States, and Level of Conflict from Speech".

The Emotion Sub-Challenge Prize is awarded to Gábor Gosztolya, Róbert Busa-Fekete, and László Tóth, all from the Research Group on Artificial Intelligence, Hungarian Academy of Sciences and University of Szeged, Hungary, for their contribution "Detecting Autism, Emotions and Social Signals Using AdaBoost". Róbert Busa-Fekete is also with the Department of Mathematics and Computer Science, University of Marburg, Germany. They won this Sub-Challenge with 42.3% UAR.

The Autism Sub-Challenge Prize was awarded to Meysam Asgari, Alireza Bayestehtashk, and Izhak Shafran from the Center for Spoken Language Understanding, Oregon Health & Science University, Portland, OR, U.S.A. for their publication "Robust and Accurate Features for Detecting and Diagnosing Autism Spectrum Disorders". They reached 69.4% UAR.

Overall, the results of all 33 uploading groups were mostly very close to each other, and significant differences between the results accepted for publication were as rare as one might expect in such a close competition. However, by late fusion (equally weighted voting of N best participants' results), new baseline scores in terms of UAAUC and AUR exceeding all individual participants' results could be established in all Sub-Challenges except for the Autism Sub-Challenge. The general lesson learned thus again is "together we are best": obviously, the different feature representations and learning architectures contribute an added value, when combined. In addition, the Challenge clearly demonstrated the difficulty of dealing with real-life data - this challenge remains again.

In the last time slot of the 2nd session of the challenge, past and (possible) future of these challenges were discussed, and the audience filled in a questionnaire. The answers from the 35 questionnaires that we received can be summarised as follows and corroborate the pre-conditions and the setting chosen so far: time provided for the experiments (ca. 3 months) and number of possible uploads (5) was considered to be sufficient. Performance measures used (UAR and AUC) are preferred over other possible measures. Participation should be only possible if the paper was accepted in the review process; "additional 2-pages material" papers should not be established for rejected papers but possibly as a voluntary option. There is a strong preference for a "Special Event" at Interspeech, and not for a satellite workshop of Interspeech or an independent workshop. The benefits of these challenges for the community and by that, the adequate criteria for acceptance of a paper, are foremost considered to be interesting/new computational approaches and/or phonetic/linguistic features; boosting of performance (above baseline) was the second most important criterion.

For more information on the 2013 Computational Paralingusitics Challenge (ComParE 2013), see the webpage on emotion-research.net.

The organisers would like to thank the sponsors of INTERSPEECH 2013 ComParE: The Association for the Advancement of Affective Computing (AAAC), SSPNet, and the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 289021 (ASC-Inclusion).

Björn Schuller is an Associate of the University of Geneva’s Swiss Center for Affective Sciences and Senior Lecturer at Imperial College London and Technische Universität München. His main interests are Computational Paralinguistics, Computer Audition, and Machine Learning. Email: schuller@IEEE.org

Stefan Steidl is a Senior Researcher at Friedrich-Alexander University Erlangen-Nuremberg. His interests are multifaceted ranging from Computational Paralinguistics to Medical Image Segmentation. Email: steidl@cs.fau.de

Anton Batliner is Senior Researcher at Technische Universität München. His main interests are Computational Paralinguistics and Phonetics/Linguistics. Email: Anton.Batliner@lrz.uni-muenchen.de

Alessandro Vinciarelli is Senior Lecturer at the University of Glasgow (UK) and Senior Researcher at the Idiap Research Institute (Switzerland). His main interest is Social Signal Processing. Email: vincia@dcs.gla.ac.uk

Klaus Scherer is a Professor emeritus at the University of Geneva. His main interest are Affective Sciences. Email: Klaus.Scherer@unige.ch

Mohamed Chetouani is a Professor at University Pierre and Marie Curie-Paris 6. His main interests are Social Signal Processing and Social Robotics. Email: mohamed.chetouani@upmc.fr

Fabien Ringeval is an Assistant-Doctor at Université de Fribourg. His main interest is Multimodal Affective Computing. Email: fabien.ringeval@unifr.ch