Interspeech2011' student lunch event - Final details

Dear ISCA students:

Thank you for your registration for the student event at Interspeech.

Below you can find your name, the table you have been assigned to, the experts at your table, and the other students that will be joining you.

This data has been shared with the invited experts so to have an idea of the interests of the students who will be joining the event.

 

Please search in the page for your name to check the tables.

The event will take place on Monday 29th of August, between 12:00 and 13:30, at the first floor of the "Pala Affari" Building (The conference venue).

 

If you have offered your help with writing a blog post about the discussion (thank you), please check if you have been assigned as a reporter for that table.

We will be expecting you at the event, and hope that you will find the lunch enjoyable and fruitful. 

 


Table No. 1
Table topic Multimodal Interaction
Italian Senior Francesco Cutugno
International Senior Björn Granström
Students    
  Christian Herff Karlsruhe Institute of Technology, Germany.
    Work: Instead of using the acoustic signal for speech recognition, my research focuses on using the movement of the articulatory muscles, which we pick up with EMG, to recognize speech. I'm particularly interested in comparing the effects different speaking modes have on the muscle movements and recognition accuracies.
  Daniel Reich Karlsruhe Institute of Technology, Germany.
    Work: I am interested in speech applications for real life and the combination of speech with other modalities for creation of seamless interfaces and environments.
In my diploma thesis, I developed and implemented a speech command detection system for a smart control room. My system is capable of distinguishing between commands directed to the system and all other audible events. The system runs in real-time and works for other domains, too. I gave a talk about my diploma thesis using it to distinguish between commands to operate the slides projector, and other speech events.
  Giuseppe Riccardo Leone ISTC-CNR, Italy.
    Work: I am working on a webgl avatar which convey paralinguistic information such as emotions in the conversation. I am interested in 3D models and in particular the tongue one and emotional tagged text to speech.
  Herman Kamper Stellenbosch University, South Africa.
    Work: Currently I am working on multi-accent speech recognition in which we are aiming to develop a speech recognition system able to recognise multiple accents of South African English. Speech applications for under-resourced languages is therefore a main focus of my work. In a broader sense I am also very interested in the way that ASR systems can be used in education (second-language acquisition, tutor systems), as well as the combination of ASR systems with other interface modes (such as vision) in multimedia applications (e.g. systems for the disabled).
  Niall McLaughlin Queen's University Belfast, United Kingdom.
    Work: I work on multimodal person identification using speaker and facial recognition. My work involves person identification with limited training data in both modalities, and realistic environmental corruption of test data in both modalities.
  Samantha Ainsley Columbia University, United States.
    Work: My primary research interest is computer graphics, specifically physics-based computer simulation. I have conducted research in collision detection for complex deformable geometry at Columbia and physical hair modeling at Weta Digital. My background in speech comes from my previous work on the Google Speech team. I am interested in visualization tools for language models and multimodal user interfaces. As an undergraduate, I developed a speech and gesture-driven multi-user interface for managing household activities in shared living space.
  Taras Butko Universitat Politecnica de Catalunya, Spain.
    Work:
- Multimodal Acoustic Event Detection, Audio Segmentation;
- Machine Learning for Signal Processing (Gaussian Mixture Models (GMMs), Hidden Markov Models (HMMs), Support Vector Machines (SVMs));
- Feature Selection and Analysis;
- Information Fusion
  Tim Paris MARCS, University of Western Sydney, Australia.
    Work: I am interested in perception of audiovisual speech. Specifically how processes of integration between Auditory and Visual speech can change and adapt based on the context and on prior experience.

Table No. 2
Table topic Multimedia&Multimodal Interaction
Italian Senior Andrea Paoloni
International Senior Helen Meng
Reporter Shi ZiQiang
Students    
  Arindam Ghosh University of Trento, Italy.
    Work: I am interested in extracting information from dialogue systems and other psychophysiological signals.
  Janto Skowronek Technical University Berlin - Deutsche Telekom Laboratories, Germany.
    Work: My research is about quality assessment of multi-party conferencing.
Based on the vast knowledge quality assessment for one-to-one telephone conversations, we are investigating how existing quality assessment methods and quality prediction models can be modified, extended or even replaced to the case when more than two interlocutors are communicating.
  Senaka Amarakeerthi University of Aizu, Japan.
    Work: I am working on emotion classification from voice. So far I have done some work to classify emotion in offline manner and willing to try realtime, which can applied to collaborative virtual environments.
  Sourish Chaudhuri Carnegie Mellon University, United States.
    Work: I am interested in modeling of acoustic phenomena, in general. Specifically, I'd like to be able to identify patterns in data that indicate certain events or event types. The data may potentially be multimodal, and integration of features from multiple modalities for various tasks is a specific interest area of mine. Unsupervised, semi-supervised approaches as well as non-parametric methods are areas I am exploring. My past and present work has involved context-sensitive modeling of conversations, unsupervised discovery of acoustic units for sound classification, and source separation for speech recognition.
  Weiwu Jiang Chinese University of Hong Kong, Hong Kong.
    Work: Speaker Recognition.
  Zhanyu Ma KTH - Royal Institute of Technology, Sweden.
    Work: I am currently working on statistical model estimation and its applications in sound and image processing. Interesting area: machine learning, pattern recognition, speech processing.
  ZiQiang Shi Harbin Institute of Technology, China.
    Work: My research interests include audio information processing, speech signal processing, as well as pattern recognition and machine learning for audio signal processing. Recently I work on sparse and low rank features, and regularization framework for audio event detection.

Table No. 3
Table topic Applications: Robotics, speech, industry
Italian Senior Mauro Falcone
International Senior Antoine Raux
Reporter Thuy Tran
Students    
  Afsaneh Asaei IDIAP/EFPL, Switzerland.
    Work: My research takes place in the general context of improving the performance of the Distant Speech Recognition (DSR) systems, tackling the reverberation and recognition of overlapping speech. Perceptual modelling indicates that sparse representation exists in the auditory cortex. The present project thus builds upon the hypothesis that incorporating this information in speech representation improves the speech recognition performance. Inspired from this, the goal of my PhD work is mainly to address the problem of overlapping speech by exploiting two key ideas: (1) blind (source) separation of the speech components, and (2) in a sparse (e.g.,Gabor) space. Source separation based on sparse representation of signals is called Sparse Component Analysis (SCA). More specifically, my research brings a framework for integration of auditory sparse representation and sparse component analysis for separation of the overlapping speech to improve the performance of the DSR systems.
  James Gibson University of Southern California, United States.
    Work: I am interested in applying signal processing and machine learning techniques to speech to contribute to our understanding of human communication and interaction. My work includes automatic classification and saliency detection of human affective behavior.
  John Labiak The University of Chicago, United States.
    Work: My research interests are in pattern recognition, machine learning and statistics. I am very interested in high dimensional data analysis, and the application of sparse representations to pattern recognition problems. My work has focused on new paradigms for ASR. In particular, my research has focused on distance metric learning for k-NN classification of phonetically labeled frames of speech.
  Jorge Marin Georgia Institute of Technology, United States.
    Work: I am working in noise reduction methods for binaural hearing aids. Particularly, how to improve noise reduction, speech quality and speech intelligibility under hostile environments, and the impleementation of these methods in ultra-low power systems such as a hearing aid. My research interests are speech enhancement, perceptually-inspired speech processing, and speech applications for portable devices.
  Matthew Black Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California, United States.
    Work: My research is in behavioral signal processing, specifically on the automatic quantification and emulation of human observational processes to describe human behavior. Highlights of this work include: predicting children's reading ability as accurately as human evaluators by extracting human-inspired cues and modeling multiple evaluators' perceptions, and automatically classifying relevant high-level behaviors during real interactions of married couples by fusing automatically derived speech and language information. I am currently leading an interdisciplinary effort with collaborators at the USC Keck School of Medicine to collect and analyze a large corpus of social interactions between expert psychologists and children diagnosed with autism spectrum disorders.
  Petko Petkov KTH-Royal Institute of Technology, Sweden.
    Work: I work on: i) developing statistical regression models for application to quality assessment of speech and ii) the use of objective quality and intellibility models for determining optimal speech modification strategies for speech presented in noise. The second topic relates to the use of automatic speech recognition as the basis for a high level intelligbility model.
  Tania Habib SPSC Lab, Graz University of Technology, Austria.
    Work: I am working in the area of microphone array signal processing. My research interests are speaker localization, beamforming and speech signal-based array processing.
I have explored the use of speech related feature such as fundamental frequency in speech localization problem by applying computational auditory scene analysis techniques and particle filters for tracking multiple concurrent speakers.
  Thuy Tran ITR, University of South Australia, Australia.
    Work: I'm working on speech separation using adaptive beamforming. My target is to deal with reverberation and moving speakers. Currently, training and adapting beamformers are of my interest.

Table No. 4
Table topic Applications: Dialogue for education
Italian Senior Giuseppe Riccardi
International Senior Maxine Eskenazi
Reporter Nicolas Cummins
Students    
  Devangi Parikh Georgia Institute of Technology, United States.
    Work: My research interests are in the general area of speech and audio processing. My current research focuses on speech enhancement and noise suppression. I am interested in understanding the human perceptual auditory model and applying this model to noise suppression algorithms to obtain perceptually pleasing speech quality. I am also interested in blind source separation as a technique for noise reduction.
  Jinqiu Sang University of Southampton, United Kingdom.
    Work: My PhD projects is about noise reduction in hearing aids and cochlear implants. I am interested in hearing loss simulation and sparse coding strategies applied in speech.
  Jose Luis Blanco Universidad Politecnica de Madrid, Spain.
    Work: The main topic of my research is the analysis of pathological voices to design and test an automatic system to identify people which are suffering from a specific condition. However, due to specific characteristics of the disorder in which I focus, and the kind of information we can model using speech processing techniques, there is no previous state-of-the-art on this particular field.
  Morten Højfeldt Rasmussen Aalborg University, Denmark.
    Work: My PhD study deals with the challenges involved in automatically spotting reading miscues in sentences read by a dyslexic person. The system uses Automatic Speech Recognition (ASR) for spotting the miscues. The goals are to develop methods and algorithms that accurately detect reading miscues and to accurately track the reading position in the text.
  Nicholas Cummins University of New South Wales, Australia.
    Work: For my undergraduate thesis I have been researching the effects of depression on speech focusing on how to account for speaker variability in a depression recognition system. I have a strong interest in this topic and am trying to decide on if I want to do a speech processing based phd.
  Srikanth Raj Indian Institute of Science, India.
    Work: My work mainly focusses on compressive sensing and its application to speech and audio signals. Compressive sensing is a sensing mechanism that guarantees almost perfect reconstruction from sub-Nyquist measurements for sparse signals. This leads to the study if sparse models for speech signals. In our work, we explore the sparse models based on orthogonal transforms, dictionary representations and sparsely excited linear system model (LPC). Other research interests include automatic speech recognition, pattern recognition, adaptive processing.
  Stefan Kombrink Brno University of Technology, Czech Republic.
    Work: I was working on subword based ASR and am currently interested in neural net based language models.
  Mehdi Soufifar Norwegian University of Science and Technology, Norway.
    Work: My research interests are language recognition and speaker identification and recently i have been working on Language recognition under noisy conditions!
  Luca Rognoni University of Padova, Italy.
    Work: I have previously worked as a qualified English and Spanish teacher at all levels of the Italian schooling sustem, basing my work on a communicative approach. My current PhD project involves around the phonetics of the acquisition of English as a second language, both at segmental and suprasegmental level, the definition of foreign accent and the concept of intelligibility between speakers of English as a second language. I am also interested in CALL (Computer Aided Language Learning) and in the application of new technologies to the teaching/learning process of foreign languages.

Table No. 5
Table topic Machine Processing of Spontaneous Speech
International Senior Elizabeth Shriberg
Reporter Luis Pinto-Coelho
Students    
  Cristina Maritza Guerrero Flores Università degli Studi di Trento, Ecuador.
    Work: I am a PhD Candidate at the University of Trento. My research is being developed at FBK (Fondazione Bruno Kessler), where I am currently working as a project researcher. My advisor, Prof. Maurizio Omologo, and his team have conducted challenging research in the fields of audio signal processing and natural spontaneous speech interaction. My work focuses on the improvement of the robustness/intelligence of acoustic soutions within real-life environments. My research interests include human language, natural interaction, audio processing, and machine learning.
  Diego Castán University of Zaragoza, Spain.
    Work: Segmentation and classification of broadcast news and acoustic events detection in spontaneous speech (meetings rooms).
  Elias Iosif ECE Dept., Technical University of Crete, Greece.
    Work: I work in the field of Natural Language Processing, with a particular interest on the unsupervised computation of semantic similarity. I investigate the contribution of my research to some speech-related fields, such as (i) automatic induction of semantic classes for class-based LM, (ii) data-driven (bottom-up) grammar generation, (iii) affective text (spoken corpora included).
Currently, I investigate the incorporation of models taken from the field of cognitive sciences (psycholinguistics included) to my research dealing with semantics.
  Erich Zwyssig The University of Edinburgh, United Kingdom.
    Work: My research is in speaker diarisation (who spoke when) in meetings.
  Laurens van der Werff University of Twente, Netherlands.
    Work: I work on Spoken Document Retrieval. The main research topics of my thesis are 1. Automatic segmentation of Speech transcripts for use in SDR, 2. Evaluation of ASR transcripts for SDR, and 3. Improved lattice scoring through query-derived information.
  Luis Pinto-Coelho University of Vigo, Spain.
    Work: Finishing my PhD on the area of prosody. Besides this topic I believe that the area of dialogue systems will be the next grail in the speech scene. I'm interested in systems for conversational understanding.
  Rivka Levitan Brno University of Technology, Czech Republic.
    Work: I am currently working on quantifying acoustic and prosodic entrainment in dialogue and exploring the relationship between entrainment and characteristics of a dialogue and its participants.
  Sree Harsha Yella IDIAP Research Institute, Switzerland.
    Work: I am currently working on Speaker diarization (who spoke when) in spontaneous meetings data. Specifically looking at the issue of overlaps (simultaneous speech) in spontaneous conversations.

Table No. 6
Table topic Articulatory Modelling
Italian Senior Claudio Zmarich
International Senior Gerard Bailly
Students    
  Atef Ben Youssef GIPSA-Lab, France.
    Work: My research focuses on talking heads that we develop for a “visual articulatory feedback” system. In particular, my thesis's work concerns the acoustics-to-articulatory speech inversion using statistical methods (e.g. HMM, GMM). For HMM-based approach, the inversion mapping is devised on two stages: acoustic recognition and articulatory synthesis. Toward a multi-speaker visual articulatory feedback system, speaker adaptation can be used to adapt the trained HMMs to the new speaker's voice.
  Brian Bush Oregon Health and Sciences University, United States.
    Work: Computational approaches to speech recognition have largely ignored explicitly modeling of coarticulation in human speech. My research is concentrated on explicitly modeling coarticulation using a formant trajectory model. Discovering relationships between model parameters, formant targets and the underlying speech signal might allow us to more accurately predict vowels in style-independent CVC tokens.
  Christina Hagedorn University of Southern California, United States.
    Work: My primary interests lie in the area of phonetics. Particularly, I am interested in the study of speech production in Romance (especially standard Italian and the minor italian varieties) within the framework of Articulatory Phonology. My current research focuses on studying articulator kinematics involved in the production of phonological length contrast in Italian using real-time MRI. The broad aim of this work is to determine how length contrast might result from critical differences in articulatory control parameter specifications.
  Gopal Ananthakrishnan Centre for Speech Technology, KTH (Royal Institute of technology), Sweden.
    Work: My main research interests are in Acoustic-to-Articulatory Inversion, Speaker normalization, Infant speech acquisition, Prosody, Multi-modal speech recognition, Emotion recognition etc.
  Juan Rafael Orozco Arroyave Universidad de Antioquia, Colombia.
    Work: I have been working on automatic detection of hypernasality by means of acoustic analysis and non-linear dynamics.
Recently, I began to apply the parametrization techniques in the detection of other voice pathologies in continous speech.
For the future I would want to work in the analysis of patients' voices with Parkinson disease.
  Jun Wang University of Nebraska - Lincoln, United States.
    Work: My research interests are (1) recognition of speech (text) from articulatory movements (without acoustic input) and (2) early detection of onset of speech deterioration due to disease like ALS. I'm also interested in analysis of brain signals associated with speech, or brain-computer interfaces. In Florence, I will present my work of a quantified articulatory vowel space, which may have significant scientific and clinical implications.
  Zeynab Raeesy University of Oxford, United Kingdom.
    Work: My research is on modelling human speech production. Articulation is a hidden-procedure and direct observation of speech production often involves physiological interventions, that may affect the naturalness of speech. The goal of my research is to develop a system for modelling human articulation without having to go through the challenges of direct observation. Instead, I intend to use more feasible features of speech, such as the resulted acoustic signal, for finding the parameters of articulation (inversion mapping). More specifically, I am working on developing an automated system that can learn the relation between the acoustics and articulation, and can predict the articulatory features based on the acoustic features. For this research, I use a dynamic MRI database , that contains both visual articulatory data and corresponding acoustic data that is recorded simultaneously as the MRI images are acquired.
My main research interests include acoustic-to-articulatory inversion, modelling human articulation and analysis and feature extraction from visual articulatory data.

Table No. 7
Table topic Articulatory Models/Phonetics - feedback
International Senior Roger Moore
Reporter Maarten Versteegh
Students    
  Arild Naess NTNU, Norway.
    Work: I am currently working on Articulatory Feature recognition from audio using k nearest neighbors and artificial neural networks. I am aiming to incorporate this into a full speech recognition system by way of a conditional random field or a dynamic Bayesian network. My research interest is generally geared towards statistical machine learning, particularly how linguistic knowledge can be incorporated in a statistical framework.
  Christina Bergmann Radboud University Nijmegen, Netherlands.
    Work: In my PhD project I am aiming to use recent findings on early infant language learning within the first year (both first episodic "words" and the emergence of abstract representations of the native phonological system) to build a computational model that unites findings on e.g. robustness, asymmetrical recognition and effects of variation. In addition, predictions of the model are tested with infants in the lab.
  Katariina Mahkonen Tampere University of Technology, Finland.
    Work: Automtic speech recognition with speech representation as sparse sum of dictionary exemplars.
  Maarten Versteegh Radboud University / International Max Planck Research School for Language Sciences, Netherlands.
    Work: Computational models of language acquisition, focussing on the unsupervised extraction of phones and words from speech. I apply machine learning methods to the simulation of infant's learning of the sounds and words of their first language. Cognitive plausibility of the models and congruency with results from experimental research in developmental linguistics and cognitive neuroscience guide the research. Ultimately, the research in my project has the goal to contribute to new models for automatic speech recognition by basing our models on human capacities for learning the structure of speech.
  Miranti Indar Mandasari Radboud University Nijmegen, Netherlands.
    Work: I have a big interest in speaker recognition system generally, but especially the make use of speaker recognition system -both the voice comparison based on phonetic analysis and the automatic ones- for forensic application. Right now I am doing a research on i-vector based automatic speaker recognition system and its applications on forensic fields.
  Saameh Golzadeh Ebrahimi FBK, Italy.
    Work: my phd thesis is related to far-field speaker identification issues.
  Stephen Shum Massachusetts Institute of Technology, United States.
    Work: My research interests revolve around the use of unsupervised methods for processing and uncovering structure in speech. For example, some of my recent work has been in speaker diarization, where I have spent some time working on the problem of determining the number of speakers in a given conversation. A goal of mine is to broaden the scope of this problem and find ways to apply similar methods to uncover different types of structure, such as how many different languages are spoken in a given collection, and maybe even some form of topic segmentation/clustering on the audio.

Table No. 8
Table topic Articulatory Models/Phonetics - phonetics and phonology
International Senior John Ohala
Reporter Leona Polyanskaya
Students    
  André Nogueira Xavier State University of Campinas, Brazil.
    Work: My research topic concerns one aspect of Brazilian sign language phonetics (henceforth Libras). I have been investigating the alternation of sign production in terms of number of manual articulators. Although in Libras there are sign which are typically articulated with one hand and others which are usually produced with both hands, sometimes one-handed signs become two-handed and vice-versa. I have been trying to figure out the reasons why that happens. Up to now I have observed that factors of different nature govern such alternation. One-handed signs can become two-handed for emphasis, grammatical reasons or co-articulation (influence from a preceding or following two-handed sign). Two-handed signs, in turn, can become one-handed for co-articulation or because one of the hands is not available for signing. In spite of the diversity of factors governing the alternation of signs in terms of number of hands with which they are realized, I will focus on co-articulation and unavailability of one of the hands.
  Barbara Samlowski University of Bonn, Germany.
    Work: At the moment, I am writing my doctoral thesis about the possible effect of syllable frequency on coarticulation and its implications for the mental syllabary hypothesis. As the syllable frequencies I am investigating are based on automatically transcribed written texts, I have gained insight into the problems and challenges of text preprocessing. I am interested in speech technology and corpus linguistics as well as phonetic theories and their practical applications.
  Caicai Zhang Language Engineering Laboratory, The Chinese University of Hong Kong, Hong Kong.
    Work: Speech production and perception, tone perception, talker normalization.
  Christos Koniaris KTH - Royal Institute of Technology, Sweden.
    Work: Currently I am working on the area of the automatic pronunciation error detection, evaluation and analysis for second language learning. For several years my research concerned the area of robust speech recognition.
My research interests include second language acquisition and perceptual assessment, automatic speech recognition, speech processing, perceptual phonetics, psychoacoustics and auditory modeling, pattern classification and dynamic modeling.
  Hywel Stoakes The University of Melbournen, Australia.
    Work: My PhD research regards the acoustic and articulatory phonetics of an Australian Aboriginal language called Kunwinjku (Bininj Gun-wok), with a particular interest in speech timing and coarticulatory processes involving nasals. My other research interests involve speech production and speech perception in Australian language communities. I have a further interest in the phonetics of endangered languages in general and how this affects cultural interactions within a dominant language community.
  Leona Polyanskaya Bielefeld University, Germany.
    Work: Acquisition of prosody, speech rhythm, acquisition of speech rhythm in first and second language.
  Pavel Sturm Institute of Phonetics, Charles University in Prague, Czech Republic.
    Work: I've just finished an undergraduate course in phonetics and therefore my research output is not abundant. I took part in investigating the phonological voicing contrast in whispered Czech (acoustics, perceptual testing) and currently I'm working on measuring RTs under different segmentation conditions (but the main goal is to evaluate 2 programs). My BA thesis was concerned with the acoustic analysis of the open front vowel /ae/ in Czech English & the acquisition of L2. I hope to continue mainly with psychophonetics for the future.

Table No. 9
Table topic TTS - Speech and Open Source
Italian Senior Mirko Grimaldi

International Senior

Alan W Black
Reporter Tamás Gábor Csapó
Students    
  Andrew Fandrianto Carnegie Mellon University, United States.
    Work: I'm working and have worked on dialog systems like Let's Go.
  Firoj Alam University of Trento, Italy.
    Work: I finished my undergrad in 2006. After that I worked on Bangla Text to speech and successfully developed it using festival framework. Apart from that I have a bit of experience to work on speech recognition. Currently I am a HLTI (Human language technologies and interfaces) master’s student in University of Trento and I admitted into PhD program in the same track under Prof. Riccardi.
  Haiyang Li Harbin Institute of Technology, China.
    Work: My main research interests are keyword spotting and spoken term detection. Most of my past research work was on the confidence measure for keyword spotting.
  Jingting Zhou University of Maryland, United States.
    Work: For my master's thesis, I am conducting research for an Italian company offering Natural Language Processing ("NLP") services. I will be attending Interspeech to determine what the future holds for NLP, and how the company maybe able to profit from those market needs.
  Toshiko Vu United States.
    Work: For my master's thesis, I am conducting research for an Italian company offering Natural Language Processing ("NLP") services. I will be attending Interspeech to determine what the future holds for NLP, and how the company maybe able to profit from those market needs.
  Sajad Shirali-Shahreza University of Toronto, Canada.
    Work: I am interested in speech processing applications in HCI domain. In the work that I will present at Interspeech, we design a system to verify the user that interacting with the system is a human, which can be referred to as a CAPTCHA. The application of our system is in identifying and blocking spammers. We are using the speech synthesis limitations in addition to speech recognition restrictions, to provide a more usable and accessible alternative to current audio CAPTCHA systems which solely rely on speech recognition limitations.
  Tamás Gábor Csapó Budapest University of Technology and Economics, Hungary
   

1) Prosodic variability in Text-To-Speech synthesis: how to make speech synthesis more human, bu chosing from several prosodic alternatives when the same or similar sentence is repeated (people also say the same content every time in a different way).

2) The role of subglottal resonances in speech: how do the resonances of the subglottal airways influence the speech production, perception; and how can they be applied in speech technology.


Table No. 10
Table topic TTS - Expressive synthesis
Italian Senior Enrico Zovato
International Senior Nick Campbell
Reporter Miaomiao Wen
Students    
  Marcin Wlodarczak Bielefeld University, Germany.
  Miaomiao Wen The University of Tokyo, Japan.
    Work: Text to speech.
  Miran Pobar University of Rijeka, Croatia.
    Work: My main research interest is text-to-speech, especially for the Croatian language. We have built several prototype voices (unit selection and statistical) and are trying to improve the speech quality, especially prosody.
  Nic De Vries CSIR Meraka Institute, South Africa.
    Work: Currently busy with Masters looking at the impact of speaking styles on acoustic models and more specifically to be able to use or adapt acoustic models for different speaking style recognition due to language resource constraints.
  Nicolas Obin IRCAM, France.
    Work: My main interest is the statistical modeling of speech prosody, and its integration into speech synthesis systems (unit-selection, HMM-based).
  Sylvain Le Beux LIMSI-CNRS, France.
    Work: I am interested in voice synthesis in general, either speech or singing, and especially about acoustical and prosodic modifications of the voice.
I am also very interested in the links between voice production and perception and the gestures involved for producing vocal sounds.
As a new interest also is the topic of characterization of individual voices for speech synthesis;
  wen zhengqi China Academy of Science, China.
    Work:Excitation modeling and emotion processing
  Shannon Hennig Italian Institute of Technology, Italy.
    Work:Of the topics above, I am most interested in prosody, TTS, multimodal communication and applications. My background is in augmentative and alternative communication and my primary interest is improving the prosodic and emotional expressivity of speech synthesis for people with disabilities who use speech generating communication devices. Currently I am recording audio and physiological signals (e.g., skin conductance, heart rate) recorded while a person is speaking. I could really use conversation with more expert researchers regarding analysis techniques for finding patterns within and between these data streams. I am currently focusing on whether there are involuntary components of prosody that vary with activation of the autonomous nervous system that can be picked up from wearable sensors. My clinical experience as a speech-language therapist leads me to suspect that such an involuntary component exists, and my goal in my PhD is to begin to describe and characterize it with the hope that maybe this information could inform implicit control techniques of a subset of voice parameters for speech synthesis in the future.

Table No. 11
Table topic HMM TTS
Italian Senior Fabio Tesser
International Senior Keiichi Tokuda
Students    
  Cassia Valentini Botinhao University of Edinburgh, United Kingdom.
    Work: I am currently doing a PhD on Perceptual HMM-based Speech Synthesis in the Centre for Speech Technology Research (CSTR) in the University of Edinburgh, under the SCALE project framework. My main interests in the area of speech technology are speech synthesis, intelligibility measures for speech, speech enhancement and psychoacoustic models for the auditory human system.
  Hanna Silen Tampere University of Technology, Finland.
    Work: I am working as a researcher at Tampere University of Technology, Finland, and pursuing towards a PhD degree in text-to-speech synthesis and voice conversion. In speech synthesis I have studied both unit selection and HMM-based methods, and most recently, also hybrid-form synthesis aiming at combining the best of these two.
  Hui Liang Idiap Research Institute, Switzerland.
    Work: I am working on cross-lingual speaker adaptation in the context of speech-to-speech translation. To be specific, the goal is to build a TTS system which can reproduce the voice of a user in a language that the user doesn't speak. The work is entirely based on the HMM-based speech synthesis framework. My research interests include text-to-speech synthesis, speech signal processing and phonetics.
  Kheang Seng Institute of Technology of Cambodia, Cambodia.
    Work: I'm working on Letter-to-Phoneme conversion for English TTS application and English learning software for Japanese people. I'm also interested in Speech Synthesis application (beginner).
  Matt Shannon University of Cambridge, United Kingdom.
    Work: I'm interested in truly probabilistic models of speech, for both speech synthesis and speech recognition. In particular I'm interested in the extent to which performance can be improved by concentrating on the low-level aspects of acoustic models. My research has focused on using autoregressive models for speech synthesis, which provide a simple but elegant framework for building probabilistic models of speech.
  Mumtaz Mustafa University of Malaya, Malaysia.
    Work: My research interest is on speech synthesis technology as well as emotional speech synthesis. Previously I have developed a Malay speech synthesizer based on diphone concatenation units to synthesize emotional speech. And recently, I have developed an HMM-based speech synthesis system for Malay to synthesize emotional speech as well. Now my research focus is on bilingual speech synthesis system which includes English and Malay languages.
  Thomas Ewender ETH Zürich, Switzerland.
    Work: My research interests are speech synthesis with a focus on signal processing and speech waveform analysis. More precisely, my recent work focused on the evaluation of speech segments in terms of quality and suitability to serve as a basis for synthesis and the automatic generation of speech corpora for synthesis. On one hand this work includes in-depth analysis of signal properties such as fundamental frequency, pitch marks and voicing properties. On the other hand extensive tools are required to build voices in a fully automatic way with minimal manual intervention.
  Gustav Eje Henter KTH - Royal Institute of Technology, Sweden.
    Work: I am generally interested in statistical machine learning. Within the speech field, my main focus is in acoustic models (HMMs and their extensions), particularly for speech synthesis.

Table No. 12
Table topic TTS - articulatory synthesis
Italian Senior Cinzia Avesani
International Senior Peter Birkholz
Reporter Michael Wand
Students    
  Alan O Cinneide Dublin Institute of Technology, Ireland.
    Work: I work with speech modelling for glottal analysis and especially voice modification. In particular, I do a lot of work with glottal waveform parameterization.
  Elizabeth Godoy Orange Labs, France.
    Work: I am currently finishing my PhD thesis on Voice Conversion (VC). The majority of my work focuses on spectral envelope transformation for VC. My interests lie primarily in signal processing for speech applications.
  Luca Iacoponi Pisa University, Italy.
    Work: I'm mainly working in Theoretical Phonology, so I'm particularly interested in models that tries to integrate and simulate speakers' competence.
  Michael Wand Karlsruhe Institute of Technology, Germany.
    Work: I'm a PhD student with Prof. Tanja Schultz, in Germany. My work is speech recognition based on electromyographic signals, which are electrical signals captured from the human articulatory muscles. This technology can be used to build Silent Speech Interfaces, to augment acoustic ASR, and to better understand human speech, among other purposes. It has been described by me in about 10 publications and has been successfully demonstrated i.e. on the CeBIT IT fair 2010.
  Thomas Drugman University of Mons, Belgium.
    Work: My work focuses on the use of the glottal flow in speech processing. How to estimate the glottal flow, the excitation signal, and how to integrate it in several speech processing applications such as speech synthesis, speaker recognition, voice pathology detection, or expressive speech analysis.
  Ya Li Institute of Automation, Chinese Academy of Sciences, China.
    Work: My work is mainly on prosodic structure prediction from textual feature in Text-to-Speech. Now, I am moving on to the generation of prosodic structure in the framework of HTS.

Table No. 13
Table topic ASR + ML for DS
Italian Senior Pietro Laface
International Senior Steve Young
Reporter Christian Gillot
Students    
  Azam Rabiee Brain Science Research Center, KAIST, Korea.
    Work: Am working in Single-Channel Speech Separation or Monaural Speech Separation. In fact, there are two groups of approaches: Engineering approaches and Human-based approaches. I'm working in a Human-based approach which generally we call it Computational Auditory Scene Analysis (CASA).
  Balázs Tarján Budapest University of Technology and Economics, Hungary.
    Work: My research interests are in language and pronunciation modeling for Hungarian large-vocabulary speech recognition (LVCSR) and spoken term detection systems (STD). Word-based language models work fairly well in languages, where the number of word forms is relatively small (e. g. English, Spanish). However, in morphologically rich languages (e.g. Finnish, Turkish, Hungarian, Arabic) even if using large training corpus we can face the problem of data sparseness (high perplexity and OOV rate). In order to handle this problem the words can be split into smaller, morpheme-like lexical units called as “morphs”. It has been shown in other agglutinative languages (Finnish, Turkish, Estonian) that morph-based recognizers can significantly outperform the ones using word-based language models. In Hungarian we also managed to achieve significant improvement by using morph-based models, however as the training corpus size was increased this improvement reduced or even disappeared in some cases. What I am especially interested in is the effect of LM parameters (size, n-gram order, OOV rate, genre of the task) on the morph-based improvement in LVCSR and STD tasks. My aim is to construct a simple, language independent model that can predict the change of WER if word-based model is replaced with a morph-based one.
  Ben Hixon Hunter College of The City University of New York, United States.
    Work: I'm an undergraduate entering my senior year and intending to pursue graduate study for machine learning approaches to natural language processing and speech recognition. I am particularly interested in interdisciplinary approaches to and applications of machine learning. My paper at Interspeech adapts a bioinformatics algorithm to prepare a matrix of phoneme substitution weights(similar to the BLOSUM matrix for amino acid substitutions) from a corpus of pronunciation variants derived from the CMU Pronouncing Dictionary; we use this matrix to compare the performance of two machine learning and one rule-based grapheme-to-phoneme methods. In addition, I'm currently working on creating a user simulator to automatically generate training dialogue corpora for our lab's machine learning SDS, to be used for book ordering by telephone for the Andrew Heiskell Library for the Blind.
  Chandramohan Senthilkumar école supelec, France.
    Work: I am a second year phd student working on statistical machine learning methods for spoken dialogue management. As part or my on going research iam analyzing thé sample efficiency of different RL algorithms for dialogue management and user simulation.
  Charl van Heerden North West University, South Africa.
    Work: I have worked on quite a few speech recognition systems involving limited resources (we've built speech recognition systems for all 11 of South Africa's official languages). I was also privileged to learn from the team working on Voice Search for South African English, Afrikaans and Zulu while doing an internship at Google in New York.
  Christian Gillot University of Nancy 1, France.
    Work: I'm working on language modeling for large vocabulary automatic speech recognition. The state of the art is still the n-gram language model which is very simple yet efficient but can't capture many phenomena of the language. The prohibitive cost of linguistic annotation of large corpora calls for learning language models automatically that are able to better capture the various phenomena of human language which is the focus of my work.
  Hadrien Gelas Université Lyon 2, France.
    Work: I am more specifically working on speech recognition for resource-scarce languages.
  Hung-yi Lee National Taiwan University, Taiwan.
    Work: I am researching in speech information retrieval. My word includes adjusting acoustic model parameters to improve spoken detection performance via relevance feedback and several pseudo-relevance feedback approaches for spoken term detection.

Table No. 14
Table topic ASR - Signal Processing
Italian Senior Maurizio Omologo
International Senior Abeer Alwan
Reporter Ingrid Yafari
Students    
  Abdul Waheed Mohammed University of Trento, Italy.
    Work: Basically my interest is on robust speech recognition in adverse environments. More specifically, investigating robust feature extraction techniques and acoustic model adaptation for reverberant environments.
  Belinda Schwerin Griffith School of Engineering, Griffith University, Australia.
    Work: Research during my candidature has focused on exploring the properties of the modulation domain, and utilizing this domain for speech enhancement. More recent work has been investigating the use of this domain for objective quality and/or intelligibility metrics that are better suited to the comparison of processed speech corrupted with different types of distortions.
  Hilman Ferdinandus Pardede Tokyo Institute of Technology, Japan.
    Work: My research is about robust speech recognition. Now we are focusing on the fact of correlated speech and noise which is often neglected by traditional method. We implement q-log properties from tsallis entropy to perform normalization in q-log domain.
  HON-BILL YU THE HONG KONG POLYTECHNIC UNIVERSITY, Hong Kong.
    Work: My research interests include speaker verification, speech enhancement and voice activity detection. I am currently focusing on implmenting a robust voice activity detector for the interview speech in NIST Speaker Recognition Evaluations. Different VAD methods have been investigated included energy based approach, statistical-model based approach and Gaussian-mixture-model based approach.
  Ingrid Jafari The University of Western Australia, Australia.
    Work: My research interests lie in the field of underdetermined blind source separation (BSS) using microphone arrays. In particular, the time-frequency masking approach to the BSS problem when reverberation is present. My work involves an extension of an existing multiple sensors unmixing estimation technique (MENUET) via a novel amalgamation with the fuzzy c-means clustering algorithm for the mask estimation stage. It is expected that the fuzzy decisions in this clustering will capture the ambiguity surrounding the membership of each time-frequency cell to a cluster - which is representative of the reverberation present.
  Vikas Joshi IIT Madras, India.
    Work: I am interested in Robust Speech Recognition. Mainly, I have focused my work on compensating the noise at the "Sub-band level". As speech and noise behave differently at different frequency bands, noise compensation at subband seems to be more appropriate. I am working on modifying the standard noise algorithms like HEQ and VTS to perform band specific compensation. Also, my interest has been on compensating speaker variabilities (using VTLN technique) in noisy conditions. My recent work has been on separating the Speaker and Environmental transforms for quick Speaker and Noise adaptations.
  Wei Rao The Hong Kong Polytechnic University, China.
    Work: My research interests are speaker verification and pattern recognition. Now I mainly works on solving the data imbalance problem in GMM-SVM based speaker verification which is the imbalance between the numbers of speaker-class utterances and impostor-class utterances avaiable for trainig a speaker-dependent SVM.
  Zhanlei Yang National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, China.
    Work: Since 2008, Dr. Zhanlei Yang has been working on the recognition of speech which is varied by some intrinsic factors such as accent/dialect. His current research interests include speech recognition, decoding/search algorithm and acoustic modeling.

Table No. 15
Table topic Hybrid ASR
Italian Senior Fabio Brugnara
International Senior Hynek Hermansky
Students    
  Antti Hurmalainen D. of Signal Processing, Tampere University of Technology, Finland.
    Work: We are attempting to improve the noise robustness of ASR by using an alternative, exemplar-based approach instead of short term GMMs. Current subtopics include factorisation algorithms, dictionary generation, feature spaces, efficient computing, decoding methods etc.
  Fethi Bougares LIUM-Le mans, France.
    Work: My research interests is about speech recognition system combination.
  Gaungsen Wang National University of Singapore, Singapore.
    Work: My research mainly focuses on the discriminative training of acoustic models. Especially the Hybrid Neural Network / Hidden Markov Model (NN/HMM) system, including the context dependent modelling of NN/HMM, the scaling problem,etc.
  Korbinian Riedhammer Univ. Erlangen-Nuremberg, Germany.
    Work: In my current research, I focus on robust automatic speech and speaker recognition. As a side track, I work on automatic speech summarization with a focus on finding alternatives to the classic extractive-abstractive modalities.
  Mahaboob Ali Basha Shaik RWTH Aachen University, Germany.
    Work: Pursuing research in SCALE project in the field of Automatic speech recognition.Currently working on hybrid language models to reduce the out of vocabulary problem in the LVCSR systems, under the supervision of Prof.H.Ney, RWTH Aachen University, Germany.
  Mark Sinclair CSTR, University of Edinburgh, United Kingdom.
    Work: I currently investigating many aspects of the speaker diarization task. I am particularly interested in tackling the problems of clustering short speech segments, identifying overlapping speech and automating or eliminating the large number of parameters present in current systems. I am mostly working with meetings data but would be interested to discuss other applications.
  Roland Maas University of Erlangen-Nuremberg, Germany.
    Work: My research interests include robust speech recognition. I'm especially investigating HMM-based techniques for reverberation compensation. The main concept I'm pursuing incorporates a priorly estimated reverberation model directly into the Viterbi decoder.
  Sravana Reddy The University of Chicago, United States.
    Work: I am interested in lexical models, as well as the interface between speech processing and theoretical linguistics and natural language processing. I am also interested in unsupervised learning and adaptation, particularly of pronunciations, but also acoustic and language models.

Table No. 16
Table topic ASR and Machine Translation
International Senior Hermann Ney
Reporter Yeh Ching-Feng
Students    
  Amr Mousa Lehrstuhl fuer Informatik 6, RWTH Aachen University, Germany.
    Work: I am doing research in the language modeling for morphologically rich languages including sub-lexical language modeling, class-based language modeling, factored language modeling and continuous space language modeling. I am also interested in NLP research and lexical modeling. My interest is more focused to the Arabic language.
  Ching-Feng Yeh National Taiwan University, Taiwan.
    Work: I am doing research on automatic speech recognition for code-mixed bilingual speech. In my work, the acoustic model was improved by merging and recovery algorithm. For now I am interested in researches such as discriminative training and language identification. In addition, the way how should machine learning techniques be integrated into speech recognition framework is one of the problems I would like to find out.
  Chun-an Chan National Taiwan University, Taiwan.
    Work: I am working on spoken term detection tasks based on unsupervised trained HMM. I am seeking for probable solution to adapt speakers speech query (very short, only a word or a phrase) that is robust to channel mismatch and background noise. Also I am improving the HMM with a more complex method that require less training data.
  Keith Kintzley Johns Hopkins University, United States.
    Work: I'm interested in spoken term detection / keyword spotting models based phone posteriorgram representations of speech.
  LE Hai Son LIMSI CNRS, France.
    Work: Now, my research subject is language model in Automatic Speech Recognition and Machine Translation System.
  Marijn Schraagen Utrecht University, Netherlands.
    Work: Past research has focussed on non-native speech and multilingual lexica, using phoneme-to-phoneme conversion to improve phonetic transcription. Current research is aimed at entity matching in written historical text for which speech may be able to match different orthographies.
  Tomas Mikolov Brno University of Technology, Czech Republic.
    Work: I am interested in statistical language modeling using neural networks, and its application to speech recognition and machine translation. Also, I am interested in combination of different approaches, and in advanced machine learning techniques (like recurrent neural networks and deep neural networks).
  Zejun Ma Institute of Automation, Chinese Academy of Sciences, China.
    Work: My research interests include multilingual acoustic modeling, automatic speech recognition and spoken term detection.

Table No. 17
Table topic ASR in Industry
Italian Senior Paolo Baggia
International Senior Michael Picheny
Reporter Madhavi Ratnagiri
Students    
  David Rybach RWTH Aachen University, Germany.
    Work: My research interests are in the general area of speech recognition with focus on efficient search algorithms. For my thesis I am developing and analyzing WFST-based decoders. Furthermore, I am the current maintainer of the RWTH Aachen University Open Source Speech Recognition System.
  Florian Mueller University of Lübeck, Germany, Germany.
    Work: My work concentrates on robust feature extraction for automatic speech recognition. A main field of my research is about the use of invariant transformations for vocal-tract length independent speech recognition. In terms of keywords, auditory filterbanks and auditory models, invariant transforms, feature selection methods, and also speech synthesis are part of my work.
  Giulio Paci ISTC-CNR, Italy.
    Work: During the last three years I worked at CINECA in the Knowledge and Data Management group, where I developed audio and text analysis application for automatic metadata generation. Next year I am going to work on Automatic Speech Recognition at CNR. My research interests includes speaker diarisation, audio and text classification, audio and text summarisation, topic detection, information retrieval, machine understanding and automatic speech recognition.
  Madhavi Ratnagiri Rutgers University, United States.
    Work: I have worked with combining feature transformation with model training for MCE training, as well as developing new loss functions for MCE training.
  Senaka Buthpitiya Carnegie Mellon University, United States.
    Work: I'm working on parallel computing approaches (specifically GPU based) for improving the accuracy and responsiveness of real-time large vocabulary continuous speech recognition systems. I also work on using GPU-based approaches for training LVCSR systems.
  Udhyakumar Nallasamy Carnegie Mellon University, United States.
    Work: I work on accent and dialect issues in automatic speech recognition.
  Xueliang Zhang Computer Science Department, Inner Mongolia University, China.
    Work: Speech Separation, Speech Recoginition.
  Yuxiang Shan Department of Electronic Engineering, Tsinghua University, China.
    Work: My research interests include:Automatic speech recognition, especially decoding algorithms and decoder design and implementation; Confidence measure; Speaker recognition.
  Scott Novotney JHU CLSP, United States.
    Work: We know how to deploy an effective ASR system with reasonable WER: get audio from the domain and transcribe it. I want to break this assumption and lower the cost of ASR for low-resource domains like conversational Arabic dialects. My thesis is focused on semi-supervised methods for language modeling. I look for ways to use a small amount of transcribed audio and large amounts of untranscribed audio. Previously, I've researched acoustic model self-training, Arabic colloquial dialect adaptation and transcription of CTS data with Mechanical Turk.

Table No. 18
Table topic ASR + prosody
Italian Senior Roberto Gretter
International Senior Julia Hirchberg
Reporter Catharine Oertel
Students    
  Bogdan Vlasenko Cognitive Systems, Department of Electrical Engineering and Information Technology, Otto von Guericke University Magdeburg, Germany.
    Work: Emotion Recognition from Speech, Emotional Speech Adapted Speech Recognition, User Behavior Adaptive Spken Dialog Systems.
  Catharine Oertel Trinity College Dublin, Ireland.
    Work: I am interested in the statistic modeling of degrees of "involvement" in spontaneous conversation by using multimodal cues. My background lies in phonetics, however, in the last year I have also analysed visual cues such as gaze, blinking and amount of movement (open cv) to identify cues which are suitable for the automatic prediction of degrees of involvement.
  Daniel Neiberg TMH / KTH, Sweden.
    Work: I develop acoustic event detectors for the epiphenomena in conversational interaction which is not about words. This includes turn-taking, via back-channels conveyed attitudes, filled pauses and affect. Para-linguistics in other words.
  David Doukhan LIMSI-CNRS, France.
    Work: The aim of my research is to perform expressive speech synthesis, applied to short tales. My work consists in: extracting relevant informations from text (speech turns, tale structure, characters, ...); constituting and annotating a 12 tales audio corpus; performing a prosodic analysis on the corpus; inferring rules allowing to map tale text to prosodic instructions
  Kristina Lundholm Fors University of Gothenburg, Sweden.
    Work: In my research I focus on the production and perception of pauses in spontaneous dialogues and in human-computer interaction. I am interested in the application of prosody and pauses in particular in spoken dialogue systems.
  Kun Li The Chinese University of Hong Kong, Hong Kong.
    Work: Now I am working on the detection of lexical stress, pitch accent, and intonation pattern. And I am also interest in automatic speech recognition.
  Martina Urbani Università degli Studi di Padova, Italy.
    Work: I focus on the study of L2 intonation, especially the difference among intonation patterns produced by American English native speakers and Italian learners of English. Although it is well known that intonation plays a role in perceptions of foreign accent and in intelligibility ratings, there is virtually no information as to which dimensions or parameters of intonation are responsible for these perceptions. Non-native speakers (NNS) may differ from native speaker (NS) productions in certain dimensions. These are the inventory of boundary tones and pitch accents (systemic dimension); the phonetic implementation of these structural elements (realisational dimension); the distribution of boundary tones and pitch accents (phonotactic dimension); and functionality (semantic/pragmatic dimension), (Mennen, 2007). I would like to know more about those aspects which are difficult for L2 learners to acquire and whether specific instruction influences success.
  Ondrej Glembek Brno University of Technology, Czech Republic.
    Work: Speaker recognition, discriminative training, subspace modeling, optimization.

Table No. 19
Table topic Prosody
International Senior Hiroya Fujisaki
Reporter Charlotte Wollermann
Students    
  Andrea DeMarco University of East Anglia, United Kingdom.
    Work: The broad area of my research is automatic speaker identification. The current focus of work is however an attempt at characterising unsupervised prosodic and accent cues, something that has so far been mostly attempted in a supervised manner due to the strong relation between spectral differences for a particular word encoding across different accents. It would be interesting to see if, after all, it is possible to characterise long-term accent cues from an acoustic-only perspective, which for all intents and purposes, seems to be something the primate auditory cortex is capable of doing to a certain extent.
  Charlotte Wollermann Bonn University, Germany.
    Work: PhD thesis: the role of audiovisual prosody for pragmatic focus interpretation; research interests: audiovisual prosody, multimodal speech perception, experimental pragmatics, emotion and human-machine interactio
  David Martinez Gonzalez University of Zaragoza, Spain.
    Work: I am mainly working on Language Identification. I have worked with phonotactic systems (PPRLM), acoustic systems (JFA, i-vectors) and now also investigating on prosody. I also interested in noisy robustness techniques.
  John Taylor University of East Anglia, Norwich, United Kingdom.
    Work: I am researching pitch estimation, but avoiding the conventional methods, and focusing on data driven, probabilistic methods instead. The goal is to produce a pitch estimator that outperforms current methods, and does not require threshold parameters to be set.
  Keng-hao Chang UC Berkeley, United States.
    Work: My thesis project is to build mental health monitor via the human voice. Human voice has a wealth of affective information, as well as indicators of early stage mental illness (a.k.a. prodrome, psychomotor dysfunction). Coupled with the pervasiveness and the built-in microphones of mobile phones, the human voice becomes the most penetrating and unobtrusive means to monitor mental health. We started this work by learning psychiatrists' diagnostic knowledge and planned to apply the knowledge to build an effective mental health monitor. I have been working on projects using voice to detect emotion and abnormal speaking styles, and to differentiate depressed patients from healthy subjects. We are in the process of building a machine learning model that predicts the mental health level (via regression) from the voice signals. At the same time, I'm implementing a mental health monitor library that can run efficiently on mobile phones, turning mobile phones as "cognitive phones". As a side project, I'm also working on biogsignals to detect psychological stress.
  S. Thomas Christie University of Minnesota, United States.
    Work: I am interested in automatic detection of speech dysfluencies based on prosody-only analysis for use in the tracking of cognitive deficits associated with drug trails and age-related dementia.
  Yan Tang Language and Speech Laboratory, Universidad del Pais Vasco (UPV/EHU), Spain.
    Work: My research is mainly focused on speech enhancement in noise conditions, speech production and perception in noise, study of human behaviour while listening and speaking in noise.

Table No. 20
Table topic Prosody
Italian Senior Antonio Romano
International Senior Mark Hasegawa-Johnson
Reporter Mikhail Ordin
Students    
  Chao-yu Su NTHU, Taiwan.
    Work: My research interest is prosody analyses and applications by speech prosody. My previous work focus on discourse prosody and now I pay more attention on the key tern extraction in spontaneous speech.
  Chierh Cheng Speeh, Hearing and Phonetic Sciences, United Kingdom.
    Work: Phonetic reduction in natural speech.
  David Imseng Idiap Research Institute, Switzerland.
    Work: The goal of my PhD is to investigate new approaches towards the development of multilingual speech recognition systems and explore language adaptive methods that provide means to build systems for languages lacking resources while focusing on problems related to multilingual acoustic-phonetic modeling. In this context, I am looking for principled approaches towards the definition and training of shared multilingual phoneme sets and multilingual features, fast adaptation of systems, or composition of multiple monolingual systems. By building, evaluating and further exploring multilingual speech recognition systems, I also expect to improve the performance of current state-of-the-art monolingual systems, ideally on accented speech and dialects.
  Erin Cvejic MARCS Auditory Laboratories, University of Western Sydney, Australia.
    Work: Although prosody is typically examined in terms of acoustic signal modifications, my research focuses on the way the visual speech signal can convey suprasegmental information, primarily in terms of prosodic focus (i.e., broad vs. narrow focus) and phrasing contrasts (i.e., declarative statements vs. echoic questions). My research to date has included perceptual studies using natural and manipulated video stimuli, spatiotemporal quantification of visual prosody using optical tracking and guided PCA, and perception of prosody from abstract visual displays (point light displays).
  Lehlohonolo Mohasi University of Stellenbosch, South Africa.
    Work: My research is in text-to-speech synthesis, with the main focus being on prosody/tone in Sesotho, which is one of the under-resourced languages in Southern Africa. My interest is in prosodic modeling of Sesotho language for a close to ideal text-to-speech system in Sesotho.
  Maria Eskevich Dublin City University, Ireland.
    Work: The main topic of my PhD is the improvement of speech retrieval process, therefore I am interested in both automatic speech recognition and information retrieval. Prosody appears as an interesting and so far not that much integrated in the field of retrieval and that is the reason why I would like to be able to discuss the possible ideas with the people experienced in the field.
  Mikhail Ordin Bielefeld University, Germany.
    Work: Acquisition of speech rhythm in production and perception by first and second language learners. Integration of speech rhythm and brain rhythms.