IS2011 Student Event Report : Multimedia and Multimodal Interaction

Submitted by naxingyu on Fri, 10/28/2011 - 01:34

 

It is really an amazing experience to attend Interspeech2011 in Florence. Also the student lunch event was so productive and interesting. We met the famous researchers in similar or same fields.

The experts of our table are Prof. Helen Meng and Prof. Andrea Paoloni. The students of this table include Arindam Ghosh from University of Trento, Janto Skowronek from Technical University Berlin - Deutsche Telekom Laboratories, Senaka Amarakeerthi from University of Aizu, Sourish Chaudhuri from CMU whom I met in the Uffizi Gallery the other day, Weiwu Jiang from CUHK, Zhanyu Ma from Royal Institute of Technology whom I will meet in Beijing in the workshop of MLSP2011 next week, and me, Ziqiang Shi from HIT.

The experts first ask all the students of our table to introduce the research fields and interests. Then they consider all the research interests and fields of the students, and suggest the theme "What is multimodal?" to discuss. We have a warm discussion on this question. Prof. Helen Meng and Andrea Paoloni thought that multimodal provides two level meanings that are input and output. Prof. Helen Meng said that generally the multimodal systems present users with multimedia displays and multimodal output, primarily in the form of visual and auditory cues. She said that interface designers have also started to make use of other modalities, such as touch and olfaction. The team of Prof. Helen Meng is doing some work on map search. She gave an example that they use text and gesture input of the destination to the map engine. You directly point to the destination, and the engine told you how to get it. That helps a lot.

Janto now is doing some work on communication. He thought that multimodal means multiple one-one communication methods. Thus we agree multimodal related to three facts: input, output, and communication. Although multimodal is very useful in real-life, Zhanyu Ma said there are many question in multimodal. For example in multimodal communication, the receipt needs to decode what the multimodal is in the input.

The good time is always short. In this one hour discussion, we benefit a lot. Thanks for the ISCA to hold such an event; it is so interesting and productive. It provides an opportunity to communicate with the peer researchers. Almost all the students in our table become friends and keep in touch the conference. It also give us the platform to discuss with the famous professors which is really a wonderful experience.

 

Report written by ZiQiang Shi

 


Student meet senoirs event - group "Applications: Robotics, speech, industry"

Submitted by thuytn on Thu, 10/06/2011 - 07:59

 

We had students from different institutes, and a senior whose works strongly reflected the demand of the industry. In general, we discussed our concerns and interest in related to speech processing and robotics. This provided a valuable opportunity for us to exchange ideas, get to know each other's work, and find potential opportunities for collaboration. Given a large scale conference like InterSpeech, such a conversation is indeed necessary for students to form a direct benefit from the research community. Following are some of the main areas of discussion:

Robotics and human-robot interaction in particular involve various challenges related to complex systems. This is the reason for a wide variety of disciplines in our group. In common are some particular issues including - training, intelligibility and quality of speech in noisy environments, speech recognition, reverberation and separation. Solving one problem is sometimes strongly dependent on results of others.

Considering a large amount of active research resulting in new ideas, Dr. Mauro Falcone raised two interesting and practical points confronting the industry. The first one is - how backward compatible are our solutions? Researchers often focus on optimizing and increasing the performance of their systems. However, the industry is more concerned with how a new solution can be implemented on top of existing systems. With particular focus on the cost of rebuilding new infrastructure, services and related components, researchers should keep in mind the need for easy integration and backward compatibility. The second point raised was the current gap between what is possible and what has been achieved. Many laboratory achievements show significant potential. In most countries, researchers and end-users do not have a great deal of contact, consequently we are missing opportunities to ensure our systems meet their requirements.

In the last part of the discussion, we covered two related topics. Firstly our expectation of having an accessible and widely recognized data base for speech and video. We found AMI data corpus is currently adequate, though there are limitations. Last but not least a similar perceived need for a 'standardized library' of implementations for existing methods. This would be useful to compare and evaluate new methods. If no code or software is provided, contacting the authors of the methods currently is the only option.

 

Report written by Thuy Tran


Interspeech2011' Student Lunch Report from the table "Applications: Dialogue for education"

Submitted by nickcummins41 on Tue, 10/04/2011 - 20:59

 

Hi Folks,

I recently had the pleasure of attending Interspeech 2011 held in Florence, Italy. This was the first time I had attended a conference so you can imagine my surprise when I was sat, feeling jet lagged in the opening ceremony and a magic show started; what was I in for over the next 5 days? I have just recently finished my undergrad Electrical Engineering degree at UNSW, Sydney Australia. My honour thesis was in mental state recognition using speech. I was able to produce an Interspeech paper from this work.

During the conference an ISCA student lunch was held. This event gives students the opportunity to discuss their research with senior members of the speech communication community and other students. As I am trying to decide whether or not to do a PhD I thought this lunch would be a good opportunity for me to talk to fellow students about what is involved in a speech processing based PhD before committing to one.

The senior researches on my table were Giuseppe Riccardi from the University of Trento, Italy and Maxine Eskenzai Carnegie Mellon University, USA. There were 9 students on the table; 6 Europeans, 1 from the USA, 1 from India and myself from Australia. We covered a wide spectrum of speech processing topics between us including ASR’s, language recognition, speech enhancement, spoken dialog systems and health application such as recognising dyslexia or depression.

The topic of our table was Applications: Dialogue for education, but the discussion on our table was considerable broader than that. We started on why speech is such an interesting topic to study; it covers a wide range of potential applications as seen by the wide range of research areas covered on the table. We then discussed why conferences are good for students; they give the opportunity to talk to people doing similar work, help us get new ideas and points of view and hopefully help us gain confidence in our work. The discussion then moved on to how to make Interspeech better for students by allowing them to present and discuss potential ideas with fellow academics instead of just presenting research findings.

My fellow students also used the lunch as an opportunity to learn from our seniors about what they thought made a good conference paper from a reviewer’s point of you. The key points I took from this discussion were to make sure you submitted your work to the correct journal and that reviewers like to learn something from a paper.

All up I really enjoyed Interspeech; I meet loads of interesting people and learnt plenty. The student drink night held at the picturesque Flo Lounge Bar overlooking Florence was a fun night! Based partially on the great experience of the conference I have decided to do a PhD in speech processing. I’m looking forward to Interspeech 2012 in Oregon!

 

Report written by Nicolas Cummins


Students meet Seniors@Interspeech2011

Submitted by luiscoelho on Tue, 10/04/2011 - 17:09

 

The International Speech Communication Association had the excellent idea of organizing the “Students meet Seniors” event at this year’s Interspeech. The participating students had the chance to talk, in a very informal environment, with experienced researchers on a given area of expertise within the speech community.

I was included in table 5 under the topic “Machine Processing of Spontaneous Speech” where I was honored to meet Catia Cucchiarini, our Italian host, Cristina Flores, Diego Castán, Elias Iosif, Erich Zwyssig, Laurens van der Werff, Rivka Levitan and Sree Yella, all brilliant scientists that shared similar research interests. But most important, we had at our table Elizabeth (Liz) Shriberg, a researcher in the speech area with an outstanding career. She is currently a Principal Scientist as Microsoft Speech Labs in Mountain View, CA and is affiliated with the International Computer Science Institute in Berkeley. Previously she was a Senior Researcher in the Speech Technology and Research Laboratory at SRI International. Over the last decade she has led several projects and published 200 journal and conference papers.

While tasting a delicious Italian meal we started by sharing our backgrounds, described our on-going research work and talked about our interests. We were a very diverse group but shared many common perspectives and the described problems were extremely interesting. It was great to listen about the range of interesting projects people were working on.

As you can imagine everybody was eager to talk with Liz; there were many questions. We discussed several subjects with many technical details and Liz really impressed with her full deep knowledge of every aspect.

It was also very interesting to listen to Liz’s visions of the near future in the speech scene. She believes that we will soon assist to major developments in language understanding and that many technologies are becoming more robust and will hit the consumer market, making speech interaction a part of our daily lives.

Liz also shared her experience as a research professional and talked about the challenges of such a career. In a single sentence: Work on what you believe in, regardless of what other people think.

Everyone was having a great time and, of course, we also had time for some jokes and funny stories. We even learned about singing birds and humming cows J

It was really a great lunch that nobody will forget, for sure! I really must thank ISCA for this initiative that helps to create links among researchers and reinforces the speech research community. I also have to thank to Liz for her availability in participating in this lunch and for sharing all her experience with us.

I'm looking forward to see everyone again next year in Portland!



 

Report written by Luis Pinto-Coelho


Interspeech2011' student lunch event, TTS - Speech and Open Source table

Submitted by csapot on Tue, 10/04/2011 - 20:22

 

The discussion at Table 9 went to rather interesting directions; we have touched a couple of different topics mainly about the interaction between human and machine.

Professor Alan Black shortly introduced some topics that he is dealing with nowadays or has done in the past. His voice can be heard at several parts of the world even if he isn't present personally, because the voice "AWB" of the Festival speech synthesis system has widely spread and used, despite having a Scottish accent.

One of the recent projects is JIBBIGO, a program that can be used for speech to speech translation on a mobile phone. This is a nice application of speech technology, using speech recognition, speech synthesis and machine translation together. The app may be useful in simple cases instead of a speech translator human.

Regarding speech synthesis, we concluded during the discussion that it is very easy to build a simple Text-To-Speech system that can produce some kind of speech. But as we want to make this speech more human, more natural, the problem is getting nearly impossible. It was said that nowadays one of the goals in TTS is to produce conversational speech instead of read-like speech. One of the problems is the missing Artificial Intelligence in speech generation systems – with some kind of AI the system could be better for users.

The talk continued about the unusual human-computer interfaces, like the control with the brain. This is a topic that seems to be science-fiction at the moment, but there are some results around the world which show that in the future such interfaces may be available. This is a kind of basic research which can push the technology development further. A brain-controlled interface may be useful for driving a car, or for controlling speech synthesizers.

In our field, Natural Language Processing is getting very important: analyzing Twitter data can predict for example incidents, because news on Twitter spread much faster than on any other medium. People like to talk (in text) about recent movies there, and the analysis of such text can show if the movie is popular. By reversing this, is it possible to create such tweets that will make the people watch a movie?

Google has about 300 years of speech data, which is the hugest collection. But is it sure that more data always helps to build better systems? There is a huge competition among large companies (e.g. IBM, Microsoft, Google, Facebook, Twitter), maybe in 10-15 years this situation will change – because of other startups that will go large.

The discussion continued to speech-driven interfaces, e.g. voice navigation based TV. The question is, if people will like these? When the cell phones were new technology, no one could predicted that once they will be essential part of our life. Maybe this will be the case with speech-driven interfaces; but maybe not - we can't look into the future.

We had some discussion regarding the history of technology as well. 50 years ago the state of the technology was completely different than nowadays, there has been a dramatical change in our everyday life caused by technology. What we will observe in the next 50 years is completely unpredictable, as no one can look to the future.

At the end we were talking about the academic world as well: being a PhD student is a great challenge, but it is very important that there should be at least a few other persons who think that our topic is “cool” (it is not enough to do research alone).

 

Report written by Tamás Gábor Csapó


IS2011 Student Event Report : TTS - articulatory synthesis

Submitted by naxingyu on Fri, 10/28/2011 - 01:36

 

On this day, a small group of students met at the lunch room of the Interspeech Conference to an event which had not existed before in this form: We had the opportunity to have lunch together with several experienced researchers from different fields.

On our table, Peter Birkholz was the main host, and I believe him to have been an inspiration to all of us, showing what you can achieve. Our group consisted of students from many different fields, which yielded a very broad discussion on all kinds of topic related to speech and signal processing.

My impression is that we all learnt something - not only from Peter, but also from each other. It's so important to have different backgrounds, joined in the common goal to bring forward science! My personal opinion, at the very end, is that this goal was achieved - it certainly was for me, since my own work in Silent Speech Interfaces and speech-related Biosignal Processing covers a wide range of topics all by itself.

 

Report written by Michael Wand


IS2011 Student Event Report : ASR - Signal Processing

Submitted by naxingyu on Fri, 10/28/2011 - 01:52

 

The Students Meet Seniors lunch provided an opportunity for students to meet and talk with some industry/research experts over a casual lunch. Specifically, my table had Prof. Abeer Alwan from the Speech Processing and Auditory Perception Laboratory at UCLA in the USA, as well as Dr. Maurizio Omologo from the Speech-acoustic Scene Analysis and Interpretation Group at Fondazione Bruno Kessler in Italy. This proved to be a great chance for us students to informally talk about our research and future career paths.

The contrast between the two chosen experts between their fields of work was an interesting aspect of the conversation; it provided all of us students the opportunity to talk to a academic, as well as a researcher. This always proves fruitful when considering future endeavours beyond university studies. Prof. Alwan took her time to ask us all about our research topics, and also offered her advice and referred us to some of her students' past work where relevant. Similarly, Dr. Omologo also made a concerted effort to talk to us all; he also recommended us where possible to talk to his fellow colleagues at the conference, and also took a special interest in topics that he could offer his knowledge.

The benefits of attending the Students Meet Seniors lunch extend beyond just the time spent at the event; it has given me (and other students I am sure) a greater sense of direction for my research, as well as provided the opportunity to network with important people in my discipline. The opportunity to explain my work also enhanced my own comprehension of my work, as I'm sure other students also discovered. It has also facilitated the path towards meeting more people in my area; which will ultimately play a contributory role towards my decision for my future chosen career/research path.





 

Report written by Ingrid Jafari


IS2011 Student Event Report : ASR and Machine Translation

Submitted by naxingyu on Fri, 10/28/2011 - 01:37

 

At the beginning of the discussion, Dr. Hermann Ney firstly introduced himself and then asked others to introduce themselves in return. The introduction included where one's home country is, what research one's been doing on, and what one is interested to learn from this discussion. Dr. Hermann Ney is very nice and patient to try to understand everyone's work. At the same time he also gave lots of comment and helpful guidance.

After the self-introduction part, we enjoyed the delicious food together and discussed topics such as the trends of speech application, the way to do research scientifically, and so on. One important thing Dr. Hermann Ney mentioned was difference between doing research in companies and academics. Due to the amount of resources and different goals to achieve, the importance of performance and efficiency can be varied a lot. As a student, I did not know this difference can be huge in the past. Thanks to the discussion, I learned a lot both from Dr. Hermann Ney and other students. Though there was lots of different opinion during the discussion and not all of them are agreed by everyone, the discussion was still worthy to pay attention to. And I think this is just the reason why there should be such discussion.

In the end of the discussion, Dr. Hermann Ney talked about the way to do research scientifically rather than just blindly. I think this part is especially useful for every student at the table, since this is usually the most easily ignored thing for young researchers. The time went very fast and soon the discussion came to an end. The food was great and the discussion was inspiring. I think that everyone learned something from the discussion more or less. Thanks to Dr. Hermann Ney and others at the table, I had a very good experience. And I wish there will be more such events in the future.

 

Report written by Yeh Ching-Feng


IS2011 Student Event Report : ASR - prosody

Submitted by naxingyu on Fri, 10/28/2011 - 01:42

 

At 12 o'clock a great number of students and seniors streamed towards a huge meeting room. The atmosphere was buzzing with chat and excitement. A huge buffet of delicious Italian food arranged at the side of the room served to further elevate moods at the event.

After everyone had eventually found the right table, the session kicked off with an introduction of everybody at the table and their research interests. The group was very international, with students from Sweden, Russia, Ireland and Italy present. All students shared an interest in social signal processing and were looking forward to the opportunity to talk to Julia Hirschberg.

Julia Hirschberg gave an overview of her projects which cover a wide range of topics. Conversations extended beyond the realm of academia; we also discussed travelling in Sweden, and Julia Hirschberg recounted how she had spent a year in Sweden. It was really a great opportunity to participate at the student lunch, not just for the opportunity to meet famous and influential researchers such as Julia Hirschberg, but also for the opportunity to bring together students with very similar research interests and who would not likely have met otherwise at such a huge conference as Interspeech.

I would encourage any forthcoming students attending future Interspeech conferences to attend the student lunch if given the opportunity !!!



 

Report written by Catharine Oertel


Notes from the Interspeech Roundtable International Senior: Hiroya Fujisaki, Local Host: Giuliano Bocci

Submitted by fira on Wed, 10/05/2011 - 15:25

 

First of all each participant presented himself/herself and gave a brief overview of his/her work. Andrea deMarco's main interest area is automatic speaker identification. David Martinez is also interested in identification, but his focus is on the identification of language; he is also interested in noisy robust techniques. John Taylor is doing research on pitch estimation. Keng-hao Chang's interests are in the area of human-machine-interaction; he is building a mental health monitor via the human voice. Yan Tang is focussing on the speech enhancement in noisy conditions. Charlotte Wollermann is interested in audiovisual prosody and focus. Last, but not least, our local host Giuliano Bocci's interests comprise the interplay between syntax and prosody in Italian. Hiroya Fujisaki's main research areas cover speech communication and language processing, natural language processing, human and artificial intelligence etc. He also developed a model for the process of fundamental frequency control in speech.

After the presentation round we realized that we are all from different countries and this brought us to the topic of entrainment, which was addressed by Julia Hirschberg in her keynote talk. Often when we travel, the phenomenon of entrainment can be observed. We are trying to speak like the other person who is coming from a different culture. The aim is to be understood by the communication partner, such that the successful communication can be seen as overall goal.

The next topic was focus. Defining focus is not trivial since there are many different types of focus in natural language and therefore focus may have different functions. The marking of focus is language-dependant, so there are various strategies, but also the dialect can play a role.

Another topic of discussion was the difference between pitch and fundamental frequency. Pitch is temporal, relative and perceptually relevant. In contrast to that the fundamental frequency is precise, but not necessarily perceptually important. We also talked about algorithms for pitch extraction,

Discussing the fundamental frequency we came to the phenomenon of accentuation. We talked about the ABI (the accent of the british isles) corpus. This corpus contains recordings from different accent regions of the british isles. We also discussed that there are different accent types, e.g. word accent, sentence accent. Accentuation can be used by humans for highlighting particular information, but what about animals? Animals do not have linguistic knowledge. However it is known that the variation of fundamental frequency can be also observed in animals, e.g. the fundamental frequency of sound plays a role for male songbirds in order to attract female songbirds.

 

Report written by Charlotte Wollermann