Kies de Nederlandse taal
Course module: 201600082
Advanced Research Projects in Speech Processing
Course info
Course module201600082
Credits (ECTS)5
Course typeCourse
Language of instructionEnglish
Contact persondr. K.P. Truong
dr. R.J.F. Ordelman
dr. K.P. Truong
Contactperson for the course
dr. K.P. Truong
Academic year2021
Starting block
Application procedureYou apply via OSIRIS Student
Registration using OSIRISYes
After completing the course, the student is able to:
  • Design, implement, and evaluate a system that uses speech as input and processes and analyses speech with a specific task and user goal in mind
  • Understand major research challenges in developing real-world speech applications
  • Present the research, the design, and results in a clear and convincing manner, in both oral and written form
In this course, students will work in groups on a specific speech-related task that is relevant in current spoken language processing research. The size and scope of the project depends on the number of students and their skills. The research themes can differ per year and partly depends on the availability of the supervisors. One of the themes that students can always choose from though is the theme of "social and affective signals".
In many current speech applications, sensitivity to humans' social and affective behavior is still lacking. Hence, we need to investigate and develop systems that can analyze and interpret 'paralinguistic' information in speech - information that refers to how a person speaks. Examples of projects that students can work on:
  • Develop and evaluate a system that can automatically detect and extract specific paralinguistic information in large amounts of speech data, e.g., laughter, age, gender, emotion, personality etc.
  • Develop and evaluate a tool for eliciting and collecting emotional speech data (for children or elderly)
  • Interactive wearable speech devices: investigate the feasibility of embedding speech technology in wearables, e.g., 'sociometric badge'
  • Segmentation: investigate means to structure large audio recordings in meaningful segments based on information in the audio/speech signal, e.g., based on emotional states.
  • Machines addressing humans: how can we improve addressing and feedback behavior of machines based on automatically extracted multimodal information of the (behavior of the) human user.
Assumed previous knowledge
Mandatory: Speech Processing or equivalent

Desired: Machine Learning or equivalent, experience with programming
Participating study
Master Interaction Technology
Required materials
Recommended materials
Instructional modes
Presence dutyYes

Project supervised
Presence dutyYes

Project unsupervised
Presence dutyYes

Self study with assistance


Kies de Nederlandse taal