|
Course Catalog 2013-2014
SGN-24006 Analysis of Audio, Speech and Music Signals, 5 cr |
Person responsible
Tuomas Virtanen, Serkan Kiranyaz, Anssi Klapuri
Lessons
Study type | P1 | P2 | P3 | P4 | Summer | Implementations | Lecture times and places |
|
|
|
|
|
|
|
|
Requirements
Final exam and project work.
Learning Outcomes
After completing this course, the student will -be able to implement common mid-level data representations used in the analysis of audio signals. He or she will understand how structural regularities of audio signals can be modeled to facilitate their analysis. -be able to implement some some widely-used audio feature extraction techniques and signal analysis algorithms such as spectrogram factorization and multi-pitch analysis. -understand the basic techniques used in speech recognition. He or she will be able to implement the front-end used for extracting relevant information from the speech signal and understand the the mathematical principles and application of hidden Markov models that are used to model the feature sequences.
Content
Content | Core content | Complementary knowledge | Specialist knowledge |
1. | Mid-level representations of acoustic signals for their content analysis. Modelling of structural regularities of audio signals for analysis purposes. | ||
2. | Acoustic feature extraction and audio classification. Spectrogram factorization and other unsupervised learning techniques. Pitch analysis and music transcription. | ||
3. | Speech recognition: acoustic feature extraction and hidden Markov models. |
Instructions for students on how to achieve the learning outcomes
The course is marked based on the exam. The highest mark is given for correct answers that cover the depth and breadth delivered at the lectures and exercises. The threshold for passing the course is at about half of the maximum amount of points. Bonus points worth a maximum of one mark are given by active participation in weekly exercises. An acceptable project work has to be returned by the deadline.
Assessment scale:
Numerical evaluation scale (1-5) will be used on the course
Study material
Type | Name | Author | ISBN | URL | Edition, availability, ... | Examination material | Language |
Book | Speech and Audio Signal Processing: Processing and Perception of Speech and Music | B. Gold, N. Morgan, D. Ellis | No | English | |||
Book | Spoken Language Processing | X. Huang, A. Acero, H.-W. Hon | No | English | |||
Lecture slides | Yes | English | |||||
Online book | Lecture Notes for Audio Engineering | University of Illinois Urbana-Champaign | No | English |
Prerequisites
Course | Mandatory/Advisable | Description |
SGN-13000 Introduction to Pattern Recognition and Machine Learning | Advisable | 1 |
SGN-13006 Introduction to Pattern Recognition and Machine Learning | Advisable | 1 |
SGN-14006 Audio and Speech Processing | Mandatory |
1 . Either SGN-13000 or SGN-13006 is advisable.
Prerequisite relations (Requires logging in to POP)
Correspondence of content
Course | Corresponds course | Description |
|
|
More precise information per implementation
Implementation | Description | Methods of instruction | Implementation |