The modern world has seen a rapid increase in amount of data generated every day. While fueling the development of novel AI applications, complex structures and high variability in data representations present new challenges for machine learning algorithms.
Real-world data often deviates from simple patterns and distributions, as well as it can include multiple sources, such as audio, video, and text. In her dissertation, M.Sc. Kateryna Chumachenko presented novel solutions for managing such different forms of data complexity in machine learning applications.
Dimensionality reduction for heterogeneous data
The dissertation focused on dimensionality reduction techniques—methods used to transform data before it is processed by classical machine learning algorithms. These methods aim to simplify data, enhance algorithm efficiency, and improve their performance by transforming data to such representation that facilitates easier learning of the decision boundary by a machine learning algorithm.
The dissertation specifically targeted dimensionality reduction methods aimed at processing data that exhibit non-uniform class distributions, i.e., where structure of data is complex and data samples within the same class can vary significantly.
– The dissertation proposed solutions to some of the limitations of existing methods, enhancing their efficiency, speed, and robustness, as well as including extensions to accommodate multiple data modalities, Chumachenko says.
Multimodal deep learning
The second part of the dissertation tackled the challenges of processing multiple data formats — such as audio, video, and text — simultaneously to build stronger predictive models in the context of deep neural networks.
The dissertation’s contributions to deep learning, particularly in the context of affective computing applications, offered solutions to common challenges, such as effective fusion of multiple modalities within the same model, robustness towards missing data, and enhancement of the practical adoption of multimodal models for unimodal inference.
– The developed methodologies in audiovisual emotion recognition and dynamic facial expression recognition based on speech and video have achieved state-of-the-art results, Chumachenko comments.
The dissertation has paved the way for developing more robust and efficient machine learning models capable of handling diverse and complex data.
– This advancement is crucial for improving the performance and applicability of AI systems across various fields, from healthcare to multimedia, enabling them to address real-world challenges more effectively, Chumachenko ends.
Public defence on Friday 4 October
The doctoral dissertation of M.Sc. (Tech) Kateryna Chumachenko, titled Machine Learning Methods for Heterogeneous Data: Multimodal deep learning and dimensionality reduction, will be publicly examined at the Faculty of Information Technology and Communication Sciences of Tampere University at 12:00 on Friday, October 4th, 2024 at Hervanta Campus, Tietotalo building, auditorium TB109. The Opponent will be Professor Benoit Macq from Université Catholique de Louvain. The Custos will be Professor Moncef Gabbouj from Tampere University. The dissertation was co-supervised by Professor Alexandros Iosifidis from Aarhus University.