Soumya Tripathy: Generative AI models enhance advanced image manipulation

Soumya Tripathy pukeutuneena tummaan paitaan, selässään reppu. Taustalla järvi ja metsää.

Kuva: Shanshan Wang

As social media continues to dominate our digital lives, the demand for intuitive tools to create and manipulate visual content has skyrocketed. In his doctoral dissertation, researcher Soumya Tripathy emphasises the critical need for seamless visual interactions, empowering even those without a background in content creation to effortlessly produce and share stunning visuals.

With the rise of social media, the demand for user-friendly tools to generate and manipulate visual data has surged. M.Sc. (Tech.) Soumya Tripathy highlights the need for seamless visual interactions, enabling users without content creation backgrounds to effortlessly create and share visual content. The study delves into the potential of data-driven deep learning models, particularly GANs, to meet this demand.

By harnessing the transformative power of Generative Adversarial Networks (GANs), Tripathy’s dissertation unveils innovative techniques to simplify and enhance image and video-based content editing.

“The potential applications are vast, promising to revolutionize industries such as social media, film post-production, animation, and virtual reality,“ Tripathy states.

The research underscores the success of Convolutional Neural Networks (CNNs) in discriminative modelling and their extension to generative modelling. GANs have achieved remarkable results in generating high-quality images of human faces, cars, animals, and buildings.

“Despite their success, GANs have struggled with controlled image manipulation tasks such as face reenactment and object animation. I have studied new approaches to improve the quality, understanding, and usability of GANs for these tasks,” says Tripathy.

Implications for social media and movie production

Tripathy emphasizes controlled image manipulation. The study proposes strategies to utilize weakly supervised data with minimal strong annotations, overcoming the limitations of requiring extensive training data. This approach enables the generation of high-quality images using GANs.

The dissertation introduces two models that allow users to interact with human-interpretable features to create high-quality videos from a single input image. These models extend to non-face objects, including body parts and animals, demonstrating robust feature learning and high-quality animation generation.

“The findings have significant implications for various fields. In social media, users can create and share visually appealing content effortlessly,” Tripathy says.

“The movie post-production and animation industries can benefit from faster and more efficient image manipulation tools. Additionally, virtual reality applications can achieve more realistic and immersive experiences,” he adds.

Soumya Tripathy conducted his doctoral research in Tampere University’s Computer Vision group and is currently working at Huawei Tampere R&D Centre.

Public defence on Friday 23 August

M.Sc. (Tech.) Soumya Tripathy’s doctoral dissertation in the field of Computing and Electrical Engineering titled Image Manipulation and Animation Using Deep Generative Networks will be publicly examined at the Faculty of Information Technology and Communication Sciences at Tampere University at noon on Friday 23 August 2024. The venue is TB109 hall in the Tietotalo building on the Hervanta campus (address: Korkeakoulunkatu 1, Tampere). The Opponent will be Associate Professor Maciej Zięba from Wroclaw University of Science and Technology, Poland. The Custos will be Professor Esa Rahtu from the Faculty of Information Technology and Communication Sciences at Tampere University.

The doctoral dissertation is available online

The public defence can be followed via a remote connection (Zoom)