Learning to Synthesize Images from Multi-modal and Hierarchical Inputs

DSA学域研讨会

Learning to Synthesize Images from Multi-modal and Hierarchical Inputs

摘要

In recent years, the field of image synthesis and manipulation has experienced remarkable advancements driven by the success of deep learning methods and the availability of Web-scale datasets. Despite this progress, the majority of current approaches predominantly rely on generating images based on simplistic inputs such as text and label maps. While these methods have demonstrated an impressive capability in generating realistic images, there persists a notable disconnect between the intricate nature of human ideas and the simplistic input structures employed by the existing models. Inspired by the coarse-to-fine workflow of human artists and the inherently multimodal aspect of human thought processes, we investigate the image synthesis problem based on multi-modal and hierarchical inputs. The first part of this talk presents several learning-based methods for synthesizing and manipulating images that handle a variety of user inputs and visual characteristics in images, including text, layout maps, hand-drawn sketches, object contours, and textures. Next, I will discuss the societal implications posed by image synthesis techniques and the strategies to mitigate their risks including synthetic image detection and generative model watermarking. The final part will introduce example applications of the image synthesis techniques in other CV/AI tasks and discuss potential future research directions.

演讲者简介

Yu ZENG

Johns Hopkins University

Yu Zeng is a PhD student at Johns Hopkins University being advised by Prof. Vishal M Patel. My research interest lies in computer vision and deep learning. She have focused on two main areas: (1) deep generative models for image synthesis and editing, and (2) label-efficient deep learning. By combining these research areas, She aim to bridge human creativity and machine intelligence through user-friendly and socially responsible models while minimizing the need for intensive human supervision.

日期

05 February 2024

时间

14:30:00 - 15:30:00

地点

香港科技大学（广州）E3-2楼-202室

Join Link

Zoom Meeting ID:
821 9474 0711

Passcode: dsat

主办方

数据科学与分析学域

联系邮箱

dsat@hkust-gz.edu.cn