Learning to Synthesize Images from Multi-modal and Hierarchical Inputs
摘要
In recent years, the field of image synthesis and manipulation has experienced remarkable advancements driven by the success of deep learning methods and the availability of Web-scale datasets. Despite this progress, the majority of current approaches predominantly rely on generating images based on simplistic inputs such as text and label maps. While these methods have demonstrated an impressive capability in generating realistic images, there persists a notable disconnect between the intricate nature of human ideas and the simplistic input structures employed by the existing models. Inspired by the coarse-to-fine workflow of human artists and the inherently multimodal aspect of human thought processes, we investigate the image synthesis problem based on multi-modal and hierarchical inputs. The first part of this talk presents several learning-based methods for synthesizing and manipulating images that handle a variety of user inputs and visual characteristics in images, including text, layout maps, hand-drawn sketches, object contours, and textures. Next, I will discuss the societal implications posed by image synthesis techniques and the strategies to mitigate their risks including synthetic image detection and generative model watermarking. The final part will introduce example applications of the image synthesis techniques in other CV/AI tasks and discuss potential future research directions.
演讲者简介
Yu ZENG
Johns Hopkins University
Yu Zeng is a PhD student at Johns Hopkins University being advised by Prof. Vishal M Patel. My research interest lies in computer vision and deep learning. She have focused on two main areas: (1) deep generative models for image synthesis and editing, and (2) label-efficient deep learning. By combining these research areas, She aim to bridge human creativity and machine intelligence through user-friendly and socially responsible models while minimizing the need for intensive human supervision.
日期
05 February 2024
时间
14:30:00 - 15:30:00
地点
香港科技大学(广州)E3-2F-202
Join Link
Zoom Meeting ID: 821 9474 0711
Passcode: dsat
主办方
数据科学与分析学域
联系邮箱
dsat@hkust-gz.edu.cn