DSA学域研讨会

Spatial Audio, Spatial Audio-Visual and Visual Learning

We human beings extensively use both audio and visual information to perceive the physical world. Despite the ubiquity of audio-visual signal co-existence,  existing research dominantly focuses on visual signal, leaving the acoustic counterpart research has lagged far behind. One important contributing reason for this trend is that acoustic signals can be easily converted into 2D images with transforms such as short time Fourier transform. In this talk, I would first address the research question: is treating spatial acoustic signals as 2D images optimal. I will explore how to design novel neural networks to directly learn from audio raw waveform (not 2D image) or continuously model spatial acoustic effects (ICML-21, AISTATS-23, ICML-24). Furthermore, I will show an audio-visual multimodal learning framework where audio-vision is weakly-correlated, reflecting real-scenarios such as gas-leak (WACV-24). I will also present a visual topological learning framework in embodied AI (RSS-23). Finally, I conclude by discussing several potential research directions.

Yuhang HE

University of Oxford

Yuhang He is a final-year Ph.D. student in Computer Science, University of Oxford. Prior to his Ph.D. journey, he has had several years' industrial research experience in companies like Baidu. During his Ph.D. study, he completed two internships, one in Mitsubishi Electric Research Lab (MERL) and the other one in Microsoft, Munich, Germany. He has publications in top-tier conferences like ICML, AISTATS, RSS, WACV. His research interest currently lies in audio-vision-X multimodal spatial intelligence learning, with the ultimate goal of achieving (or even surpassing) human-level spatial intelligence. He incorporates practical applications and theoretical analysis in his research. In his spare time, he enjoys running marathons and practicing street photography.

日期

11 October 2024

时间

10:00:00 - 11:00:00

地点

香港科技大学(广州)E3-2F-202

Join Link

Zoom Meeting ID:
962 2017 7186


Passcode: dsat

主办方

数据科学与分析学域

联系邮箱

dsat@hkust-gz.edu.cn

了解更多