Research Project

Symphony: Retrieval-augmented language models using multi-modal data lakes

Abstract

Multi-modal data lakes, which contain datasets in different formats such as text, tables, and knowledge graphs, have become increasingly popular for many organizations. Large language models, as generative models, cannot ensure the correctness of generative data. Given any natural language query, Symphony will first retrieve (possibly multiple) datasets from data lakes, which are then used for reasoning to answer the given query.

Project members

Nan TANG

Associate Professor

Publications

Symphony: Towards natural language query answering over multi-modal data lakes. Zui Chen, Zihui Gu, Lei Cao, Ju Fan, Samuel Madden, and Nan Tang.

Project Period

2023-Present

Research Area

Data-driven AI

Keywords

LLM, multi-modal data lake, RAG