张恺睿

音频-视觉-语言学习 · 交互式人工智能 · 机制可解释性

我目前正在英国布里斯托大学智能系统实验室（ISL）攻读工程数学专业的博士，由 Zahraa S. Abdallah 和 Martha Lewis 共同指导。我的研究关注多模态大语言模型在与环境交互时如何使用音频、视觉和语言信息，以及模型内部各种回路是如何工作的。

代表性项目

VASAE: Vocabulary-Aligned Sparse Autoencoders

2026 · ICML workshop poster

VASAE: Vocabulary-Aligned Sparse Autoencoders

这个方法在 sparse autoencoder 训练中把 dictionary directions 对齐到 vocabulary anchors，再用 token 示例和重构行为检查命名后的特征。

sparse autoencodersfeature namingvocabulary anchors

Recent Advances in Audio-Visual-Language Modeling

2025 · preprint

Recent Advances in Audio-Visual-Language Modeling

这个 survey 和资源地图按任务设置、模态对齐方式、benchmark 覆盖、评估指标和现有数据集缺口来组织 AVL 工作。

任务分类benchmark map评估指标

Evaluating CLIP's Understanding on Relationships in a Blocks World

2023 · IEEE BigData

Evaluating CLIP's Understanding on Relationships in a Blocks World

这个受控 Blocks World benchmark 把对象存在信号和关系提示分开，用合成场景测试 CLIP 是否能区分空间关系。

合成场景空间关系受控 prompts

动态

2026 年 5 月 VASAE 已作为 ICML workshop poster 发布在 OpenReview。
2025 发布 Recent Advances in Audio-Visual-Language Modeling 项目主页。
2023 年 12 月在 IEEE BigData 2023 发表 CLIP 关系理解评估工作。

联系

欢迎通过邮件联系我，也可以通过下面的链接查看我的工作。

电子邮箱谷歌学术 GitHub ORCID Bristol profile