Kairui Zhang

Audio-Visual-Language Learning · Interactive AI · Mechanistic Interpretability

I am a PhD student in Engineering Mathematics at the Intelligent Systems Laboratory (ISL), University of Bristol. I am co-supervised by Zahraa S. Abdallah and Martha Lewis. My research focuses on how multimodal large language models utilize audio, visual, and linguistic information when interacting with the environment, as well as how the various circuits within these models function.
Kairui Zhang

Selected projects

VASAE: Vocabulary-Aligned Sparse Autoencoders

2026 · ICML workshop poster

VASAE: Vocabulary-Aligned Sparse Autoencoders

A sparse-autoencoder training setup that aligns dictionary directions with vocabulary anchors, then checks named features through token examples and reconstruction behavior.

sparse autoencodersfeature namingvocabulary anchors
Recent Advances in Audio-Visual-Language Modeling

2025 · Preprint

Recent Advances in Audio-Visual-Language Modeling

A survey and resource map that organizes AVL work by task setup, modality alignment, benchmark coverage, evaluation metrics, and gaps in current datasets.

task taxonomybenchmark mapevaluation metrics

Contacts

Feel free to reach out by email or find my work through the links below.