Kairui Zhang

Audio-Visual-Language Learning · Interactive AI · Mechanistic Interpretability

I am a PhD student in Engineering Mathematics at the Intelligent Systems Laboratory (ISL), University of Bristol. I am co-supervised by Zahraa S. Abdallah and Martha Lewis. My research focuses on how multimodal large language models utilize audio, visual, and linguistic information when interacting with the environment, as well as how the various circuits within these models function.

Selected projects

2026 · ICML workshop poster

VASAE: Vocabulary-Aligned Sparse Autoencoders

A sparse-autoencoder training setup that aligns dictionary directions with vocabulary anchors, then checks named features through token examples and reconstruction behavior.

sparse autoencodersfeature namingvocabulary anchors

2025 · Preprint

Recent Advances in Audio-Visual-Language Modeling

A survey and resource map that organizes AVL work by task setup, modality alignment, benchmark coverage, evaluation metrics, and gaps in current datasets.

task taxonomybenchmark mapevaluation metrics

2023 · IEEE BigData

Evaluating CLIP's Understanding on Relationships in a Blocks World

A controlled Blocks World benchmark that separates object-presence cues from relational prompts, testing whether CLIP distinguishes spatial relations in synthetic scenes.

Vision-Language ModelCompositionalitySpatial Relations

News

Contacts

Feel free to reach out by email or find my work through the links below.

Email Google Scholar GitHub ORCID Bristol profile