VASAE: Vocabulary-Aligned Sparse Autoencoders
A sparse-autoencoder training setup that aligns dictionary directions with vocabulary anchors, then checks named features through token examples and reconstruction behavior.
sparse autoencodersfeature namingvocabulary anchors