To represent the word boat, large language models (LLMs) have to model a lot of information. Semantically, a boat is a type of vehicle that can usually hold a person and floats on water. But it may also be helpful for an LLM to encode that "boat" is an English word, that it starts with "b," and that it rhymes with "oat", "goat", and "float" (McLaughlin et al., 2025). In this short paper, we show how certain model components specialized for copying semantic and token information can be used to tease out these distinctions in the geometry of Llama-2-7b representations.
Background: Word Vector Arithmetic
In their famous word2vec paper, Mikolov et al. (2013) showed that they could train word embeddings that reflected intuitive parallelogram-like structure. The classic example was that of a consistent "gender vector" in embedding space, where they showed that the difference between the vector for man and the vector for woman was basically equal to the difference between king and queen. In other words, if you added (man - woman) to queen, you could actually get the vector for king.
You could also think of this in terms of analogies, like "man is to woman as king is to queen," or "Tim Hortons is to Canada as Dunkin' is to New England." But today's LLMs are decoder models with the goal of predicting the next token, not creating embeddings for use in downstream applications. Can we figure out how to do this kind of analysis on a model like Llama-2-7b?
Concept and Token Induction
In our previous paper on the "Dual-Route Model of Induction" (Feucht et al., 2025), we isolated two types of induction heads, attention heads in LLMs that are responsible for copying text. We found that token induction heads, originally described by Elhage et al. (2021), are responsible for verbatim copying, whereas concept induction heads were responsible for copying whole word representations.
In that paper, we used the weights of those two types of heads to create concept lens and token lens, matrices that can reveal either semantic or literal token information stored in a given hidden state. (See our original project page for a quick overview of this approach, or the paper for specific details).
Our Approach
In this work, we investigate whether we can use Llama-2-7b's internal hidden states as word embeddings that operate like Mikolov et al. (2013)'s vectors. Let's say we want to see if Llama-2-7b's hidden states encode country-capital city relationships. For every country and capital in their dataset, we pass that word through Llama-2-7b (with the prefix "She travelled to" to ensure that it understands each word as a location). We can then take the activations for that word at the last token position as an embedding for that word, and perform word2vec arithmetic with all of these separate hidden states.
The best nearest-neighbor accuracy you can get by doing this with raw hidden states is around 50%. However, we find that if you use concept lens to transform these hidden states first, you can get accuracy comparable with the model's performance when prompted to do the same thing in-context! (See Figure 1 of the paper for full results.)
Finally, if we do the same thing for token lens, then analogies like code - coding = dance - dancing become much clearer (with around 80% accuracy) compared to using raw hidden states (around 30% accuracy). Again, see Figure 1 of the paper for full results.
These results suggest that the concept and token heads found in previous work don't just blindly copy representations of tokens, or words. In the case of concept heads, it seems that they transport rich representations of the semantics of a word, whereas for token heads, they transport some kind of information that has to do with how words are literally written.
Related Work
Language Models Implement Simple Word2Vec-style Vector Arithmetic. Jack Merullo, Carsten Eickhoff, and Ellie Pavlick. 2023. Notes: Merullo et al. (2023) find a very similar phenomenon in GPT2-Medium: they show that feed-forward sublayers in later layers of the model will actually output get_capital(x) function vectors (similar to Todd et al. (2024)) that, when added to Poland, cause the model to output Warsaw. These may be the very same vectors we obtain when calculating Athens - Greece in concept space, because they occur at mid-late layers, after concept induction has taken place. In other words, we speculate that the feed-forward components of later model layers have learned to work within the output subspace of concept induction heads, taking advantage of this semantic rich structure to perform word2vec-style operations.
This work was accepted at the NeurIPS Mech Interp Workshop (2025). It can be cited as follows:
bibliography
Sheridan Feucht, Byron Wallace, and David Bau. "Vector Arithmetic in Concept and Token Subspaces." Second Mechanistic Interpretability Workshop at NeurIPS (2025).
bibtex
@inproceedings{feucht2025arithmetic,
title={Vector Arithmetic in Concept and Token Subspaces},
author={Sheridan Feucht and Byron Wallace and David Bau},
booktitle={Second Mechanistic Interpretability Workshop at NeurIPS},
year={2025},
url={https://arithmetic.baulab.info}
}