Friday, March 27, 2026
Lecture room B6
Institute for Exact Sciences
Sidlerstr. 5
CH-3012 Bern
16:50-17:40 h
Clustering dynamics in mean-field models of transformers
We consider a family of models describing the layer-wise evolution of information (represented as a set of "tokens") in transformers, a common architecture used in deep learning. These models are formulated as mean-field interacting particle systems on the d-dimensional unit sphere, evolving the empirical measure of such tokens through the depth of the network. Numerical experiments reveal the tendency of these particle systems to organize in clustered/synchronized states, offering a potential explanation for how meaning emerges in these architectures. In this talk, I will introduce both deterministic and stochastic variants of these models and provide a rigorous characterization of this phenomenon.