Fast-Weight Attention for Continual Learning

Yifan Zhang

Abstract

Recurrent fast-weight memories and selective state-space models compress a growing context into a bounded state. Because every new token updates that state, their writes can be viewed as online continual-learning rules — at inference time the model is, in effect, continually learning from its own context.

Falcon develops this fast-weight attention perspective, connecting attention, recurrent memory, and state-space updates through the lens of continual learning over a bounded memory.

Read the Paper Code

Overview

Falcon: fast-weight attention as an online continual-learning rule. — **Figure:** Falcon — viewing fast-weight and state-space writes as online continual-learning updates to a bounded memory.

Citation

If you find this work useful, please cite:

@article{zhang2026fast,
  title   = {Fast Weight Attention for Continual Learning},
  author  = {Zhang, Yifan and others},
  journal = {yifanzhang-pro.github.io},
  year    = {2026},
  month   = {March},
  url     = {https://github.com/yifanzhang-pro/fast-weight-attention}
}