https://cybercat.institute/2025/02/12/transformers-applicative-functors/
- a generalization of Transformer models that can operate on (almost) arbitrary structures such as functions, graphs, probability distributions, not just matrices and vectors.
- exploring machine learning through abstract diagrammatical means
- other resources
- On the anatomy of attention arXiv:2407.02423(https://arxiv.org/abs/2407.02423) (the ‘tube’ notation in Part 4 is equivalent to the ‘SIMD’ notation in that paper)
- A pattern language for machine learning tasks arXiv:2407.02424(https://arxiv.org/abs/2407.02424)
- other resources