Article Source
Attention Approximates Sparse Distributed Memory
- Trenton Bricken, PhD student at Harvard
- Will Dorrell, PhD student at University College London’s Gatsby Unit
Abstract
In this speaker series, we examine the details of how transformers work, and dive deep into the different kinds of transformers and how they’re applied in different fields. We do this by inviting people at the forefront of transformers research across different domains for guest lectures.