Long-Sequence Attention with ⚡FlashAttention⚡
The new year opens with lots of discussions of interesting new papers, following the high tide of the ChatGPT. The intro of FlashAttention is among one of the best ones. The main problem it is addressing is an important one for the Transformer architectures — accelerating the speed and improving memory consumption for self-attention operations.
Why is it interesting?
One way to recognize a good paper or a new method is to learn how quickly it is adopted/adapted by the open-source world and the industry.