Member-only story

Griffin: New LLM Architecture Conquer Long Contexts

5 min readMay 10, 2024

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale.

When it comes to efficient and powerful language models, modeling long contexts and sequences effectively and cutting cost remain significant challenges. Google DeepMind’s innovative Hawk and Griffin models are taking strides in this direction, showing remarkable abilities to leverage extended context windows and presents compelling alternative to traditional Transformer-based approaches.

Introducing Griffin and Hawk

DeepMind proposes Hawk, an RNN with a recurrent architecture called the Real-Gated Linear Recurrent Unit (RG-LRU) to enhance the model’s performance on downstream tasks. Hawk surpasses the reported performance of Mamba, another model, on these tasks.

Meanwhile, Griffin is a hybrid model that combines the use of Linear Recurrent Unit with local attention to improve performance on downstream tasks. It achieves comparable performance to Llama-2, despite being trained on significantly fewer tokens. Griffin addresses the challenges of training and scaling recurrent neural networks (RNNs) and demonstrates competitive results with reduced computational requirements.

Gated Linear Recurrences

Gated linear recurrences are a variation of RNNs that incorporate gating mechanisms to control the flow of information through the network. They play a crucial role in these models. These recurrences allow the model to efficiently process long sequences by selectively updating and resetting information at each time step.

The gating mechanism helps in capturing long-term dependencies and maintaining relevant information over time. By incorporating gated linear recurrences, it overcomes the difficulties associated with training and scaling RNNs, making it a powerful language model architecture.

Local Attention

In addition to gated linear recurrences, Griffin utilizes local attention to further enhance its efficiency. Attention mechanisms have played a crucial role in improving the performance of language models by allowing them to focus on relevant parts of…

Griffin: New LLM Architecture Conquer Long Contexts

Introducing Griffin and Hawk

Gated Linear Recurrences

Local Attention

Written by Angelina Yang

No responses yet