Infini-attention: artificial intelligence can work on texts of infinite length

Infini-attention: artificial intelligence can work on texts of infinite length

In 2017, a group of Google software engineers presented the historic research entitled “Attention Is All You Need“. Although, at the time, the document had not obtained the attention it deserved, precisely from the top management of the Mountain View company, that study would later literally revolutionize the field of deep learningparticularly in the context of Natural Language Processing (NLP).

The approach described by Google technicians, in fact, introduced an innovative neural network architecture called Transformer: based on the concept of “Attention“, eliminated the need to use recurrent or convolutional layers, which were widely used previously.

I Transformer have demonstrated their effectiveness in learning broad-based relationships in data, outperforming previous models in many data-related tasks.natural language processing.

Infini-attention: what changes with the possibility, for generative models, of working on infinite texts

With a new study, just published, Google is back in the news and introduces, once again, an innovative idea. The company founded by Larry Page and Sergey Brin publicly unveiled the new technique Infinite-attentionwhich allows the Large Language Models (LLM) to work with texts by infinite length.

The newly proposed approach extends the so-called “context window” (context window) of language models, allowing them to process a larger number of tokens at once, without increasing memory and computation requirements.

The context window represents the number of tokens that a model can work on at any given time. It is thought under examination the chatbot ChatGPT and the underlying OpenAI GPT model: if the information introduced goes beyond the context window, the model’s performance drops sharply and the tokens present in the initial part of the chat are automatically discarded.

Increasing the length of the context has therefore become a primary objective to improve the performance and quality of the results generative modelsthus obtaining a competitive advantage.

Experiments conducted by Google’s research team indicate that models based on Infinite-attention they can maintain their quality across over a million tokens, without requiring additional memory. Furthermore, this type of performance can also be extended to even longer texts.

How does Infini-attention improve model performance without side effects

The use of Transformer presents a “quadratic complexity” in terms of memory and calculation time. This means that the amount of memory required and the time needed to process data increases exponentially with the size of the data input data.

For example, if you extend the size of the input from 1,000 to 2,000 tokens, the memory and computation time needed to process the input does not double, but rather quadruples.

This quadratic relationship arises from the mechanism of self-attention used in Transformers. It compares every element in the input sequence with every other element. In other words, each token in the input must be related to all other tokens, which leads to a significant increase in computational complexity as the size of the increases prompt supplied by the user.

Infinite-attention maintains the classic attention mechanism and adds a module called “compressive memory” to handle extended inputs. Once the input exceeds a certain context length, the model saves the old attention states in the compressive memorywhich maintains a constant number of parameters to maximize thecomputational efficiency. In order to process the final output, Infinite-attention then aggregates the content of the compressive memory and contexts of local attention.

The opening to LLM with a infinite context could allow the creation of customized applications, eliminating the need to resort to complex techniques such as fine-tuning o to Retrieval-Augmented Generation (RAG).

Google’s new study, however, is not intended to wipe out all other techniques: rather, it will make the creation of advanced artificial intelligence applicationswithout the need for extensive engineering efforts.

Opening image credit: – BlackJack3D

Leave a Reply

Your email address will not be published. Required fields are marked *