Computer

Hugging Face Transformers Agent: What they are and how AI becomes multimodal

Hugging Face also improves the way of interacting with its models: the developer can now combine the use of several different mechanisms for analyzing and generating new content. This is, in short, the agents that put the multimodal approach of artificial intelligence into the hands of users.

Hugging Face is a New York-based company specializing in the field of Natural Language Processing (NLP), or in natural language processing. Founded in 2016, Hugging Face has earned a top-notch reputation for creating the open source bookstore Transformersused in many NLP applications.

The bookstore Transformers of Hugging Face allows developers to use templates deep learning of the latest generation to manage NLP activities such as understanding of language natural, machine translation, speech synthesis and many other advanced applications. The online platform of machine learning Hugging Face Hub which allows developers to share and access pre-trained and pre-configured AI models, simplifying the NLP application development process.

In these hours the new ones have been presented Hugging Face Transformers AgentAI models designed to enable businesses to build and customize their chatbots and virtual assistants with advanced NLP capabilities.

The Hugging Face Transformers Agent they can be trained on large amounts of data to improve their accuracy and their ability to understand and respond to user requests. They are also usable in different contexts, such as customer service, business information management, programming virtual assistant apps and building chatbots. In general, they are useful in all those situations where a natural interaction with the user based on the natural language.

I Transformers Agent offer, in practice, aAPI in natural language that can be used to interact at a high level with a wide range of models made available by Hugging Face. As explained in the documentation of the Transformers Agentthe developer first creates ainstance of an agent which in fact activates the possibility of interacting with a Large Language Model (LLM) then makes use of the method agent.run: it allows you to request an agent to perform a specific activity.

L’API run() allows for a one-time interaction with the agent: the method automatically selects the tool or tools needed for the current task and executes them accordingly. Every operation run() it is independent, allowing the user to run it multiple times with different tasks consecutively.

For example the following two instructions allow you to request the generation of a caption for an image and to create an audio stream (text-to-speech) to turn written text into speech:

agent.run("Aggiungi una didascalia all'immagine seguente", image=image)

agent.run("Leggi il testo che segue ad alta voce", text=text)

Then there is the method chat() which is especially useful when needed maintain state between several successive instructions. The method can also take arguments, allowing non-text or types to be passed prompt specific, depending on your needs. An example could be the following:

agent.chat("Genera una foto che mostra un lago con tanti alberi sulla riva")

agent.chat("Modifica la foto in modo che ci sia uno scoglio in mezzo al lago")

The code is then executed using a small the Python interpreter along with a number of inputs provided. Despite concerns related to the potential execution of arbitrary code, it must be said that the only functions that can be called are those provided by Hugging Face: in this way there is a significant limitation on the code that can be executed. Also, searches and imports of attributes are not allowed, further reducing the inherent risks.

Leave a Reply

Your email address will not be published. Required fields are marked *