Run LLM in a serverless web browser with Transformers.js

Run LLM in a serverless web browser with Transformers.js

The ever-increasing availability of Large Language Models (LLM) open source offers the concrete possibility of running generative models locally, exploiting the computing power of your systems. Transformers.js is a JavaScript library that allows Transformer-based models to be directly executed in the browser, without the need to rely on a server system.

Transformers.js it is designed to be functionally equivalent to the Python library transformerssupporting a wide range of tasks regarding natural language processing, the computer visionaudio processing and multimodal applications.

The library supports various pre-trained models to carry out tasks such as text synthesis, classification (including images) and generation, speech recognition. It also provides an API that is functionally similar to the behavior of the Python librariesmaking it easy for developers who are already familiar with the Python version to switch to the version JavaScript.

A tool like Transformers.js is particularly useful for running models machine learning nel browser and allows developers to choose the most suitable templates for their specific projects.

How to get started using Transformers.js in your browser

To get started working with Transformers.js, developers can create a new progetto Node.js and install the library via npm. They can then create a new file called app.jswhich will be the entry point for their application.

Depending on whether you use ESM (ECMAScript Modules) or CommonJS, each programmer must import the Transformers.js library differently. ESM is a standard for importing and exporting modules in JavaScript. With ESM, you can organize your code into separate modules, making the management and maintenance of complex projects easier. It is supported by modern browsers and Node.js, allowing developers to use a clearer, more declarative syntax to manage dependencies between modules.

In CommonJS modules are defined using the syntax require for import e module.exports for export. This approach has been widely used in the past and is still common in many Node.js projects legacy. ESM is more modern and is considered a real benchmark for JavaScript development.

Just like in the Python library, Transformers.js supports the pipeline API. The pipeline they group a pre-trained model with the preprocessing of the inputs and the postprocessing of outputs, making them the easiest way to run models with the library. In the following example, in which we use precisely codice JavaScriptwe analyze the sentiment of the user which emerges from the information provided at the input:

import { pipeline } from ‘@xenova/transformers’;

// Allocate a pipeline for sentiment analysis
let pipe = await pipeline(‘sentiment-analysis’);

let out = await pipe(‘Amo i transformers!’);
// [{‘label’: ‘POSITIVE’, ‘score’: 0.999817686}]

The function pipeline at the beginning of code, it is used to instantiate a pipeline that contains a pre-trained model for analyzing the sentiment. The term sentiment refers to feeling oremotion associated with a specific text, sentence, document or other type of linguistic input. In the context of text analysis, sentiment analysis is the process of identifying and classifying the sentiment expressed in the text based on predefined categories (i.e. positive, neutral or negative).

Installing and using the Transformers.js library

To start using Transformers.js and immediately explore all its potential, simply install it via npm. Alternatively, however, just call the library remotely, referring to a CDN or a hosting service.

In the first case, just enter the following command to proceed with theinstallation via npm:

npm i @xenova/transformers

Alternatively, you can import the library into your JavaScript code using the following source:

The sample applications available here allow you to experience the benefits of Transformers.js without even typing a line of code.

By default, Transformers.js uses pre-trained models hosted on “ad hoc” servers and precompiled WASM binaries, which should work out of the box. However, you can customize these behaviors as indicated in this section.

Model conversion to ONNX and quantization

The developers of Transformers.js also offer practical guidance for converting models to ONNX and quantizing them.

The ONNX conversion allows you to adapt each template to the format Open Neural Network Exchange (ONNX). ONNX is a model exchange format for artificial intelligence supported by several machine learning libraries, including PyTorch, TensorFlow e MXNet.

The conversion process allows you to obtain several advantages in terms of portability, efficiency, interoperability and optimization. In fact, once you have an ONNX file, the model can be used on different platforms and frameworks without having to retrain it. This makes it easier to deploy and use the template in different environments, such as mobile devices and web browsers.

ONNX maximizes efficiency when running models, allowing good performance to be achieved even on devices with limited resources. To understand the advantages in terms of interoperability, just think that a model converted from PyTorch to ONNX can be run without problems in TensorFlow.

With the model quantization it is possible to make it more compact, reducing its size, and improving its computational efficiency without significantly compromising performance. Model parameters, which are usually represented as single- or double-precision floating-point numbers, are converted to lower-precision numbers, such as 16-bit floating-point numbers or 8-bit integers.

In the section Modelsyou can find the complete list of models that can be called up via Transformers.js: the brand new LLaMA 3 by Meta will certainly be added soon.

Leave a Reply

Your email address will not be published. Required fields are marked *