Programming

VRAM Estimator, how much memory is needed to run modern LLMs

VRAM Estimator, how much memory is needed to run modern LLMs

I Large Language Models (LLM) sono language models advanced technologies that use deep neural networks to learn and understand natural language. When it comes to perform LLM in locale, means hosting and running the model directly on your own machine or on a cloud server under your exclusive control. In this way, all processing takes place by relying on resources available, without relying on third parties.

Artificial intelligence and generative models are at the service of business decisions: it is in fact possible to use them to make inference starting from the enormous amount of data that each company possesses.

How to determine how much VRAM is needed to run and manage each generative model

The use of GPU high performance is essential to accelerate the parallel computing required during the training and inference phases of the LLMs. Interestingly, it is this service that helps to estimate the quantity of VRAM memory necessary to adequately support the operation of one of the most popular and appreciated models.

VRAM Estimator is an open source tool whose source is published on the corresponding GitHub repository. Calculate the effort in terms of video memory (VRAM) when training and optimizing transformer-based language models.

I CUDA Kernel they are GPU-specific software components that perform operations in parallel. When using the GPU for the first time, CUDA kernels will use between 300 and 2000 MB, which can vary depending on the type of GPU, drivers and PyTorch versions used. PyTorch is an open source machine learning and deep learning framework developed primarily by Facebook. It provides a set of tools and libraries that make it easy to create and train models machine learning, especially on deep neural networks. PyTorch is widely used in academia and industry and is known for its flexibility and ease of use.

In the VRAM Estimator interface, you will find the tools to perform evaluations both in terms of training and inference.

The terms used by VRAM Estimator

On the reference page of tool VRAM Estimator you can find the various aspects that have a predominant impact on the occupation of the VRAM memory and therefore on the requirements hardware to manage specific LLMs.

Mixed Precision Training, for example, is a training technique that uses mixed precision, i.e. both 16-bit (float16) and 32-bit (float32) representation to optimize computation time and reduce the size of activations. The latter corresponds to dimensions of the tensors which represent the intermediate outputs of a layer or set of layers within the neural network during the data processing phase.

Memory calculation VRAM GPU LLM artificial intelligence

In addition to data accuracy used during calculations, key concepts includeoptimizer o optimizer. It is a critical component during training because it determines how the model weights are updated. The goal of the optimizer is to minimize the model’s loss function by adjusting the weights so that the model improves its performance.

The length of the sequence (Sequence Length) refers to the number of time steps in a sequence of data. In contexts such as natural language, it represents the length of the sequences of words or samples considered.

A parameter like batch size represents the number of training examples used in a single iteration and allows you to parallelize calculations on the GPU more effectively. Obviously, one batch size large size can speed up the training process, but may also require more memory.

In the end, Number of GPUs, expresses the number of graphics processing units (GPUs) used when training the model. Some models can be trained in parallel on multiple GPUs.

Intervening both on the parameters linked to the execution of the model and on those specific to the individual LLM chosen using the drop-down menu Parameters Presetit is possible to obtain an estimate of the VRAM memory necessary.

Opening image credit: iStock.com – Digital43

Leave a Reply

Your email address will not be published. Required fields are marked *