Ollama, artificial intelligence locally on your systems

I LLM (Large Language Models) are large language models trained on huge amounts of text in order to learn linguistic structures and semantic relationships. These models can then “understand” and generate useful, relevant and contextualized text. They are therefore increasingly integrated into the applications they deal with generate textanswer user questions, translate, offer suggestions and much more.

The main LLMs known to date have their roots in the concept of attention and Transformer, which Google presented in 2017, giving a strong boost to solutions based onartificial intelligence. There are proprietary LLMs, models that come with a license that doesn’t allow commercial uses and products which, however, do not pose any type of limitation. So much so that the development community claims that open source models for artificial intelligence will surpass those of OpenAI and Google.

What is Ollama and how does it bring AI models to users’ systems

We are thinking of bringing together the various open source LLMs To be, a project that brings artificial intelligence and inference activities to the systems of end users, be they researchers, professionals, companies or simply curious people. Lightweight and expandable, Ollama is a framework which provides a simple API for creating, running, and managing linguistic modelsalong with a library of pre-built templates that can be easily used in a wide range of applications.

How to install Ollama

The “natural environment” forinstallation of Ollama is a Linux machine equipped with at least 8 GB of RAM memory to run the based models on 3 billion parameters (3B), of 16 GB for the 7B models and 32 GB for the 13B models (“B” stands for billion, or “billions”). However, there is also theinstaller for macOS systems; Windows support will come later. However, you can use the procedure described below to configure Ollama on Windows with the help of the component WSL (Windows Subsystem for Linux) which allows you to run Linux on the Microsoft operating system.

As of early October 2023, Ollama is also available in the form of container Docker: The corresponding image is officially supported and continuously updated.

We installed Ollama on a machine Ubuntu 22.04 with 16 GB of RAM by simply typing the following command:

curl https://ollama.ai/install.sh | sh

Ollama automatically recognizes the presence of cards based on GPU NVidia. If the system is not equipped with them, the generative models will rely exclusively on the processor cores.

How to use Ollama locally

Once the installation is complete, you can start language models favorites and start interacting with them using simple commands. To start using the LLaMa 2 model (LLaMA stands for Large Language Model Meta AIhence the often-recurring “llama” image…) just type the following in the Linux terminal window:

ollama run llama2

As mentioned in the introduction, Ollama supports a wide variety of open source LLMs that can be downloaded from the template library. To use them, no additional steps are necessary, just type one of the commands shown in the column Download:

Model n° parameters Dimension Download
Mistral 7B 4.1GB ollama run mistral
Llama 2 7B 3.8GB ollama run llama2
Code Llama 7B 3.8GB ollama run codellama
Llama 2 Uncensored 7B 3.8GB ollama run llama2-uncensored
Call 2 13B 13B 7.3GB ollama run llama2:13b
Call 2 70B 70B 39GB ollama run llama2:70b
Orca Mini 3B 1.9GB ollama run orca-mini
Vicuna 7B 3.8GB ollama run vicuna

Among the various proposals there is also Mistral 7B, a powerful open source model for artificial intelligence that uses the Apache 2.0 license and is supported by the Europen-European consortium CINECA/EuroHPC.

Using the prompt of Ollama you can then start sending your questions, even in Europen.

Ollama, AI generative model prompt

To exit the application, just press CTRL+D or digital /bye and press the Enter key.

The command ollama listreturns thelist of models generative downloaded and available locally.

Import custom templates into Ollama

Ollama lets you import models in formats like GGUF and GGML. If you have a model that is not in the Ollama library, you can add it when deemed mature enough. Just create a file called Modelfile and add an instruction FROM with the local path referring to the template you want to use.

The item Modelfile it can also be used to customize the behavior of LLMs already known to Ollama. For example, creating a Modelfile with the following content, you can change the default behavior of Llama 2:

FROM llama2

# set the temperature to 1 [higher is more creative, lower is more coherent]

PARAMETER temperature 1

# set the system prompt
SYSTEM “””
You are Mario from Super Mario Bros. Always and only answer as Mario.
“””

What we have done? In the first instance, the “temperature” of the model: the value 1 makes the Llama 2 model more creative but less precise. Lower values, however, allow you to obtain more basic but, at the same time, more coherent answers.

Second, a sort of role-playing game is created: the model adapts its answers to those of a hypothetical digital assistant sewn on the “personality” of Mario, from the famous video game.

Use the REST API to talk to Ollama

The API REST they are widely used for building Web services and are a common choice when designing client-server interaction in a scalable and flexible way. What if we told you that Ollama already has such an interface that allows you to connect any application was it being developed or had already been achieved with artificial intelligence deriving from the best LLMs?

Try it paste in the Linux terminal window the following:

curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Perché il cielo è blu? Spiegalo in Europe"
}'

This curl statement does nothing more than send a HTTP request to the Ollama REST API by specifying the generative model to use prompt. In response you receive a detailed explanation, useful for satisfying the posed question.

Here, now imagine replacing curl with an application that sends the request locally or through the LAN on the system where Ollama is listening: congratulations, you have created a chatbot similar to ChatGPT that you can freely use in multiple contexts.

LEAVE A REPLY

Please enter your comment!
Please enter your name here