Let’s reveal how to immediately try Google Gemma locally, on your systems

Let's reveal how to immediately try Google Gemma locally, on your systems

Google Gemma is a new family of open generative models, designed to assist developers and researchers in responsibly building AI-based applications. The new models, already publicly available, that is Gemma 2B e 7B, are the result of the same research efforts and technological advances that led to the recent presentation of Gemini. Developed by Google DeepMind with the collaboration of other teams of the Mountain View company, the Gemma models are open source and can be used for free by developers, even for commercial uses.

The basic features of Google Gemma models

Gemma models are lightweight and cutting edge, and they deliver high performance despite its small size. The big plus is that they can be executed directly on notebooksworkstations or on the Google Cloud platform, relying on Vertex AI and Google Kubernetes Engine (GKE).

To silence the rumors that in May 2023 described third-party open source models on the road to overtaking Google and OpenAI, the company led by Sundar Pichai decides to take that path. Google Gemma templates are not only open and freely usable but can be integrated with diverse platforms and popular instruments such as Kaggle, Colab, Hugging Face, MaxText e NVidia NeMo. This integration allows developers to use Gemma models in familiar environments and with widely adopted tools, choosing the ones they prefer.

Google also provides a Responsible Generative AI Toolkit to support the safe use of AI, promoting collaboration and driving responsible use. Gemma models use automated techniques to filter personal information and other sensitive data from the datasets. Additionally, the company announced the release of a complete set of benchmark to evaluate Gemma compared to other models.

Both models, Gemma 2B and Gemma 7B, however are text-orientedin the sense that they receive input from natural language text (prompt), process it and return some text in turn output.

The novelty introduced with Gemma are so many: far too many even for an in-depth article. We therefore invite interested parties to read up on the post in Europen “a new family of open models“. Let’s focus on instead how to install Gemma and use it locally or in the cloud.

How to install Google Gem locally

As we anticipated in the introduction, Gemma is designed to run on any hardware, in locale or on the cloud. Think servers, workstations, laptops, mobile devices, whatever custom application by users. The good news, moreover, is that i generative models of Google can be optimized and subjected, in total autonomy, to activities fine tuning aimed at broadening her abilities or “specializing” Gemma’s behavior to better carry out “ad hoc” tasks.

To get started and have the chance to run locally Gemmaeven with a rather modest set of hardware resources, you can start with the lightest model with two billion parameters (2B) and then eventually migrate to 7B.

Configuring Gemma on an Ubuntu system with Ollama

Gemma can be used without limitations on a wide range of platforms and operating systems. Let’s try, first of all, to set up its operation on one Linux machinein our case Ubuntu 22.04.

Per use Gemma locally you can certainly follow the Google instructions posted on Hugging Face. An essential requirement is the presence of language Python e di pip (package management tool for Python; its name is an acronym for “Pip Installs Packages“).

For our part, however, we would like to recommend the use of To be. Open source project, Ollama brings artificial intelligence to its local systems: lightweight and extensible framework, providesAPI (Application Programming Interface) for creating, running, and managing language models, allowing users to run generative models locally on macOS, Linux and Windows.

The platform automatically recognizes the presence of cards based on GPU NVidia (otherwise, it relies on the CPU cores for processing, obviously with reduced performance) and offers a wide variety of integrations. Installing Ollama can be done via a simple terminal command, and they are available too Docker images officials.

Installing Ollama and adding Google Gemma generative models

Per install Ollamasimply issue the following command from the Linux terminal window:

curl | sh

At this point, you can choose whether to download and install the LLM (Large Language Model) Gem 2B or Gem 7B, using one of the following two instructions (both can be given to use both Google templates):

ollama run gemma:2b

ollama run gemma:7b

Run Google Gem locally

Interestingly, typing /show info at the Ollama prompt, the framework indicates 3B as the number of parameters supported by Gemma 2B and 9B in the case of the Gemma 7B model. But that’s it.

Pass a prompt to Gemma and get a reasoned response

At the Ollama prompt, you can type your question in natural language to submit to Google’s generative model. After pressing the Enter key, depending on the template chosen with the command ollama runyou will get the desired response.

Ollama runs Google Gemma 2B and 7B

Using Ollama Web UI you can even create your own chatbot by interacting directly with it via a graphical interface. In another in-depth study we saw how to set up a local chatbot with GUI using the main LLMs, including Google Gemma.

As soon as it is installed, Ollama displays a message similar to the following: “The Ollama API is now available at“. This means that it is not possible to interact with the various generative models only via a textual interface but also by relying on the dedicated Ollama API.

The following command uses theAPI REST of Ollama to communicate with Gemma and obtain a response in JSON format, then appropriately manageable:

curl http://localhost:11434/api/generate -d '{ "model": "gemma:2b", "prompt":"Perché il cielo è celeste?" }'

There are also versions of Ollama for macOS and Windows, which can be downloaded for free from the download page. Plus, the possibilities are virtually endless thanks to libraries that allow you to interact with Ollama via Python and JavaScript code. You can install these libraries using pip by Python e npm per JavaScript.

Leave a Reply

Your email address will not be published. Required fields are marked *