Programming

Run a chatbot like ChatGPT locally on QNAP NAS

Run a chatbot like ChatGPT locally on QNAP NAS

Retrieval Augmented Generation (RAG) is an artificial intelligence model that combines two main approaches: information retrieval and text generation. This approach is mainly used to extract valuable information from business data. THE Large Language Model (LLM) “generalists”, in fact, have no visibility on data of individual business realities: companies of any size but also small professional studios. Their knowledge is in fact “limited” to data sources publicly accessible. In this article we will see how to do it ChatGPT in locale and customize its operation thanks to a QNAP NAS.

I server NAS QNAP branded devices are among the most popular devices in the business sector for storing, managing and restoring data. As we have told in our other articles, the devices of storage QNAP they have become real computers, capable of guaranteeing high performance. We have also given concrete examples of advanced devices that can be used, for example, in the manufacturing sector.

QNAP NAS offer a rich catalog of applications to install directly on the devices and are widely customizable, with the ability to unlock the features best suited to your business.

ChatGPT locally on QNAP NAS: what a great idea!

QNAP supports the use of GPU in many of its NAS systems. Likewise, the Taiwanese company offers the possibility of installing and using a large array of apps which in turn can benefit from the use of GPUs. Virtualization Station, in particular, is a hypervisor for QNAP NAS that allows users to create and manage virtual machines. This solution provides a large set of features, including the so-called GPU passthroughi.e. the possibility of interacting directly with the video section.

NAS QNAP TS-h1290fx

StorageReview colleagues used a dedicated card NVIDIA RTX A4000 single-slot, installing it in a QNAP TS-h1290FX all-flash NAS. You can still use other QNAP NAS models by making sure to check support for Virtualization Station as well as for the PCIe graphics card that you want to install.

The dedicated video card it can be used to run a sort of chatbot similar to ChatGPT locally, maximizing the performance of inference activities linked to artificial intelligence.

The first step, after installing the graphics card inside the chassis of the QNAP NAS consists of creating a virtual machine con Virtualization Station. The procedure is the standard one: just set up a Windows machine with 64 GB of memory and 8 CPUs (activating the CPU passthrough). We remind you that, for example, the TS-h1290FX NAS is perfectly capable of supporting this configuration, as it is based on an AMD EPYC 7302P processor (16 cores, 32 threads) and 256 GB of RAM. In the virtual machine startup options, you can set it UEFI as a typology of BIOS.

After the first start of the virtual machine (the operating system can be installed from aISO image official Windows), you can enable the functionality Desktop remoto. This step allows you to significantly simplify system administration.

Enabling GPU passthrough

GPU passthrough is a technique used in virtual machines to allow them to directly access the host system’s GPU rather than using an emulated virtual GPU. The virtual machine can thus take full advantage of the GPU power physics of the host system, enabling high-performance workloads. With the GPU passthroughthe use of the video card is assigned to the virtual machine, allowing for near-native performance within the virtual environment.

After temporarily shutting down the virtual machine created on the QNAP NAS, you can select it in Virtualization Station then access the configuration modification screen.

In the section PCIeyou can choose to use the video card previously installed inside the chassis. By restarting the virtual machine, you must proceed with thedriver installation for the graphics card.

By accessing the Task Manager di Windows (CTRL+MAIUSC+ESC), with one click your More details so up Performancethe GPU must be correctly recognized and functioning.

How to run NVIDIA ChatRTX on your QNAP NAS

ChatRTX is an application developed by NVIDIA that allows users to customize a Large Language Model (LLM) in type GPT (Generative pre-trained transformer), exactly like the one OpenAI uses to govern the operation of ChatGPT. The main difference compared to ChatGPT standard, is that ChatRTX can be connected to its contents or access the knowledge base (knowledge base) of each individual company.

The RAG abilities that characterize a tool like ChatRTX open the way to the possibility of process business documents, notes, multimedia, images and much more. The advantage is that of having a LLM locale which operates exclusively within the confines of the QNAP NAS, never sharing any data to the cloud.

Plus, ChatRTX integrates seamlessly with your NAS operation – you don’t have to move the data Leveraging the model and process is as simple and cost-effective as putting a mid-range GPU into your QNAP-branded storage device.

ChatGPT locale with ChatRTX and QNAP NAS

Set up ChatRTX in a few simple steps

After downloading and installing NVIDIA ChatRTX in the virtual machine created on the QNAP NAS (the download can be started by clicking on Download nowinvolves the withdrawal of approximately 36 GB of data), you must press the key combination Windows+Rthen type the following: %localappdata%\NVIDIA\ChatWithRTX\RAG\trt-llm-rag-windows-main\ui.

Opening the Python script user_interface.pythe directive must be added share=True, (don’t forget the comma) immediately below the line show_api=False, then save the file. This way ChatRTX will also be able to access the local network.

By launching Chat with RTX from the Windows Start menu, the application displays a black background screen showing a public URL and a local address (of type localhost). By typing it into the address bar of your web browser, you can start talking to the generative model.

ChatRTX supports various file formats, including plain text, PDF, DOC/DOCX and XML. Simply open the folder containing the company files from the application to upload them to the catalog in a few seconds. In other words, by pointing to the folders containing your files, the generative model will begin to take them into consideration to expand (locally) its knowledge and personalize your responses (refer to the field Folder path).

The opening image is taken from the NVIDIA ChatRTX page, your personalized chatbot.

Leave a Reply

Your email address will not be published. Required fields are marked *