Computer

Vanna, or how to chat with a SQL database using AI

Vanna, or how to chat with a SQL database using AI

The SQL JOIN are operators that allow you to combine rows from two or more tables based on a specific condition. The tables of database SQL they are connected based on the values ​​of the columns defined in the JOIN condition. The operation allows you to obtain more detailed and complete results than a single one query on the content of a specific table contained in the database.

And INNER JOIN, for example, returns only rows that have matches in both tables. The LEFT (OUTER) JOIN provides all the rows from the first specified table (the one on the left) and the corresponding rows from the table on the right (the second specified table). In reverse, RIGHT (OUTER) JOIN gives all the rows from the table on the right and the corresponding rows from the table on the left. Finally, the FULL (OUTER) JOIN returns all rows when there is a match in one of the tables.

Mastering the use of JOINs is not easy, especially when the tables are distinguished by one complex structure and not easily superimposable.

What the Vanna library that uses artificial intelligence is and how it works

Vanna.ai is an open source Python library that allows you to use Large Language Models (LLM) come OpenAI GPT o Mistral per query database using natural language. Instead of using the SQL syntaxyou can ask questions like “which technological products have generated the highest number of sales per country over the last two years?” All without having to deal with traditional SQL queries anymore.

The task of generating the correct SQL query and executing it on the database falls to Vanna. The model places each correct query into a vector database so as to constantly improve the behavior of the LLM.

Run SQL queries on databases with Vanna

The official documentation explains how to install Vanna in your environment and use the appropriate function to generate an API key. The key can be stored so you don’t have to log in every time.

The creators of Vanna also explain how to specify the use of your preferred LLM, but suggest starting with a public model generated from a sample database (Chinook).

The Chinook sample database

Il sample database Chinook is a relational database widely used for educational, demonstration and practical purposes in database management and SQL. It is designed to simulate a digital music library management system and includes several related tables, representing concepts such as artists, albums, tracks, customers, orders, and more. The tables are designed to reflect the typical relationships that exist within a typical business management application music library.

The section Train contains useful information for train the model starting from its own database. Vanna supports data sources in various formats and is able to connect to databases Snowflake, BigQuery, PostgreSQL as well as other popular sources. Beyond PostgreSQLthe other two are database management systems and services data warehousing cloud-based. Confirming how Vanna is a solution designed to interact with data volumes challenging.

An example that makes us understand Vanna’s potential

Take the following as an example Python code, taken verbatim from the project documentation. First we import several libraries, such as pandas for data manipulation, numpy for mathematical operations, Vanna which activates the interaction with the system question-answer based on natural language, e plotly for data visualization.

Initially, the API keys for using Vanna are set and used, therefore the use of the model built starting from the contents of the database is required Chinookmentioned previously:

import pandas as pd
import numpy as np
import vanna as vn
import plotly as px

api_key = vn.get_api_key('[email protected]')
vn.set_api_key(api_key)
vn.set_model('chinook')

vn.connect_to_sqlite('https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite')
vn.ask("What are the top 5 artists by sales?")

The last two lines establish a connection with the SQLite database Chinook hosted on GitHub and ultimately pushes itself to Vanna’s AI high level question: “Who are the top 5 artists who have sold the most?”

The output provided by Vanna consists of the complex and effective SQL query structured as follows:

SELECT a.name,
       sum(il.quantity) as totalsales
FROM   artist a
    INNER JOIN album al
        ON a.artistid = al.artistid
    INNER JOIN track t
        ON al.albumid = t.albumid
    INNER JOIN invoiceline il
        ON t.trackid = il.trackid
GROUP BY a.name
ORDER BY totalsales desc limit 5;

Esempio query SQL database AI

When the training phase is completed correctly, Vanna and the underlying LLM become capable of generating relevant SQL queries. For large datasetsHowever, the analyst is called upon to provide Vanna with useful clues. However, the potential is enormous and we are just scratching the surface: for example, we suggest trying very complex queries, which on paper require the use of joins between different tables.

Saving a lot of time thanks to artificial intelligence

In order for the most complex and detailed SQL queries to work and return reliable results, the advice is to try – as often happens with generative models – more strategies than prompt. Vanna, like other tools capable of generating texts, images or music, also ensures better results if the user describes in an accurate and timely manner what he wishes to obtain through a query.

Solutions based on artificial intelligence are increasingly “trendy”, in all sectors and in particular to improve business. Day after day, new tools are created that offer top-level functionality by exploiting LLM. We can easily include Vanna among the most useful tools, capable of ensuring a huge time saving.

Vanna significantly reduces the time between phases brainstorming on databases and the completion of analysis activities.

Opening image credit: iStock.com – islander11

Leave a Reply

Your email address will not be published. Required fields are marked *