Skip to content

basedosdados/chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chatbot

Chatbot is a Python library designed to make it easy for Large Language Models (LLMs) to interact with your data. It is built on top of LangChain and LangGraph and provides agents and high-level assistants for natural language querying and data visualization.

Note

This library is still under active development. Expect breaking changes, incomplete features, and limited documentation.

Installation

Clone the repository and install it (you can also use poetry or uv instead of pip).

git clone https://github.com/basedosdados/chatbot.git
cd chatbot
pip install .

Assistants

SQLAssistant

The SQLAssistant allows LLMs to interact with your database so you can ask questions about it. All it needs is a LangChain Chat Model, a Context Provider and a Prompt Formatter. The context provider is responsible for providing context about your data to the SQL Agent and the prompt formatter is responsible for building a system prompt for SQL generation.

We provide a default BigQueryContextProvider for retrieving metadata directly from Google BigQuery and a default SQLPromptFormatter. You can supply your own implementation of a context provider and a prompt formatter for custom behavior.

from langchain.chat_models import init_chat_model

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

prompt_formatter = SQLPromptFormatter()

assistant = SQLAssistant(model, context_provider, prompt_formatter)

response = assistant.invoke("Hello! what can you tell me about our database?")

You can optionally use a PostgresSaver checkpointer to add short-term memory to your assistant and a VectorStore for few-shot prompting during SQL query generation:

from langchain.chat_models import init_chat_model

from langchain_postgres import PGVector
from langgraph.checkpoint.postgres import PostgresSaver

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

# it could be any combination of
# a langchain vector store and an embedding model
vector_store = PGVector(
    connection="your connection string",
    collection_name="your collection name",
    embedding=OpenAIEmbeddings(
      model="text-embedding-3-small",
    ),
)

prompt_formatter = SQLPromptFormatter(vector_store)

DB_URI = "postgresql://postgres:postgres@localhost:5442/postgres"

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()

    assistant = SQLAssistant(
        model=model,
        context_provider=context_provider,
        prompt_formatter=prompt_formatter,
        checkpointer=checkpointer,
    )

    response = assistant.invoke(
        message="Hello! what can you tell me about our database?",
        thread_id="some uuid"
    )

An async version is also available: AsyncSQLAssistant.

SQLVizAssistant

SQLVizAssistant extends SQLAssistant by retrieving data and preparing it for visualization. Using pandas and plotly, it generates a Python script that creates visualizations from the retrieved data. All it needs is a LangChain chat model, a context provider, and a prompt formatter for the SQL query generation step.

from langchain.chat_models import init_chat_model

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

prompt_formatter = SQLPromptFormatter()

assistant = SQLVizAssistant(model, context_provider, prompt_formatter)

response = assistant.invoke("Hello! what can you tell me about our database?")

You can also optionally use a PostgresSaver checkpointer to add short-term memory to your assistant, and provide langchain vector stores for few-shot prompting during SQL query generation, just like we did with the SQLAssistant:

from langchain.chat_models import init_chat_model

from langchain_postgres import PGVector
from langgraph.checkpoint.postgres import PostgresSaver

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

# it could be any combination of
# a langchain vector store and an embedding model
vector_store = PGVector(
    connection="your connection string",
    collection_name="your sql collection name",
    embedding=OpenAIEmbeddings(
      model="text-embedding-3-small",
    ),
)

prompt_formatter = SQLPromptFormatter(vector_store)

DB_URI = "postgresql://postgres:postgres@localhost:5442/postgres"

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()

    assistant = SQLVizAssistant(
        model=model,
        context_provider=context_provider,
        sql_prompt_formatter=prompt_formatter,
        checkpointer=checkpointer,
    )

    response = assistant.invoke(
        message="Hello! what can you tell me about our database?",
        thread_id="some uuid"
    )

An async version is also available: AsyncSQLVizAssistant.

Tip

To improve semantic search when using vector stores, you can enable query rewriting by setting rewrite_query=True when invoking the assistants.

Extensibility

Under the hood, both assistants rely on composable agents:

  • SQLAgent – Handles database metadata retrieval, query generation and execution.
  • VizAgent – Handles visualization reasoning.
  • RouterAgent – Orchestrates SQL querying and data visualization via a multi-agent workflow.

There is also an implementation of a simple ReActAgent with support to custom system prompts and short-term memory, to which you can add an arbitrary set of tools.

You can directly use these agents or use them to create your own workflows.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages