Chatbot

Chatbot is a Python library designed to make it easy for Large Language Models (LLMs) to interact with your data. It is built on top of LangChain and LangGraph and provides agents and high-level assistants for natural language querying and data visualization.

Note

This library is still under active development. Expect breaking changes, incomplete features, and limited documentation.

Installation

Clone the repository and install it (you can also use poetry or uv instead of pip).

git clone https://github.com/basedosdados/chatbot.git
cd chatbot
pip install .

Assistants

SQLAssistant

The SQLAssistant allows LLMs to interact with your database so you can ask questions about it. All it needs is a LangChain Chat Model, a Context Provider and a Prompt Formatter. The context provider is responsible for providing context about your data to the SQL Agent and the prompt formatter is responsible for building a system prompt for SQL generation.

We provide a default BigQueryContextProvider for retrieving metadata directly from Google BigQuery and a default SQLPromptFormatter. You can supply your own implementation of a context provider and a prompt formatter for custom behavior.

from langchain.chat_models import init_chat_model

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

prompt_formatter = SQLPromptFormatter()

assistant = SQLAssistant(model, context_provider, prompt_formatter)

response = assistant.invoke("Hello! what can you tell me about our database?")

You can optionally use a PostgresSaver checkpointer to add short-term memory to your assistant and a VectorStore for few-shot prompting during SQL query generation:

from langchain.chat_models import init_chat_model

from langchain_postgres import PGVector
from langgraph.checkpoint.postgres import PostgresSaver

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

# it could be any combination of
# a langchain vector store and an embedding model
vector_store = PGVector(
    connection="your connection string",
    collection_name="your collection name",
    embedding=OpenAIEmbeddings(
      model="text-embedding-3-small",
    ),
)

prompt_formatter = SQLPromptFormatter(vector_store)

DB_URI = "postgresql://postgres:postgres@localhost:5442/postgres"

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()

    assistant = SQLAssistant(
        model=model,
        context_provider=context_provider,
        prompt_formatter=prompt_formatter,
        checkpointer=checkpointer,
    )

    response = assistant.invoke(
        message="Hello! what can you tell me about our database?",
        thread_id="some uuid"
    )

An async version is also available: AsyncSQLAssistant.

SQLVizAssistant

SQLVizAssistant extends SQLAssistant by retrieving data and preparing it for visualization. Using pandas and plotly, it generates a Python script that creates visualizations from the retrieved data. All it needs is a LangChain chat model, a context provider, and a prompt formatter for the SQL query generation step.

from langchain.chat_models import init_chat_model

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

prompt_formatter = SQLPromptFormatter()

assistant = SQLVizAssistant(model, context_provider, prompt_formatter)

response = assistant.invoke("Hello! what can you tell me about our database?")

You can also optionally use a PostgresSaver checkpointer to add short-term memory to your assistant, and provide langchain vector stores for few-shot prompting during SQL query generation, just like we did with the SQLAssistant:

from langchain.chat_models import init_chat_model

from langchain_postgres import PGVector
from langgraph.checkpoint.postgres import PostgresSaver

from chatbot.assistants import SQLAssistant
from chatbot.contexts import BigQueryContextProvider
from chatbot.formatters import SQLPromptFormatter

model = init_chat_model("gpt-4o", temperature=0)

# you must point the GOOGLE_APPLICATION_CREDENTIALS
# env variable to your service account JSON file.
context_provider = BigQueryContextProvider(
    billing_project="your billing project",
    query_project="your query project",
)

# it could be any combination of
# a langchain vector store and an embedding model
vector_store = PGVector(
    connection="your connection string",
    collection_name="your sql collection name",
    embedding=OpenAIEmbeddings(
      model="text-embedding-3-small",
    ),
)

prompt_formatter = SQLPromptFormatter(vector_store)

DB_URI = "postgresql://postgres:postgres@localhost:5442/postgres"

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()

    assistant = SQLVizAssistant(
        model=model,
        context_provider=context_provider,
        sql_prompt_formatter=prompt_formatter,
        checkpointer=checkpointer,
    )

    response = assistant.invoke(
        message="Hello! what can you tell me about our database?",
        thread_id="some uuid"
    )

An async version is also available: AsyncSQLVizAssistant.

Tip

To improve semantic search when using vector stores, you can enable query rewriting by setting rewrite_query=True when invoking the assistants.

Extensibility

Under the hood, both assistants rely on composable agents:

SQLAgent – Handles database metadata retrieval, query generation and execution.
VizAgent – Handles visualization reasoning.
RouterAgent – Orchestrates SQL querying and data visualization via a multi-agent workflow.

There is also an implementation of a simple ReActAgent with support to custom system prompts and short-term memory, to which you can add an arbitrary set of tools.

You can directly use these agents or use them to create your own workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
.github/workflows		.github/workflows
chatbot		chatbot
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chatbot

Installation

Assistants

SQLAssistant

SQLVizAssistant

Extensibility

About

Uh oh!

Releases 17

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

basedosdados/chatbot

Folders and files

Latest commit

History

Repository files navigation

Chatbot

Installation

Assistants

SQLAssistant

SQLVizAssistant

Extensibility

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages