PandasAI Agent Overview
While the pai.chat() method is meant to be used in a single session and for exploratory data analysis, an agent can be used for multi-turn conversations.
To instantiate an agent, you can use the following code:
import os
from pandasai import Agent
import pandas as pd
# Sample DataFrames
sales_by_country = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000],
"deals_opened": [142, 80, 70, 90, 60, 50, 40, 30, 110, 120],
"deals_closed": [120, 70, 60, 80, 50, 40, 30, 20, 100, 110]
})
agent = Agent(sales_by_country)
agent.chat('Which are the top 5 countries by sales?')
# Output: China, United States, Japan, Germany, Australia
Contrary to the pai.chat() method, an agent will keep track of the state of the conversation and will be able to answer multi-turn conversations. For example:
agent.chat('And which one has the most deals?')
# Output: United States has the most deals
Clarification questions
An agent will also be able to ask clarification questions if it does not have enough information to answer the query. For example:
agent.clarification_questions('What is the GDP of the United States?')
This will return up to 3 clarification questions that the agent can ask the user to get more information to answer the query.
Explanation
An agent will also be able to explain the answer given to the user. For example:
response = agent.chat('What is the GDP of the United States?')
explanation = agent.explain()
print("The answer is", response)
print("The explanation is", explanation)
Rephrase Question
Rephrase question to get accurate and comprehensive response from the model. For example:
rephrased_query = agent.rephrase_query('What is the GDP of the United States?')
print("The rephrased query is", rephrased_query)
Using the Agent in a Sandbox Environment
The sandbox works offline and provides an additional layer of security for
code execution. It’s particularly useful when working with untrusted data or
when you need to ensure that code execution is isolated from your main system.
To enhance security and protect against malicious code through prompt injection, PandasAI provides a sandbox environment for code execution. The sandbox runs your code in an isolated Docker container, ensuring that potentially harmful operations are contained.
Installation
Before using the sandbox, you need to install Docker on your machine and ensure it is running.
First, install the sandbox package:
pip install pandasai-docker
Basic Usage
Here’s how to use the sandbox with your PandasAI agent:
from pandasai import Agent
from pandasai_docker import DockerSandbox
# Initialize the sandbox
sandbox = DockerSandbox()
sandbox.start()
# Create an agent with the sandbox
df = pai.read_csv("data.csv")
agent = Agent([df], sandbox=sandbox)
# Chat with the agent - code will run in the sandbox
response = agent.chat("Calculate the average sales")
# Don't forget to stop the sandbox when done
sandbox.stop()
Customizing the Sandbox
You can customize the sandbox environment by specifying a custom name and Dockerfile:
sandbox = DockerSandbox(
"custom-sandbox-name",
"/path/to/custom/Dockerfile"
)
Training the Agent with local Vector stores
Training agents with local vector stores requires a PandasAI Enterprise license. See Enterprise Features for more details or contact us for production use.
It is possible also to use PandasAI with a few-shot learning agent, thanks to the “train with local vector store” enterprise feature (requiring an enterprise license).
If you want to train the agent with a local vector store, you can use the local ChromaDB, Qdrant or Pinecone vector stores. Here’s how to do it:
An enterprise license is required for using the vector stores locally. See Enterprise Features for licensing information.
If you plan to use it in production, contact us.
from pandasai import Agent
from pandasai.ee.vectorstores import ChromaDB
from pandasai.ee.vectorstores import Qdrant
from pandasai.ee.vectorstores import Pinecone
from pandasai.ee.vector_stores import LanceDB
# Instantiate the vector store
vector_store = ChromaDB()
# or with Qdrant
# vector_store = Qdrant()
# or with LanceDB
vector_store = LanceDB()
# or with Pinecone
# vector_store = Pinecone(
# api_key="*****",
# embedding_function=embedding_function,
# dimensions=384, # dimension of your embedding model
# )
# Instantiate the agent with the custom vector store
agent = Agent("data.csv", vectorstore=vector_store)
# Train the model
query = "What is the total sales for the current fiscal year?"
# The following code is passed as a string to the response variable
response = '\n'.join([
'import pandas as pd',
'',
'df = dfs[0]',
'',
'# Calculate the total sales for the current fiscal year',
'total_sales = df[df[\'date\'] >= pd.to_datetime(\'today\').replace(month=4, day=1)][\'sales\'].sum()',
'result = { "type": "number", "value": total_sales }'
])
agent.train(queries=[query], codes=[response])
response = agent.chat("What is the total sales for the last fiscal year?")
print(response)
# The model will use the information provided in the training to generate a response