PandasAI 3.0 is currently in beta. This documentation reflects the latest features and functionality, which may evolve before the final release.

Installation

PandasAI requires Python 3.8+ <3.12. We recommend using Poetry for dependency management:

# Using poetry (recommended)
poetry add "pandasai>=3.0.0b2"

# Alternative: using pip
pip install "pandasai>=3.0.0b2"

Quick setup

In order to use PandasAI, you need a large language model (LLM). You can use any LLM, but for this guide we’ll use OpenAI through the LiteLLM extension.

First, install the required extension:

pip install pandasai-litellm

Then, import PandasAI and configure the LLM:

import pandasai as pai
from pandasai_litellm.litellm import LiteLLM

# Initialize LiteLLM with your OpenAI model
llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY")

# Configure PandasAI to use this LLM
pai.config.set({
    "llm": llm
})

Chat with your data

import pandasai as pai
from pandasai_litellm.litellm import LiteLLM

# Initialize LiteLLM with your OpenAI model
llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY")

# Configure PandasAI to use this LLM
pai.config.set({
    "llm": llm
})

# Load your data
df = pai.read_csv("data/companies.csv")

response = df.chat("What is the average revenue by region?")
print(response)

When you ask a question, PandasAI will use the LLM to generate the answer and output a response. Depending on your question, it can return different kind of responses:

  • string
  • dataframe
  • chart
  • number

Find it more about output data formats here.

Creating your first data layer

1. Define a data source

Start by creating a data schema that describes your dataset:

import pandasai as pai

# Load your data
df = pai.read_csv("data/companies.csv")

# Create the data layer
companies = pai.create(
  path="my-org/companies",
  df=df,
  description="Customer companies dataset"
)

This dataset will be saved in the datasets/my-org/companies folder of your project.

2. Define the structure of your dataset

By default, the column will be inferred from the data. For more control, though, you can define explicit column schemas:

# Define a companies dataset with explicit schema
companies = pai.create(
  path="my-org/companies",
  df=df,
  description="Customer companies dataset",
  columns=[
    {
      "name": "company_name",
      "type": "string",
      "description": "The name of the company"
    },
    {
      "name": "revenue",
      "type": "float",
      "description": "The revenue of the company"
    },
    {
      "name": "region",
      "type": "string",
      "description": "The region of the company"
    }
  ]
)

3. Load and query data

Once defined, you can easily load and query your datasets:

# Load existing datasets
stocks = pai.load("organization/coca_cola_stock")
companies = pai.load("organization/companies")

# Query using natural language
response = stocks.chat("What is the volatility of the Coca Cola stock?")
response = companies.chat("What is the average revenue by region?")

# Query using multiple datasets
result = pai.chat("Compare the revenue between Coca Cola and Apple", stocks, companies)

Next Steps