Quickstart

PandasAI 3.0 is currently in beta. This documentation reflects the latest features and functionality, which may evolve before the final release.

Installation

PandasAI requires Python 3.8+ <3.12. We recommend using Poetry for dependency management:

# Using poetry (recommended)
poetry add "pandasai>=3.0.0b2"

# Alternative: using pip
pip install "pandasai>=3.0.0b2"

Quick setup

In order to use PandasAI, you need a large language model (LLM). You can use any LLM, but for this guide we’ll use OpenAI through the LiteLLM extension. First, install the required extension:

pip install pandasai-litellm

Then, import PandasAI and configure the LLM:

import pandasai as pai
from pandasai_litellm.litellm import LiteLLM

# Initialize LiteLLM with your OpenAI model
llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY")

# Configure PandasAI to use this LLM
pai.config.set({
    "llm": llm
})

Chat with your data

import pandasai as pai
from pandasai_litellm.litellm import LiteLLM

# Initialize LiteLLM with your OpenAI model
llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY")

# Configure PandasAI to use this LLM
pai.config.set({
    "llm": llm
})

# Load your data
df = pai.read_csv("data/companies.csv")

response = df.chat("What is the average revenue by region?")
print(response)

When you ask a question, PandasAI will use the LLM to generate the answer and output a response. Depending on your question, it can return different kind of responses:

string
dataframe
chart
number

Find it more about output data formats here.

Creating your first data layer

1. Define a data source

Start by creating a data schema that describes your dataset:

import pandasai as pai

# Load your data
df = pai.read_csv("data/companies.csv")

# Create the data layer
companies = pai.create(
  path="my-org/companies",
  df=df,
  description="Customer companies dataset"
)

This dataset will be saved in the datasets/my-org/companies folder of your project.

2. Define the structure of your dataset

By default, the column will be inferred from the data. For more control, though, you can define explicit column schemas:

# Define a companies dataset with explicit schema
companies = pai.create(
  path="my-org/companies",
  df=df,
  description="Customer companies dataset",
  columns=[
    {
      "name": "company_name",
      "type": "string",
      "description": "The name of the company"
    },
    {
      "name": "revenue",
      "type": "float",
      "description": "The revenue of the company"
    },
    {
      "name": "region",
      "type": "string",
      "description": "The region of the company"
    }
  ]
)

3. Load and query data

Once defined, you can easily load and query your datasets:

# Load existing datasets
stocks = pai.load("organization/coca_cola_stock")
companies = pai.load("organization/companies")

# Query using natural language
response = stocks.chat("What is the volatility of the Coca Cola stock?")
response = companies.chat("What is the average revenue by region?")

# Query using multiple datasets
result = pai.chat("Compare the revenue between Coca Cola and Apple", stocks, companies)

Next Steps

Learn more about Semantic Layer
Join our Discord Community for support

Overview

Data layer

Natural Language

Data Platform

Advanced Usage

Backwards Compatibility

About

Installation

Quick setup

Chat with your data

Creating your first data layer

1. Define a data source

2. Define the structure of your dataset

3. Load and query data

Next Steps

Overview

Data layer

Natural Language

Data Platform

Advanced Usage

Backwards Compatibility

About

​Installation

​Quick setup

​Chat with your data

​Creating your first data layer

​1. Define a data source

​2. Define the structure of your dataset

​3. Load and query data

​Next Steps

Installation

Quick setup

Chat with your data

Creating your first data layer

1. Define a data source

2. Define the structure of your dataset

3. Load and query data

Next Steps