In order to use PandasAI, you need a large language model (LLM). You can use any LLM, but for this guide we’ll use OpenAI through the LiteLLM extension.
First, install the required extension:
Copy
pip install pandasai-litellm
Then, import PandasAI and configure the LLM:
Copy
import pandasai as paifrom pandasai_litellm.litellm import LiteLLM# Initialize LiteLLM with your OpenAI modelllm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY")# Configure PandasAI to use this LLMpai.config.set({ "llm": llm})
import pandasai as paifrom pandasai_litellm.litellm import LiteLLM# Initialize LiteLLM with your OpenAI modelllm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY")# Configure PandasAI to use this LLMpai.config.set({ "llm": llm})# Load your datadf = pai.read_csv("data/companies.csv")response = df.chat("What is the average revenue by region?")print(response)
When you ask a question, PandasAI will use the LLM to generate the answer and output a response.
Depending on your question, it can return different kind of responses:
Start by creating a data schema that describes your dataset:
Copy
import pandasai as pai# Load your datadf = pai.read_csv("data/companies.csv")# Create the data layercompanies = pai.create( path="my-org/companies", df=df, description="Customer companies dataset")
This dataset will be saved in the datasets/my-org/companies folder of your project.
By default, the column will be inferred from the data. For more control, though, you can define explicit column schemas:
Copy
# Define a companies dataset with explicit schemacompanies = pai.create( path="my-org/companies", df=df, description="Customer companies dataset", columns=[ { "name": "company_name", "type": "string", "description": "The name of the company" }, { "name": "revenue", "type": "float", "description": "The revenue of the company" }, { "name": "region", "type": "string", "description": "The region of the company" } ])
Once defined, you can easily load and query your datasets:
Copy
# Load existing datasetsstocks = pai.load("organization/coca_cola_stock")companies = pai.load("organization/companies")# Query using natural languageresponse = stocks.chat("What is the volatility of the Coca Cola stock?")response = companies.chat("What is the average revenue by region?")# Query using multiple datasetsresult = pai.chat("Compare the revenue between Coca Cola and Apple", stocks, companies)