Here are some examples of how to use PandasAI. More examples are included in the repository along with samples of data.

Working with pandas dataframes

Using PandasAI with a Pandas DataFrame

import os

from pandasai import SmartDataframe

import pandas as pd



# pandas dataframe

sales_by_country = pd.DataFrame({

    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],

    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]

})





# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# convert to SmartDataframe

sdf = SmartDataframe(sales_by_country)



response = sdf.chat('Which are the top 5 countries by sales?')

print(response)

# Output: China, United States, Japan, Germany, Australia

Working with CSVs

Example of using PandasAI with a CSV file

import os

from pandasai import SmartDataframe



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# You can instantiate a SmartDataframe with a path to a CSV file

sdf = SmartDataframe("data/Loan payments data.csv")



response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

Working with Excel files

Example of using PandasAI with an Excel file. In order to use Excel files as a data source, you need to install the pandasai[excel] extra dependency.

pip install pandasai[excel]

Then, you can use PandasAI with an Excel file as follows:

import os

from pandasai import SmartDataframe



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# You can instantiate a SmartDataframe with a path to an Excel file

sdf = SmartDataframe("data/Loan payments data.xlsx")



response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

Working with Parquet files

Example of using PandasAI with a Parquet file

import os

from pandasai import SmartDataframe



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# You can instantiate a SmartDataframe with a path to a Parquet file

sdf = SmartDataframe("data/Loan payments data.parquet")



response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

Working with Google Sheets

Example of using PandasAI with a Google Sheet. In order to use Google Sheets as a data source, you need to install the pandasai[google-sheet] extra dependency.

pip install pandasai[google-sheet]

Then, you can use PandasAI with a Google Sheet as follows:

import os

from pandasai import SmartDataframe



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# You can instantiate a SmartDataframe with a path to a Google Sheet

sdf = SmartDataframe("https://docs.google.com/spreadsheets/d/fake/edit#gid=0")

response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

Remember that at the moment, you need to make sure that the Google Sheet is public.

Working with Modin dataframes

Example of using PandasAI with a Modin DataFrame. In order to use Modin dataframes as a data source, you need to install the pandasai[modin] extra dependency.

pip install pandasai[modin]

Then, you can use PandasAI with a Modin DataFrame as follows:

import os

import pandasai

from pandasai import SmartDataframe

import modin.pandas as pd



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



sales_by_country = pd.DataFrame({

    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],

    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]

})



pandasai.set_pd_engine("modin")

sdf = SmartDataframe(sales_by_country)

response = sdf.chat('Which are the top 5 countries by sales?')

print(response)

# Output: China, United States, Japan, Germany, Australia



# you can switch back to pandas using

# pandasai.set_pd_engine("pandas")

Working with Polars dataframes

Example of using PandasAI with a Polars DataFrame (still in beta). In order to use Polars dataframes as a data source, you need to install the pandasai[polars] extra dependency.

pip install pandasai[polars]

Then, you can use PandasAI with a Polars DataFrame as follows:

import os

from pandasai import SmartDataframe

import polars as pl



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



# You can instantiate a SmartDataframe with a Polars DataFrame

sales_by_country = pl.DataFrame({

    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],

    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]

})



sdf = SmartDataframe(sales_by_country)

response = sdf.chat("How many loans are from men and have been paid off?")

print(response)

# Output: 247 loans have been paid off by men.

Plotting

Example of using PandasAI to plot a chart from a Pandas DataFrame

import os

from pandasai import SmartDataframe



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



sdf = SmartDataframe("data/Countries.csv")

response = sdf.chat(

    "Plot the histogram of countries showing for each the gpd, using different colors for each bar",

)

print(response)

# Output: check out assets/histogram-chart.png

Saving Plots with User Defined Path

You can pass a custom path to save the charts. The path must be a valid global path. Below is the example to Save Charts with user defined location.

import os

from pandasai import SmartDataframe



user_defined_path = os.getcwd()



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



sdf = SmartDataframe("data/Countries.csv", config={

    "save_charts": True,

    "save_charts_path": user_defined_path,

})

response = sdf.chat(

    "Plot the histogram of countries showing for each the gpd,"

    " using different colors for each bar",

)

print(response)

# Output: check out $pwd/exports/charts/{hashid}/chart.png

Working with multiple dataframes (using the SmartDatalake)

Example of using PandasAI with multiple dataframes. In order to use multiple dataframes as a data source, you need to use a SmartDatalake instead of a SmartDataframe. You can instantiate a SmartDatalake as follows:

import os

from pandasai import SmartDatalake

import pandas as pd



employees_data = {

    'EmployeeID': [1, 2, 3, 4, 5],

    'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],

    'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']

}



salaries_data = {

    'EmployeeID': [1, 2, 3, 4, 5],

    'Salary': [5000, 6000, 4500, 7000, 5500]

}



employees_df = pd.DataFrame(employees_data)

salaries_df = pd.DataFrame(salaries_data)



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



lake = SmartDatalake([employees_df, salaries_df])

response = lake.chat("Who gets paid the most?")

print(response)

# Output: Olivia gets paid the most.

Working with Agent

With the chat agent, you can engage in dynamic conversations where the agent retains context throughout the discussion. This enables you to have more interactive and meaningful exchanges.

Key Features

  • Context Retention: The agent remembers the conversation history, allowing for seamless, context-aware interactions.

  • Clarification Questions: You can use the clarification_questions method to request clarification on any aspect of the conversation. This helps ensure you fully understand the information provided.

  • Explanation: The explain method is available to obtain detailed explanations of how the agent arrived at a particular solution or response. It offers transparency and insights into the agent’s decision-making process.

Feel free to initiate conversations, seek clarifications, and explore explanations to enhance your interactions with the chat agent!

import os

import pandas as pd

from pandasai import Agent



employees_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Name": ["John", "Emma", "Liam", "Olivia", "William"],

    "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],

}



salaries_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Salary": [5000, 6000, 4500, 7000, 5500],

}



employees_df = pd.DataFrame(employees_data)

salaries_df = pd.DataFrame(salaries_data)





# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



agent = Agent([employees_df, salaries_df], memory_size=10)



query = "Who gets paid the most?"



# Chat with the agent

response = agent.chat(query)

print(response)



# Get Clarification Questions

questions = agent.clarification_questions(query)



for question in questions:

    print(question)



# Explain how the chat response is generated

response = agent.explain()

print(response)

Description for an Agent

When you instantiate an agent, you can provide a description of the agent. THis description will be used to describe the agent in the chat and to provide more context for the LLM about how to respond to queries.

Some examples of descriptions can be:

  • You are a data analysis agent. Your main goal is to help non-technical users to analyze data
  • Act as a data analyst. Every time I ask you a question, you should provide the code to visualize the answer using plotly
import os

from pandasai import Agent



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



agent = Agent(

    "data.csv",

    description="You are a data analysis agent. Your main goal is to help non-technical users to analyze data",

)

Add Skills to the Agent

You can add customs functions for the agent to use, allowing the agent to expand its capabilities. These custom functions can be seamlessly integrated with the agent’s skills, enabling a wide range of user-defined operations.

import os

import pandas as pd

from pandasai import Agent

from pandasai.skills import skill





employees_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Name": ["John", "Emma", "Liam", "Olivia", "William"],

    "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],

}



salaries_data = {

    "EmployeeID": [1, 2, 3, 4, 5],

    "Salary": [5000, 6000, 4500, 7000, 5500],

}



employees_df = pd.DataFrame(employees_data)

salaries_df = pd.DataFrame(salaries_data)





@skill

def plot_salaries(merged_df: pd.DataFrame):

    """

    Displays the bar chart having name on x-axis and salaries on y-axis using streamlit

    """

    import matplotlib.pyplot as plt



    plt.bar(merged_df["Name"], merged_df["Salary"])

    plt.xlabel("Employee Name")

    plt.ylabel("Salary")

    plt.title("Employee Salaries")

    plt.xticks(rotation=45)

    plt.savefig("temp_chart.png")

    plt.close()



# By default, unless you choose a different LLM, it will use BambooLLM.

# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"



agent = Agent([employees_df, salaries_df], memory_size=10)

agent.add_skills(plot_salaries)



# Chat with the agent

response = agent.chat("Plot the employee salaries against names")

print(response)