Enhance the PandasAI library with the Semantic Agent for more accurate and interpretable results.
The SemanticAgent
(currently in beta) extends the capabilities of the PandasAI library by adding a semantic layer to its results. Unlike the standard Agent
, the SemanticAgent
generates a JSON query, which can then be used to produce Python or SQL code. This approach ensures more accurate and interpretable outputs.
Note: Usage of the Semantic Agent in production is subject to a license. For more details, refer to the license documentation. If you plan to use it in production, contact us.
Creating an instance of the SemanticAgent
is similar to creating an instance of an Agent
.
The Semantic Agent operates in two main steps:
The first step is schema generation, which structures the data into a schema that the Semantic Agent can use to generate JSON queries. By default, this schema is automatically created, but you can also provide a custom schema if necessary.
By default, the SemanticAgent
considers all dataframes passed to it and generates an appropriate schema.
To provide a custom schema, pass a schema
parameter during the instantiation of the SemanticAgent
.
The second step involves generating a JSON query based on the schema. This query is then used to produce the Python or SQL code required for execution.
Here’s an example of a JSON query generated by the SemanticAgent
:
This query is interpreted by the Semantic Agent and converted into executable Python or SQL code.
A schema in the SemanticAgent
is a comprehensive representation of the data, including tables, columns, measures, dimensions, and relationships between tables. Here’s a breakdown of its components:
Measures are the quantitative metrics used in the analysis, such as sums, averages, counts, etc.
count
, avg
, sum
, max
, min
).Example:
Dimensions are the categorical variables used to slice and dice the data.
Example:
Joins define the relationships between tables, specifying how they should be connected in queries.
left
, right
, inner
).Example:
The JSON query is a structured representation of the request, specifying what data to retrieve and how to process it. Here’s a detailed look at its fields:
The type of query determines the format of the result, such as a single number, a table, or a chart.
Example:
Columns used to group the data. In an SQL GROUP BY
clause, these would be the columns listed.
Example:
Columns used to calculate data, typically involving aggregate functions like sum, average, count, etc.
Example:
Columns used to group the data by time, often involving date functions. Each timeDimensions
entry specifies a time period and its granularity. The dateRange
field allows various formats, including specific dates such as ["2022-01-01", "2023-03-31"]
, relative periods like “last week”, “last month”, “this month”, “this week”, “today”, “this year”, and “last year”.
Example:
Conditions to filter the data, equivalent to SQL WHERE
clauses. Each filter specifies a member, an operator, and a set of values. The operators allowed include: “equals”, “notEquals”, “contains”, “notContains”, “startsWith”, “endsWith”, “gt” (greater than), “gte” (greater than or equal to), “lt” (less than), “lte” (less than or equal to), “set”, “notSet”, “inDateRange”, “notInDateRange”, “beforeDate”, and “afterDate”.
Example:
Columns used to order the data, equivalent to SQL ORDER BY
clauses. Each entry in the order
array specifies an identifier and the direction of sorting. The direction can be either “asc” for ascending or “desc” for descending order.
Example:
When these components come together, they form a complete query that the Semantic Agent can interpret and execute. Here’s an example that combines all elements:
This query translates to an SQL statement like:
Enhance the PandasAI library with the Semantic Agent for more accurate and interpretable results.
The SemanticAgent
(currently in beta) extends the capabilities of the PandasAI library by adding a semantic layer to its results. Unlike the standard Agent
, the SemanticAgent
generates a JSON query, which can then be used to produce Python or SQL code. This approach ensures more accurate and interpretable outputs.
Note: Usage of the Semantic Agent in production is subject to a license. For more details, refer to the license documentation. If you plan to use it in production, contact us.
Creating an instance of the SemanticAgent
is similar to creating an instance of an Agent
.
The Semantic Agent operates in two main steps:
The first step is schema generation, which structures the data into a schema that the Semantic Agent can use to generate JSON queries. By default, this schema is automatically created, but you can also provide a custom schema if necessary.
By default, the SemanticAgent
considers all dataframes passed to it and generates an appropriate schema.
To provide a custom schema, pass a schema
parameter during the instantiation of the SemanticAgent
.
The second step involves generating a JSON query based on the schema. This query is then used to produce the Python or SQL code required for execution.
Here’s an example of a JSON query generated by the SemanticAgent
:
This query is interpreted by the Semantic Agent and converted into executable Python or SQL code.
A schema in the SemanticAgent
is a comprehensive representation of the data, including tables, columns, measures, dimensions, and relationships between tables. Here’s a breakdown of its components:
Measures are the quantitative metrics used in the analysis, such as sums, averages, counts, etc.
count
, avg
, sum
, max
, min
).Example:
Dimensions are the categorical variables used to slice and dice the data.
Example:
Joins define the relationships between tables, specifying how they should be connected in queries.
left
, right
, inner
).Example:
The JSON query is a structured representation of the request, specifying what data to retrieve and how to process it. Here’s a detailed look at its fields:
The type of query determines the format of the result, such as a single number, a table, or a chart.
Example:
Columns used to group the data. In an SQL GROUP BY
clause, these would be the columns listed.
Example:
Columns used to calculate data, typically involving aggregate functions like sum, average, count, etc.
Example:
Columns used to group the data by time, often involving date functions. Each timeDimensions
entry specifies a time period and its granularity. The dateRange
field allows various formats, including specific dates such as ["2022-01-01", "2023-03-31"]
, relative periods like “last week”, “last month”, “this month”, “this week”, “today”, “this year”, and “last year”.
Example:
Conditions to filter the data, equivalent to SQL WHERE
clauses. Each filter specifies a member, an operator, and a set of values. The operators allowed include: “equals”, “notEquals”, “contains”, “notContains”, “startsWith”, “endsWith”, “gt” (greater than), “gte” (greater than or equal to), “lt” (less than), “lte” (less than or equal to), “set”, “notSet”, “inDateRange”, “notInDateRange”, “beforeDate”, and “afterDate”.
Example:
Columns used to order the data, equivalent to SQL ORDER BY
clauses. Each entry in the order
array specifies an identifier and the direction of sorting. The direction can be either “asc” for ascending or “desc” for descending order.
Example:
When these components come together, they form a complete query that the Semantic Agent can interpret and execute. Here’s an example that combines all elements:
This query translates to an SQL statement like: