Create a new semantic layer schema using the create
method
pai.create()
method with CSV and parquet filescreate
method:
str
pai.read_csv()
.
DataFrame
pai.read_csv()
str
columns
parameter is not provided, all columns from the input dataframe will be included in the semantic layer.
When specified, only the declared columns will be included, allowing you to select specific columns for your semantic layer.
dict[str, dict]
type
(str): Data type of the column
description
(str): Clear explanation of what the column representspai.create()
method for SQL databasespandasai-sql
extra dependency for this feature.
See SQL installation instructions.create
method to define your data source and schema. Here’s an example using a MySQL database:
path
defines where the dataset will be stored in your projectdescription
provides context about the datasetsource
object contains:
create
method, a YAML configuration file is automatically generated for you in the datasets/
directory of your project.
As an alternative, you can use a YAML schema.yaml
file directly in the datasets/organization_name/dataset_name
directory.
The following sections detail all available configuration options for your schema.yaml file:
str
The available data sources depends on the installed data extensions (sql databases, data lakehouses, yahoo_finance).Type:
dict
type
(str): Type of data source
connection_string
(str): Connection string for the data sourcequery
(str): Query to retrieve data from the data sourcelist[dict]
name
(str): Name of the column.
transaction_id
).type
(str): Data type of the column.
"string"
: IDs, names, categories."integer"
: Counts, whole numbers."float"
: Prices, percentages."datetime"
: Timestamps, dates."boolean"
: Flags, true/false values.description
(str): Clear explanation of what the column represents.[table].[column]
.list[dict]
type
(str): Type of transformation
params
(dict): Parameters for the transformationIf you want to learn more about transformations, check out the transformations documentation.
group_by
field allows you to specify which columns can be used for grouping operations. This is particularly useful for aggregation queries and data analysis.
group_by
(list[str]):
table.column
expression
field allows you to specify a SQL expression for a column. This expression will be used in the query instead of the column name.
alias
(str):
expression
(str):