Configuring Data Sources
Data sources are configured via a JSON config file that you pass to PlyDB with the --config flag. You can configure multiple data sources in a single file and query across all of them.
Config file structure
The config file has three top-level objects:
credentials— Shared authentication profiles for cloud providers (S3, Google Sheets).databases— A map of data source configurations. Each key is a unique identifier that becomes the catalog name in SQL queries.semantic_context— (Optional) Semantic context overlay configuration.
Database keys must consist of lowercase letters, digits, and underscores only.
PostgreSQL
{
"databases": {
"analytics": {
"metadata": {
"name": "Production Analytics",
"description": "Primary read-replica for data warehousing."
},
"type": "postgresql",
"host": "db-prod-01.example.com",
"port": 5432,
"database_name": "analytics_main",
"username": "bi_user",
"password_env_var": "DB_PROD_PASSWORD"
}
}
}| Field | Type | Description |
|---|---|---|
host | String | Server address |
port | Integer | Network port |
database_name | String | Target database name |
username | String | Login identity |
password_env_var | String | Name of the environment variable holding the password |
Tables are referenced as catalog.schema.table, e.g. analytics.public.orders.
MySQL
MySQL uses the same fields as PostgreSQL:
{
"databases": {
"app": {
"metadata": {
"name": "Application Database",
"description": "Main application MySQL instance."
},
"type": "mysql",
"host": "mysql.example.com",
"port": 3306,
"database_name": "myapp",
"username": "readonly",
"password_env_var": "MYSQL_PASSWORD"
}
}
}SQLite
{
"databases": {
"shop": {
"metadata": {
"name": "Shop Database",
"description": "SQLite database backing the app."
},
"type": "sqlite",
"path": "/data/shop.sqlite"
}
}
}| Field | Type | Description |
|---|---|---|
path | String | Path to the SQLite database file |
SQLite uses main as its default schema, so tables are referenced as shop.main.customers.
DuckDB
{
"databases": {
"analytics": {
"metadata": {
"name": "Analytics Database",
"description": "DuckDB with pre-aggregated metrics."
},
"type": "duckdb",
"path": "/data/analytics.duckdb"
}
}
}| Field | Type | Description |
|---|---|---|
path | String | Path to the DuckDB database file |
Like SQLite, DuckDB uses main as its default schema.
Local files (CSV, JSON, Parquet, Excel)
{
"databases": {
"budget": {
"metadata": {
"name": "FY2026 Budget",
"description": "Department budget allocations."
},
"type": "file",
"path": "/Documents/Finance/budget_2026.xlsx",
"sheet_name": "Final_Approval",
"header_row": true
}
}
}| Field | Type | Description |
|---|---|---|
path | String | Path to the file |
format | String | (Optional) csv, xlsx, parquet, or json. Inferred from extension if omitted. |
delimiter | String | (CSV only) Separator character |
header_row | Boolean | (CSV/XLSX) Whether row 1 is the header |
sheet_name | String | (XLSX only) Tab name to read |
S3
{
"credentials": {
"aws_user": {
"access_key_env": "AWS_ACCESS_KEY_ID",
"secret_key_env": "AWS_SECRET_ACCESS_KEY"
}
},
"databases": {
"sensor_data": {
"metadata": {
"name": "IoT Sensor Data",
"description": "Partitioned sensor data in S3."
},
"type": "s3",
"credential_profile": "aws_user",
"uri": "s3://iot-bucket/2026/*/sensor_*.parquet",
"format": "parquet",
"region": "us-west-2"
}
}
}| Field | Type | Description |
|---|---|---|
uri | String | S3 URI. Supports glob patterns (*, ?, []). |
credential_profile | String | Key matching an entry in the credentials map |
region | String | AWS region (e.g. us-east-1) |
format | String | Required. File format (csv, parquet, etc.) |
delimiter | String | (CSV only) Separator character |
header_row | Boolean | (CSV/XLSX) Whether row 1 is the header |
Credentials are defined in the top-level credentials object and referenced by name via credential_profile.
Google Sheets
PlyDB supports two authentication methods for Google Sheets:
- Service account — for server-side, non-interactive use.
- Browser OAuth — for interactive, ad-hoc use. PlyDB opens a browser for Google login with no credentials needed in the config.
Service account
{
"credentials": {
"gsheet_sa": {
"key_file": "/etc/secrets/gsheet-sa-key.json"
}
},
"databases": {
"sales": {
"metadata": {
"name": "Sales Forecast",
"description": "Q3 sales forecast."
},
"type": "gsheet",
"spreadsheet_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms",
"credential_profile": "gsheet_sa",
"sheet_name": "Forecast",
"header_row": true
}
}
}Browser OAuth
{
"databases": {
"sales": {
"metadata": {
"name": "Personal Tracker",
"description": "Ad-hoc sheet accessed via browser OAuth."
},
"type": "gsheet",
"spreadsheet_id": "1AbCdEfGhIjKlMnOpQrStUvWxYz0123456789abcdefg"
}
}
}On first query, PlyDB opens your browser for Google login. The OAuth token is cached at ~/.duckdb/stored_secrets/ and reused automatically. To force re-authentication, run plydb auth --config /path/to/config.json.
| Field | Type | Description |
|---|---|---|
spreadsheet_id | String | The spreadsheet ID from the Google Sheets URL |
credential_profile | String | (Optional) Key referencing a credential with key_file. Omit for browser OAuth. |
sheet_name | String | (Optional) Tab name. If omitted, PlyDB uses the table name from the SQL query. |
header_row | Boolean | (Optional) Whether row 1 is the header. Defaults to true. |
Dynamic sheet names
When sheet_name is omitted, the table name in your SQL query determines which tab is read:
# Read from the "January" tab
plydb query --config config.json \
"SELECT * FROM sales.default.\"January\""
# Read from the "February" tab
plydb query --config config.json \
"SELECT * FROM sales.default.\"February\"""Revenue") or set sheet_name in the config to bypass SQL identifier parsing.Cross-source queries
You can join across any combination of data sources in a single query. Just add all sources to the same config file:
plydb query --config config.json \
"SELECT c.name, o.product, o.amount
FROM store.public.orders o
JOIN products.default.\"table\" p ON o.product_id = p.id"Semantic context
PlyDB can automatically scan your data sources and provide AI agents with structured semantic context — schema, tables, columns, and comment metadata. This helps agents understand what your data means, not just its structure.
plydb semantic-context --config config.jsonThis outputs an Open Semantic Interchange (OSI) YAML document describing your data model.
Overlays
You can enrich the auto-scanned context with additional descriptions, relationships, and metrics using overlay files. Overlays cannot add new tables or columns — they only enrich what the scanner already discovered.
Pass overlays via the CLI:
plydb semantic-context \
--config config.json \
--semantic-context-overlay business_glossary.yaml \
--semantic-context-overlay team_annotations.yamlOr embed them in the config file:
{
"semantic_context": {
"overlays": [
"/path/to/business_glossary.yaml",
"/path/to/column_descriptions.yaml"
]
}
}Config-file overlays are applied first, then CLI flag overlays, in order.
Full reference example
Here is a complete config file demonstrating multiple source types:
{
"credentials": {
"aws_user": {
"access_key_env": "AWS_ACCESS_KEY_ID",
"secret_key_env": "AWS_SECRET_ACCESS_KEY"
},
"gsheet_sa": {
"key_file": "/etc/secrets/gsheet-sa-key.json"
}
},
"databases": {
"analytics": {
"metadata": {
"name": "Production Analytics",
"description": "Primary read-replica."
},
"type": "postgresql",
"host": "db-prod-01.example.com",
"port": 5432,
"database_name": "analytics_main",
"username": "bi_user",
"password_env_var": "DB_PROD_PASSWORD"
},
"budget": {
"metadata": {
"name": "FY2026 Budget",
"description": "Department budget allocations."
},
"type": "file",
"path": "/Documents/Finance/budget_2026.xlsx",
"sheet_name": "Final_Approval",
"header_row": true
},
"sensor_data": {
"metadata": {
"name": "IoT Sensor Data",
"description": "Partitioned sensor data."
},
"type": "s3",
"credential_profile": "aws_user",
"uri": "s3://iot-bucket/2026/*/sensor_*.parquet",
"format": "parquet",
"region": "us-west-2"
},
"sales": {
"metadata": {
"name": "Sales Forecast",
"description": "Q3 sales forecast."
},
"type": "gsheet",
"spreadsheet_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms",
"credential_profile": "gsheet_sa",
"sheet_name": "Forecast",
"header_row": true
}
},
"semantic_context": {
"overlays": [
"/path/to/business_glossary.yaml"
]
}
}