Postgres Interface#
Postgres is the primary supported backend for PostBOUND. Historically, it was the first and only supported backend and has influenced the overall design of the database abstraction. The Postgres backend was eventually generalized into the database abstraction. Therefore, all features of the database abstraction are supported by the Postgres backend. Additionally, the Postgres backend provides a number of features that make working with a Postgres server easier. Since these are specific to Postgres, they are not part of the general database abstraction. This document outlines the general usage as well as the special features of the Postgres backend.
Setup#
To use the Postgres backend, you need a running Postgres server. To make use of the hinting functionality, this server must at least have the pg_hint_plan extension installed and enabled. For more advanced hinting features, we recommend using pg_lab. In addition to the pg_hint_plan hints, pg_lab provides support for base table cardinalities and reliable support for memoize operators, materialize operators, and parallel execution of subplans.
To connect to the Postgres server, you need to provide the connection parameters either via a connection string or
(recommended) via a config file. See the connect() function for details. The Postgres interface
will detect the available hinting mechanism automatically.
The PostBOUND repository contains utility scripts to setup a local Postgres server with pg_hint_plan from scratch. These
are located in the db-support/postgres directory. The postgres-setup.sh script will install Postgres, create a
new database cluster, and setup a Postgres server with the pg_hint_plan extension enabled. The
postgres-config-generator.py utility will generate an optimized Postgres configuration based on your hardware based on
PGTune. Finally, the workloadXYZ-setup.sh scripts will create commonly-used
database instances such as IMDB/JOB, Stats, or Stack on your Postgres server.
Usage#
The Postgres backend is available from the postgres module. You can create a connection to a Postgres
server using the connect() function. The returned database instance functions like any regular
postbound.Database interface:
import postbound as pb
pg_instance = pb.postgres.connect(config_file="pg-connect.toml")
job = pb.workloads.job()
pg_instance.execute_query(job["1a"])
Supported Backend Features#
Feature |
Status |
|---|---|
Query Execution |
fully supported |
Query Execution with timeouts |
fully supported |
Schema interface |
fully implemented |
Statistics interface |
fully implemented |
EXPLAIN parsing and query plan extraction |
fully supported |
EXPLAIN ANALYZE plans |
fully supported |
Extraction of cardinality estimates |
fully supported |
Extraction of cost estimates |
fully supported |
Plan hinting |
fully supported |
Advanced Backend Features#
In addition to the standardized database abstraction features, the Postgres backend provides the following additional features:
cache warmup via
prewarm_tables()server configuration via
apply_configuration()statistics maintenance via
update_statistics()server management with
start(),stop(), andis_running()parallel query execution based on the
ParallelQueryExecutorsimple database manipulation with the
WorkloadShifter
Query Plans#
To obtain DuckDB query plans, you can either use the query_plan() method or
parse the EXPLAIN output manually using PostgresExplainPlan. Both options yield the same
results:
# obtain a query plan directly:
plan = pg_instance.optimizer().query_plan(job["1a"])
# this is equivalent to:
explain_query = pb.transform.as_explain(job["1a"])
raw_plan = pg_instance.execute_query(explain_query)
equivalent_plan = pb.postgres.PostgresExplainPlan(raw_plan)
Optimizing the Server Configuration#
The PostBOUND repository contains a utility script
db-support/postgres/postgres-config-generator.py that generates an optimized Postgres configuration based on your
hardware. The script uses rules from PGTune. The script outputs an SQL file that
contains the necessary ALTER SYSTEM commands to modify the postgresql.conf. Please note that these are just
heuristics that might not be optimal for your workload. Furthermore, the settings are intended for a dedicated database
server and a single user, single query at-a-time scenario.
One fragile aspect of the script is figuring out whether the database is stored on an SSD or HDD. If the script misdetects
the storage type or raises an error, you can manually specify it via --disk-type.
Hinting Backends#
The PostgresInterface supports two different hinting backends: the widely-used
pg_hint_plan and the research-focused
pg_lab. pg_lab is a fork of vanilla Postgres that adds additional extension points
to the server. These extension points allow to control optimizer internals in a fine-grained manner. The hinting extension
shipped with pg_lab uses these extension points to provide more reliable and more detailed hinting features compared to
pg_hint_plan.
Upon establishing a server connection, the Postgres interface automatically detects which hinting backend is available on
the server and adjusts its hinting dialect used in generate_hints()
accordingly. If for some reason you want to change the current hinting dialect, you can do so via the
backend attribute available via
hinting().