Postgres Interface#

Postgres is the primary supported backend for PostBOUND. Historically, it was the first and only supported backend and has influenced the overall design of the database abstraction. The Postgres backend was eventually generalized into the database abstraction. Therefore, all features of the database abstraction are supported by the Postgres backend. Additionally, the Postgres backend provides a number of features that make working with a Postgres server easier. Since these are specific to Postgres, they are not part of the general database abstraction. This document outlines the general usage as well as the special features of the Postgres backend.

Setup#

To use the Postgres backend, you need a running Postgres server. To make use of the hinting functionality, this server must at least have the pg_hint_plan extension installed and enabled. For more advanced hinting features, we recommend using pg_lab. In addition to the pg_hint_plan hints, pg_lab provides support for base table cardinalities and reliable support for memoize operators, materialize operators, and parallel execution of subplans.

To connect to the Postgres server, you need to provide the connection parameters either via a connection string or (recommended) via a config file. See the connect() function for details. The Postgres interface will detect the available hinting mechanism automatically.

The PostBOUND repository contains utility scripts to setup a local Postgres server with pg_hint_plan from scratch. These are located in the db-support/postgres directory. The postgres-setup.sh script will install Postgres, create a new database cluster, and setup a Postgres server with the pg_hint_plan extension enabled. The postgres-config-generator.py utility will generate an optimized Postgres configuration based on your hardware based on PGTune. Finally, the workloadXYZ-setup.sh scripts will create commonly-used database instances such as IMDB/JOB, Stats, or Stack on your Postgres server.

Usage#

The Postgres backend is available from the postgres module. You can create a connection to a Postgres server using the connect() function. The returned database instance functions like any regular postbound.Database interface:

import postbound as pb

pg_instance = pb.postgres.connect(config_file="pg-connect.toml")
job = pb.workloads.job()
pg_instance.execute_query(job["1a"])

Supported Backend Features#

Feature

Status

Query Execution

fully supported

Query Execution with timeouts

fully supported

Schema interface

fully implemented

Statistics interface

fully implemented

EXPLAIN parsing and query plan extraction

fully supported

EXPLAIN ANALYZE plans

fully supported

Extraction of cardinality estimates

fully supported

Extraction of cost estimates

fully supported

Plan hinting

fully supported

Advanced Backend Features#

In addition to the standardized database abstraction features, the Postgres backend provides the following additional features:

Query Plans#

To obtain DuckDB query plans, you can either use the query_plan() method or parse the EXPLAIN output manually using PostgresExplainPlan. Both options yield the same results:

# obtain a query plan directly:
plan = pg_instance.optimizer().query_plan(job["1a"])

# this is equivalent to:
explain_query = pb.transform.as_explain(job["1a"])
raw_plan = pg_instance.execute_query(explain_query)
equivalent_plan = pb.postgres.PostgresExplainPlan(raw_plan)

Optimizing the Server Configuration#

The PostBOUND repository contains a utility script db-support/postgres/postgres-config-generator.py that generates an optimized Postgres configuration based on your hardware. The script uses rules from PGTune. The script outputs an SQL file that contains the necessary ALTER SYSTEM commands to modify the postgresql.conf. Please note that these are just heuristics that might not be optimal for your workload. Furthermore, the settings are intended for a dedicated database server and a single user, single query at-a-time scenario.

One fragile aspect of the script is figuring out whether the database is stored on an SSD or HDD. If the script misdetects the storage type or raises an error, you can manually specify it via --disk-type.

Hinting Backends#

The PostgresInterface supports two different hinting backends: the widely-used pg_hint_plan and the research-focused pg_lab. pg_lab is a fork of vanilla Postgres that adds additional extension points to the server. These extension points allow to control optimizer internals in a fine-grained manner. The hinting extension shipped with pg_lab uses these extension points to provide more reliable and more detailed hinting features compared to pg_hint_plan.

Upon establishing a server connection, the Postgres interface automatically detects which hinting backend is available on the server and adjusts its hinting dialect used in generate_hints() accordingly. If for some reason you want to change the current hinting dialect, you can do so via the backend attribute available via hinting().