PostBOUND documentation#

PostBOUND is a Python framework for research in query optimization. It provides a high-level interface to implement novel optimization algorithms and to analyze their performance.

Tip

New users should start by reading the 10 Minutes to PostBOUND tutorial and the Setup guide. The remainder of the documentation describes the different parts of the framework in more detail.

In general, using the PostBOUND framework for optimizer research uses a workflow similar to the following:

_images/postbound-workflow-detailed.svg — Typical workflow for using PostBOUND for optimizer research.#

For an input query, PostBOUND provides a large set of infrastructure tools to prepare and analyze the query to limit the boilerplate parts of the optimizer. This includes query parsing, join graph analysis, and others. Rearchers specify their optimizer prototypes in terms of optimization pipelines. Essentially, these are mental models for different optimizer architectures that provide different interfaces to be implemented. Each pipeline produces an abstract representation of optimization decisions, such as the join order or custom cardinality estimates. Since PostBOUND is implemented as a Python framework that runs on top of an actual database system, it needs to ensure that these optimization decisions are retained when the query is executed on the database system. This is achieved by using query hints, which restrict the search space of the native optimizer. Hints are generated automatically depending on the target database system. Finally, the benchmarking tools allow to easily compare the performance of different optimization strategies.

Contents#

Important

This documentation is a work in progress and currently under construction. Currently, the best place to familiarize yourself with PostBOUND is to take a look at the examples. If you have any questions, please feel free to open an issue on GitHub or send us an email. User experience is very important to us, and we are happy to help you get started.

Example#

The following example shows how PostBOUND can be used to implement a simple join order “optimization” algorithm. The algorithm computes a random linear join order without considering any statistics or cost estimates. The implementation is compared to the native PostgreSQL optimizer on the Stats benchmark.

End-to-end experiment with PostBOUND#

import random

import postbound as pb


# Step 1: define our optimization strategy.
# In this example we develop a simple join order optimizer that
# selects a linear join order at random.
# We delegate most of the actual work to the pre-defined join grap
# that keeps track of free tables.
class RandomJoinOrderOptimizer(pb.JoinOrderOptimization):
    def optimize_join_order(self, query: pb.SqlQuery) -> pb.LogicalJoinTree:
        join_tree = pb.LogicalJoinTree()
        join_graph = pb.opt.JoinGraph(query)

        while join_graph.contains_free_tables():
            candidate_tables = [
                path.target_table for path in join_graph.available_join_paths()
            ]
            next_table = random.choice(candidate_tables)

            join_tree = join_tree.join_with(next_table)
            join_graph.mark_joined(next_table)

        return join_tree

    def describe(self) -> pb.util.jsondict:
        return {"name": "random-join-order"}


# Step 2: connect to the target database, load the workload and
# setup the optimization pipeline.
# In our case, we evaluate on the Join Order Benchmark on Postgres
pg_imdb = pb.postgres.connect(config_file=".psycopg_connection_job")
job = pb.workloads.job()

optimization_pipeline = (
    pb.MultiStageOptimizationPipeline(pg_imdb)
    .use(RandomJoinOrderOptimizer())
    .build()
)

# (Step 3): in this example we just compare against the native Postgres optimizer
# Therefore, we do not need to setup any additional optimizers.

# Step 4: execute the workload.
# We use the QueryPreparationService to prewarm the database buffer and run all
# queries as EXPLAIN ANALYZE.
query_prep = pb.bench.QueryPreparation(
    prewarm=True, analyze=True, preparatory_statements=["SET geqo TO off;"]
)
native_results = pb.bench.execute_workload(
    job,
    on=pg_imdb,
    query_preparation=query_prep,
    workload_repetitions=3,
    progressive_output="job-results-native.csv",
    logger="tqdm",
)
optimized_results = pb.bench.execute_workload(
    job,
    on=optimization_pipeline,
    query_preparation=query_prep,
    workload_repetitions=3,
    progressive_output="job-results-optimized.csv",
    logger="tqdm",
)

Need more? There are a lot of basic examples in the PostBOUND repository!

History#

Note

If you are looking for new features or bug fixes, please check out the Changelog.

PostBOUND was initially created as framework to study pessimistic (or upper bound-driven) query optimization techniques in PostgreSQL (hence the name). An early version of the framework was presented at the BTW 2023 conference [Bergmann23]. Community feedback has been positive and especially the idea of offering a general infrastructure to quickly test new ideas in query optimization was well received. However, the focus on pessimistic query optimization called the applicability for other optimization techniques into question. As a result, we decided to extend PostBOUND beyond the original scope and to provide truly general-purpose framework for research in query optimization. This revamped version of PostBOUND was introduced as part of a SIGMOD 2025 research paper [Bergmann25].

Citation#

If you found PostBOUND to be useful in your research, please cite the following paper. Thank you!

@article{DBLP:journals/pacmmod/BergmannHHL25,
   author       = {Rico Bergmann and
                   Claudio Hartmann and
                   Dirk Habich and
                   Wolfgang Lehner},
   title        = {An Elephant Under the Microscope: Analyzing the Interaction of Optimizer
                   Components in PostgreSQL},
   journal      = {Proc. {ACM} Manag. Data},
   volume       = {3},
   number       = {1},
   pages        = {9:1--9:28},
   year         = {2025},
   url          = {https://doi.org/10.1145/3709659},
   doi          = {10.1145/3709659},
   timestamp    = {Tue, 01 Apr 2025 19:03:19 +0200}
 }

Indices and tables#

References#

[Bergmann23]

Rico Bergmann, Axel Hertzschuch, Claudio Hartmann, Dirk Habich, and Wolfgang Lehner: “PostBOUND: PostgreSQL with Upper Bound SPJ Query Optimization.” BTW 2023 (DOI: https://doi.org/10.18420/BTW2023-14)

[Bergmann25]

Rico Bergmann, Claudio Hartmann, Dirk Habich, and Wolfgang Lehner: “An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQL.” SIGMOD 2025 (DOI: https://doi.org/10.1145/3709659)