Hinting Abstractions

Contents

Hinting Abstractions#

class postbound.HintType(*values)#

Contains all hint types that are supported by PostBOUND.

Notice that not all of these hints need to be represented in the PlanParameterization, since some of them concern other aspects such as the join order. Furthermore, not all database systems will support all operators. The availability of certain hints can be checked on the database system interface and should be handled as part of the optimization pre-checks.

Join Orders#

class postbound.JoinTree(*, base_table: TableReference | None = None, outer_child: JoinTree[JoinTreeAnnotation] | None = None, inner_child: JoinTree[JoinTreeAnnotation] | None = None, annotation: JoinTreeAnnotation | None = None)#

A join tree models the sequence in which joins should be performed in a query plan.

A join tree is a composite structure that contains base tables at its leaves and joins as inner nodes. Each node can optionally be annotated with arbitrary metadata (annotation property). While a join tree does usually not contain any information regarding physical operators to execute its joins or scans, we do distinguish between inner and outer relations at the join level.

Each join tree instance is immutable. To expand the join tree, either use the join_with member method or create a new join tree, for example using the join factory method. The metadata can be updated using the update_annotation method.

Regular join trees#

Depending on the specific node, different attributes are available. For leaf nodes, this is just the base_table property. For joins, the outer_child and inner_child properties are available. The specific node type can be checked using the is_scan and is_join methods respectively. Notice that these methods are “binary”: is_join() = False implies is_scan() = True and vice versa. No matter the specific node type, the children property always provides iteration support for the input nodes of the current node (which in case of base tables is just an empty iterable). Likewise, the annotation property is always available, but its value is entirely up to the user.

Empty join trees#

An empty join tree is a special case that can be created using the empty factory method or by calling the constructor without any arguments. Empty join trees should only be used when starting the construction of a join tree and never be returned as a result of the optimization process. Clients are not required to check for emptiness and empty join trees also violate some of the invariants of proper join trees. Consider them syntactic sugar to simplify the construction, but only use them sparingly. If you decide to work with empty join trees, use the is_empty method to check for emptiness.

param base_table:

The base table being scanned. Accessing this property on join nodes raises an error.

type base_table:

TableReference, optional

param outer_child:

The left child of the join. Accessing this property on base tables raises an error.

type outer_child:

JoinTree[AnnotationType] | None, optional

param inner_child:

The right child of the join. Accessing this property on base tables raises

type inner_child:

JoinTree[AnnotationType] | None, optional

param annotation:

The annotation for the node. This can be used to store arbitrary data.

type annotation:

AnnotationType | None, optional

static scan(table: TableReference, *, annotation: JoinTreeAnnotation | None = None) JoinTree[JoinTreeAnnotation]#

Creates a new join tree with a single base table.

Parameters:
  • table (TableReference) – The base table to scan

  • annotation (AnnotationType) – The annotation to attach to the base table node

Returns:

The new join tree

Return type:

JoinTree[AnnotationType]

static join(outer: JoinTree[JoinTreeAnnotation], inner: JoinTree[JoinTreeAnnotation], *, annotation: JoinTreeAnnotation | None = None) JoinTree[JoinTreeAnnotation]#

Creates a new join tree by combining two existing join trees.

Parameters:
  • outer (JoinTree[AnnotationType]) – The outer join tree

  • inner (JoinTree[AnnotationType]) – The inner join tree

  • annotation (AnnotationType) – The annotation to attach to the intermediate join node

Returns:

The new join tree

Return type:

JoinTree[AnnotationType]

static empty() JoinTree[JoinTreeAnnotation]#

Creates an empty join tree.

Returns:

The empty join tree

Return type:

JoinTree[AnnotationType]

property base_table: TableReference#

Get the base table for join tree leaves.

Accessing this property on a join node raises an error.

property outer_child: JoinTree[JoinTreeAnnotation]#

Get the left child of the join node.

Accessing this property on a base table raises an error.

property inner_child: JoinTree[JoinTreeAnnotation]#

Get the right child of the join node.

Accessing this property on a base table raises an error.

property children: tuple[JoinTree[JoinTreeAnnotation], JoinTree[JoinTreeAnnotation]]#

Get the children of the current node.

For base tables, this is an empty tuple. For join nodes, this is a tuple of the outer and inner child.

property annotation: JoinTreeAnnotation#

Get the annotation of the current node.

is_empty() bool#

Check, whether the current join tree is an empty one.

Return type:

bool

is_join() bool#

Check, whether the current join tree node is an intermediate.

Return type:

bool

is_scan() bool#

Check, whether the current join tree node is a leaf node.

Return type:

bool

is_linear() bool#

Checks, whether the join tree encodes a linear join sequence.

In a linear join tree each join node is always a join between a base table and another join node or another base table. As a special case, this implies that join trees that only constist of a single node are also considered to be linear.

The opposite of linear join trees are bushy join trees. There also exists a is_base_join method to check whether a join node joins two base tables directly.

See also

is_bushy

Return type:

bool

is_bushy() bool#

Checks, whether the join tree encodes a bushy join sequence.

In a bushy join tree, at least one join node is a join between two other join nodes. This implies that the join tree is not linear.

See also

is_linear

Return type:

bool

is_base_join() bool#

Checks, whether the current join node joins two base tables directly.

Return type:

bool

tables() set[TableReference]#

Provides all tables that are scanned in the join tree.

Notice that this does not consider tables that might be stored in the annotation of the join tree nodes.

Return type:

set[TableReference]

plan_depth() int#

Calculates the depth of the join tree.

The depth of a join tree is the length of the longest path from the root to a leaf node. The depth of an empty join is defined to be 0, while the depth of a join tree with a single node is 1.

Return type:

int

lookup(table: TableReference | Iterable[TableReference]) JoinTree[JoinTreeAnnotation] | None#

Traverses the join tree to find a specific (intermediate) node.

Parameters:

table (TableReference | Iterable[TableReference]) – The tables that should be contained in the intermediate. If a single table is provided (either as-is or as a singleton iterable), the correponding leaf node will be returned. If multiple tables are provided, the join node that calculates the intermediate exactly is returned.

Returns:

The join tree node that contains the specified tables. If no such node exists, None is returned.

Return type:

Optional[JoinTree[AnnotationType]]

update_annotation(new_annotation: JoinTreeAnnotation) JoinTree[JoinTreeAnnotation]#

Creates a new join tree with the same structure, but a different annotation.

The original join tree is not modified.

Parameters:

new_annotation (JoinTreeAnnotation)

Return type:

JoinTree[JoinTreeAnnotation]

join_with(partner: JoinTree[JoinTreeAnnotation] | TableReference, *, annotation: JoinTreeAnnotation | None = None, partner_annotation: JoinTreeAnnotation | None = None, partner_direction: Literal['inner', 'outer'] = 'inner') JoinTree[JoinTreeAnnotation]#

Creates a new join tree by combining the current join tree with another one.

Both input join trees are not modified. If one of the join trees is empty, the other one is returned as-is. As a special case, joining two empty join trees results once again in an empty join tree.

Parameters:
  • partner (JoinTree[AnnotationType] | TableReference) – The join tree to join with the current tree. This can also be a base table, in which case it is treated as a scan node of the table. The scan can be further described with the partner_annotation parameter.

  • annotation (Optional[AnnotationType], optional) – The annotation of the new join node.

  • partner_annotation (AnnotationType | None, optional) – If the join partner is given as a plain table, this annotation is used to describe the corresponding scan node. Otherwise it is ignored.

  • partner_direction (JoinDirection, optional) – Which role the partner node should play in the new join. Defaults to “inner”, which means that the current node becomes the outer node of the new join and the partner becomes the inner child. If set to “outer”, the roles are reversed.

Returns:

The resulting join tree

Return type:

JoinTree[AnnotationType]

inspect() str#

Provides a pretty-printed an human-readable representation of the join tree.

Return type:

str

iternodes() Iterable[JoinTree[JoinTreeAnnotation]]#

Provides all nodes in the join tree, with outer nodes coming first.

Return type:

Iterable[JoinTree[JoinTreeAnnotation]]

itertables() Iterable[TableReference]#

Provides all tables that are scanned in the join tree. Outer tables appear first.

Return type:

Iterable[TableReference]

iterjoins() Iterable[JoinTree[JoinTreeAnnotation]]#

Provides all join nodes in the join tree, with outer nodes coming first.

Return type:

Iterable[JoinTree[JoinTreeAnnotation]]

Parameters:
  • base_table (TableReference | None)

  • outer_child (JoinTree[JoinTreeAnnotation] | None)

  • inner_child (JoinTree[JoinTreeAnnotation] | None)

  • annotation (JoinTreeAnnotation | None)

postbound.JoinTreeAnnotation = ~JoinTreeAnnotation#

Type variable.

The preferred way to construct a type variable is via the dedicated syntax for generic functions, classes, and type aliases:

class Sequence[T]:  # T is a TypeVar
    ...

This syntax can also be used to create bound and constrained type variables:

# S is a TypeVar bound to str
class StrSequence[S: str]:
    ...

# A is a TypeVar constrained to str or bytes
class StrOrBytesSequence[A: (str, bytes)]:
    ...

Type variables can also have defaults:

class IntDefault[T = int]:

However, if desired, reusable type variables can also be constructed manually, like so:

T = TypeVar('T')  # Can be anything
S = TypeVar('S', bound=str)  # Can be any subtype of str
A = TypeVar('A', str, bytes)  # Must be exactly str or bytes
D = TypeVar('D', default=int)  # Defaults to int

Type variables exist primarily for the benefit of static type checkers. They serve as the parameters for generic types as well as for generic function and type alias definitions.

The variance of type variables is inferred by type checkers when they are created through the type parameter syntax and when infer_variance=True is passed. Manually created type variables may be explicitly marked covariant or contravariant by passing covariant=True or contravariant=True. By default, manually created type variables are invariant. See PEP 484 and PEP 695 for more details.

class postbound.LogicalJoinTree(*, table: TableReference | None = None, outer: LogicalJoinTree | None = None, inner: LogicalJoinTree | None = None, annotation: Cardinality | None = None)#

A logical join tree is a special kind of join tree that has cardinality estimates attached to each node.

Other than the annotation type, it behaves exactly like a regular JoinTree. The cardinality estimates can be directly accessed using the cardinality property.

Parameters:
static scan(table: TableReference, *, annotation: Cardinality | None = None) LogicalJoinTree#

Creates a new join tree with a single base table.

Parameters:
  • table (TableReference) – The base table to scan

  • annotation (AnnotationType) – The annotation to attach to the base table node

Returns:

The new join tree

Return type:

JoinTree[AnnotationType]

static join(outer: LogicalJoinTree, inner: LogicalJoinTree, *, annotation: Cardinality | None = None) LogicalJoinTree#

Creates a new join tree by combining two existing join trees.

Parameters:
  • outer (JoinTree[AnnotationType]) – The outer join tree

  • inner (JoinTree[AnnotationType]) – The inner join tree

  • annotation (AnnotationType) – The annotation to attach to the intermediate join node

Returns:

The new join tree

Return type:

JoinTree[AnnotationType]

static empty() LogicalJoinTree#

Creates an empty join tree.

Returns:

The empty join tree

Return type:

JoinTree[AnnotationType]

property outer_child: LogicalJoinTree#

Get the left child of the join node.

Accessing this property on a base table raises an error.

property inner_child: LogicalJoinTree#

Get the right child of the join node.

Accessing this property on a base table raises an error.

property children: tuple[LogicalJoinTree, LogicalJoinTree]#

Get the children of the current node.

For base tables, this is an empty tuple. For join nodes, this is a tuple of the outer and inner child.

lookup(table: TableReference | Iterable[TableReference]) LogicalJoinTree | None#

Traverses the join tree to find a specific (intermediate) node.

Parameters:

table (TableReference | Iterable[TableReference]) – The tables that should be contained in the intermediate. If a single table is provided (either as-is or as a singleton iterable), the correponding leaf node will be returned. If multiple tables are provided, the join node that calculates the intermediate exactly is returned.

Returns:

The join tree node that contains the specified tables. If no such node exists, None is returned.

Return type:

Optional[JoinTree[AnnotationType]]

update_annotation(new_annotation: Cardinality) LogicalJoinTree#

Creates a new join tree with the same structure, but a different annotation.

The original join tree is not modified.

Parameters:

new_annotation (Cardinality)

Return type:

LogicalJoinTree

join_with(partner: LogicalJoinTree | TableReference, *, annotation: Cardinality | None = None, partner_annotation: Cardinality | None = None, partner_direction: Literal['inner', 'outer'] = 'inner') LogicalJoinTree#

Creates a new join tree by combining the current join tree with another one.

Both input join trees are not modified. If one of the join trees is empty, the other one is returned as-is. As a special case, joining two empty join trees results once again in an empty join tree.

Parameters:
  • partner (JoinTree[AnnotationType] | TableReference) – The join tree to join with the current tree. This can also be a base table, in which case it is treated as a scan node of the table. The scan can be further described with the partner_annotation parameter.

  • annotation (Optional[AnnotationType], optional) – The annotation of the new join node.

  • partner_annotation (AnnotationType | None, optional) – If the join partner is given as a plain table, this annotation is used to describe the corresponding scan node. Otherwise it is ignored.

  • partner_direction (JoinDirection, optional) – Which role the partner node should play in the new join. Defaults to “inner”, which means that the current node becomes the outer node of the new join and the partner becomes the inner child. If set to “outer”, the roles are reversed.

Returns:

The resulting join tree

Return type:

JoinTree[AnnotationType]

iternodes() Iterable[LogicalJoinTree]#

Provides all nodes in the join tree, with outer nodes coming first.

Return type:

Iterable[LogicalJoinTree]

iterjoins() Iterable[LogicalJoinTree]#

Provides all join nodes in the join tree, with outer nodes coming first.

Return type:

Iterable[LogicalJoinTree]

postbound.jointree_from_plan(plan: QueryPlan, *, card_source: Literal['estimates', 'actual'] = 'estimates') LogicalJoinTree#

Extracts the join tree encoded in a query plan.

The cardinality estimates of the join tree can be inferred from either the estimated cardinalities or from the measured actual cardinalities of the query plan.

Parameters:
  • plan (QueryPlan)

  • card_source (Literal['estimates', 'actual'])

Return type:

LogicalJoinTree

Physical Operators#

class postbound.ScanOperatorAssignment(operator: ScanOperator, table: TableReference, parallel_workers: float | int = nan)#

Models the selection of a scan operator for a specific base table.

Parameters:
operator#

The selected operator

Type:

ScanOperators

table#

The table that is scanned using the operator

Type:

TableReference

parallel_workers#

The number of parallel processes that should be used to execute the scan. Can be set to 1 to indicate sequential operation. Defaults to NaN to indicate that no choice has been made.

Type:

float | int

property operator: ScanOperator#

Get the assigned operator.

Returns:

The operator

Return type:

ScanOperators

property table: TableReference#

Get the table being scanned.

Returns:

The table

Return type:

TableReference

property parallel_workers: int | float#

Get the number of parallel workers used for the scan.

This number designates the total number of parallel processes. It can be 1 to indicate sequential operation, or even NaN if it is unknown.

Returns:

The number of workers

Return type:

int | float

inspect() str#

Provides the scan as a natural string.

Returns:

A string representation of the assignment

Return type:

str

class postbound.JoinOperatorAssignment(operator: JoinOperator, join: Collection[TableReference], *, parallel_workers: float | int = nan)#

Models the selection of a join operator for a specific join of tables.

Each join is identified by all base tables that are involved in the join. The assignment to intermediate results does not matter here. For example, a join between R ⨝ S and T is expressed as R, S, T even though the actual join combined an intermediate result with as base table.

A more verbose model is provided by the DirectionalJoinOperatorAssignment. In addition to the joined tables, that model also distinguishes between inner and outer relation of the join.

Parameters:
  • operator (JoinOperators) – The selected operator

  • join (Collection[TableReference]) – The base tables that are joined using the operator

  • parallel_workers (float | int, optional) – The number of parallel processes that should be used to execute the join. Can be set to 1 to indicate sequential operation. Defaults to NaN to indicate that no choice has been made.

Raises:

ValueError – If join contains less than 2 tables

property operator: JoinOperator#

Get the operator that was selected for the join

Returns:

The operator

Return type:

JoinOperators

property join: frozenset[TableReference]#

Get the tables that are joined together.

For joins of more than 2 base tables this usually means that the join combines an intermediate result with a base table or another intermediate result. These two cases are not distinguished by the assignment and have to be detected through other information, e.g. the join tree.

The more verbose model of a DirectionalJoinOperatorAssignment also distinguishes between inner and outer relations.

Returns:

The tables that are joined together

Return type:

frozenset[TableReference]

property intermediate: frozenset[TableReference]#

Alias for join

property parallel_workers: float | int#

Get the number of parallel processes that should be used in the join.

“Processes” does not necessarily mean “system processes”. The database system can also choose to use threads or other means of parallelization. This is not restricted by the join assignment.

Returns:

The number processes to use. Can be 1 to indicate sequential processing or NaN to indicate that no choice has been made.

Return type:

float | int

inspect() str#

Provides this assignment as a natural string.

Returns:

A string representation of the assignment.

Return type:

str

is_directional() bool#

Checks, whether this assignment contains directional information, i.e. regarding inner and outer relation.

Returns:

Whether the assignment explicitly denotes which relation should be the inner relationship and which relation should be the outer relationship

Return type:

bool

class postbound.DirectionalJoinOperatorAssignment(operator: JoinOperator, inner: Collection[TableReference], outer: Collection[TableReference], *, parallel_workers: float | int = nan)#

A more verbose model of join operators.

The directional assignment does not only represent the relations that should be joined together, but also denotes which role they should play for the join. More specifically, the directional assignment provides the inner and outer relation of the join. The precise semantics of this distinction depends on the specific join operator and is also used inconsistently between different database systems. In PostBOUND we use the following definitions:

  • for nested-loop joins the outer relation corresponds to the outer loop and the inner relation is the inner loop. As a special case for index nested-loop joins ths inner relation is the one that is probed via an index

  • for hash joins the outer relation is the one that is aggregated in a hash table and the inner relation is the one that is probed against that table

  • for sort-merge joins the assignment does not matter

Parameters:
  • operator (JoinOperators) – The selected operator

  • inner (Collection[TableReference]) – The tables that form the inner relation of the join

  • outer (Collection[TableReference]) – The tables that form the outer relation of the join

  • parallel_workers (float | int, optional) – The number of parallel processes that should be used to execute the join. Can be set to 1 to indicate sequential operation. Defaults to NaN to indicate that no choice has been made.

Raises:

ValueError – If either inner or outer is empty.

property inner: frozenset[TableReference]#

Get the inner relation of the join.

Returns:

The tables of the inner relation

Return type:

frozenset[TableReference]

property outer: frozenset[TableReference]#

Get the outer relation of the join.

Returns:

The tables of the outer relation

Return type:

frozenset[TableReference]

is_directional() bool#

Checks, whether this assignment contains directional information, i.e. regarding inner and outer relation.

Returns:

Whether the assignment explicitly denotes which relation should be the inner relationship and which relation should be the outer relationship

Return type:

bool

class postbound.PhysicalOperatorAssignment#

The physical operator assignment stores the operators that should be used for specific joins or scans.

The assignment can happen at different levels:

  • global_settings enable or disable specific operators for the entire query

  • join_operators and scan_operators are concerned with specific (joins of) base tables. These assignments overwrite the global settings, i.e. it is possible to assign a nested loop join to a specific set of tables, but disable NLJ globally. In this case, only the specified join will be executed as an NLJ and other algorithms are used for all other joins

  • intermediate_operators are used to pre-process the input for joins, e.g. by caching input tuples in a memo.

The basic assumption here is that for all joins and scans that have no assignment, the database system should determine the best operators by itself. Likewise, the database system is free to insert intermediate operators wherever it sees fit.

Although it is allowed to modify the different dictionaries directly, the high-level methods (e.g. add or set_join_operator) should be used instead. This ensures that all potential (future) invariants are maintained.

The assignment enables __getitem__ access and tries to determine the requested setting in an intelligent way, i.e. supplying a single base table will provide the associated scan operator, supplying an iterable of base tables the join operator and supplying an operator will return the global setting. If no item is found, None will be returned. __iter__ and __contains__ wrap scan and join operators and __bool__ checks for any assignment (global or specific). Notice that intermediate operators are not considered in the container-like methods.

global_settings#

Contains the global settings. Each operator is mapped to whether it is enable for the entire query or not. If an operator is not present in the dictionary, the default setting of the database system is used.

Type:

dict[ScanOperators | JoinOperators | IntermediateOperator, bool]

join_operators#

Contains the join operators that should be used for individual joins. All joins are identified by the base tables that they combine. If a join does not appear in this dictionary, the database system has to choose an appropriate operator (perhaps while considering the global_settings).

Type:

dict[frozenset[TableReference], JoinOperatorAssignment]

scan_operators#

Contains the scan operators that should be used for individual base table scans. Each scan is identified by the table that should be scanned. If a table does not appear in this dictionary, the database system has to choose an appropriate operator (perhaps while considering the global_settings).

Type:

dict[TableReference, ScanOperatorAssignment]

intermediate_operators#

Contains the intermediate operators that are used to pre-process the input for joins. Keys are the intermediate tables that are processed by the operator, i.e. an entry intermediate_operators[{R, S}] = Materialize means that the result of the join between R and S should be materialized and not that the input to the join between R and S should be materialized. Notice that intermediate operators are not enforced in conjunction with the join operators. For example, a merge join assignment between R and S does not require the presence of sort operators for R and S. Such interactions must be handled by the database hinting backend.

Type:

dict[frozenset[TableReference], IntermediateOperator]

get_globally_enabled_operators(include_by_default: bool = True) frozenset[ScanOperator | JoinOperator | IntermediateOperator]#

Provides all operators that are enabled globally.

This differs from just calling assignment.global_settings directly, since all operators are checked, not just the operators that appear in the global settings dictionary.

Parameters:

include_by_default (bool, optional) – The behaviour for operators that do not have a global setting set. If enabled, such operators are assumed to be enabled and are hence included in the set.

Returns:

The enabled scan and join operators. If no global setting is available for an operator include_by_default determines the appropriate action.

Return type:

frozenset[PhysicalOperator]

set_operator_enabled_globally(operator: ScanOperator | JoinOperator | IntermediateOperator, enabled: bool, *, overwrite_fine_grained_selection: bool = False) Self#

Enables or disables an operator for all parts of a query.

Parameters:
  • operator (PhysicalOperator) – The operator to configure

  • enabled (bool) – Whether the database system is allowed to choose the operator

  • overwrite_fine_grained_selection (bool, optional) – How to deal with assignments of the same operator to individual nodes. If True all assignments that contradict the setting are removed. For example, consider a situation where nested-loop joins should be disabled globally, but a specific join has already been assigned to be executed with an NLJ. In this case, setting overwrite_fine_grained_selection removes the assignment for the specific join. This is off by default, to enable the per-node selection to overwrite global settings.

Returns:

The assignment is updated in place and returned for chaining.

Return type:

PhysicalOperatorAssignment

set_join_operator(operator: JoinOperatorAssignment | JoinOperator, tables: Iterable[TableReference] | None = None) Self#

Enforces a specific join operator for the join that consists of the contained tables.

This overwrites all previous assignments for the same join. Global settings are left unmodified since per-join settings overwrite them anyway.

Parameters:
  • join_operator (JoinOperatorAssignment | JoinOperator) – The join operator. Can be an entire assignment, or just a plain operator. If a plain operator is supplied, the actual tables to join must be provided in the tables parameter.

  • tables (Iterable[TableReference], optional) – The tables to join. This parameter is only used if only a join operator without a proper assignment is supplied in the join_operator parameter. Otherwise it is ignored.

  • operator (JoinOperatorAssignment | JoinOperator)

Returns:

The assignment is updated in place and returned for chaining.

Return type:

PhysicalOperatorAssignment

Notes

You can also pass a DirectionalJoinOperatorAssignment to this method. In contrast to the normal assignment, this one also distinguishes between inner and outer relations of the join.

set_scan_operator(operator: ScanOperatorAssignment | ScanOperator, table: TableReference | Iterable[TableReference] | None = None) Self#

Enforces a specific scan operator for the contained base table.

This overwrites all previous assignments for the same table. Global settings are left unmodified since per-table settings overwrite them anyway.

Parameters:
  • scan_operator (ScanOperatorAssignment | ScanOperator) – The scan operator. Can be an entire assignment, or just a plain operator. If a plain operator is supplied, the actual table to scan must be provided in the table parameter.

  • table (TableReference | Iterable[TableReference], optional) – The table to scan. This parameter is only used if only a scan operator without a proper assignment is supplied in the scan_operator parameter. Otherwise it is ignored.

  • operator (ScanOperatorAssignment | ScanOperator)

Returns:

The assignment is updated in place and returned for chaining.

Return type:

PhysicalOperatorAssignment

set_intermediate_operator(operator: IntermediateOperator, tables: Iterable[TableReference]) Self#

Enforces an intermediate operator to process specific tables.

This overwrites all previous assignments for the same intermediate. Global settings are left unmodified since per-intermediate settings overwrite them anyway.

Parameters:
  • intermediate_operator (IntermediateOperator) – The intermediate operator

  • tables (Iterable[TableReference]) – The tables to process. Notice that these tables are not the tables that are joined, but the input to the join. For example, consider a neste-loop join between R and S where the tuples from S should be materialized (perhaps because they stem from an expensive index access). In this case, the assignment should contain a nested-loop assignment for the intermediate {R, S} and an assignment for the materialize operator for S.

  • operator (IntermediateOperator)

Returns:

The assignment is updated in place and returned for chaining.

Return type:

PhysicalOperatorAssignment

add(operator: ScanOperatorAssignment | JoinOperatorAssignment | ScanOperator | JoinOperator | IntermediateOperator, tables: TableReference | Iterable[TableReference] | None = None) Self#

Adds an arbitrary operator assignment to the current settings.

In contrast to the set_scan_operator and set_join_operator methods, this method figures out the correct assignment type based on the input.

Parameters:
  • operator (ScanOperatorAssignment | JoinOperatorAssignment | PhysicalOperator) – The operator to use. If this is a complete assignment, it is used as such. Otherwise, the tables parameter must contain the tables that are affected by the operator.

  • tables (TableReference | Iterable[TableReference] | None, optional) – The tables to join. This parameter is only used if a plain operator is supplied in the operator parameter. Otherwise it is ignored.

Returns:

The assignment is updated in place and returned for chaining.

Return type:

PhysicalOperatorAssignment

set(operator: ScanOperator | JoinOperator | IntermediateOperator, enabled: bool) Self#

Shortcut method for set_operator_enabled_globally.

Fine-grained selections are not overwritten.

Returns:

The assignment is updated in place and returned for chaining.

Return type:

PhysicalOperatorAssignment

Parameters:
merge_with(other_assignment: PhysicalOperatorAssignment) PhysicalOperatorAssignment#

Combines the current assignment with additional operators.

In case of assignments to the same operators, the settings from the other assignment take precedence. None of the input assignments are modified.

Parameters:

other_assignment (PhysicalOperatorAssignment) – The assignment to combine with the current assignment

Returns:

The combined assignment

Return type:

PhysicalOperatorAssignment

integrate_workers_from(params: PlanParameterization, *, fail_on_missing: bool = False) PhysicalOperatorAssignment#

Adds parallel workers from plan parameters to all matching operators.

Parameters:
  • params (PlanParameterization) – Parameters that provide the number of workers for specific intermediates

  • fail_on_missing (bool, optional) – Whether to raise an error if the plan parameters contain worker hints for an intermediate that does not have an operator assigned. The default is to just ignore such hints.

Returns:

The updated assignment. The original assignment is not modified.

Return type:

PhysicalOperatorAssignment

global_settings_only() PhysicalOperatorAssignment#

Provides an assignment that only contains the global settings.

Changes to the global settings of the derived assignment are not reflected in this assignment and vice-versa.

Returns:

An assignment of the global settings

Return type:

PhysicalOperatorAssignment

clone() PhysicalOperatorAssignment#

Provides a copy of the current settings.

Changes to the copy are not reflected back on this assignment and vice-versa.

Returns:

The copy

Return type:

PhysicalOperatorAssignment

get(intermediate: TableReference | Iterable[TableReference], default: T | None = None) ScanOperatorAssignment | JoinOperatorAssignment | T | None#

Retrieves the operator assignment for a specific scan or join.

This is similar to the dict.get method. An important distinction is that we never raise an error if there is no intermediate assigned to the operator. Instead, we return the default value, which is None by default.

Notice that this method never provides intermediate operators!

Parameters:
  • intermediate (TableReference | Iterable[TableReference]) – The intermediate to retrieve the operator assignment for. For scans, either the scanned table can be given directly, or the table can be wrapped in a singleton iterable.

  • default (Optional[T], optional) – The default value to return if no assignment is found. Defaults to None.

Returns:

The assignment if it was found or the default value otherwise.

Return type:

Optional[ScanOperatorAssignment | JoinOperatorAssignment | T]

postbound.operators_from_plan(query_plan: QueryPlan, *, include_workers: bool = False) PhysicalOperatorAssignment#

Extracts the operator assignment from a whole query plan.

Notice that this method only adds parallel workers to the assignment if explicitly told to, since this is generally better handled by the parameterization.

Parameters:
  • query_plan (QueryPlan)

  • include_workers (bool)

Return type:

PhysicalOperatorAssignment

Plan Parameters#

class postbound.PlanParameterization#

The plan parameterization stores metadata that is assigned to different parts of the plan.

Currently, three types of parameters are supported:

  • cardinalities provide specific cardinality estimates for individual joins or tables. These can be used to overwrite the estimation of the native database system

  • parallel_workers indicate how many worker processes should be used to execute individual joins or table scans (assuming that the selected operator can be parallelized). Notice that this can also be indicated as part of the PhysicalOperatorAssignment which will take precedence over this setting.

  • system_settings can be used to enable or disable specific optimization or execution features of the target database. For example, they can be used to disable parallel execution or switch to another cardinality estimation method. Such settings should be used sparingly since they defeat the purpose of optimization algorithms that are independent of specific database systems. Using these settings can also modify properties of the connection and therefore affect later queries. It is the users’s responsibility to reset such settings if necessary.

In addition, the execution_mode can be used to control whether the optimizer should only consider sequential plans or parallel plans. Note that the parallel_workers take precedence over this setting. If the optimizer should decide whether a parallel execution is beneficial, this should be set to None.

Although it is allowed to modify the different dictionaries directly, the more high-level methods should be used instead. This ensures that all potential (future) invariants are maintained.

cardinalities#

Contains the cardinalities for individual joins and scans. This is always the cardinality that is emitted by a specific operator. All joins are identified by the base tables that they combine. Keys of single tables correpond to scans. Each join should assume that all filter predicates that can be evaluated at this point have already been applied.

Type:

dict[frozenset[TableReference], Cardinality]

parallel_workers#

Contains the number of parallel processes that should be used to execute a join or scan. All joins are identified by the base tables that they combine. Keys of single tables correpond to scans. “Processes” does not necessarily mean “system processes”. The database system can also choose to use threads or other means of parallelization. This is not restricted by the join assignment.

Type:

dict[frozenset[TableReference], int]

system_settings#

Contains the settings for the target database system. The keys and values, as well as their usage depend entirely on the system. For example, in Postgres a setting like enable_geqo = ‘off’ can be used to disable the genetic optimizer.

Type:

dict[str, Any]

execution_mode#

Indicates whether the optimizer should only consider sequential plans, parallel plans, or leave the decision to the optimizer (None). The default is None.

Type:

ExecutionMode | None

cardinalities: dict[frozenset[TableReference], Cardinality]#

Contains the cardinalities for individual joins and scans. This is always the cardinality that is emitted by a specific operator. All joins are identified by the base tables that they combine. Keys of single tables correpond to scans. Each join should assume that all filter predicates that can be evaluated at this point have already been applied.

parallel_workers: dict[frozenset[TableReference], int]#

Contains the number of parallel processes that should be used to execute a join or scan. All joins are identified by the base tables that they combine. Keys of single tables correpond to scans. “Processes” does not necessarily mean “system processes”. The database system can also choose to use threads or other means of parallelization. This is not restricted by the join assignment.

system_settings: dict[str, Any]#

Contains the settings for the target database system. The keys and values, as well as their usage depend entirely on the system. For example, in Postgres a setting like enable_geqo = ‘off’ can be used to disable the genetic optimizer.

execution_mode: Literal['sequential', 'parallel'] | None#

Indicates whether the optimizer should only consider sequential plans, parallel plans, or leave the decision to the optimizer (None). The default is None.

add_cardinality(tables: Iterable[TableReference], cardinality: Cardinality) Self#

Assigns a specific cardinality hint to a (join of) tables.

Parameters:
  • tables (Iterable[TableReference]) – The tables for which the hint is generated. This can be an iterable of a single table, which denotes a scan hint.

  • cardinality (Cardinality) – The estimated or known cardinality.

Returns:

The parameters are updated in place and returned for chaining.

Return type:

PlanParameterization

set_workers(tables: Iterable[TableReference], num_workers: int) Self#

Assigns a specific number of parallel workers to a (join of) tables.

How these workers are implemented depends on the database system. They could become actual system processes, threads, etc.

Parameters:
  • tables (Iterable[TableReference]) – The tables for which the hint is generated. This can be an iterable of a single table, which denotes a scan hint.

  • num_workers (int) – The desired number of worker processes. This denotes the total number of processes, not an additional amount. For some database systems this is an important distinction since one operator node will always be created. This node is then responsible for spawning the workers, but can also take part in the actual calculation. To prevent one-off errors, we standardize this number to denote the total number of workers that take part in the calculation.

Returns:

The parameters are updated in place and returned for chaining.

Return type:

PlanParameterization

set_system_settings(setting_name: str = '', setting_value: Any = None, **kwargs) Self#

Stores a specific system setting.

This may happen in one of two ways: giving the setting name and value as two different parameters, or combining their assignment in the keyword parameters. While the first is limited to a single parameter, the second can be used to assign an arbitrary number of settings. However, this is limited to setting names that form valid keyword names.

Parameters:
  • setting_name (str, optional) – The name of the setting when using the separate key/value assignment mode. Defaults to an empty string to enable the integrated keyword parameter mode.

  • setting_value (Any, optional) – The setting’s value when using the separate key/value assignment mode. Defaults to None to enable the integrated keyword parameter mode.

  • kwargs – The key/value pairs in the integrated keyword parameter mode.

Returns:

The parameters are updated in place and returned for chaining.

Return type:

PlanParameterization

Raises:
  • ValueError – If both the setting_name as well as keyword arguments are given

  • ValueError – If neither the setting_name nor keyword arguments are given

Examples

Using the separate setting name and value syntax: set_system_settings("join_collapse_limit", 1) Using the kwargs syntax: set_system_settings(join_collapse_limit=1, jit=False) Both examples are specific to Postgres (see https://www.postgresql.org/docs/current/runtime-config-query.html).

merge_with(other_parameters: PlanParameterization) PlanParameterization#

Combines the current parameters with additional hints.

In case of assignments to the same hints, the values from the other parameters take precedence. None of the input parameterizations are modified.

Parameters:

other_parameters (PlanParameterization) – The parameterization to combine with the current parameterization

Returns:

The merged parameters. The original parameterizations are not modified.

Return type:

PlanParameterization

drop_workers() PlanParameterization#

Provides a copy of the current parameters without any parallel worker hints.

Changes to the copy are not reflected back on this parameterization and vice-versa.

Returns:

The copy without any parallel worker hints

Return type:

PlanParameterization

postbound.ExecutionMode#

alias of Literal[‘sequential’, ‘parallel’]

postbound.parameters_from_plan(query_plan: QueryPlan | LogicalJoinTree, *, target_cardinality: Literal['estimated', 'actual'] = 'estimated', fallback_estimated: bool = False) PlanParameterization#

Extracts the cardinality estimates from a join tree.

The join tree can be either a logical representation, in which case the cardinalities are extracted directly. Or, it can be a full query plan, in which case the cardinalities are extracted from the estimates or actual measurements. The cardinality source depends on the target_cardinality setting. If actual cardinalities should be used, but some nodes do only have estimates, these can be used as a fallback if fallback_estimated is set.

Parameters:
  • query_plan (QueryPlan | LogicalJoinTree)

  • target_cardinality (Literal['estimated', 'actual'])

  • fallback_estimated (bool)

Return type:

PlanParameterization