Query Plans#
- class postbound.QueryPlan(node_type: str | ScanOperator | JoinOperator | IntermediateOperator, *, operator: ScanOperator | JoinOperator | IntermediateOperator | None = None, children: QueryPlan | Iterable[QueryPlan] | None = None, plan_params: PlanParams | None = None, subplan: Subplan | None = None, estimates: PlanEstimates | None = None, measures: PlanMeasures | None = None, base_table: TableReference | None = None, filter_predicate: AbstractPredicate | None = None, parallel_workers: int | None = None, index: str | None = None, sort_keys: Sequence[SortKey] | None = None, lookup_key: SqlExpression | None = None, estimated_cardinality: Cardinality = NaN, estimated_cost: float = nan, actual_cardinality: Cardinality = NaN, execution_time: float = nan, cache_hits: int | None = None, cache_misses: int | None = None, subplan_root: QueryPlan | None = None, subplan_target_name: str = '', **kwargs)#
Models the structure of a query execution plan (QEP).
Query plans are constructed as a tree of operators. Each operator represents an entire query plan by itself. Hence, we use the QueryPlan to refer to the actual nodes in a hierarchical structure. Each node has a potentially large amount of metadata attached to it, e.g. regarding the table being scanned for scan nodes, the estimated cost of the operator or the actual cardinality of the result set. The different types of metadata are structured into three separate classes:
PlanParams contain all structural metadata about the operator, e.g. the table being scanned or the filter predicate.
PlanEstimates contain the optimizer’s view on the operator, e.g. the estimated cardinality and cost.
PlanMeasures contain the actual execution statistics of the operator, e.g. the actual cardinality and execution time.
Users are free to attach additional metadata to each of the containers to support there specific use-cases. However, these additional fields are typically not considered by the standard methods available on query plans. For example, if users store additional tables in the node, these are not considered in the tables method.
Each query plan can contain an arbitrary number of child nodes. This is true even for scans, to accomodate bitmap scans that combine an arbitrary amount of index lookups with a final scan. If just a single child is present, it can be set more expressively using the input_node property.
PostBOUND uses QEPs in two different ways: first, they can be used as the output of the optimization process (i.e. the optimization pipelines), being constructed by the different optimization stages. Second, they can also be extracted from an actual database system to encode the QEP that this system used to execute a specific query. This dichotomy leads to different granularities of query plans: actual database systems often have much more detailed QEPs. For example, Postgres represents a hash join as a hash join operator, whose inner child is a hash operator that constructs the hash table. The optimizer stages will typically not worry about such fine-grained details and simply demand a join to be executed as a hash join. To mitigate these issues, the query plans can be normalized by using the canonical method. This method removes all unnecessary details and only retains the join and scan operators.
When constructing a query plan, the metadata can be provided in two ways: either as instances of the corresponding metadata objects, or explicitly as keyword arguments to enable a more convenient usage. Notice however, that these two ways cannot be mixed: either all metadata of a specific type is provided as wrapper instance, or all metadata is provided as keyword arguments. Mixing is only allowed across different metadata types, e.g. providing the estimates as a PlanEstimates object and the measurements as keyword arguments.
In addition to the pre-defined metadata types, you can also add additional metadata as part of the kwargs. These will be added to the plan parameters (using the same mixing rules as the pre-defined types). Each query plan provides dict-like access to the plan parameters, estimates and measures, e.g.
plan["custom"] = 42,plan.get("custom", default), or"custom" in plan.Query plans provide rather extensive support methods to check their shape (e.g. is_linear() or is_bushy()), to aid with traversal (e.g. find_first_node() or find_all_nodes()) or to extract specific information (e.g. tables() or qerror()).
To convert between different optimization artifacts, a number of methods are available. For example, to_query_plan can be used to construct a query plan from a join order and a set of operators. Likewise, explode_query_plan converts the query plan back into join order, operators and parameters.
Query plans support len() (providing the plan depth without subplans) and iter() (providing all contained nodes including subplans).
- Parameters:
node_type (str | PhysicalOperator) – The name of the operator. If this is supplied as a physical operator, the name is inferred from it.
operator (Optional[PhysicalOperator], optional) – The actual operator that is used to compute the result set. This can be empty if there is no specific operator corresponding to the current node (e.g. for transient hash tables).
children (Optional[QueryPlan | Iterable[QueryPlan]], optional) – The input nodes of the current operator. For nodes without an input (e.g. most scans), this can simply be None or an empty list. Nodes with exactly one input node (e.g. most aggregations) can supply their input either directly as a plan object, or as a singleton list. Nodes with two input nodes (e.g. joins) should supply them as an ordered iterable with the outer child first.
plan_params (Optional[PlanParams], optional) – Structural metadata (e.g. parallel workers or accessed indexes) of the operator. If this is provided, no other plan parameters can be supplied as keyword arguments, including kwargs.
subplan (Optional[Subplan], optional) – A subquery that has to be executed as part of this node. If this is provided, no other subplan components can be supplied as keyword arguments.
estimates (Optional[PlanEstimates], optional) – The optimizer’s view on the operator (e.g. estimated cardinality and cost). If this is provided, no other estimates can be supplied as keyword arguments.
measures (Optional[PlanMeasures], optional) – The actual execution statistics of the operator (e.g. actual cardinality and execution time). If this is provided, no other measures can be supplied as keyword arguments.
base_table (Optional[TableReference], optional) – The table that is being scanned. This is only relevant for scan nodes and should be None for all other nodes. If this argument is used, no other plan parameters can be supplied in the plan_params argument.
filter_predicate (Optional[AbstractPredicate], optional) – An arbitrary predicate to restrict the allowed tuples in the output of a relation. This should be mostly used for join nodes and scans. If this argument is used, no other plan parameters can be supplied in the plan_params argument.
parallel_workers (Optional[int], optional) – The number of parallel workers that should be used to execute the operator. If this argument is used, no other plan parameters can be supplied in the plan_params argument.
index (Optional[str], optional) – The name of the index that should be used to scan the table. This is mostly relevant for scan nodes and should be None for all other nodes. If this argument is used, no other plan parameters can be supplied in the plan_params argument.
lookup_key (Optional[SqlExpression], optional) – The expression that is used to lookup tuples in some indexing structure. For scans, this could actually be the physical index. For intermediate operators such as hash tables or memoize nodes, this could be the expression that is used to build the table or to structure the memo. If this argument is used, no other plan parameters can be supplied in the plan_params argument.
sort_keys (Optional[Sequence[SortKey]], optional) – How the tuples in a the output of a relation are sorted. Absence of a specific sort order can be indicated either through an empty list or by setting this parameter to None. In this case, tuples are assumed to be in some random order. If this argument is used, no other plan parameters can be supplied in the plan_params argument.
estimated_cardinality (Cardinality, optional) – The estimated number of tuples that are produced by the operator. If no estimate is available, NaN can be used. If this argument is used, no other estimates can be supplied in the estimates argument.
estimated_cost (Cost, optional) – The approximate amount of abstract “work” that needs to be done to compute the result set of the operator. If no estimate is available, NaN can be used. If this argument is used, no other estimates can be supplied in the estimates argument.
actual_cardinality (Cardinality, optional) – The actual number of tuples that are produced by the operator. If no measurement is available, NaN can be used. If this argument is used, no other measures can be supplied in the measures argument.
execution_time (float, optional) – The total time (in seconds) that was spent to compute the result set of the operator. If no measurement is available, NaN can be used. If this argument is used, no other measures can be supplied in the measures argument.
cache_hits (Optional[int], optional) – The number of page reads that were satisfied by the shared buffer. If no measurement is available, None can be used. If this argument is used, no other measures can be supplied in the measures argument.
cache_misses (Optional[int], optional) – The number of page reads that had to be delegated to the disk and could not be satisfied by the shared buffer. If no measurement is available, None can be used. If this argument is used, no other measures can be supplied in the measures argument.
subplan_root (Optional[QueryPlan], optional) – The root operator of the subplan. If this argument is used, no other subplan components can be supplied in the subplan argument.
subplan_target_name (str, optional) – The name of the target table that the subplan should produce. If this argument is used, no other subplan components can be supplied in the subplan argument.
**kwargs – Additional metadata that should be attached to the plan parameters. If this is used, no other plan parameters can be supplied in the plan_params argument.
See also
to_query_plan,explode_query_plan,OptimizerInterface.query_plan,OptimizationPipeline.query_execution_plan- property node_type: str#
Get the name of the operator.
- property operator: ScanOperator | JoinOperator | IntermediateOperator | None#
Get the actual operator that is used to compute the result set.
For transient operators (e.g. hash tables), this can be None.
- property input_node: QueryPlan | None#
Get the input node of the current operator.
For nodes without an input (e.g. most scans), or nodes with multiple inputs (e.g. joins), this is None.
- property children: Sequence[QueryPlan]#
Get the input nodes of the current operator.
For nodes without an input (e.g. most scans), this is an empty list. For nodes with exactly one input (e.g. most aggregations), this is a singleton list. For nodes with two input nodes (e.g. joins), this is an ordered iterable with the outer child first.
- property outer_child: QueryPlan | None#
Get the outer input of the current operator.
For nodes that do not have exactly two inputs, this is None.
- property inner_child: QueryPlan | None#
Get the inner input of the current operator.
For nodes that do not have exactly two inputs, this is None.
- property params: PlanParams#
Get the structural metadata of the operator.
- property base_table: TableReference | None#
Get the table that is being scanned. For non-scan nodes, this will probably is None.
This is just a shorthand for accessing the plan parameters manually.
See also
- property filter_predicate: AbstractPredicate | None#
Get the filter predicate that is used to restrict the tuples in the output of a relation.
This is just a shorthand for accessing the plan parameters manually.
See also
- property sort_keys: Sequence[SortKey]#
Get the sort keys describing the ordering of tuples in the output relation.
Absence of a specific sort order is indicated by an empty sequence.
This is just a shorthand for accessing the plan parameters manually.
See also
- property lookup_key: SqlExpression | None#
Get the expression that is used to lookup tuples in some indexing structure.
This is just a shorthand for accessing the plan parameters manually.
See also
- property parallel_workers: int#
Get the number of parallel workers that should be used to execute the operator.
Absence of parallel execution is indicated by 0.
This is just a shorthand for accessing the plan parameters manually.
See also
- property estimates: PlanEstimates#
Get the optimizer’s view on the operator.
- property estimated_cardinality: Cardinality#
Get the cardinality estimate of the optimizer.
This is just a shorthand for accessing the estimates manually.
See also
- property estimated_cost: float#
Get the cost estimate of the optimizer.
This is just a shorthand for accessing the estimates manually.
See also
- property measures: PlanMeasures#
Get the actual execution statistics of the operator.
- property actual_cardinality: Cardinality#
Get the actual cardinality of the operator.
This is just a shorthand for accessing the measures manually.
See also
- property execution_time: float#
Get the actual execution time (in seconds) of the operator.
The execution time is cumulative, i.e. it includes the time spent in all child operators as well.
This is just a shorthand for accessing the measures manually.
See also
- property operator_time: float#
Get the actual execution time (in seconds) spent in this operator only.
The operator time excludes the time spent in child operators.
- Returns:
The operator execution time in seconds. If no measurement is available, NaN is returned.
- Return type:
float
- get(key: str, default: Any = None) Any#
Retrieves a specific parameter from the plan.
The lookup is performed in the following order:
Plan parameters
Plan estimates
Plan measures
If none of these containers contains the requested key, the default value is returned.
- Parameters:
key (str)
default (Any)
- Return type:
Any
- is_join() bool#
Checks, whether the current node is a join operator.
- Return type:
bool
- is_scan() bool#
Checks, whether the current node is a scan operator.
- Return type:
bool
- is_auxiliary() bool#
Checks, whether the current node is an arbitrary intermediate operator (i.e. not a join nor a scan).
- Return type:
bool
- is_analyze() bool#
Checks, whether the plan was executed in ANALYZE mode, i.e. whether runtime measurements are available.
- Return type:
bool
- is_ordered() bool#
Checks, whether the plan guarantees a specific order of the result tuples.
- Return type:
bool
- is_linear() bool#
Checks, whether the plan performs all joins in a linear order.
This is the case if all join nodes compute their result by joining at least one base table (no matter whether it is the inner or outer child) with another relation (base relation or intermediate).
As a special case, scan nodes are considered to be linear as well.
- Return type:
bool
- is_bushy() bool#
Checks, whether the plan performs joins in a bushy order.
This is the case if at least one join node joins two intermediates that are themselves the result of a join.
- Return type:
bool
- is_left_deep() bool#
Checks, whether the plan performs all joins in a left-deep order.
Left deep order means that the plan is linear and all joins are performed with the base table as the inner relation. As a special case, scan nodes are considered to be right-deep as well.
- Return type:
bool
- is_right_deep() bool#
Checks, whether the plan performs all joins in a right-deep order.
Right deep order means that the plan is linear and all joins are performed with the base table as the outer relation. As a special case, scan nodes are considered to be right-deep as well.
- Return type:
bool
- is_zigzag() bool#
Checks, whether the plan performs all joins in a zigzag order.
Zig-zag order means that the plan is linear, but neither left-deep nor right-deep. Therefore, at least one join has to be performed with the base table as the outer relation and another join with the base table as the inner relation. As a special case, scan nodes are considered to be zig-zag as well.
- Return type:
bool
- is_scan_branch() bool#
Checks, whether the current node directly leads to a scan node.
For example, the plan Hash(SeqScan(R)) is a scan branch, because the input of the hash node is a scan node. Likewise, the plan Aggregate(Sort(R)) is a scan branch, because the input of the aggregate node is just a sort node which in turn contains a scan node. On the other hand, the plan NestLoop(SeqScan(R), IdxScan(S)) is not a scan branch, because the nested-loop join contains two input nodes that are both scans.
If a plan is a scan branch, fetch_base_table() can be used to directly retrieve the base table that is being scanned.
- Return type:
bool
- is_base_join() bool#
Checks, whether the current node is a join node that joins two base tables.
The base tables do not need to be direct children of the join, but both at least have to be scan branches, as in the case of MergeJoin(Sort(SeqScan(R)), IdxScan(S)).
See also
- Return type:
bool
- plan_depth() int#
Calculates the depth of the query plan.
The depth of a query plan is the length of the longest path from the root to a leaf node. The leaf node is included in the calculation, i.e. the depth of the plan SeqScan(R) is 1.
- Return type:
int
- fetch_base_table() TableReference | None#
Retrieves the base table that is being scanned by the plan.
The base table is only specified for plans that directly lead to a scan node, as defined by is_scan_branch().
- Return type:
TableReference | None
- outermost_scan() QueryPlan | None#
Retrieves the scan node that is furthest to the “left”, i.e. on the outer-most position in the plan.
- Return type:
QueryPlan | None
- tables() set[TableReference]#
Provides all tables that are accessed at some point in the plan.
Notice that tables that are only accessed as part of user-specific metadata are not considered.
- Return type:
set[TableReference]
- columns() set[ColumnReference]#
Provides all columns that are accessed at some point in the plan.
Notice that columns that are only accessed as part of user-specific metadata are not considered.
- Return type:
set[ColumnReference]
- iternodes() Iterable[QueryPlan]#
Provides all nodes that are contained in the plan in depth-first order, prioritizing outer child nodes.
- Return type:
Iterable[QueryPlan]
- lookup(tables: TableReference | Iterable[TableReference]) QueryPlan | None#
Traverse the plan to find a specific intermediate node.
If two nodes compute the same intermediate (i.e. provide the same tables), the node that is higher up in the plan is returned. If both appear on the same level, the outer child is preferred.
- Parameters:
tables (TableReference | Iterable[TableReference]) – The tables that should be contained in the intermediate. If a single table is provided (either as-is or as a singleton iterable), the corresponding scan node will be returned. If multiple tables are provided, the highest node that provides all of them exactly is returned.
- Returns:
The join tree node that contains the specified tables. If no such node exists, None is returned.
- Return type:
Optional[QueryPlan]
- find_first_node(predicate: Callable[[QueryPlan], bool], *args, direction: Literal['inner', 'outer'] = 'outer', **kwargs) QueryPlan | None#
Recursively searches for the first node that matches a specific predicate.
- Parameters:
predicate (Callable[[QueryPlan], bool]) – The predicate to check. The predicate is called on each node in the tree and should return a True-ish value if the node matches the desired search criteria.
direction (JoinDirection, optional) – The traversal strategy to use. Outer (the default) indicates that the outer child should be traversed first if the check on the parent node fails. Inner indicates the opposite.
args – Additional positional arguments that are passed to the predicate after the current node.
kwargs – Additional keyword arguments that are passed to the predicate.
- Returns:
The first node that matches the predicate. If no such node exists, None is returned.
- Return type:
Optional[QueryPlan]
- find_all_nodes(predicate: Callable[[QueryPlan], bool], *args, **kwargs) Iterable[QueryPlan]#
Recursively searches for all nodes that match a specific predicate.
The order in which the matching nodes appear is an implementation detail and should not be relied upon.
- Parameters:
predicate (Callable[[QueryPlan], bool]) – The predicate to check. The predicate is called on each node in the tree and should return a True-ish value if the node matches the desired search criteria.
args – Additional positional arguments that are passed to the predicate after the current node.
kwargs – Additional keyword arguments that are passed to the predicate.
- Returns:
All nodes that match the predicate. If no such nodes exist, an empty iterable is returned.
- Return type:
Iterable[QueryPlan]
- cout(*, include_auxiliaries: bool = True) float#
Computes the C-out value of the operator.
The C-out value is the sum of the cardinalities of the current operator and all its children.
If the plan does not contain a measurement of the actual cardinality, the C-out value is undefined (indicated as NaN).
- Parameters:
include_auxiliaries (bool, optional) – Whether to include auxiliary nodes in the computation (which is the default). If disabled, only the actual cardinality of join and scan nodes is considered.
- Return type:
float
- qerror() float#
Computes the Q-error of the operator.
If the plan does not contain an estimate of the cardinality, the Q-error value is undefined (indicated as NaN).
Notes
We use the a slight deviation from the standard definition:
\[qerror(e, a) = \frac{max(e, a) + 1}{min(e, a) + 1}\]where e is the estimated cardinality of the node and a is the actual cardinality. Notice that we add 1 to both the numerator as well as the denominator to prevent infinity errors for nodes that do not process any rows (e.g. due to pruning).
- Return type:
float
- parallelize(workers: int) QueryPlan#
Creates a parallel version of the query plan with the specified number of workers.
The original plan is not changed.
- Parameters:
workers (int)
- Return type:
- with_estimates(*, cardinality: Cardinality | None = None, cost: float | None = None, keep_measures: bool = False) QueryPlan#
Replaces the current estimates of the operator with new ones.
- Parameters:
cardinality (Optional[Cardinality], optional) – The new estimated cardinality of the operator. If the estimate should be dropped NaN can be used. If the current cardinality should be kept, None can be passed (which is the default).
cost (Optional[Cost], optional) – The new estimated cost of the operator. If the estimate should be dropped, NaN can be used. If the current cost should be kept, None can be passed (which is the default).
keep_measures (bool, optional) – Whether to keep the actual measurements of the operator. If this is set to False, the actual cardinality and execution time are dropped. Measures are dropped by default because they usually depend on the estimates (which are now changed).
- Return type:
- with_actual_card(*, cost_estimator: Callable[[QueryPlan, Cardinality], float] | None = None, ignore_nan: bool = True) QueryPlan#
Replaces the current estimates of the operator with the actual measurements.
The updated plan will not contain any measurements anymore and the costs will be set to Nan unless an explicit cost estimator is provided.
- Parameters:
cost_estimator (Optional[Callable[[QueryPlan, Cardinality], Cost]], optional) – An optional cost function to compute new estimates based on the new estimates. If no cost estimator is provided, the cost is set to NaN. The estimator receives the old plan now along with the new cardinality estimate as input and should return the new cost estimate.
ignore_nan (bool, optional) – Whether NaN cardinalities should also be swapped. By default, this is set to True, which only replaces the estimated cardinality if the actual cardinality is a meaningful value.
- Returns:
A new query plan with the actual cardinality as the estimated cardinality and the actual execution time as the estimated cost. The current plan is not changed.
- Return type:
- scale_cardinality(factor: float, *, kind: Literal['estimated', 'actual', 'both'] = 'estimated', recursive: bool = False) QueryPlan#
Scales the cardinality estimate of the operator by a specific factor.
- Parameters:
factor (float) – The factor by which the cardinality estimate should be scaled. For example, a factor of 0.5 reduces the estimate by half, while a factor of 2.0 doubles the estimate.
kind (Literal["estimated", "actual", "both"], optional) – Specifies which cardinality estimates should be scaled. If set to estimated (the default), only the estimated cardinality is scaled. If set to actual, only the actual cardinality is scaled. If set to both, both estimates are scaled.
recursive (bool, optional) – Whether to apply the scaling recursively to all child nodes. By default, this is set to False, which only scales the estimate of the current node.
- Returns:
A new query plan with the scaled cardinality estimate. The current plan is not changed.
- Return type:
- canonical() QueryPlan#
Creates a normalized version of the query plan.
This normalized version will only contain scan and join nodes, without any auxiliary nodes. Estimates and measurements of these nodes are kept as they are.
This method is mostly intended to remove system-specific elements of the QEP and provide a more stable representation. For example, Postgres uses a combination of Hash join node and Hash node to represent an actual hash join. Likewise, bitmap scans are represented as a bitmap heap scan with a number of bitmap index scans (and optional bitmap ANDs and ORs) as child nodes. With canonical all of these “implementation details” are removed and only the core of the query plan is kept.
Notice that aggregations and groupings are also auxiliary nodes and will not be available after canonicalization. Therefore, the cost of the canonical query plan might be less than the cost of the original plan.
- Return type:
- inspect(*, fields: Iterable[str] | None = None) str#
Provides a human-readable representation of the query plan, inspired by Postgre’s EXPLAIN output.
By default, the output will contain fields akin to the EXPLAIN output of Postgres. For example, this includes the estimated cardinality and the operator cost, or for ANALYZE plans also the actual measurements.
This can be customized by providing a list of fields that should be included in the output. The fields can either reference properties of the plan itself (e.g.
estimated_cardinality) or of a redirection to the metadata properties (e.g.params.index). However, the current implementation only supports a single level of indirection, i.e. noparams.custom_property.one_more_level.- Parameters:
fields (Iterable[str] | None)
- Return type:
str
- plan_summary() dict[str, object]#
Provides a quick summary of important properties of the query plan, inspired by Panda’s describe method.
- Return type:
dict[str, object]
- ast() str#
Provides the tree-structure of the plan in a human-readable format.
- Return type:
str
- class postbound.PlanParams(*, base_table: TableReference | None = None, filter_predicate: AbstractPredicate | None = None, sort_keys: Sequence[SortKey] | None = None, parallel_workers: int | None = None, index: str | None = None, lookup_key: SqlExpression | None = None, **kwargs)#
Plan parameters contain additional “structural” metadata about the operators in a query plan.
This information is mostly concerned with how the operator should function, e.g. which table it should scan, or which index to use or how the tuples will be sorted.
In addition to the pre-defined attributes, users can attach arbitrary metadata using a dict-like access into the parameters, e.g.
params["custom"] = 42.- Parameters:
base_table (Optional[TableReference], optional) – For scan nodes, this is the table being scanned. For all other nodes this is should be None.
filter_predicate (Optional[AbstractPredicate], optional) – An arbitrary predicate to restrict the allowed tuples in the output of a relation. This should be mostly used for join nodes and scans.
sort_keys (Optional[Sequence[SortKey]], optional) – How the tuples in a the output of a relation are sorted. Absence of a specific sort order can be indicated either through an empty list or by setting this parameter to None. In this case, tuples are assumed to be in some random order.
parallel_workers (Optional[int], optional) – The number of parallel workers that should be used to execute the operator. The underlying processing model assumes that there exists some sort of main operator process which spawns additional worker processes. The worker processes will compute the output relation together with the main process. Hence, if some relation should be processed by two processes in parallel, the proper value for this parameter would be 1 (the main process and one additional worker). It is up to the actual execution engine to decide whether a lower number of workers has to be used.
index (Optional[str], optional) – The name of the index that should be used to scan the table. This is only relevant for scan nodes and should be None for all other nodes.
lookup_key (Optional[SqlExpression], optional) – The expression that is used to lookup tuples in some indexing structure. For scans, this could actually be the physical index. For intermediate operators such as hash tables or memoize nodes, this could be the expression that is used to build the table or to structure the memo.
**kwargs – Additional metadata that should be attached to the plan parameters.
- property base_table: TableReference | None#
Get the base table that is being scanned. For non-scan nodes, this is None.
- property filter_predicate: AbstractPredicate | None#
Get the filter predicate that is used to restrict the tuples in the output of a relation.
For join nodes this would be the join condition and for scan nodes this would be the filter conditions from the WHERE clause. However, if the optimizer decides to delay the evaluation of some filter, or some filters need to be evaluated multiple times (e.g. recheck conditions in Postgres), this predicate can be more complex.
- property sort_keys: Sequence[SortKey]#
Get the sort keys describing the ordering of tuples in the output relation.
Absence of a specific sort order is indicated by an empty sequence.
- property parallel_workers: int#
Get the number of parallel workers that should be used to execute the operator.
The underlying processing model assumes that there exists some sort of main operator process which spawns additional worker processes. The worker processes will compute the output relation together with the main process. Hence, if some relation should be processed by two processes in parallel, the proper value for this parameter would be 1 (the main process and one additional worker).
It is up to the actual execution engine to decide whether a lower number of workers has to be used.
Absence of parallelism is indicated by 0.
- property index: str#
Get the name of the index that should be used to scan the table.
Absence of an index is indicated by an empty string.
- property lookup_key: SqlExpression | None#
Get the expression that is used to lookup tuples in some indexing structure.
For scans, this could actually be the physical index. In this case, the lookup expression should be the one that is used to build the index, e.g., the primary key column. For intermediate operators such as hash tables or memoize nodes, this could be the expression that is used to build the table or to structure the memo.
- tables() set[TableReference]#
Provide all tables that are referenced at some point in the plan parameters.
This includes only the well-defined properties available to all parameterizations, i.e. base_table and filter_predicate. If users decide to store additional metadata with table information in the parameters, these are not retained here.
- Returns:
The tables
- Return type:
set[TableReference]
- columns() set[ColumnReference]#
Provides all columns that are referenced at some point in the plan parameters.
This includes only the well-defined properties available to all parameterizations, i.e. just the filter_predicate. If users decide to store additional metadata with column information in the parameters, these are not retained here.
- Returns:
The columns
- Return type:
set[ColumnReference]
- get(key: str, default: Any = None) Any#
Retrieves the value of a specific key from the parameters.
This is similar to the dict.get method. An important distinction is that we never raise an error if there is no parameter with the given key. Instead, we return the default value, which is None by default.
- Parameters:
key (str) – The parameter name
default (Any, optional) – The default value to return if the parameter is not found. Defaults to None.
- Returns:
The parameter value if it exists, otherwise the default value.
- Return type:
Any
- items() Iterable[tuple[str, Any]]#
Provides all metadata that is currently stored in the parameters as key-value pairs, similar to dict.items
- Return type:
Iterable[tuple[str, Any]]
- clone(*, deep: bool = False) PlanParams#
Creates a copy of the current plan parameters.
- Parameters:
deep (bool, optional) – Whether to create a deep copy of all parameters. Defaults to False.
- Returns:
The copied parameters.
- Return type:
- class postbound.PlanEstimates(*, cardinality: Cardinality = NaN, cost: float = nan, **kwargs)#
Plan estimates provide the optimizer’s view on a specific (sub-)plan.
This includes the estimated cardinality and cost of the plan. The cardinality is the number of tuples that are expected to be produced by the operator, while the cost is a measure of the resources that are consumed by the operator. Costs do not have a specific unit and it is the user’s obligation to ensure that they are used in a sound way. Most importantly, this means that only costs from the same source should be compared since most database systems interpret costs in a different way.
In addition to the pre-defined attributes, users can attach arbitrary metadata using a dict-like access into the parameters, e.g.
estimates["custom"] = 42.- Parameters:
cardinality (Cardinality, optional) – The estimated number of tuples that are produced by the operator. If no estimate is available, NaN can be used.
cost (Cost, optional) – The approximate amount of abstract “work” that needs to be done to compute the result set of the operator. If no estimate is available, NaN can be used.
**kwargs – Additional metadata that should be attached to the plan estimates.
Notes
In case of parallel execution, all measures should be thought of “meaningful totals”, i.e. the cardinality numbers are the total number of tuples produced by all workers. The execution time should denote the wall time, it took to execute the entire operator (which just happened to include parallel processing), not an average of the worker execution time or some other measure.
- property cardinality: Cardinality#
Get the estimated cardinality of the operator. Can be NaN if no estimate is available.
- property cost: float#
Get the estimated cost of the operator. Can be NaN if no estimate is available.
- property additional_estimates: dict[str, Any]#
Get all additional estimates that are not part of the well-defined attributes.
- get(key: str, default: Any = None) Any#
Retrieves the value of a specific key from the estimates.
This is similar to the dict.get method. An important distinction is that we never raise an error if there is no parameter with the given key. Instead, we return the default value, which is None by default.
- Parameters:
key (str) – The parameter name
default (Any, optional) – The default value to return if the parameter is not found. Defaults to None.
- Returns:
The parameter value if it exists, otherwise the default value.
- Return type:
Any
- items() Iterable[tuple[str, Any]]#
Provides all estimates as key-value pairs, similar to the dict.items method.
- Return type:
Iterable[tuple[str, Any]]
- clone(*, deep: bool = False) PlanEstimates#
Creates a copy of the current plan estimates.
- Parameters:
deep (bool, optional) – Whether to create a deep copy of all estimates. Defaults to False.
- Returns:
The copied estimates.
- Return type:
- class postbound.PlanMeasures(*, cardinality: Cardinality = NaN, execution_time: float = nan, cache_hits: int | None = None, cache_misses: int | None = None, **kwargs)#
Plan measures provide actual execution statistics of a specific (sub-)plan.
Typically, this includes the actual cardinality of the result set as well as the execution time of the operator. Additionally, information about cache hits and misses for the shared buffer can be provided.
Other than the pre-defined attributes, users can attach arbitrary metadata using a dict-like access into the parameters, e.g.
measures["custom"] = 42.- Parameters:
cardinality (Cardinality, optional) – The actual number of tuples that are produced by the operator. If no measurement is available, NaN can be used.
execution_time (float, optional) – The total time (in seconds) that was spent to compute the result set of the operator. If no measurement is available, NaN can be used.
cache_hits (Optional[int], optional) – The number of page reads that were satisfied by the shared buffer. If no measurement is available, None can be used.
cache_misses (Optional[int], optional) – The number of page reads that had to be delegated to the disk and could not be satisfied by the shared buffer. If no measurement is available, None can be used.
**kwargs – Additional metadata that should be attached to the plan measures.
Notes
In case of parallel execution, all measures should be thought of “meaningful totals”, i.e. the cardinality numbers are the total number of tuples produced by all workers. The execution time should denote the wall time, it took to execute the entire operator (which just happened to include parallel processing), not an average of the worker execution time or some other measure.
- property cardinality: Cardinality#
Get the actual cardinality of the operator. Can be NaN if no measurement is available.
- property execution_time: float#
Get the actual execution time of the operator. Can be NaN if no measurement is available.
- property cache_hits: int | None#
Get the number of page reads that were satisfied by the shared buffer.
If no measurement is available, None is returned.
- property cache_misses: int | None#
Get the number of page reads that had to be delegated to the disk and could not be satisfied by the shared buffer.
If no measurement is available, None is returned.
- property additional_measures: dict[str, Any]#
Get all additional estimates that are not part of the well-defined attributes.
- get(key: str, default: Any = None) Any#
Retrieves the value of a specific key from the measures.
This is similar to the dict.get method. An important distinction is that we never raise an error if there is no parameter with the given key. Instead, we return the default value, which is None by default.
- Parameters:
key (str) – The parameter name
default (Any, optional) – The default value to return if the parameter is not found. Defaults to None.
- Returns:
The parameter value if it exists, otherwise the default value.
- Return type:
Any
- items() Iterable[tuple[str, Any]]#
Provides all measures as key-value pairs, similar to the dict.items method.
- Return type:
Iterable[tuple[str, Any]]
- clone(*, deep: bool = False) PlanMeasures#
Creates a copy of the current plan measures.
- Parameters:
deep (bool, optional) – Whether to create a deep copy of all measures. Defaults to False.
- Returns:
The copied measures.
- Return type:
- class postbound.Subplan(root: QueryPlan, target_name: str = '')#
Subplans are used to model subqueries whose results are used while processing another operator in the main query.
A typical example are correlated/dependent subqueries that are used in some predicate and need to be evaluated for each tuple of the outer relation (unless some algebraic optimization has been applied beforehand).
- Parameters:
root (QueryPlan)
target_name (str)
- target_name#
The name of the target table that the subplan should produce
- Type:
str
- tables() set[TableReference]#
Provide all tables that are referenced at some point in the subplan.
- Returns:
The tables. This set includes the target table that the subplan produces as a virtual table.
- Return type:
set[TableReference]
- class postbound.SortKey(columns: Iterable[SqlExpression], *, ascending: bool = True)#
Sort keys describe how the tuples in a relation are sorted.
Each sort key contains a set of columns that describe the equivalence class of the sort key, i.e. the column values in each row are all equal to one another. Therefore, the relation can be treated as being sorted by any of them.
Most commonly, relations will only be sorted by a single column (which can be checked by calling len() on the sort key, or by checking the equivalence_class directly). In this case, the column property can be used to retrieve the corresponding expression that forms the column.
To check, whether two sort keys are equivalent, the is_compatible_with method can be used. For more idiomatic access,
column in sort_keyis also supported.To create a new equivalence class, the for_equivalence_class(columns) method is available to create a new sort key from scratch. To combine two existing sort keys, merge_with can be used.
- Parameters:
columns (Iterable[SqlExpression]) – The column(s) that is used to sort the tuples. This will usually contain plain column references (ColumnExpression), but can also use more complex expressions.
ascending (bool) – Whether the sorting is ascending or descending. Defaults to ascending.
- static of(column: SqlExpression | ColumnReference, *, ascending: bool = True) SortKey#
Creates a new sort key for a single column.
- Parameters:
column (SqlExpression | ColumnReference) – The column that is used to sort the tuples. Can be a plain column reference, which will be wrapped by a ColumnExpression automatically.
ascending (bool, optional) – Whether the sorting is ascending or descending. Defaults to ascending.
- Returns:
The sort key with an equivalence class for the single column.
- Return type:
- static for_equivalence_class(members: Iterable[SqlExpression | ColumnReference], *, ascending: bool = True) SortKey#
Creates a new sort key for an equivalence class of columns.
This is just a more expressive alias for calling the constructor directly. This method assumes that the values for all columns in the equivalence class are equal to one another. The client is responsible for ensuring and checking that this is actually the case.
- Parameters:
members (Iterable[SqlExpression | ColumnReference]) – The columns that describe the sorting of the relation. This can contain just a single item, in which case the method is pretty much the same as of. Any passed ColumnReference will be wrapped in a ColumnExpression.
ascending (bool, optional) – Whether the sorting is ascending or descending. Defaults to ascending.
- Returns:
The sort key with an equivalence class for the columns.
- Return type:
- property column: SqlExpression#
For single-column sort keys, get this column.
- property equivalence_class: frozenset[SqlExpression]#
Get all columns that are part of the equivalence class. This will be 1 or more columns.
- property ascending: bool#
Get the sort direction of this key.
- is_compatible_with(other: SortKey | ColumnReference) bool#
Checks, whether two keys are sorted the same way.
For single column references, this essentially checks whether the column is part of the key’s equivalence class.
- Parameters:
other (SortKey | ColumnReference)
- Return type:
bool