Random Optimizers#
Provides “optimization” strategies that generate random query plans.
- class postbound.opt.randomized.RandomJoinOrderGenerator(eliminate_duplicates: bool = False, *, tree_structure: Literal['bushy', 'right-deep', 'left-deep'] = 'bushy')#
Utility service to produce randomized join orders for an input query.
The service produces a generator that in turn provides the join orders. This is done in the random_join_orders_for method. The provided join orders can include linear, as well as bushy, join orders. The structure can be customized during service creation.
- Parameters:
eliminate_duplicates (bool, optional) – Whether repeated calls to the generator should be guaranteed to provide different join orders. Defaults to
False, which permits duplicates.tree_structure (Literal[bushy, left-deep, right-deep], optional) – The kind of join orders that are generated by the service. “bushy” allows join orders with arbitrary branches to be generated (including linear join orders). “right-deep” and “left-deep” restrict the join orders to the respective linear trees. Defaults to “bushy”.
Warning
For now, the underlying algorithm is limited to queries without cross-products.
- random_join_orders_for(query: SqlQuery, *, base_table: TableReference | None = None) Generator[JoinTree, None, None]#
Provides a generator that successively provides join orders at random.
- Parameters:
query (SqlQuery) – The query for which the join orders should be generated
base_table (Optional[TableReference], optional) – An optional table that should always be joined first. If unspecified, base tables are selected at random.
- Yields:
Generator[JoinTree, None, None] – A generator that produces random join orders for the input query. The structure of these join orders depends on the service configuration. Consult the class-level documentation for more details. Depeding on the eliminate_duplicates attribute, the join orders are guaranteed to be unique.
- Raises:
ValueError – If the query contains cross products.
- Return type:
Generator[JoinTree, None, None]
Warning
For now, the underlying algorithm is limited to queries without cross-products.
- class postbound.opt.randomized.RandomJoinOrderOptimizer(*args, **kwargs)#
Optimization stage that produces a randomized join order.
This class acts as a wrapper around a RandomJoinOrderGenerator for the join optimization interface. The setup of the generator can be customized during creation of the optimizer. Consult the documentation of the generator for details.
- Parameters:
generator_args (Optional[dict], optional) – Arguments to customize the generator operation. All parameters are forwarded to its
__init__method.
See also
- optimize_join_order(query: SqlQuery) JoinTree | None#
Performs the actual join ordering process.
The join tree can be further annotated with an initial operator assignment, if that is an inherent part of the specific optimization strategy. However, this is generally discouraged and the multi-stage pipeline will discard such operators to prepare for the subsequent physical operator selection.
Other than the join order and operator assignment, the algorithm should add as much information to the join tree as possible, e.g. including join conditions and cardinality estimates that were calculated for the selected joins. This enables other parts of the optimization process to re-use that information.
- Parameters:
query (SqlQuery) – The query to optimize
- Returns:
The join order. If for some reason there is no valid join order for the given query (e.g. queries with just a single selected table), None can be returned. Otherwise, the selected join order has to be described using a JoinTree.
- Return type:
Optional[LogicalJoinTree]
- describe() dict#
Provides a JSON-serializable representation of the specific strategy, as well as important parameters.
- Returns:
The description
- Return type:
jsondict
See also
OptimizationPipeline.describe
- pre_check() OptimizationPreCheck#
Provides requirements that input query or database system have to satisfy for the optimizer to work properly.
- Returns:
The check instance. Can be an empty check if no specific requirements exist.
- Return type:
- class postbound.opt.randomized.RandomOperatorGenerator(scan_operators: Iterable[ScanOperator] | None = None, join_operators: Iterable[JoinOperator] | None = None, *, include_scans: bool = True, include_joins: bool = True, eliminate_duplicates: bool = False, database: Database | None = None)#
Utility service to generate random assignments of physical operators for a join order.
The service produces a generator that in turn provides the operator assignments. This is done in the random_operator_assignments_for method. The precise properties of the generated assignments depends on the configuration of this service. It can be set up to only use a subset of the available operators or to exclude operators for scans or joins completely. By default, the service uses all operators that are supported by the target database system.
- Parameters:
scan_operators (Optional[Iterable[ScanOperators]], optional) – The scan operators that can be used in the query plans. If this is
Noneor empty, all scans supported by the database are used. Likewise, if the iterable contains an operator that is not supported by the database, it is exlcuded from generation.join_operators (Optional[Iterable[JoinOperators]], optional) – The join operators that can be used in the query plans. If this is
Noneor empty, all joins supported by the database are used. Likewise, if the iterable contains an operator that is not supported by the database, it is exlcuded from generation.include_scans (bool, optional) – Whether the assignment should contain scan operators at all. By default, this is enabled. However, if scans are disabled, this overwrites any supplied operators in the scan_operators parameter.
include_joins (bool, optional) – Whether the assignment should contain join operators at all. By default, this is enabled. However, if joins are disabled, this overwrites any supplied operators in the join_operators parameter.
eliminate_duplicates (bool, optional) – Whether repeated calls to the generator should be guaranteed to provide different operator assignments. Defaults to
False, which permits duplicates.database (Optional[db.Database], optional) – The database that should execute the queries in the end. The database connection is necessary to determine the operators that are actually supported by the system. If this parameter is omitted, it is inferred from the DatabasePool.
- Raises:
ValueError – If both scans and joins are disabled
- random_operator_assignments_for(query: SqlQuery, join_order: JoinTree) Generator[PhysicalOperatorAssignment, None, None]#
Produces a generator for random operator assignments of the allowed operators.
The precise structure of the operator assignments depends on the service configuration. Take a look at the class documentation for details.
- Parameters:
query (SqlQuery) – The query to “optimize”
join_order (jointree.JoinTree) – The join sequence to use. This contains all required tables to be scanned and joins to be performed.
- Yields:
Generator[PhysicalOperatorAssignment, None, None] – A generator producing random operator assignments. The assignments will not contain any cost estimates, nor will they specify join directions or parallization data.
- Return type:
Generator[PhysicalOperatorAssignment, None, None]
- necessary_hints() frozenset[ScanOperator | JoinOperator | IntermediateOperator]#
Provides all hints that a database system must support in order for the generator to work properly.
- Returns:
The required operator hints
- Return type:
frozenset[PhysicalOperator]
- class postbound.opt.randomized.RandomOperatorOptimizer(*args, **kwargs)#
Optimization stage that produces a randomized operator assignment.
This class acts as a wrapper around a RandomOperatorGenerator for the operator optimization interface. The setup of the generator can be customized during creation of the optimizer. Consult the documentation of the generator for details.
- Parameters:
generator_args (Optional[dict], optional) – Arguments to customize the generator operation. All parameters are forwarded to its
__init__method.
See also
- select_physical_operators(query: SqlQuery, join_order: JoinTree | None) PhysicalOperatorAssignment#
Performs the operator assignment.
- Parameters:
- Returns:
The operator assignment. If for some reason no operators can be assigned, an empty assignment can be returned
- Return type:
Notes
The operator selection should handle a None join order gracefully. This can happen if the query does not require any joins (e.g. processing of a single table.
Depending on the specific optimization settings, it is also possible to raise an error if such a situation occurs and there is no reasonable way to deal with it.
- describe() dict#
Provides a JSON-serializable representation of the specific strategy, as well as important parameters.
- Returns:
The description
- Return type:
jsondict
See also
OptimizationPipeline.describe
- pre_check() OptimizationPreCheck#
Provides requirements that input query or database system have to satisfy for the optimizer to work properly.
- Returns:
The check instance. Can be an empty check if no specific requirements exist.
- Return type:
- class postbound.opt.randomized.RandomPlanGenerator(*, eliminate_duplicates: bool = False, join_order_args: dict | None = None, operator_args: dict | None = None, database: Database | None = None)#
Utility service to provide random exection plans for a query.
This service combines the RandomJoinOrderGenerator and RandomOperatorGenerator into a single high-level service. Therefore, it underlies the same restrictions as these two services. The produced generator can be accessed via the random_plans_for method.
- Parameters:
eliminate_duplicates (bool, optional) – Whether repeated calls to the generator should be guaranteed to provide different plans. Defaults to
False, which permits duplicates. This setting can be overwritten on a per-generator basis by specifying it in the dedicated generator arguments.join_order_args (Optional[dict], optional) – Configuration for the RandomJoinOrderGenerator. This is forwarded to the service’s
__init__method.operator_args (Optional[dict], optional) – Configuration for the RandomOperatorGenerator. This is forwarded to the service’s
__init__method.database (Optional[db.Database], optional) – The database for the operator selection. This parameter can also be specified in the operator generator arguments, or even left completely unspecified.
- class postbound.opt.randomized.RandomPlanOptimizer(*args, **kwargs)#
Optimization stage that produces a random query plan.
This class acts as a wrapper around a RandomPlanGenerator and passes all its arguments to that service.
- Parameters:
join_order_args (Optional[dict], optional) – Configuration for the RandomJoinOrderGenerator. This is forwarded to the service’s
__init__method.operator_args (Optional[dict], optional) – Configuration for the RandomOperatorGenerator. This is forwarded to the service’s
__init__method.database (Optional[db.Database], optional) – The database for the operator selection. This parameter can also be specified in the operator generator arguments, or even left completely unspecified.
See also
Notes
It is not necessary to request duplicate elimination for any of the generators, since the underlying Python generator objects cannot be re-used between multiple optimization passes for the same input query. Therefore, it is not possible to enforce duplicate elimination for a join order or operator assignment.
Because multiple calls to optimizer with the same input query should not influence each other, the optimizer also does not provide its own duplicate elimination.
- optimize_query(query: SqlQuery) QueryPlan#
Constructs the optimized execution plan for an input query.
- describe() dict#
Provides a JSON-serializable representation of the specific strategy, as well as important parameters.
- Returns:
The description
- Return type:
jsondict
See also
OptimizationPipeline.describe
- pre_check() OptimizationPreCheck#
Provides requirements that input query or database system have to satisfy for the optimizer to work properly.
- Returns:
The check instance. Can be an empty check if no specific requirements exist.
- Return type: