Random Optimizers#

Provides “optimization” strategies that generate random query plans.

class postbound.opt.randomized.RandomJoinOrderGenerator(eliminate_duplicates: bool = False, *, tree_structure: Literal['bushy', 'right-deep', 'left-deep'] = 'bushy')#

Utility service to produce randomized join orders for an input query.

The service produces a generator that in turn provides the join orders. This is done in the random_join_orders_for method. The provided join orders can include linear, as well as bushy, join orders. The structure can be customized during service creation.

Parameters:
  • eliminate_duplicates (bool, optional) – Whether repeated calls to the generator should be guaranteed to provide different join orders. Defaults to False, which permits duplicates.

  • tree_structure (Literal[bushy, left-deep, right-deep], optional) – The kind of join orders that are generated by the service. “bushy” allows join orders with arbitrary branches to be generated (including linear join orders). “right-deep” and “left-deep” restrict the join orders to the respective linear trees. Defaults to “bushy”.

Warning

For now, the underlying algorithm is limited to queries without cross-products.

random_join_orders_for(query: SqlQuery, *, base_table: TableReference | None = None) Generator[JoinTree, None, None]#

Provides a generator that successively provides join orders at random.

Parameters:
  • query (SqlQuery) – The query for which the join orders should be generated

  • base_table (Optional[TableReference], optional) – An optional table that should always be joined first. If unspecified, base tables are selected at random.

Yields:

Generator[JoinTree, None, None] – A generator that produces random join orders for the input query. The structure of these join orders depends on the service configuration. Consult the class-level documentation for more details. Depeding on the eliminate_duplicates attribute, the join orders are guaranteed to be unique.

Raises:

ValueError – If the query contains cross products.

Return type:

Generator[JoinTree, None, None]

Warning

For now, the underlying algorithm is limited to queries without cross-products.

class postbound.opt.randomized.RandomJoinOrderOptimizer(*args, **kwargs)#

Optimization stage that produces a randomized join order.

This class acts as a wrapper around a RandomJoinOrderGenerator for the join optimization interface. The setup of the generator can be customized during creation of the optimizer. Consult the documentation of the generator for details.

Parameters:

generator_args (Optional[dict], optional) – Arguments to customize the generator operation. All parameters are forwarded to its __init__ method.

optimize_join_order(query: SqlQuery) JoinTree | None#

Performs the actual join ordering process.

The join tree can be further annotated with an initial operator assignment, if that is an inherent part of the specific optimization strategy. However, this is generally discouraged and the multi-stage pipeline will discard such operators to prepare for the subsequent physical operator selection.

Other than the join order and operator assignment, the algorithm should add as much information to the join tree as possible, e.g. including join conditions and cardinality estimates that were calculated for the selected joins. This enables other parts of the optimization process to re-use that information.

Parameters:

query (SqlQuery) – The query to optimize

Returns:

The join order. If for some reason there is no valid join order for the given query (e.g. queries with just a single selected table), None can be returned. Otherwise, the selected join order has to be described using a JoinTree.

Return type:

Optional[LogicalJoinTree]

describe() dict#

Provides a JSON-serializable representation of the specific strategy, as well as important parameters.

Returns:

The description

Return type:

jsondict

See also

OptimizationPipeline.describe

pre_check() OptimizationPreCheck#

Provides requirements that input query or database system have to satisfy for the optimizer to work properly.

Returns:

The check instance. Can be an empty check if no specific requirements exist.

Return type:

OptimizationPreCheck

class postbound.opt.randomized.RandomOperatorGenerator(scan_operators: Iterable[ScanOperator] | None = None, join_operators: Iterable[JoinOperator] | None = None, *, include_scans: bool = True, include_joins: bool = True, eliminate_duplicates: bool = False, database: Database | None = None)#

Utility service to generate random assignments of physical operators for a join order.

The service produces a generator that in turn provides the operator assignments. This is done in the random_operator_assignments_for method. The precise properties of the generated assignments depends on the configuration of this service. It can be set up to only use a subset of the available operators or to exclude operators for scans or joins completely. By default, the service uses all operators that are supported by the target database system.

Parameters:
  • scan_operators (Optional[Iterable[ScanOperators]], optional) – The scan operators that can be used in the query plans. If this is None or empty, all scans supported by the database are used. Likewise, if the iterable contains an operator that is not supported by the database, it is exlcuded from generation.

  • join_operators (Optional[Iterable[JoinOperators]], optional) – The join operators that can be used in the query plans. If this is None or empty, all joins supported by the database are used. Likewise, if the iterable contains an operator that is not supported by the database, it is exlcuded from generation.

  • include_scans (bool, optional) – Whether the assignment should contain scan operators at all. By default, this is enabled. However, if scans are disabled, this overwrites any supplied operators in the scan_operators parameter.

  • include_joins (bool, optional) – Whether the assignment should contain join operators at all. By default, this is enabled. However, if joins are disabled, this overwrites any supplied operators in the join_operators parameter.

  • eliminate_duplicates (bool, optional) – Whether repeated calls to the generator should be guaranteed to provide different operator assignments. Defaults to False, which permits duplicates.

  • database (Optional[db.Database], optional) – The database that should execute the queries in the end. The database connection is necessary to determine the operators that are actually supported by the system. If this parameter is omitted, it is inferred from the DatabasePool.

Raises:

ValueError – If both scans and joins are disabled

random_operator_assignments_for(query: SqlQuery, join_order: JoinTree) Generator[PhysicalOperatorAssignment, None, None]#

Produces a generator for random operator assignments of the allowed operators.

The precise structure of the operator assignments depends on the service configuration. Take a look at the class documentation for details.

Parameters:
  • query (SqlQuery) – The query to “optimize”

  • join_order (jointree.JoinTree) – The join sequence to use. This contains all required tables to be scanned and joins to be performed.

Yields:

Generator[PhysicalOperatorAssignment, None, None] – A generator producing random operator assignments. The assignments will not contain any cost estimates, nor will they specify join directions or parallization data.

Return type:

Generator[PhysicalOperatorAssignment, None, None]

necessary_hints() frozenset[ScanOperator | JoinOperator | IntermediateOperator]#

Provides all hints that a database system must support in order for the generator to work properly.

Returns:

The required operator hints

Return type:

frozenset[PhysicalOperator]

class postbound.opt.randomized.RandomOperatorOptimizer(*args, **kwargs)#

Optimization stage that produces a randomized operator assignment.

This class acts as a wrapper around a RandomOperatorGenerator for the operator optimization interface. The setup of the generator can be customized during creation of the optimizer. Consult the documentation of the generator for details.

Parameters:

generator_args (Optional[dict], optional) – Arguments to customize the generator operation. All parameters are forwarded to its __init__ method.

select_physical_operators(query: SqlQuery, join_order: JoinTree | None) PhysicalOperatorAssignment#

Performs the operator assignment.

Parameters:
  • query (SqlQuery) – The query to optimize

  • join_order (Optional[JoinTree]) – The selected join order of the query

Returns:

The operator assignment. If for some reason no operators can be assigned, an empty assignment can be returned

Return type:

PhysicalOperatorAssignment

Notes

The operator selection should handle a None join order gracefully. This can happen if the query does not require any joins (e.g. processing of a single table.

Depending on the specific optimization settings, it is also possible to raise an error if such a situation occurs and there is no reasonable way to deal with it.

describe() dict#

Provides a JSON-serializable representation of the specific strategy, as well as important parameters.

Returns:

The description

Return type:

jsondict

See also

OptimizationPipeline.describe

pre_check() OptimizationPreCheck#

Provides requirements that input query or database system have to satisfy for the optimizer to work properly.

Returns:

The check instance. Can be an empty check if no specific requirements exist.

Return type:

OptimizationPreCheck

class postbound.opt.randomized.RandomPlanGenerator(*, eliminate_duplicates: bool = False, join_order_args: dict | None = None, operator_args: dict | None = None, database: Database | None = None)#

Utility service to provide random exection plans for a query.

This service combines the RandomJoinOrderGenerator and RandomOperatorGenerator into a single high-level service. Therefore, it underlies the same restrictions as these two services. The produced generator can be accessed via the random_plans_for method.

Parameters:
  • eliminate_duplicates (bool, optional) – Whether repeated calls to the generator should be guaranteed to provide different plans. Defaults to False, which permits duplicates. This setting can be overwritten on a per-generator basis by specifying it in the dedicated generator arguments.

  • join_order_args (Optional[dict], optional) – Configuration for the RandomJoinOrderGenerator. This is forwarded to the service’s __init__ method.

  • operator_args (Optional[dict], optional) – Configuration for the RandomOperatorGenerator. This is forwarded to the service’s __init__ method.

  • database (Optional[db.Database], optional) – The database for the operator selection. This parameter can also be specified in the operator generator arguments, or even left completely unspecified.

random_plans_for(query: SqlQuery) Generator[QueryPlan, None, None]#

Produces a generator for random query plans of an input query.

The structure of the provided plans can be restricted by configuring the underlying services. Consult the class-level documentation for details.

Parameters:

query (SqlQuery) – The query to “optimize”

Yields:

Generator[QueryPlan, None, None] – A generator producing random query plans

Return type:

Generator[QueryPlan, None, None]

class postbound.opt.randomized.RandomPlanOptimizer(*args, **kwargs)#

Optimization stage that produces a random query plan.

This class acts as a wrapper around a RandomPlanGenerator and passes all its arguments to that service.

Parameters:
  • join_order_args (Optional[dict], optional) – Configuration for the RandomJoinOrderGenerator. This is forwarded to the service’s __init__ method.

  • operator_args (Optional[dict], optional) – Configuration for the RandomOperatorGenerator. This is forwarded to the service’s __init__ method.

  • database (Optional[db.Database], optional) – The database for the operator selection. This parameter can also be specified in the operator generator arguments, or even left completely unspecified.

Notes

It is not necessary to request duplicate elimination for any of the generators, since the underlying Python generator objects cannot be re-used between multiple optimization passes for the same input query. Therefore, it is not possible to enforce duplicate elimination for a join order or operator assignment.

Because multiple calls to optimizer with the same input query should not influence each other, the optimizer also does not provide its own duplicate elimination.

optimize_query(query: SqlQuery) QueryPlan#

Constructs the optimized execution plan for an input query.

Parameters:

query (SqlQuery) – The query to optimize

Returns:

The optimized query plan

Return type:

QueryPlan

describe() dict#

Provides a JSON-serializable representation of the specific strategy, as well as important parameters.

Returns:

The description

Return type:

jsondict

See also

OptimizationPipeline.describe

pre_check() OptimizationPreCheck#

Provides requirements that input query or database system have to satisfy for the optimizer to work properly.

Returns:

The check instance. Can be an empty check if no specific requirements exist.

Return type:

OptimizationPreCheck