Query Manipulation#

Module name: postbound.transform

This module provides tools to modify the contents of existing SqlQuery instances.

Since queries are designed as immutable data objects, these transformations operate by implementing new query instances.

The tools differ in their granularity, ranging from utilities that swap out individual expressions and predicates, to tools that change the entire structure of the query.

Some important transformations include:

  • flatten_and_predicate: Simplifies the predicate structure by moving all nested AND predicates to their parent AND

  • extract_query_fragment: Extracts parts of an original query based on a subset of its tables (i.e. induced join graph and filter predicates)

  • add_ec_predicates: Expands a querie’s WHERE clause to include all join predicates that are implied by other (equi-) joins

  • as_count_star_query and as_explain_analyze change the query to be executed as COUNT(*) or EXPLAIN ANALYZE respectively

In addition to these frequently-used transformations, there are also lots of utilities that add, remove, or modify specific parts of queries, such as individual clauses or expressions.

class postbound.transform.QueryType#

The concrete class of a query.

This generic type is used for transformations that do not change the type of a query and operate on all the different query types.

alias of TypeVar(‘QueryType’, bound=SqlQuery)

class postbound.transform.SelectQueryType#

The concrete class of a select query.

This generic type is used for transformations that do not change the type of a query and operate on all the different select query types.

alias of TypeVar(‘SelectQueryType’, bound=SqlQuery | SetQuery)

class postbound.transform.ClauseType#

The concrete class of a clause.

This generic type is used for transformations that do not change the type of a clause and operate on all the different clause types.

alias of TypeVar(‘ClauseType’, bound=BaseClause)

class postbound.transform.PredicateType#

The concrete type of a predicate.

This generic type is used for transformations that do not change the type of a predicate and operate on all the different predicate types.

alias of TypeVar(‘PredicateType’, bound=AbstractPredicate)

postbound.transform.flatten_and_predicate(predicate: AbstractPredicate) AbstractPredicate#

Simplifies the predicate structure by moving all nested AND predicates to their parent AND predicate.

For example, consider the predicate (R.a = S.b AND R.a = 42) AND S.b = 24. This is transformed into the flattened equivalent conjunction R.a = S.b AND R.a = 42 AND S.b = 24.

This procedure continues in a recursive manner, until the first disjunction or negation is encountered. All predicates below that point are left as-is for the current branch of the predicate hierarchy.

Parameters:

predicate (AbstractPredicate) – The predicate to simplified

Returns:

An equivalent version of the given predicate, with all conjunctions unnested

Return type:

AbstractPredicate

postbound.transform.explicit_to_implicit(source_query: ExplicitSqlQuery) ImplicitSqlQuery#

Transforms a query with an explicit FROM clause to a query with an implicit FROM clause.

Currently, this process is only supported for explicit queries that do not contain subqueries in their FROM clause.

Parameters:

source_query (ExplicitSqlQuery) – The query that should be transformed

Returns:

An equivalent version of the given query, using an implicit FROM clause

Return type:

ImplicitSqlQuery

Raises:

ValueError – If the source_query contains subquery table sources

postbound.transform.extract_query_fragment(source_query: ImplicitSqlQuery, referenced_tables: TableReference | Iterable[TableReference], *, projection: Literal['keep', 'star', '*', 'count_star'] = 'keep') ImplicitSqlQuery | None#
postbound.transform.extract_query_fragment(source_query: SetQuery, referenced_tables: TableReference | Iterable[TableReference], *, projection: Literal['keep', 'star', '*', 'count_star'] = 'keep') SetQuery | None
postbound.transform.extract_query_fragment(source_query: SqlQuery, referenced_tables: TableReference | Iterable[TableReference], *, projection: Literal['keep', 'star', '*', 'count_star'] = 'keep') SqlQuery | None

Filters a query to only include parts that reference specific tables.

This builds a new query from the given query that contains exactly those parts of the original query’s clauses that reference only the given tables or a subset of them.

For example, consider the query SELECT * FROM R, S, T WHERE R.a = S.b AND S.c = T.d AND R.a = 42 ORDER BY S.b the query fragment for tables R and S would look like this: SELECT * FROM R, S WHERE R.a = S.b AND R.a = 42 ORDER BY S.b, whereas the query fragment for table S would look like SELECT * FROM S ORDER BY S.b.

Notice that this can break disjunctions: the fragment for table R of query SELECT * FROM R, S, WHERE R.a < 100 AND (R.a = 42 OR S.b = 42) is SELECT * FROM R WHERE R.a < 100 AND R.a = 42. This also indicates that the fragment extraction does not perform any logical pruning of superflous predicates.

Parameters:
  • source_query (ImplicitSqlQuery | SetQuery) – The query that should be transformed. Note that the current implementation only supports implicit queries and set queries.

  • referenced_tables (TableReference | Iterable[TableReference]) – The tables that should be extracted

  • projection (Literal["keep", "star", "*", "count_star"], optional) – How the projection of the resulting query should look like. Defaults to keep, which retains the original projection/SELECT clause. Alternatively, the projection can be set to star or (literal) * to use a SELECT * projection, or to count_star to use a SELECT COUNT(*) projection.

Returns:

A query that only consists of those parts of the source_query, that reference (a subset of) the referenced_tables. If there is no such subset, None is returned.

Return type:

Optional[ImplicitSqlQuery | SetQuery]

Warning

The current implementation only works for SetQuery and ImplicitSqlQuery instances. If the source_query is not of one of these types (or contains subqueries that are not of these types), a ValueError is raised.

See also

extract_subquery

A wrapper with sane defaults. Use this whenever possible.

postbound.transform.extract_subquery(query: SqlQuery, intermediate: TableReference | Iterable[TableReference]) SqlQuery#

Extracts a subquery from a given query based on a subset of its tables.

The subquery consists of exactly those predicates (join and filter) of the original query that reference the given tables. This is mostly a convenience wrapper around extract_query_fragment with the following rules:

  • if the subquery extraction fails, we raise an error

  • the subquery will always be a SELECT * query

Parameters:
Return type:

SqlQuery

postbound.transform.expand_to_query(predicate: AbstractPredicate) ImplicitSqlQuery#

Provides a SELECT * query that computes the result set of a specific predicate.

Parameters:

predicate (AbstractPredicate) – The predicate to expand

Returns:

An SQL query of the form SELECT * FROM <predicate tables> WHERE <predicate>.

Return type:

ImplicitSqlQuery

postbound.transform.move_into_subquery(query: SqlQuery, tables: Iterable[TableReference], subquery_name: str = '') SqlQuery#

Transforms a specific query by moving some of its tables into a subquery.

This transformation renames all usages of columns that are now produced by the subquery to references to the virtual subquery table instead.

Notice that this transformation currently only really works for “good natured” queries, i.e. mostly implicit SPJ+ queries. Notably, the transformation is only supported for queries that do not already contain subqueries, since moving tables between subqueries is a quite tricky process. Likewise, the renaming is only applied at a table-level. If the tables export columns of the same name, these are not renamed and the transformation fails as well. If in doubt, you should definitely check the output of this method for your complicated queries to prevent bad surprises!

Parameters:
  • query (SqlQuery) – The query to transform

  • tables (Iterable[TableReference]) – The tables that should be placed into a subquery

  • subquery_name (str, optional) – The target name of the virtual subquery table. If empty, a default name (consisting of all the subquery tables) is generated

Returns:

The transformed query

Return type:

SqlQuery

Raises:
  • ValueError – If the query does not contain a FROM clause.

  • ValueError – If the query contains virtual tables

  • ValueError – If tables contains less than 2 entries. In this case, using a subquery is completely pointless.

  • ValueError – If the tables that should become part of the subquery both provide columns of the same name, and these columns are used in the rest of the query. This level of renaming is currently not accounted for.

postbound.transform.add_ec_predicates(query: ImplicitSqlQuery) ImplicitSqlQuery#

Expands the join predicates of a query to include all predicates that are implied by the join equivalence classes.

Parameters:

query (ImplicitSqlQuery) – The query to analyze

Returns:

An equivalent query that explicitly contains all predicates from join equivalence classes.

Return type:

ImplicitSqlQuery

See also

determine_join_equivalence_classes, generate_predicates_for_equivalence_classes

postbound.transform.as_star_query(source_query: QueryType) QueryType#

Transforms a specific query to use a SELECT * projection instead.

Notice that this can break certain queries where a renamed column from the SELECT clause is used in other parts of the query, such as ORDER BY clauses (e.g. SELECT SUM(foo) AS f FROM bar ORDER BY f). We currently do not undo such a renaming.

Parameters:

source_query (QueryType) – The query to transform

Returns:

A variant of the input query that uses a SELECT * projection.

Return type:

QueryType

postbound.transform.as_count_star_query(source_query: QueryType) QueryType#

Transforms a specific query to use a SELECT COUNT(*) projection instead.

Notice that this can break certain queries where a renamed column from the SELECT clause is used in other parts of the query, such as ORDER BY clauses (e.g. SELECT SUM(foo) AS f FROM bar ORDER BY f). We currently do not undo such a renaming.

Parameters:

source_query (QueryType) – The query to transform

Returns:

A variant of the input query that uses a SELECT COUNT(*) projection.

Return type:

QueryType

postbound.transform.drop_hints(query: SelectQueryType, preparatory_statements_only: bool = False) SelectQueryType#

Removes the hint clause from a specific query.

Parameters:
  • query (SelectQueryType) – The query to transform

  • preparatory_statements_only (bool, optional) – Whether only the preparatory statements from the hint block should be removed. This would retain the actual hints. Defaults to False, which removes the entire block, no matter its contents.

Returns:

The query without the hint block

Return type:

SelectQueryType

postbound.transform.as_explain(query: ~postbound.transform.SelectQueryType, explain: ~postbound.qal._qal.Explain = EXPLAIN (FORMAT JSON)) SelectQueryType#

Transforms a specific query into an EXPLAIN query.

Parameters:
  • query (SelectQueryType) – The query to transform

  • explain (Explain, optional) – The EXPLAIN block to use. Defaults to a standard Explain.plan() block.

Returns:

The transformed query

Return type:

SelectQueryType

postbound.transform.as_explain_analyze(query: SelectQueryType) SelectQueryType#

Transforms a specific query into an EXPLAIN ANALYZE query.

Parameters:

query (SelectQueryType) – The query to transform

Returns:

The transformed query. It uses an EXPLAIN ANALYZE block with the default output format. If this is not desired, the as_explain transformation has to be used and the target EXPLAIN block has to be given explicitly.

Return type:

SelectQueryType

postbound.transform.remove_predicate(predicate: AbstractPredicate | None, predicate_to_remove: AbstractPredicate) AbstractPredicate | None#

Drops a specific predicate from the predicate hierarchy.

If necessary, the hierarchy will be simplified. For example, if the predicate_to_remove is one of two childs of a conjunction, the removal would leave a conjunction of just a single predicate. In this case, the conjunction can be dropped altogether, leaving just the other child predicate. The same also applies to disjunctions and negations.

Parameters:
  • predicate (Optional[AbstractPredicate]) – The predicate hierarchy from which should removed. If this is None, no removal is attempted.

  • predicate_to_remove (AbstractPredicate) – The predicate that should be removed.

Returns:

The resulting (simplified) predicate hierarchy. Will be None if there are no meaningful predicates left after removal, or if the predicate equals the predicate_to_remove.

Return type:

Optional[AbstractPredicate]

postbound.transform.add_clause(query: SelectQueryType, clauses_to_add: BaseClause | Iterable[BaseClause]) SelectQueryType#

Creates a new SQL query, potentailly with additional clauses.

No validation is performed. Conflicts are resolved according to the rules of build_query. This means that the query can potentially be switched from an implicit query to an explicit one and vice-versa.

Parameters:
  • query (SqlQuery) – The query to which the clause(s) should be added

  • clauses_to_add (BaseClause | Iterable[BaseClause]) – The new clauses

Returns:

A new clauses consisting of the old query’s clauses and the clauses_to_add. Duplicate clauses are overwritten by the clauses_to_add.

Return type:

SqlQuery

postbound.transform.ClauseDescription#

Denotes different ways clauses to remove can be denoted.

See also

drop_clause

alias of Type | BaseClause | Iterable[Type | BaseClause]

postbound.transform.drop_clause(query: T, clauses_to_drop: Type | BaseClause | Iterable[Type | BaseClause]) T#

Removes specific clauses from a query.

The clauses can be denoted in two different ways: either as the raw type of the clause, or as an instance of the same clause type as the one that should be removed. Notice that the instance of the clause does not need to be equal to the clause of the query. It just needs to be the same type of clause.

This method does not perform any validation, other than the rules described in build_query.

Parameters:
  • query (SelectQueryType) – The query to remove clauses from

  • clauses_to_drop (ClauseDescription) – The clause(s) to remove. This can be a single clause type or clause instance, or an iterable of clauses types, intermixed with clause instances. In either way clauses of the desired types are dropped from the query.

Returns:

A query without the specified clauses

Return type:

SelectQueryType

Examples

The following two calls achieve exactly the same thing: getting rid of the LIMIT clause.

drop_clause(query, Limit)
drop_clause(query, query.limit_clause)
postbound.transform.replace_clause(query: SelectQueryType, replacements: BaseClause | Iterable[BaseClause]) SelectQueryType#

Creates a new SQL query with the replacements being used instead of the original clauses.

Clauses are matched on a per-type basis (including subclasses, i.e. a replacement can be a subclass of an existing clause). Therefore, this function does not switch a query from implicit to explicit or vice-versa. Use a combination of drop_clause and add_clause for that. If a replacement is not present in the original query, it is simply ignored.

No validation other than the rules of build_query is performed.

Parameters:
Returns:

An updated query where the matching replacements clauses are used in place of the clause instances that were originally present in the query

Return type:

SelectQueryType

postbound.transform.replace_expressions(query: SelectQueryType, replacement: Callable[[SqlExpression], SqlExpression]) SelectQueryType#

Updates all expressions in a query.

The replacement handler can either produce entirely new expressions, or simply return the current expression instance if no update should be performed. Be very careful with this method since no sanity checks are performed, other than the rules of build_query.

Parameters:
  • query (SelectQueryType) – The query to update

  • replacement (Callable[[SqlExpression], SqlExpression]) – A function mapping each of the current expressions in the query to potentially updated expressions.

Returns:

The updated query

Return type:

SelectQueryType

postbound.transform.replace_predicate(query: ImplicitSqlQuery, predicate_to_replace: AbstractPredicate, new_predicate: AbstractPredicate) ImplicitSqlQuery#

Rewrites a specific query to use a new predicate in place of an old one.

In the current implementation this does only work for top-level predicates, i.e. subqueries and CTEs are not considered. Furthermore, only the WHERE clause and the HAVING clause are modified, since these should be the only ones that contain predicates.

If the predicate to replace is not found, nothing happens. In the same vein, no sanity checks are performed on the updated query.

Parameters:
  • query (ImplicitSqlQuery) – The query update

  • predicate_to_replace (AbstractPredicate) – The old predicate that should be dropped

  • new_predicate (AbstractPredicate) – The predicate that should be used in place of predicate_to_replace. This can be an entirely different type of predicate, e.g. a conjunction of join conditions that replace a single join predicate.

Returns:

The updated query

Return type:

ImplicitSqlQuery

postbound.transform.rename_columns_in_query(query: T, available_renamings: dict[ColumnReference, ColumnReference]) T#

Replaces specific column references by new references for an entire query.

Parameters:
Returns:

The updated query

Return type:

SelectQueryType

Raises:

TypeError – If the query is of no known type. This indicates that this method is missing a handler for a specific query type that was added later on.

postbound.transform.rename_columns_in_expression(expression: SqlExpression | None, available_renamings: dict[ColumnReference, ColumnReference]) SqlExpression | None#

Replaces references to specific columns in an expression.

Parameters:
  • expression (Optional[SqlExpression]) – The expression to update. If None, no renaming is performed.

  • available_renamings (dict[ColumnReference, ColumnReference]) – A dictionary mapping each of the old column values to the values that should be used instead.

Returns:

The updated expression. Can be None, if expression already was.

Return type:

Optional[SqlExpression]

Raises:

ValueError – If the expression is of no known type. This indicates that this method is missing a handler for a specific expressoin type that was added later on.

postbound.transform.rename_columns_in_predicate(predicate: AbstractPredicate | None, available_renamings: dict[ColumnReference, ColumnReference]) AbstractPredicate | None#

Replaces all references to specific columns in a predicate by new references.

Parameters:
  • predicate (Optional[AbstractPredicate]) – The predicate to update. Can be None, in which case no update is performed.

  • available_renamings (dict[ColumnReference, ColumnReference]) – A dictionary mapping each of the old column values to the values that should be used instead.

Returns:

The updated predicate. Can be None, if predicate already was.

Return type:

Optional[AbstractPredicate]

Raises:

ValueError – If the query is of no known type. This indicates that this method is missing a handler for a specific query type that was added later on.

postbound.transform.rename_columns_in_clause(clause: ClauseType | None, available_renamings: dict[ColumnReference, ColumnReference]) ClauseType | None#

Replaces all references to specific columns in a clause by new columns.

Parameters:
  • clause (Optional[ClauseType]) – The clause to update. Can be None, in which case no update is performed.

  • available_renamings (dict[ColumnReference, ColumnReference]) – A dictionary mapping each of the old column values to the values that should be used instead.

Returns:

The updated clause. Can be None, if clause already was.

Return type:

Optional[ClauseType]

Raises:

ValueError – If the clause is of no known type. This indicates that this method is missing a handler for a specific clause type that was added later on.

postbound.transform.rename_table(source_query: T, renamings: dict[TableReference, TableReference], *, prefix_column_names: bool = False) T#
postbound.transform.rename_table(source_query: T, from_table: TableReference, target_table: TableReference, *, prefix_column_names: bool = False) T

Changes all references to a specific table to refer to another table instead.

Parameters:
  • source_query (SelectQueryType) – The query that should be updated

  • from_table (TableReference) – The table that should be replaced

  • target_table (TableReference) – The table that should be used instead

  • prefix_column_names (bool, optional) –

    Whether a prefix should be added to column names. If this is True, column references will be changed in two ways:

    1. if they belonged to the from_table, they will now belong to the target_table after the renaming

    2. The column names will be changed to include the identifier of the from_table as a prefix.

Returns:

The updated query

Return type:

SelectQueryType

postbound.transform.merge_tables(query: T, tables: Iterable[TableReference], *, target: TableReference) T#

Rewrites a query to replace all references to any of the given tables by references to a single target table.

This is useful, for example, for mat view scenarios where queries should re-written after a materialized view has been created over a specific join.

Examples

>>> stats = pb.workloads.stats()
>>> posts = pb.TableReference("posts", "p")
>>> comments = pb.TableReference("comments", "c")
>>> mat_view = pb.TableReference("post_comments", "pc")
>>> pb.transform.merge_tables(stats["q-10"], [posts, comments], target=mat_view)
Parameters:
Return type:

T