Query Manipulation#
Module name: postbound.transform
This module provides tools to modify the contents of existing SqlQuery instances.
Since queries are designed as immutable data objects, these transformations operate by implementing new query instances.
The tools differ in their granularity, ranging from utilities that swap out individual expressions and predicates, to tools that change the entire structure of the query.
Some important transformations include:
flatten_and_predicate: Simplifies the predicate structure by moving all nested
ANDpredicates to their parentANDextract_query_fragment: Extracts parts of an original query based on a subset of its tables (i.e. induced join graph and filter predicates)
add_ec_predicates: Expands a querie’s WHERE clause to include all join predicates that are implied by other (equi-) joins
as_count_star_query and as_explain_analyze change the query to be executed as COUNT(*) or EXPLAIN ANALYZE respectively
In addition to these frequently-used transformations, there are also lots of utilities that add, remove, or modify specific parts of queries, such as individual clauses or expressions.
- class postbound.transform.QueryType#
The concrete class of a query.
This generic type is used for transformations that do not change the type of a query and operate on all the different query types.
alias of TypeVar(‘QueryType’, bound=
SqlQuery)
- class postbound.transform.SelectQueryType#
The concrete class of a select query.
This generic type is used for transformations that do not change the type of a query and operate on all the different select query types.
alias of TypeVar(‘SelectQueryType’, bound=
SqlQuery|SetQuery)
- class postbound.transform.ClauseType#
The concrete class of a clause.
This generic type is used for transformations that do not change the type of a clause and operate on all the different clause types.
alias of TypeVar(‘ClauseType’, bound=
BaseClause)
- class postbound.transform.PredicateType#
The concrete type of a predicate.
This generic type is used for transformations that do not change the type of a predicate and operate on all the different predicate types.
alias of TypeVar(‘PredicateType’, bound=
AbstractPredicate)
- postbound.transform.flatten_and_predicate(predicate: AbstractPredicate) AbstractPredicate#
Simplifies the predicate structure by moving all nested
ANDpredicates to their parentANDpredicate.For example, consider the predicate
(R.a = S.b AND R.a = 42) AND S.b = 24. This is transformed into the flattened equivalent conjunctionR.a = S.b AND R.a = 42 AND S.b = 24.This procedure continues in a recursive manner, until the first disjunction or negation is encountered. All predicates below that point are left as-is for the current branch of the predicate hierarchy.
- Parameters:
predicate (AbstractPredicate) – The predicate to simplified
- Returns:
An equivalent version of the given predicate, with all conjunctions unnested
- Return type:
- postbound.transform.explicit_to_implicit(source_query: ExplicitSqlQuery) ImplicitSqlQuery#
Transforms a query with an explicit
FROMclause to a query with an implicitFROMclause.Currently, this process is only supported for explicit queries that do not contain subqueries in their
FROMclause.- Parameters:
source_query (ExplicitSqlQuery) – The query that should be transformed
- Returns:
An equivalent version of the given query, using an implicit
FROMclause- Return type:
- Raises:
ValueError – If the source_query contains subquery table sources
- postbound.transform.extract_query_fragment(source_query: ImplicitSqlQuery, referenced_tables: TableReference | Iterable[TableReference], *, projection: Literal['keep', 'star', '*', 'count_star'] = 'keep') ImplicitSqlQuery | None#
- postbound.transform.extract_query_fragment(source_query: SetQuery, referenced_tables: TableReference | Iterable[TableReference], *, projection: Literal['keep', 'star', '*', 'count_star'] = 'keep') SetQuery | None
- postbound.transform.extract_query_fragment(source_query: SqlQuery, referenced_tables: TableReference | Iterable[TableReference], *, projection: Literal['keep', 'star', '*', 'count_star'] = 'keep') SqlQuery | None
Filters a query to only include parts that reference specific tables.
This builds a new query from the given query that contains exactly those parts of the original query’s clauses that reference only the given tables or a subset of them.
For example, consider the query
SELECT * FROM R, S, T WHERE R.a = S.b AND S.c = T.d AND R.a = 42 ORDER BY S.bthe query fragment for tablesRandSwould look like this:SELECT * FROM R, S WHERE R.a = S.b AND R.a = 42 ORDER BY S.b, whereas the query fragment for tableSwould look likeSELECT * FROM S ORDER BY S.b.Notice that this can break disjunctions: the fragment for table
Rof querySELECT * FROM R, S, WHERE R.a < 100 AND (R.a = 42 OR S.b = 42)isSELECT * FROM R WHERE R.a < 100 AND R.a = 42. This also indicates that the fragment extraction does not perform any logical pruning of superflous predicates.- Parameters:
source_query (ImplicitSqlQuery | SetQuery) – The query that should be transformed. Note that the current implementation only supports implicit queries and set queries.
referenced_tables (TableReference | Iterable[TableReference]) – The tables that should be extracted
projection (Literal["keep", "star", "*", "count_star"], optional) – How the projection of the resulting query should look like. Defaults to keep, which retains the original projection/SELECT clause. Alternatively, the projection can be set to star or (literal) * to use a SELECT * projection, or to count_star to use a SELECT COUNT(*) projection.
- Returns:
A query that only consists of those parts of the source_query, that reference (a subset of) the referenced_tables. If there is no such subset,
Noneis returned.- Return type:
Optional[ImplicitSqlQuery | SetQuery]
Warning
The current implementation only works for SetQuery and ImplicitSqlQuery instances. If the source_query is not of one of these types (or contains subqueries that are not of these types), a ValueError is raised.
See also
extract_subqueryA wrapper with sane defaults. Use this whenever possible.
- postbound.transform.extract_subquery(query: SqlQuery, intermediate: TableReference | Iterable[TableReference]) SqlQuery#
Extracts a subquery from a given query based on a subset of its tables.
The subquery consists of exactly those predicates (join and filter) of the original query that reference the given tables. This is mostly a convenience wrapper around extract_query_fragment with the following rules:
if the subquery extraction fails, we raise an error
the subquery will always be a SELECT * query
- Parameters:
query (SqlQuery)
intermediate (TableReference | Iterable[TableReference])
- Return type:
- postbound.transform.expand_to_query(predicate: AbstractPredicate) ImplicitSqlQuery#
Provides a
SELECT *query that computes the result set of a specific predicate.- Parameters:
predicate (AbstractPredicate) – The predicate to expand
- Returns:
An SQL query of the form
SELECT * FROM <predicate tables> WHERE <predicate>.- Return type:
- postbound.transform.move_into_subquery(query: SqlQuery, tables: Iterable[TableReference], subquery_name: str = '') SqlQuery#
Transforms a specific query by moving some of its tables into a subquery.
This transformation renames all usages of columns that are now produced by the subquery to references to the virtual subquery table instead.
Notice that this transformation currently only really works for “good natured” queries, i.e. mostly implicit SPJ+ queries. Notably, the transformation is only supported for queries that do not already contain subqueries, since moving tables between subqueries is a quite tricky process. Likewise, the renaming is only applied at a table-level. If the tables export columns of the same name, these are not renamed and the transformation fails as well. If in doubt, you should definitely check the output of this method for your complicated queries to prevent bad surprises!
- Parameters:
query (SqlQuery) – The query to transform
tables (Iterable[TableReference]) – The tables that should be placed into a subquery
subquery_name (str, optional) – The target name of the virtual subquery table. If empty, a default name (consisting of all the subquery tables) is generated
- Returns:
The transformed query
- Return type:
- Raises:
ValueError – If the query does not contain a
FROMclause.ValueError – If the query contains virtual tables
ValueError – If tables contains less than 2 entries. In this case, using a subquery is completely pointless.
ValueError – If the tables that should become part of the subquery both provide columns of the same name, and these columns are used in the rest of the query. This level of renaming is currently not accounted for.
- postbound.transform.add_ec_predicates(query: ImplicitSqlQuery) ImplicitSqlQuery#
Expands the join predicates of a query to include all predicates that are implied by the join equivalence classes.
- Parameters:
query (ImplicitSqlQuery) – The query to analyze
- Returns:
An equivalent query that explicitly contains all predicates from join equivalence classes.
- Return type:
See also
determine_join_equivalence_classes,generate_predicates_for_equivalence_classes
- postbound.transform.as_star_query(source_query: QueryType) QueryType#
Transforms a specific query to use a
SELECT *projection instead.Notice that this can break certain queries where a renamed column from the
SELECTclause is used in other parts of the query, such asORDER BYclauses (e.g.SELECT SUM(foo) AS f FROM bar ORDER BY f). We currently do not undo such a renaming.
- postbound.transform.as_count_star_query(source_query: QueryType) QueryType#
Transforms a specific query to use a
SELECT COUNT(*)projection instead.Notice that this can break certain queries where a renamed column from the
SELECTclause is used in other parts of the query, such asORDER BYclauses (e.g.SELECT SUM(foo) AS f FROM bar ORDER BY f). We currently do not undo such a renaming.
- postbound.transform.drop_hints(query: SelectQueryType, preparatory_statements_only: bool = False) SelectQueryType#
Removes the hint clause from a specific query.
- Parameters:
query (SelectQueryType) – The query to transform
preparatory_statements_only (bool, optional) – Whether only the preparatory statements from the hint block should be removed. This would retain the actual hints. Defaults to
False, which removes the entire block, no matter its contents.
- Returns:
The query without the hint block
- Return type:
- postbound.transform.as_explain(query: ~postbound.transform.SelectQueryType, explain: ~postbound.qal._qal.Explain = EXPLAIN (FORMAT JSON)) SelectQueryType#
Transforms a specific query into an
EXPLAINquery.- Parameters:
query (SelectQueryType) – The query to transform
explain (Explain, optional) – The
EXPLAINblock to use. Defaults to a standardExplain.plan()block.
- Returns:
The transformed query
- Return type:
- postbound.transform.as_explain_analyze(query: SelectQueryType) SelectQueryType#
Transforms a specific query into an
EXPLAIN ANALYZEquery.- Parameters:
query (SelectQueryType) – The query to transform
- Returns:
The transformed query. It uses an
EXPLAIN ANALYZEblock with the default output format. If this is not desired, the as_explain transformation has to be used and the targetEXPLAINblock has to be given explicitly.- Return type:
- postbound.transform.remove_predicate(predicate: AbstractPredicate | None, predicate_to_remove: AbstractPredicate) AbstractPredicate | None#
Drops a specific predicate from the predicate hierarchy.
If necessary, the hierarchy will be simplified. For example, if the predicate_to_remove is one of two childs of a conjunction, the removal would leave a conjunction of just a single predicate. In this case, the conjunction can be dropped altogether, leaving just the other child predicate. The same also applies to disjunctions and negations.
- Parameters:
predicate (Optional[AbstractPredicate]) – The predicate hierarchy from which should removed. If this is
None, no removal is attempted.predicate_to_remove (AbstractPredicate) – The predicate that should be removed.
- Returns:
The resulting (simplified) predicate hierarchy. Will be
Noneif there are no meaningful predicates left after removal, or if the predicate equals the predicate_to_remove.- Return type:
Optional[AbstractPredicate]
- postbound.transform.add_clause(query: SelectQueryType, clauses_to_add: BaseClause | Iterable[BaseClause]) SelectQueryType#
Creates a new SQL query, potentailly with additional clauses.
No validation is performed. Conflicts are resolved according to the rules of build_query. This means that the query can potentially be switched from an implicit query to an explicit one and vice-versa.
- Parameters:
query (SqlQuery) – The query to which the clause(s) should be added
clauses_to_add (BaseClause | Iterable[BaseClause]) – The new clauses
- Returns:
A new clauses consisting of the old query’s clauses and the clauses_to_add. Duplicate clauses are overwritten by the clauses_to_add.
- Return type:
- postbound.transform.ClauseDescription#
Denotes different ways clauses to remove can be denoted.
See also
alias of
Type|BaseClause|Iterable[Type|BaseClause]
- postbound.transform.drop_clause(query: T, clauses_to_drop: Type | BaseClause | Iterable[Type | BaseClause]) T#
Removes specific clauses from a query.
The clauses can be denoted in two different ways: either as the raw type of the clause, or as an instance of the same clause type as the one that should be removed. Notice that the instance of the clause does not need to be equal to the clause of the query. It just needs to be the same type of clause.
This method does not perform any validation, other than the rules described in build_query.
- Parameters:
query (SelectQueryType) – The query to remove clauses from
clauses_to_drop (ClauseDescription) – The clause(s) to remove. This can be a single clause type or clause instance, or an iterable of clauses types, intermixed with clause instances. In either way clauses of the desired types are dropped from the query.
- Returns:
A query without the specified clauses
- Return type:
Examples
The following two calls achieve exactly the same thing: getting rid of the
LIMITclause.drop_clause(query, Limit) drop_clause(query, query.limit_clause)
- postbound.transform.replace_clause(query: SelectQueryType, replacements: BaseClause | Iterable[BaseClause]) SelectQueryType#
Creates a new SQL query with the replacements being used instead of the original clauses.
Clauses are matched on a per-type basis (including subclasses, i.e. a replacement can be a subclass of an existing clause). Therefore, this function does not switch a query from implicit to explicit or vice-versa. Use a combination of drop_clause and add_clause for that. If a replacement is not present in the original query, it is simply ignored.
No validation other than the rules of build_query is performed.
- Parameters:
query (SelectQueryType) – The query to update
replacements (BaseClause | Iterable[BaseClause]) – The new clause instances that should be used instead of the old ones.
- Returns:
An updated query where the matching replacements clauses are used in place of the clause instances that were originally present in the query
- Return type:
- postbound.transform.replace_expressions(query: SelectQueryType, replacement: Callable[[SqlExpression], SqlExpression]) SelectQueryType#
Updates all expressions in a query.
The replacement handler can either produce entirely new expressions, or simply return the current expression instance if no update should be performed. Be very careful with this method since no sanity checks are performed, other than the rules of build_query.
- Parameters:
query (SelectQueryType) – The query to update
replacement (Callable[[SqlExpression], SqlExpression]) – A function mapping each of the current expressions in the query to potentially updated expressions.
- Returns:
The updated query
- Return type:
- postbound.transform.replace_predicate(query: ImplicitSqlQuery, predicate_to_replace: AbstractPredicate, new_predicate: AbstractPredicate) ImplicitSqlQuery#
Rewrites a specific query to use a new predicate in place of an old one.
In the current implementation this does only work for top-level predicates, i.e. subqueries and CTEs are not considered. Furthermore, only the
WHEREclause and theHAVINGclause are modified, since these should be the only ones that contain predicates.If the predicate to replace is not found, nothing happens. In the same vein, no sanity checks are performed on the updated query.
- Parameters:
query (ImplicitSqlQuery) – The query update
predicate_to_replace (AbstractPredicate) – The old predicate that should be dropped
new_predicate (AbstractPredicate) – The predicate that should be used in place of predicate_to_replace. This can be an entirely different type of predicate, e.g. a conjunction of join conditions that replace a single join predicate.
- Returns:
The updated query
- Return type:
- postbound.transform.rename_columns_in_query(query: T, available_renamings: dict[ColumnReference, ColumnReference]) T#
Replaces specific column references by new references for an entire query.
- Parameters:
query (SelectQueryType) – The query to update
available_renamings (dict[ColumnReference, ColumnReference]) – A dictionary mapping each of the old column values to the values that should be used instead.
- Returns:
The updated query
- Return type:
- Raises:
TypeError – If the query is of no known type. This indicates that this method is missing a handler for a specific query type that was added later on.
- postbound.transform.rename_columns_in_expression(expression: SqlExpression | None, available_renamings: dict[ColumnReference, ColumnReference]) SqlExpression | None#
Replaces references to specific columns in an expression.
- Parameters:
expression (Optional[SqlExpression]) – The expression to update. If
None, no renaming is performed.available_renamings (dict[ColumnReference, ColumnReference]) – A dictionary mapping each of the old column values to the values that should be used instead.
- Returns:
The updated expression. Can be
None, if expression already was.- Return type:
Optional[SqlExpression]
- Raises:
ValueError – If the expression is of no known type. This indicates that this method is missing a handler for a specific expressoin type that was added later on.
- postbound.transform.rename_columns_in_predicate(predicate: AbstractPredicate | None, available_renamings: dict[ColumnReference, ColumnReference]) AbstractPredicate | None#
Replaces all references to specific columns in a predicate by new references.
- Parameters:
predicate (Optional[AbstractPredicate]) – The predicate to update. Can be
None, in which case no update is performed.available_renamings (dict[ColumnReference, ColumnReference]) – A dictionary mapping each of the old column values to the values that should be used instead.
- Returns:
The updated predicate. Can be
None, if predicate already was.- Return type:
Optional[AbstractPredicate]
- Raises:
ValueError – If the query is of no known type. This indicates that this method is missing a handler for a specific query type that was added later on.
- postbound.transform.rename_columns_in_clause(clause: ClauseType | None, available_renamings: dict[ColumnReference, ColumnReference]) ClauseType | None#
Replaces all references to specific columns in a clause by new columns.
- Parameters:
clause (Optional[ClauseType]) – The clause to update. Can be
None, in which case no update is performed.available_renamings (dict[ColumnReference, ColumnReference]) – A dictionary mapping each of the old column values to the values that should be used instead.
- Returns:
The updated clause. Can be
None, if clause already was.- Return type:
Optional[ClauseType]
- Raises:
ValueError – If the clause is of no known type. This indicates that this method is missing a handler for a specific clause type that was added later on.
- postbound.transform.rename_table(source_query: T, renamings: dict[TableReference, TableReference], *, prefix_column_names: bool = False) T#
- postbound.transform.rename_table(source_query: T, from_table: TableReference, target_table: TableReference, *, prefix_column_names: bool = False) T
Changes all references to a specific table to refer to another table instead.
- Parameters:
source_query (SelectQueryType) – The query that should be updated
from_table (TableReference) – The table that should be replaced
target_table (TableReference) – The table that should be used instead
prefix_column_names (bool, optional) –
Whether a prefix should be added to column names. If this is
True, column references will be changed in two ways:if they belonged to the from_table, they will now belong to the target_table after the renaming
The column names will be changed to include the identifier of the from_table as a prefix.
- Returns:
The updated query
- Return type:
- postbound.transform.merge_tables(query: T, tables: Iterable[TableReference], *, target: TableReference) T#
Rewrites a query to replace all references to any of the given tables by references to a single target table.
This is useful, for example, for mat view scenarios where queries should re-written after a materialized view has been created over a specific join.
Examples
>>> stats = pb.workloads.stats() >>> posts = pb.TableReference("posts", "p") >>> comments = pb.TableReference("comments", "c") >>> mat_view = pb.TableReference("post_comments", "pc") >>> pb.transform.merge_tables(stats["q-10"], [posts, comments], target=mat_view)
- Parameters:
query (T)
tables (Iterable[TableReference])
target (TableReference)
- Return type:
T