Query Abstraction Layer

Contents

Query Abstraction Layer#

Module name: postbound.qal

Contains the basic query abstraction layer to conveniently model SQL queries.

The most important features of the qal are: 1. parsing query strings into qal objects 2. providing access to underlying query features such as referenced tables, aliases or predicates 3. converting queries to representations in relational algebra 4. formatting qal objects back to strings

Generally, the qal is structured around 3 fundamental concepts: At the core of the qal are SQL expressions. Such expressions form the basic building blocks that are re-used by more high-level components. For example, there are expressions that model a reference to a column, as well as expressions for function calls and expressions for modelling math. The SqlExpression acts as the common base class for all different expression types.

Expressions are used to construct predicates or clauses. Predicates are normally part of clauses (such as a WHERE clause or a HAVING clause). Finally, clauses are combined to form the actual SQL queries.

Using these basic building blocks, the relalg module provides a simple model of relational algebra, as well as means to translate a parsed SQL query to an algebraic expression.

A common pattern when working with elements of the qal are the tables and columns methods (along with some other, more rarely used ones). These are defined on pretty much all of the qal types and provide access to the tables, respectively the columns that are referenced within the current element.

Notice that some references in the qal are inherently cyclic: for example, predicates can contain subqueries and the subqueries in turn contain predicates. This might lead to cyclic import errors in certain corner cases. Such issues can usually be solved by varying the import sequence slightly.

All concepts in the qal are modelled as data objects that are immutable. In order to modify parts of an SQL query, a new query has to be constructed. The transform module provides some functions to help with that. Traversal of the different parts of a query can be done using specific visitor implementations.

In order to generate query instances, the parser module can be used to read them from strings. Finally, the formatter module can be used to create pretty representations of queries. The transform, and formatter modules are available directly from the qal and do not need to be imported explicitly. The same holds for a simple relational algebra representation in the relalg module. The parser provides means for reading an entire query from text, or reading parts of it from JSON. A parse_query helper function is directly available from the qal module.

SQL queries#

The most important type of our query abstraction is the SqlQuery class. It focuses on modelling an entire SQL query with all important concepts. Notice that the focus here really in on modelling - nearly no interactive functionality, no input/output capabilities and no modification tools are provided. These are handled by dedicated modules (e.g. the parser module for reading queries from text, or the transform module for changing existing query objects).

In addition to the pure SqlQuery, a number of subclasses exist. These model queries with specific FROM clauses. For example, the ImplicitSqlQuery provides an ImplicitFromClause that restricts how tables can be referenced in this clause. For some use-cases, these might be easier to work with than the more general SqlQuery class, where much more diverse FROM clauses are permitted.

Predicates#

Predicates are the central building block to represent filter conditions for SQL queries.

A predicate is a boolean expression that can be applied to a tuple to determine whether it should be kept in the intermediate result or thrown away. PostBOUND distinguishes between two kinds of predicates, even though they are both represented by the same class: there are filter predicates, which - as a rule of thumb - can be applied directly to base table relations. Furthermore, there are join predicates that access tuples from different relations and determine whether the join of both tuples should become part of the intermediate result.

PostBOUND’s implementation of predicates is structured using a composite-style layout: The AbstractPredicate interface describes all behaviour that is common to the concrete predicate types. There are BasePredicate`s, which typically contain different expressions. The `CompoundPredicate is used to nest different predicates, thereby creating tree-shaped hierarchies.

In addition to the predicate representation, this module also provides a utility for streamlined access to simple predicates via SimpleFilter and SimpleJoin. Likwise, the QueryPredicates provide high-level access to all predicates (join and filter) that are specified in a query. From a user perspective, this is probably the best entry point to work with predicates. Alternatively, the predicate tree can also be traversed using custom functions.

Lastly, there exists some basic support for equivalence class computation via the determine_join_equivalence_classes and generate_predicates_for_equivalence_classes functions.

Clauses#

In addition to widely accepted clauses such as the default SPJ-building blocks or grouping clauses (GROUP BY and HAVING), some additional clauses are also defined. These include Explain clauses that model widely used EXPLAIN queries which provide the query plan instead of optimizing the query. Furthermore, the Hint clause is used to model hint blocks that can be used to pass additional non-standardized information to the database system and its query optimizer. In real-world contexts this is mostly used to correct mistakes by the optimizer, but PostBOUND uses this feature to enforce entire query plans. The specific contents of a hint block are not standardized by PostBOUND and thus remains completely system-specific.

All clauses inherit from BaseClause, which specifies the basic common behaviour shared by all concrete clauses. Furthermore, all clauses are designed as immutable data objects whose content cannot be changed. Any forced modifications will break the entire query abstraction layer and lead to unpredictable behaviour.

Notes

The immutability enables a very fast hashing of values as well as the caching of complicated computations. Most objects employ a pattern of determining their hash value during initialization of the object and simply provide that precomputed value during hashing. This helps to speed up several hot loops at optimization time significantly.

class postbound.qal.AbstractPredicate(hash_val: int)#

Base class for all predicates.

Predicates constitute the central building block for WHERE and HAVING clauses and model the join conditions in explicit joins using the JOIN ON syntax.

The different kinds of predicates are represented as subclasses of the AbstractPredicate interface. This really is an abstract interface, not a usable predicate and it only specifies the behaviour that is shared among all specific predicate implementations. All inheriting classes have to implement their own __eq__ method and inherit the __hash__ method specified by the abstract predicate. Remember to explicitly set this up! The concrete hash value is constant since the clause itself is immutable. It is up to the implementing class to make sure that the equality/hash consistency is enforced.

Possible implementations of the abstract predicate can model basic binary predicates such as R.a = S.b or R.a = 42, as well as compound predicates that are build form base predicates, e.g. conjunctions, disjunctions or negations.

Parameters:

hash_val (int) – The hash of the concrete predicate object

abstractmethod accept_visitor(visitor: PredicateVisitor[VisitorResult] | SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current predicate by a predicate visitor.

Parameters:
  • visitor (PredicateVisitor[VisitorResult]) – The visitor

  • args – Additional arguments to pass to the visitor

  • kwargs – Additional keyword arguments to pass to the visitor

Return type:

VisitorResult

abstractmethod base_predicates() Iterable[AbstractPredicate]#

Provides all base predicates that form this predicate.

This allows to iterate over all leaves of a compound predicate, for base predicates it simply returns the predicate itself.

Returns:

The base predicates, in an arbitrary order. If the predicate is a base predicate already, it will be the only item in the iterable.

Return type:

Iterable[AbstractPredicate]

abstractmethod columns() set[ColumnReference]#

Provides all columns that are referenced by this predicate.

Returns:

The columns. If the predicate contains a subquery, all columns of that query are included.

Return type:

set[ColumnReference]

columns_of(table: TableReference) set[ColumnReference]#

Retrieves all columns of a specific table that are referenced by this predicate.

Parameters:

table (TableReference) – The table to check

Returns:

All columns in this predicate that belong to the given table

Return type:

set[ColumnReference]

contains_table(table: TableReference) bool#

Checks, whether this predicate filters or joins a column of a specific table.

Parameters:

table (TableReference) – The table to check

Returns:

Whether the given table is referenced by any of the columns in the predicate.

Return type:

bool

is_base() bool#

Checks, whether this predicate forms a leaf in the predicate tree, i.e. does not contain any more child predicates.

This is the case for basic binary predicates, IN predicates, etc. This method serves as a high-level check, preventing the usage of dedicated isinstance calls in some use-cases.

Returns:

Whether this predicate is a base predicate

Return type:

bool

abstractmethod is_compound() bool#

Checks, whether this predicate combines the evaluation of other predicates to compute the overall evaluation.

Operators to combine such predicates can be standard logical operators like conjunction, disjunction and negation. This method serves as a high-level check, preventing the usage of dedicated isinstance calls in some use-cases.

Returns:

Whethter this predicate is a composite of other predicates.

Return type:

bool

is_filter() bool#

Checks, whether this predicate encodes a filter on a base table rather than a join of base tables.

This is the inverse method to is_join. Consult it for more details on the join/filter recognition procedure.

Returns:

Whether the predicate is a filter of some relation

Return type:

bool

abstractmethod is_join() bool#

Checks, whether this predicate encodes a join between two tables.

PostBOUND uses the following criteria to determine, whether a predicate is join or not:

  1. all predicates of the form <col 1> <operator> <col 2> where <col 1> and <col 2> come from different tables are joins. The columns can optionally be modified by value casts or static expressions, e.g. R.a::integer + 7

  2. all functions that access columns from multiple tables are joins, e.g. my_udf(R.a, S.b)

  3. all subqueries are treated as filters, no matter whether they are dependent subqueries or not. This means that both R.a = (SELECT MAX(S.b) FROM S) and R.a = (SELECT MAX(S.b) FROM S WHERE R.c = S.d) are treated as filters and not as joins, even though the second subquery will require some sort of the join in the query plan.

  4. BETWEEN and IN predicates are treated according to rule 1 since they can be emulated via base predicates (subqueries in IN predicates are evaluated according to rule 3.)

Although these rules might seem a bit arbitrary at first, there is actually no clear consensus of what constitutes a join and the query optimizers of different industrial database systems treat different predicates as joins. For example, some systems might not recognize function calls that access columns form two or more tables as joins or do not recognize predicates that use non-equi joins as opertors as actual joins.

If the specific join and filter recognition procedure breaks a specific use-case, subclasses of the predicate classes can be implemented. These subclasses can then apply the required rules. Using the tools in the transformation module, the queries can be updated. For some use-cases it can also be sufficient to change the join/filter recognition rules of the QueryPredicates objects. Consult its documentation for more details.

Lastly, notice that the distinction between join and filter is not entirely binary. There may also be a third class of predicates, potentially called “post-join filters”. These are filters that are applied after a join but cannot be included in the join predicate itself. This is usually the case due to limitations in the operator implementation of the actual database system. For example, invocations of user defined functions (case 2 above) usually fall in this category. Since the query abstraction layer is agnostic to specific details of database systems, we apply the binary categorization outlined above.

Returns:

Whether the predicate is a join of different relations.

Return type:

bool

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

abstractmethod itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this predicate.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

abstractmethod iterexpressions() Iterable[SqlExpression]#

Provides access to all expressions that are directly contained in this predicate.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

abstractmethod join_partners() set[tuple[ColumnReference, ColumnReference]]#

Provides all pairs of columns that are joined within this predicate.

If multiple columns are joined or it is unclear which columns are involved in a join exactly, this method falls back to returning the cross-product of all potential join partners. For example, consider the following query: SELECT * FROM R, S WHERE my_udf(R.a, R.b, S.c). In this case, it cannot be determined which columns of R take part in the join. Therefore, join_partners will return the set {(R.a, S.c), (R.b, S.c)}.

Returns:

The pairs of joined columns. These pairs are normalized, such that two predicates which join the same columns provide the join partners in the same order.

Return type:

set[tuple[ColumnReference, ColumnReference]]

Raises:

NoJoinPredicateError

join_partners_of(table: TableReference) set[ColumnReference]#

Retrieves all columns that are joined with a specific table.

Parameters:

table (TableReference) – The table for which the join partners should be searched

Returns:

The columns that are joined with the given table

Return type:

set[ColumnReference]

Raises:

NoJoinPredicateError

joins_columns(a: ColumnReference, b: ColumnReference) bool#

Checks, whether this predicate describes a join between the given columns.

The order of the columns does not matter.

Parameters:
Return type:

bool

joins_table(table: TableReference) bool#

Checks, whether this predicate describes a join and one of the join partners is a specific table.

Parameters:

table (TableReference) – The table to check

Returns:

Whether the given table is one of the join partners in the predicate.

Return type:

bool

joins_tables(a: TableReference, b: TableReference) bool#

Checks, whether this predicate describes a join between the given tables.

The order of the tables does not matter.

Parameters:
Return type:

bool

required_tables() set[TableReference]#

Provides all tables that have to be “available” in order for this predicate to be executed.

Availability in this context means that the table has to be scanned already. Therefore it can be accessed either as-is, or as part of an intermediate relation.

The output of this method differs from the tables method in one central aspect: tables provides all tables that are accessed, which includes all tables from subqueries. In contrast, the required_tables remove all tables that are scanned by the subquery and only include those that must be “provided” by the query execution engine.

Consider the following example predicate: R.a = (SELECT MIN(S.b) FROM S). Calling tables on this predicate would return the set {R, S}. However, table S is already provided by the subquery. Therefore, required_tables only returns {R}, since this is the only table that has to be provided by the context of this method.

Returns:

The tables that need to be provided by the query execution engine in order to run this predicate

Return type:

set[TableReference]

tables() set[TableReference]#

Provides all tables that are accessed by this predicate.

Notice that even for filters, the provided set might contain multiple entries, e.g. if the predicate contains subqueries.

Returns:

All tables. This can include virtual tables if such tables are referenced in the predicate.

Return type:

set[TableReference]

class postbound.qal.ArrayAccessExpression(array_expr: SqlExpression, *, idx: SqlExpression | None = None, lower_idx: SqlExpression | None = None, upper_idx: SqlExpression | None = None)#

Models index-based access to an array column.

Due to its oftentimes special syntax, this is modeled as a special case of a function expression (using ARRAY_GET as the function name). The text representation is based on Postgres and should be adapted for other systems during query formatting if necessary.

Depending on the specific kind of access, different parameters can be set. For simple element access, the index attribute is used. For slices, the lower_index and upper_index attributes are available. It is also possible to only set one boundary to create a half-open slice. Whether the index is 0-based or 1-based is not enforced by PostBOUND and depends on the actual database system.

Notice that all indexes are represented as expressions rather than simple integers. This allows for “variable” indexes, as in SELECT R.a[R.b] FROM R.

Parameters:
  • array_expr (SqlExpression) – The array being accessed

  • idx (Optional[SqlExpression], optional) – For point-based access, the index of the element to access. Defaults to None.

  • lower_idx (Optional[SqlExpression], optional) – For slice-based access, the lower boundary of the slice. Defaults to None.

  • upper_idx (Optional[SqlExpression], optional) – For slice-based access, the upper boundary of the slice. Defaults to None.

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

property array: SqlExpression#

Get the array that is being accessed.

Returns:

The array

Return type:

SqlExpression

property index: SqlExpression | None#

Get the index of the element to access.

Returns:

The index or None if sliced access is used

Return type:

Optional[SqlExpression]

property index_slice: tuple[SqlExpression | None, SqlExpression | None] | None#

Get the boundaries of the slice.

Returns:

The slice interval. Any boundaries can be none if the interval is open at that end. If the array access is not sliced, the entire tuple is None.

Return type:

Optional[tuple[Optional[SqlExpression], Optional[SqlExpression]]]

property lower_index: SqlExpression | None#

Get the lower boundary of the slice.

Returns:

The lower boundary or None if either point-based access is used, or the slice is open at the lower end

Return type:

Optional[SqlExpression]

property upper_index: SqlExpression | None#

Get the upper boundary of the slice.

Returns:

The upper boundary or None if either point-based access is used, or the slice is open at the upper end

Return type:

Optional[SqlExpression]

class postbound.qal.ArrayExpression(elements: Sequence[SqlExpression])#

Models an array literal expression, such as ARRAY[1, 2, 3].

Our array abstraction also permits the array to contain arbitrary expressions, as long as they are all of the same type (which we assume but cannot check), e.g. ARRAY[41, (SELECT 42), 43].

Parameters:

elements (Sequence[SqlExpression]) – The elements of the array. Notice that all elements have to be valid SqlExpression instances, raw values are not permitted.

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

property elements: Sequence[SqlExpression]#

Get the elements of the array.

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

class postbound.qal.BaseClause(hash_val: int)#

Basic interface shared by all supported clauses.

This really is an abstract interface, not a usable clause. All inheriting clauses have to provide their own __eq__ method and re-use the __hash__ method provided by the base clause. Remember to explicitly set this up! The concrete hash value is constant since the clause itself is immutable. It is up to the implementing class to make sure that the equality/hash consistency is enforced.

Parameters:

hash_val (int) – The hash of the concrete clause object

abstractmethod accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

abstractmethod columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

abstractmethod itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

abstractmethod iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

tables() set[TableReference]#

Provides all tables that are referenced in the clause.

Returns:

All tables. This includes virtual tables if such tables are present in the clause

Return type:

set[TableReference]

class postbound.qal.BasePredicate(operation: MathOperator | LogicalOperator | CompoundOperator | None, *, hash_val: int)#

A base predicate is a predicate that is not composed of any additional child predicates, such as a binary predicate.

It represents the smallest kind of condition that evaluates to TRUE or FALSE.

Parameters:
  • operation (Optional[SqlOperator]) – The operation that compares the column value(s). For unary base predicates, this may be None if a predicate function is used to determine matching tuples.

  • hash_val (int) – The hash of the entire predicate

base_predicates() Iterable[AbstractPredicate]#

Provides all base predicates that form this predicate.

This allows to iterate over all leaves of a compound predicate, for base predicates it simply returns the predicate itself.

Returns:

The base predicates, in an arbitrary order. If the predicate is a base predicate already, it will be the only item in the iterable.

Return type:

Iterable[AbstractPredicate]

is_compound() bool#

Checks, whether this predicate combines the evaluation of other predicates to compute the overall evaluation.

Operators to combine such predicates can be standard logical operators like conjunction, disjunction and negation. This method serves as a high-level check, preventing the usage of dedicated isinstance calls in some use-cases.

Returns:

Whethter this predicate is a composite of other predicates.

Return type:

bool

property operation: MathOperator | LogicalOperator | CompoundOperator | None#

Get the operation that is used to obtain matching (pairs of) tuples.

Most of the time, this operation will be set to one of the SQL operators. However, for unary predicates that filter based on a predicate function this might be None (e.g. a user-defined function such as in SELECT * FROM R WHERE my_udf_predicate(R.a, R.b)).

Returns:

The operation if it exists

Return type:

Optional[SqlOperator]

class postbound.qal.BaseProjection(expression: SqlExpression, target_name: str = '')#

The BaseProjection forms the fundamental building block of a SELECT clause.

Each SELECT clause is composed of at least one base projection. Each projection can be an arbitrary SqlExpression (rules and restrictions of the SQL standard are not enforced here). In addition, each projection can receive a target name as in SELECT foo AS f FROM bar.

Parameters:
  • expression (SqlExpression) – The expression that is used to calculate the column value. In the simplest case, this can just be a ColumnExpression, which provides the column values directly.

  • target_name (str, optional) – An optional name under which the column should be accessible. Defaults to an empty string, which indicates that the original column value or a system-specific modification of that value should be used. The latter case mostly applies to columns which are modified in some way, e.g. by a mathematical expression or a function call. Depending on the specific database system, the default column name could just be the function name, or the function name along with all its parameters.

static column(col: ColumnReference, target_name: str = '') BaseProjection#

Shortcut method to create a projection for a specific column.

Parameters:
  • col (ColumnReference) – The column that should be projected

  • target_name (str, optional) – An optional name under which the column should be available.

Returns:

The projection

Return type:

BaseProjection

static count_star(target_name: str = '') BaseProjection#

Shortcut method to create a COUNT(*) projection.

Parameters:

target_name (str, optional) – An optional name under which the column should be accessible.

Returns:

The projection

Return type:

BaseProjection

property expression: SqlExpression#

Get the expression that forms the column.

Returns:

The expression

Return type:

SqlExpression

is_star() bool#

Checks, whether this projection is a star projection.

Return type:

bool

static star() BaseProjection#

Shortcut method to create a * (as in SELECT * FROM R) projection.

Returns:

The projection

Return type:

BaseProjection

property target_name: str#

Get the alias under which the column should be accessible.

Can be empty to indicate the absence of a target name.

Returns:

The name

Return type:

str

class postbound.qal.BetweenPredicate(column: SqlExpression, interval: tuple[SqlExpression, SqlExpression])#

A BETWEEN predicate is a special case of a conjunction of two binary predicates.

Each BETWEEN predicate has a structure of <col> BETWEEN <a> AND <b>, where <col> describes the (column) expression to which the condition should apply and <a> and <b> are the expressions that denote the valid bounds.

Each BETWEEN predicate can be represented by a conjunction of binary predicates: <col> BETWEEN <a> AND <b> is equivalent to <col> >= <a> AND <col> <= <b>.

Parameters:
  • column (SqlExpression) – The value that is checked by the predicate

  • interval (tuple[SqlExpression, SqlExpression]) – The allowed range in which the column values must lie. The range is inclusive at both endpoints. This has to be a pair (2-tuple) of expressions.

Raises:

ValueError – If the interval is not a pair of values.

Notes

A BETWEEN predicate can be a join predicate as in R.a BETWEEN 42 AND S.b. Furthermore, some systems even allow the <col> part to be an arbitrary expression. For example, in Postgres this is a valid query:

SELECT *
FROM R JOIN S ON R.a = S.b
WHERE 42 BETWEEN R.c AND S.d
accept_visitor(visitor: PredicateVisitor[VisitorResult] | SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current predicate by a predicate visitor.

Parameters:
  • visitor (PredicateVisitor[VisitorResult]) – The visitor

  • args – Additional arguments to pass to the visitor

  • kwargs – Additional keyword arguments to pass to the visitor

Return type:

VisitorResult

property column: SqlExpression#

Get the column that is tested (R.a in SELECT * FROM R WHERE R.a BETWEEN 1 AND 42).

Returns:

The expression

Return type:

SqlExpression

columns() set[ColumnReference]#

Provides all columns that are referenced by this predicate.

Returns:

The columns. If the predicate contains a subquery, all columns of that query are included.

Return type:

set[ColumnReference]

property interval: tuple[SqlExpression, SqlExpression]#

Get the interval (as (lower, upper)) that is tested against.

Returns:

The allowed range of values. This interval is inclusive at both endpoints.

Return type:

tuple[SqlExpression, SqlExpression]

property interval_end: SqlExpression#

Get the upper bound of the interval that is tested against.

Returns:

The upper value. This value is inclusive, i.e. the comparison values must be less or equal.

Return type:

SqlExpression

property interval_start: SqlExpression#

Get the lower bound of the interval that is tested against.

Returns:

The lower value. This value is inclusive, i.e. the comparison values must be greater or equal.

Return type:

SqlExpression

is_join() bool#

Checks, whether this predicate encodes a join between two tables.

PostBOUND uses the following criteria to determine, whether a predicate is join or not:

  1. all predicates of the form <col 1> <operator> <col 2> where <col 1> and <col 2> come from different tables are joins. The columns can optionally be modified by value casts or static expressions, e.g. R.a::integer + 7

  2. all functions that access columns from multiple tables are joins, e.g. my_udf(R.a, S.b)

  3. all subqueries are treated as filters, no matter whether they are dependent subqueries or not. This means that both R.a = (SELECT MAX(S.b) FROM S) and R.a = (SELECT MAX(S.b) FROM S WHERE R.c = S.d) are treated as filters and not as joins, even though the second subquery will require some sort of the join in the query plan.

  4. BETWEEN and IN predicates are treated according to rule 1 since they can be emulated via base predicates (subqueries in IN predicates are evaluated according to rule 3.)

Although these rules might seem a bit arbitrary at first, there is actually no clear consensus of what constitutes a join and the query optimizers of different industrial database systems treat different predicates as joins. For example, some systems might not recognize function calls that access columns form two or more tables as joins or do not recognize predicates that use non-equi joins as opertors as actual joins.

If the specific join and filter recognition procedure breaks a specific use-case, subclasses of the predicate classes can be implemented. These subclasses can then apply the required rules. Using the tools in the transformation module, the queries can be updated. For some use-cases it can also be sufficient to change the join/filter recognition rules of the QueryPredicates objects. Consult its documentation for more details.

Lastly, notice that the distinction between join and filter is not entirely binary. There may also be a third class of predicates, potentially called “post-join filters”. These are filters that are applied after a join but cannot be included in the join predicate itself. This is usually the case due to limitations in the operator implementation of the actual database system. For example, invocations of user defined functions (case 2 above) usually fall in this category. Since the query abstraction layer is agnostic to specific details of database systems, we apply the binary categorization outlined above.

Returns:

Whether the predicate is a join of different relations.

Return type:

bool

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this predicate.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all expressions that are directly contained in this predicate.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

join_partners() set[tuple[ColumnReference, ColumnReference]]#

Provides all pairs of columns that are joined within this predicate.

If multiple columns are joined or it is unclear which columns are involved in a join exactly, this method falls back to returning the cross-product of all potential join partners. For example, consider the following query: SELECT * FROM R, S WHERE my_udf(R.a, R.b, S.c). In this case, it cannot be determined which columns of R take part in the join. Therefore, join_partners will return the set {(R.a, S.c), (R.b, S.c)}.

Returns:

The pairs of joined columns. These pairs are normalized, such that two predicates which join the same columns provide the join partners in the same order.

Return type:

set[tuple[ColumnReference, ColumnReference]]

Raises:

NoJoinPredicateError

class postbound.qal.BinaryPredicate(operation: MathOperator | LogicalOperator | CompoundOperator, first_argument: SqlExpression, second_argument: SqlExpression)#

A binary predicate combines exactly two base expressions with a comparison operation.

This is the most typical kind of predicate and appears in most joins (e.g. R.a = S.b) and many filters (e.g. R.a = 42).

Parameters:
  • operation (SqlOperator) – The operation that combines the input arguments

  • first_argument (SqlExpression) – The first comparison value

  • second_argument (SqlExpression) – The second comparison value

accept_visitor(visitor: PredicateVisitor[VisitorResult] | SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current predicate by a predicate visitor.

Parameters:
  • visitor (PredicateVisitor[VisitorResult]) – The visitor

  • args – Additional arguments to pass to the visitor

  • kwargs – Additional keyword arguments to pass to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced by this predicate.

Returns:

The columns. If the predicate contains a subquery, all columns of that query are included.

Return type:

set[ColumnReference]

static equal(first_argument: SqlExpression, second_argument: SqlExpression) BinaryPredicate#

Generates an equality predicate between two arguments.

Parameters:
Return type:

BinaryPredicate

property first_argument: SqlExpression#

Get the first argument of the predicate.

Returns:

The argument

Return type:

SqlExpression

is_join() bool#

Checks, whether this predicate encodes a join between two tables.

PostBOUND uses the following criteria to determine, whether a predicate is join or not:

  1. all predicates of the form <col 1> <operator> <col 2> where <col 1> and <col 2> come from different tables are joins. The columns can optionally be modified by value casts or static expressions, e.g. R.a::integer + 7

  2. all functions that access columns from multiple tables are joins, e.g. my_udf(R.a, S.b)

  3. all subqueries are treated as filters, no matter whether they are dependent subqueries or not. This means that both R.a = (SELECT MAX(S.b) FROM S) and R.a = (SELECT MAX(S.b) FROM S WHERE R.c = S.d) are treated as filters and not as joins, even though the second subquery will require some sort of the join in the query plan.

  4. BETWEEN and IN predicates are treated according to rule 1 since they can be emulated via base predicates (subqueries in IN predicates are evaluated according to rule 3.)

Although these rules might seem a bit arbitrary at first, there is actually no clear consensus of what constitutes a join and the query optimizers of different industrial database systems treat different predicates as joins. For example, some systems might not recognize function calls that access columns form two or more tables as joins or do not recognize predicates that use non-equi joins as opertors as actual joins.

If the specific join and filter recognition procedure breaks a specific use-case, subclasses of the predicate classes can be implemented. These subclasses can then apply the required rules. Using the tools in the transformation module, the queries can be updated. For some use-cases it can also be sufficient to change the join/filter recognition rules of the QueryPredicates objects. Consult its documentation for more details.

Lastly, notice that the distinction between join and filter is not entirely binary. There may also be a third class of predicates, potentially called “post-join filters”. These are filters that are applied after a join but cannot be included in the join predicate itself. This is usually the case due to limitations in the operator implementation of the actual database system. For example, invocations of user defined functions (case 2 above) usually fall in this category. Since the query abstraction layer is agnostic to specific details of database systems, we apply the binary categorization outlined above.

Returns:

Whether the predicate is a join of different relations.

Return type:

bool

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this predicate.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all expressions that are directly contained in this predicate.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

join_partners() set[tuple[ColumnReference, ColumnReference]]#

Provides all pairs of columns that are joined within this predicate.

If multiple columns are joined or it is unclear which columns are involved in a join exactly, this method falls back to returning the cross-product of all potential join partners. For example, consider the following query: SELECT * FROM R, S WHERE my_udf(R.a, R.b, S.c). In this case, it cannot be determined which columns of R take part in the join. Therefore, join_partners will return the set {(R.a, S.c), (R.b, S.c)}.

Returns:

The pairs of joined columns. These pairs are normalized, such that two predicates which join the same columns provide the join partners in the same order.

Return type:

set[tuple[ColumnReference, ColumnReference]]

Raises:

NoJoinPredicateError

property operation: MathOperator | LogicalOperator | CompoundOperator#

Get the operation that is used to obtain matching (pairs of) tuples.

Most of the time, this operation will be set to one of the SQL operators. However, for unary predicates that filter based on a predicate function this might be None (e.g. a user-defined function such as in SELECT * FROM R WHERE my_udf_predicate(R.a, R.b)).

Returns:

The operation if it exists

Return type:

Optional[SqlOperator]

property second_argument: SqlExpression#

Get the second argument of the predicate.

Returns:

The argument

Return type:

SqlExpression

class postbound.qal.CaseExpression(cases: Sequence[tuple[SqlExpression, SqlExpression]], *, simple_expr: SqlExpression | None = None, else_expr: SqlExpression | None = None)#

Represents a case expression in SQL.

Parameters:#

casesSequence[tuple[SqlExpression, SqlExpression]]

A sequence of tuples representing the cases in the case expression. The cases are passed as a sequence rather than a dictionary, because the evaluation order of the cases is important. The first case that evaluates to true determines the result of the entire case statement.

simple_expr: Optional[SqlExpression], optional

The expression to evaluate against the cases. This “simple form” compares the expression directly against each of the values in cases, similar to a switch statement.

else_exprOptional[SqlExpression], optional

The expression to be evaluated if none of the cases match. If no case matches and no else expression is provided, the entire case expression should evaluate to NULL.

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

property cases: Sequence[tuple[SqlExpression, SqlExpression]]#

Get the different cases.

Returns:

The cases. At least one case will be present.

Return type:

Sequence[tuple[SqlExpression]]

columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

property else_expression: SqlExpression | None#

Get the expression to use if none of the cases match.

Returns:

The expression. Can be None, in which case the case expression evaluates to NULL.

Return type:

Optional[SqlExpression]

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

property simple_expression: SqlExpression | None#

Get the expression to evaluate against the cases.

This is only set for the “simple form” of the case expression, where the expression is compared directly against the values in the cases. In this form, each case has to be a plain value instead of a full predicate, similar to a switch statement:

SELECT  CASE R.a
            WHEN 1 THEN 'one'
            WHEN 2 THEN 'two'
            ELSE 'other'
        END AS foo
FROM R;
tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

Parameters:
class postbound.qal.CastExpression(expression: SqlExpression, target_type: str, *, type_params: Sequence[SqlExpression] | None = None, array_type: bool = False)#

An expression that casts the type of another nested expression.

Note that PostBOUND itself does not know about the semantics of the actual types or casts. Eventual errors due to illegal casts are only caught at runtime by the actual database system.

Parameters:
  • expression (SqlExpression) – The expression that is casted to a different type.

  • target_type (str) – The type to which the expression should be converted to. This cannot be empty.

  • type_params (Optional[Sequence[SqlExpression]], optional) – Additional arguments to parameterize the type, such as in NUMERIC(4, 2) or VARCHAR(255). For example, when casting to VARCHAR(255), the 255 would be an additional parameter, represented as a single static value expression. When casting to NUMERIC(4, 2), the 4 and 2 would be the additional parameters (in that order).

  • array_type (bool, optional) – Whether the target type is an array type.

Raises:

ValueError – If the target_type is empty.

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

property array_type: bool#

Get whether the target type is an array type.

property casted_expression: SqlExpression#

Get the expression that is being casted.

Returns:

The expression

Return type:

SqlExpression

columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

property target_type: str#

Get the type to which to cast to.

Returns:

The desired type. This is never empty.

Return type:

str

property type_params: Sequence[SqlExpression]#

Get additional arguments that parameterize the type.

For example, when casting to VARCHAR(255), the 255 would be an additional parameter, represented as a single static value expression. When casting to NUMERIC(4, 2), the 4 and 2 would be the additional parameters (in that order).

Returns:

The type parameters or an empty sequence if there are none.

Return type:

Sequence[SqlExpression]

class postbound.qal.ClauseVisitor#

Basic visitor to operate on arbitrary clause lists.

See also

BaseClause

References

class postbound.qal.ColumnExpression(column: ColumnReference)#

A column expression wraps the reference to a column.

This is a leaf expression, i.e. a column expression cannot have any more child expressions. It corresponds directly to an access to the values of the wrapped column with no modifications.

Parameters:

column (ColumnReference) – The column being wrapped

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

property column: ColumnReference#

Get the column that is wrapped by this expression.

Returns:

The column

Return type:

ColumnReference

columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

static of(column_name: str) ColumnExpression#

Shortcut method to create a new column reference + expression.

Parameters:

column_name (str)

Return type:

ColumnExpression

tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

class postbound.qal.CommonTableExpression(with_queries: Iterable[WithQuery], *, recursive: bool = False)#

The WITH clause of a query, consisting of at least one CTE query.

Parameters:
  • with_queries (Iterable[WithQuery]) – The common table expressions that form the WITH clause.

  • recursive (bool, optional) – Whether the WITH clause is recursive or not. Defaults to False.

Raises:

ValueError – If with_queries does not contain any CTE

Warning

The tables() method provides all tables that are referenced as part of the CTEs as well as their aliases. The referenced_tables() method does not include the CTE aliases. Likewise, the aliases() method provides all the aliases, but not the tables that are referenced within the CTEs.

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

aliases() set[TableReference]#

Provides all aliases that are used in the CTEs.

Returns:

The aliases.

Return type:

set[TableReference]

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

static create_for(query: SqlQuery, target_name: str) CommonTableExpression#

Shortcut method to create a CTE clause with a single query.

Parameters:
Return type:

CommonTableExpression

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

property queries: Sequence[WithQuery]#

Get CTEs that form the WITH clause

Returns:

The CTEs in the order in which they were originally specified.

Return type:

Sequence[WithQuery]

property recursive: bool#

Check whether the WITH clause is recursive or not.

Returns:

Whether the WITH clause is recursive

Return type:

bool

referenced_tables() set[TableReference]#

Provides all tables that are referenced in the CTEs. This does not include the CTE aliases.

Returns:

The tables.

Return type:

set[TableReference]

tables() set[TableReference]#

Provides all tables that are referenced in the clause.

Returns:

All tables. This includes virtual tables if such tables are present in the clause

Return type:

set[TableReference]

class postbound.qal.CompoundOperator(*values)#

The supported compound operators.

class postbound.qal.CompoundPredicate(operation: CompoundOperator, children: AbstractPredicate | Sequence[AbstractPredicate])#

A compound predicate creates a composite hierarchical structure of other predicates.

Currently, PostBOUND supports 3 kinds of compound predicates: negations, conjunctions and disjunctions. Depending on the specific compound operator, a diferent number of child predicates is allowed.

Parameters:
  • operation (LogicalSqlCompoundOperators) – The operation that glues together the individual child predicates.

  • children (AbstractPredicate | Sequence[AbstractPredicate]) – The predicates that are combined by this composite. For conjunctions and disjunctions, at least two children are required. For negations, exactly one child is permitted (either directly or in the sequence).

Raises:
  • ValueError – If operation is a negation and a number of children unequal to 1 is passed

  • ValueError – If operation is a conjunction or a disjunction and less than 2 children are passed

  • .. deprecated: – 0.20.2: CompoundPredicate will only handle AND/OR predicates in the future. NOT predicates will be represented by a proper NotPredicate class.

accept_visitor(visitor: PredicateVisitor[VisitorResult] | SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current predicate by a predicate visitor.

Parameters:
  • visitor (PredicateVisitor[VisitorResult]) – The visitor

  • args – Additional arguments to pass to the visitor

  • kwargs – Additional keyword arguments to pass to the visitor

Return type:

VisitorResult

base_predicates() Iterable[AbstractPredicate]#

Provides all base predicates that form this predicate.

This allows to iterate over all leaves of a compound predicate, for base predicates it simply returns the predicate itself.

Returns:

The base predicates, in an arbitrary order. If the predicate is a base predicate already, it will be the only item in the iterable.

Return type:

Iterable[AbstractPredicate]

property children: Sequence[AbstractPredicate] | AbstractPredicate#

Get the child predicates that are combined in this compound predicate.

For conjunctions and disjunctions this will be a sequence of children with at least two children. For negations the child predicate will be returned directly (i.e. without being wrapped in a sequence).

Returns:

The sequence of child predicates for AND and OR predicates, or the negated predicate for NOT predicates.

Return type:

Sequence[AbstractPredicate] | AbstractPredicate

columns() set[ColumnReference]#

Provides all columns that are referenced by this predicate.

Returns:

The columns. If the predicate contains a subquery, all columns of that query are included.

Return type:

set[ColumnReference]

static create(operation: CompoundOperator, parts: Sequence[AbstractPredicate]) AbstractPredicate#

Creates an arbitrary compound predicate for a number of child predicates.

If just a single child predicate is provided, but the operation requires multiple children, that child is returned directly instead of the compound predicate.

Parameters:
  • operation (LogicalSqlCompoundOperators) – The logical operator to combine the child predicates.

  • parts (Sequence[AbstractPredicate]) – The child predicates

Returns:

A composite predicate of the given parts, if parts contains the appropriate number of items. Otherwise the supplied child predicate.

Return type:

AbstractPredicate

Raises:

ValueError – If a negation predicate should be created but a number child predicates unequal to one are supplied. Likewise, if a conjunction or disjunction is requested, but no child predicates are supplied.

static create_and(parts: Collection[AbstractPredicate]) AbstractPredicate#

Creates an AND predicate, combining a number of child predicates.

If just a single child predicate is provided, that child is returned directly instead of wrapping it in an AND predicate.

Parameters:

parts (Collection[AbstractPredicate]) – The children that should be combined

Returns:

A conjunctive predicate of the given parts, if parts contains at least two items. Otherwise the only passed predicate is returned.

Return type:

AbstractPredicate

Raises:

ValueError – If parts is empty

static create_not(predicate: AbstractPredicate) CompoundPredicate#

Builds a NOT predicate, wrapping a specific child predicate.

Parameters:

predicate (AbstractPredicate) – The predicate that should be negated

Returns:

The negated predicate. No logic checks or simplifications are performed. For example, it is possible to negate a negation and this will still be represented in the predicate hierarchy.

Return type:

CompoundPredicate

static create_or(parts: Collection[AbstractPredicate]) AbstractPredicate#

Creates an OR predicate, combining a number of child predicates.

If just a single child predicate is provided, that child is returned directly instead of wrapping it in an OR predicate.

Parameters:

parts (Collection[AbstractPredicate]) – The children that should be combined

Returns:

A disjunctive predicate of the given parts, if parts contains at least two items. Otherwise the only passed predicate is returned.

Return type:

AbstractPredicate

Raises:

ValueError – If parts is empty

is_compound() bool#

Checks, whether this predicate combines the evaluation of other predicates to compute the overall evaluation.

Operators to combine such predicates can be standard logical operators like conjunction, disjunction and negation. This method serves as a high-level check, preventing the usage of dedicated isinstance calls in some use-cases.

Returns:

Whethter this predicate is a composite of other predicates.

Return type:

bool

is_join() bool#

Checks, whether this predicate encodes a join between two tables.

PostBOUND uses the following criteria to determine, whether a predicate is join or not:

  1. all predicates of the form <col 1> <operator> <col 2> where <col 1> and <col 2> come from different tables are joins. The columns can optionally be modified by value casts or static expressions, e.g. R.a::integer + 7

  2. all functions that access columns from multiple tables are joins, e.g. my_udf(R.a, S.b)

  3. all subqueries are treated as filters, no matter whether they are dependent subqueries or not. This means that both R.a = (SELECT MAX(S.b) FROM S) and R.a = (SELECT MAX(S.b) FROM S WHERE R.c = S.d) are treated as filters and not as joins, even though the second subquery will require some sort of the join in the query plan.

  4. BETWEEN and IN predicates are treated according to rule 1 since they can be emulated via base predicates (subqueries in IN predicates are evaluated according to rule 3.)

Although these rules might seem a bit arbitrary at first, there is actually no clear consensus of what constitutes a join and the query optimizers of different industrial database systems treat different predicates as joins. For example, some systems might not recognize function calls that access columns form two or more tables as joins or do not recognize predicates that use non-equi joins as opertors as actual joins.

If the specific join and filter recognition procedure breaks a specific use-case, subclasses of the predicate classes can be implemented. These subclasses can then apply the required rules. Using the tools in the transformation module, the queries can be updated. For some use-cases it can also be sufficient to change the join/filter recognition rules of the QueryPredicates objects. Consult its documentation for more details.

Lastly, notice that the distinction between join and filter is not entirely binary. There may also be a third class of predicates, potentially called “post-join filters”. These are filters that are applied after a join but cannot be included in the join predicate itself. This is usually the case due to limitations in the operator implementation of the actual database system. For example, invocations of user defined functions (case 2 above) usually fall in this category. Since the query abstraction layer is agnostic to specific details of database systems, we apply the binary categorization outlined above.

Returns:

Whether the predicate is a join of different relations.

Return type:

bool

is_negation() bool#

Checks whether this is a NOT predicate.

Return type:

bool

iterchildren() Sequence[AbstractPredicate]#

Provides all children contained in this predicate.

In contrast to the children property, this method always returns an iterable, even for NOT predicates. In the latter case the iterable contains just a single item.

Returns:

The children

Return type:

Sequence[AbstractPredicate]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this predicate.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all expressions that are directly contained in this predicate.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

join_partners() set[tuple[ColumnReference, ColumnReference]]#

Provides all pairs of columns that are joined within this predicate.

If multiple columns are joined or it is unclear which columns are involved in a join exactly, this method falls back to returning the cross-product of all potential join partners. For example, consider the following query: SELECT * FROM R, S WHERE my_udf(R.a, R.b, S.c). In this case, it cannot be determined which columns of R take part in the join. Therefore, join_partners will return the set {(R.a, S.c), (R.b, S.c)}.

Returns:

The pairs of joined columns. These pairs are normalized, such that two predicates which join the same columns provide the join partners in the same order.

Return type:

set[tuple[ColumnReference, ColumnReference]]

Raises:

NoJoinPredicateError

property operation: CompoundOperator#

Get the operation used to combine the individual evaluations of the child predicates.

Returns:

The operation

Return type:

LogicalSqlCompoundOperators

class postbound.qal.DirectTableSource(table: TableReference)#

Models a plain table that is directly referenced in a FROM clause, e.g. R in SELECT * FROM R, S.

Parameters:

table (TableReference) – The table that is sourced

columns() set[ColumnReference]#

Provides all column sthat are referenced in the source.

For plain table sources this will be empty, but for more complicate structures such as subquery source, this will include all columns that are referenced in the subquery.

Returns:

The columns

Return type:

set[ColumnReference]

identifier() str#

Get the table’s identifier.

Return type:

str

itercolumns() Iterable[ColumnReference]#

Provides access to all column in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this source, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

predicates() QueryPredicates | None#

Provides all predicates that are contained in the source.

For plain table sources this will be None, but for subquery sources, etc. all predicates are returned.

Returns:

The predicates or None if the source does not allow predicates or simply does not contain any.

Return type:

QueryPredicates | None

property table: TableReference#

Get the table that is sourced.

This can be a virtual table (e.g. for CTEs), but will most commonly be an actual table.

Returns:

The table.

Return type:

TableReference

tables() set[TableReference]#

Provides all tables that are referenced in the source.

For plain table sources this will just be the actual table. For more complicated structures, such as subquery sources, this will include all tables of the subquery as well.

Returns:

The tables.

Return type:

set[TableReference]

class postbound.qal.ExceptClause(left_query: SqlQuery | SetQuery, right_query: SqlQuery | SetQuery)#

The EXCEPT clause of a query.

Parameters:
  • left_query (SelectStatement) – The left query that is part of the EXCEPT operation. This is the result set from which tuples are removed.

  • right_query (SelectStatement) – The right query that is part of the EXCEPT operation. This is the result set of the tuples that should be removed.

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

property left_query: SqlQuery | SetQuery#

Get the left query that is part of the EXCEPT operation.

The left query provides the result set from which tuples are removed.

Returns:

The left query.

Return type:

SelectStatement

property right_query: SqlQuery | SetQuery#

Get the right query that is part of the EXCEPT operation.

The right query provides the result set of the tuples that should be removed.

Returns:

The right query.

Return type:

SelectStatement

tables() set[TableReference]#

Provides all tables that are referenced in the clause.

Returns:

All tables. This includes virtual tables if such tables are present in the clause

Return type:

set[TableReference]

class postbound.qal.Explain(analyze: bool = False, target_format: str | None = None)#

EXPLAIN block of a query.

EXPLAIN queries change the execution mode of a query. Instead of focusing on the actual query result, an EXPLAIN query produces information about the internal processes of the database system. Typically, this includes which execution plan the DBS would choose for the query. Additionally, EXPLAIN ANALYZE (as for example supported by Postgres) provides the query plan and executes the actual query. The returned plan is then annotated by how the optimizer predictions match reality. Furthermore, such ANALYZE plans typically also contain some runtime statistics such as runtime of certain operators.

Notice that there is no EXPLAIN keyword in the SQL standard, but all major database systems provide this functionality. Nevertheless, the precise syntax and semantic of an EXPLAIN statement depends on the actual DBS. The Explain clause object is modeled after Postgres and needs to adapted accordingly for different systems (see db.HintService). Especially the EXPLAIN ANALYZE variant is not supported by all systems.

Parameters:
  • analyze (bool, optional) – Whether the query should not only be executed as an EXPLAIN query, but rather as an EXPLAIN ANALYZE query. Defaults to False which runs the query as a pure EXPLAIN query.

  • target_format (Optional[str], optional) – The desired output format of the query plan, if this is supported by the database system. Defaults to None which normally forces the default output format.

See also

postbound.db.db.HintService.format_query

References

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

property analyze: bool#

Check, whether the query should be executed as EXPLAIN ANALYZE rather than just plain EXPLAIN.

Usually, EXPLAIN ANALYZE executes the query and gathers extensive runtime statistics (e.g. comparing estimated vs. true cardinalities for intermediate nodes).

Returns:

Whether ANALYZE mode is enabled

Return type:

bool

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

static explain_analyze(format_type: str = 'JSON') Explain#

Constructs an EXPLAIN ANALYZE clause with the specified output format.

Parameters:

format_type (str, optional) – The output format, by default JSON

Returns:

The explain clause

Return type:

Explain

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

static plan(format_type: str = 'JSON') Explain#

Constructs a pure EXPLAIN clause (i.e. without ANALYZE) with the specified output format.

Parameters:

format_type (str, optional) – The output format, by default JSON

Returns:

The explain clause

Return type:

Explain

property target_format: str | None#

Get the target format in which the EXPLAIN plan should be provided.

Returns:

The output format, or None if this is not specified. This is never an empty string.

Return type:

Optional[str]

class postbound.qal.ExplicitFromClause(joins: JoinTableSource | Iterable[JoinTableSource])#

Represents a special kind of FROM clause that requires all tables to be joined using the JOIN ON syntax.

Parameters:

joins (JoinTableSource | Iterable[JoinTableSource]) – The joins that should be performed

base_table() TableReference#

Get the table that is farthest to the left in the join chain.

For subqueries or VALUES clauses, this will return the alias of the expression, i.e. the name of the virtual table that is created for the subquery or VALUES clause.

Returns:

The table

Return type:

TableReference

iterpredicates() Iterable[AbstractPredicate]#

Provides all join conditions that are contained in the FROM clause.

Returns:

The join conditions.

Return type:

Iterable[AbstractPredicate]

property root: JoinTableSource#

Get the root join of the FROM clause.

Returns:

The root join

Return type:

JoinTableSource

class postbound.qal.ExplicitSqlQuery(*, select_clause: Select, from_clause: ExplicitFromClause | None = None, where_clause: Where | None = None, groupby_clause: GroupBy | None = None, having_clause: Having | None = None, orderby_clause: OrderBy | None = None, limit_clause: Limit | None = None, cte_clause: CommonTableExpression | None = None, explain_clause: Explain | None = None, hints: Hint | None = None)#

An explicit query restricts the constructs that may appear in the FROM clause.

For explicit queries, the FROM clause must utilize the JOIN ON syntax for all tables. The join conditions should be put into the ON blocks. Notice however, that PostBOUND does not perform any sanity checks here. Therefore, it is possible to put mix joins and filters in the ON blocks, move all joins to the WHERE clause or scatter the join conditions between the two clauses. Whether this is good style is up for debate, but at least PostBOUND does allow it. In contrast to the implicit query, subqueries are also allowed as table sources.

Notice that each explicit query must join at least two tables in its FROM clause.

The attributes and parameters for this query type are the same as for SqlQuery, only the type of the From clause is restricted.

Examples

The following queries are considered as explicit queries:

SELECT *
FROM R
    JOIN S ON R.a = S.b
    JOIN T ON S.b = T.c
WHERE R.a < 42
SELECT *
FROM R
    JOIN S ON R.a = S.b AND R.a = (SELECT MIN(R.a) FROM R)
    JOIN T ON S.b = T.c
Parameters:
property from_clause: ExplicitFromClause | None#

Get the FROM clause of the query.

Returns:

The FROM clause if it was specified, or None otherwise.

Return type:

Optional[From]

is_explicit() bool#

Checks, whether this query has an explicit FROM clause.

The explicit FROM clause exclusively makes use of the JOIN ON syntax to denote both the tables that should be scanned, and the predicates that should be used to join the tables together. Therefore, the WHERE clause should only consist of filter predicates on the base tables. However, this is not enforced and the contents of the ON conditions as well as the WHERE clause can be arbitrary predicates.

Returns:

Whether the query is explicit

Return type:

bool

See also

ExplicitSqlQuery

is_implicit() bool#

Checks, whether this query has an implicit FROM clause.

The implicit FROM clause only consists of the source tables that should be scanned for the query. No subqueries or joins are contained in the clause. All join predicates must be part of the WHERE clause.

Returns:

Whether the query is implicit

Return type:

bool

See also

ImplicitSqlQuery

class postbound.qal.ExpressionCollector(matcher: Callable[[SqlExpression], bool], *, continue_after_match: bool = False)#

Utility to traverse an arbitrarily deep expression hierarchy in order to collect specific expressions.

Parameters:
  • matcher (Callable[[SqlExpression], bool]) – Function to determine whether a specific expression matches the collection predicate. Should return True for matches and False otherwise.

  • continue_after_match (bool, optional) – Whether the traversal of the current expression element should be continued if the current element matches the collection predicate. By default, traversal is stopped for the current element (but other branches in the expression tree could still produce more matches).

class postbound.qal.From(items: TableSource | Iterable[TableSource])#

The FROM clause models which tables should be selected and potentially how they are combined.

A FROM clause permits arbitrary source items and does not enforce a specific structure or semantic on them. This puts the user in charge to generate a valid and meaningful structure. For example, the model allows for the first item to be a JoinTableSource, even though this is not valid SQL. Likewise, no duplicate checks are performed.

To represent FROM clauses with a bit more structure, the ImplicitFromClause and ExplicitFromClause subclasses exist and should generally be preffered over direct usage of the raw From clause class.

Parameters:

items (TableSource | Iterable[TableSource]) – The tables that should be sourced in the FROM clause

Raises:

ValueError – If no items are specified

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

static create_for(items: TableReference | Iterable[TableReference]) ImplicitFromClause#

Shorthand method to create a FROM clause for a set of table references.

Parameters:

items (TableReference | Iterable[TableReference])

Return type:

ImplicitFromClause

property items: Sequence[TableType]#

Get the tables that are sourced in the FROM clause

Returns:

The sources in exactly the sequence in which they were specified

Return type:

Sequence[TableSource]

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

tables() set[TableReference]#

Provides all tables that are referenced in the clause.

Returns:

All tables. This includes virtual tables if such tables are present in the clause

Return type:

set[TableReference]

class postbound.qal.FunctionExpression(function: str, arguments: Sequence[SqlExpression] | None = None, *, distinct: bool = False, filter_where: AbstractPredicate | None = None)#

The function expression indicates a call to an arbitrary function.

The actual function might be one of the standard SQL functions, an aggregation function or a user-defined one. PostBOUND treats them all the same and it is up to the user to differentiate e.g. between UDFs and aggregations if this distinction is important. This can be easily achieved by introducing additional subclasses of the function expression and updating the queries to use the new function expressions where appropriate. The transform module provides utilities to make such updates easy.

Parameters:
  • function (str) – The name of the function that should be called. Cannot be empty.

  • arguments (Optional[Sequence[SqlExpression]], optional) – The parameters that should be passed to the function. Can be None if the function does not take or does not need any arguments (e.g. CURRENT_TIME())

  • distinct (bool, optional) – Whether the (aggregation) function should only operate on distinct column values and hence a duplicate elimination needs to be performed before passing the argument values (e.g. COUNT(DISTINCT *)). Defaults to False

  • filter_where (Optional[AbstractPredicate], optional) – An optional filter expression that restricts the values included in an aggregation function.

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

property arguments: Sequence[SqlExpression]#

Get all arguments that are supplied to the function.

Returns:

The arguments. Can be empty if no arguments are passed (but will never be None).

Return type:

Sequence[SqlExpression]

columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

static create_count(column: ~postbound._core.ColumnReference | ~postbound.qal._qal.SqlExpression | ~collections.abc.Iterable[~postbound.qal._qal.SqlExpression] = *, *, distinct: bool = False) FunctionExpression#

Shortcut method to create a new COUNT() expression over (one or multiple) columns.

If no column is given, a COUNT(*) is created.

Parameters:
Return type:

FunctionExpression

static create_max(column: ColumnReference | SqlExpression | Iterable[SqlExpression]) FunctionExpression#

Shortcut method to create a new MAX() expression over (one or multiple) columns.

Parameters:

column (ColumnReference | SqlExpression | Iterable[SqlExpression])

Return type:

FunctionExpression

static create_min(column: ColumnReference | SqlExpression | Iterable[SqlExpression]) FunctionExpression#

Shortcut method to create a new MIN() expression over (one or multiple) columns.

Parameters:

column (ColumnReference | SqlExpression | Iterable[SqlExpression])

Return type:

FunctionExpression

static create_sum(column: ColumnReference | SqlExpression | Iterable[SqlExpression]) FunctionExpression#

Shortcut method to create a new SUM() expression over (one or multiple) columns.

Parameters:

column (ColumnReference | SqlExpression | Iterable[SqlExpression])

Return type:

FunctionExpression

property distinct: bool#

Get whether the function should only operate on distinct values.

Whether this makes any sense for the function at hand is entirely dependend on the specific function and not enfored by PostBOUND. The runtime DBS has to check this.

Generally speaking, this argument is intended for aggregation functions.

Returns:

Whether a duplicate elimination has to be performed on the function arguments

Return type:

bool

property filter_where: AbstractPredicate | None#

Get the filter expression for an aggregate function.

Filters restrict the values that are actually included in the aggregate.

Returns:

The filter expression or None if no filter is applied (or the function is not an aggregate).

Return type:

Optional[AbstractPredicate]

property function: str#

Get the function name.

Returns:

The function name. Will never be empty

Return type:

str

is_aggregate() bool#

Checks, whether the function is a well-known SQL aggregation function.

Only standard functions are considered (e.g. no CORR for computing correlations).

Returns:

Whether the function is a known aggregate function.

Return type:

bool

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

class postbound.qal.FunctionTableSource(function: FunctionExpression, *, alias: str | TableReference = '')#

Models a table that is constructed from a function call, e.g. as in SELECT * FROM my_udf(42).

Parameters:
  • function (FunctionExpression) – The function that computes the temporary relation.

  • alias (str | TableReference, optional) – The name under which the virtual table can be accessed in the actual query. If this is empty, an anonymous table is created.

alias() str#

Get the name under which the virtual table can be accessed in the actual query.

Return type:

str

columns() set[ColumnReference]#

Provides all column sthat are referenced in the source.

For plain table sources this will be empty, but for more complicate structures such as subquery source, this will include all columns that are referenced in the subquery.

Returns:

The columns

Return type:

set[ColumnReference]

property function: FunctionExpression#

Get the function that computes the temporary relation.

itercolumns() Iterable[ColumnReference]#

Provides access to all column in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this source, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

predicates() QueryPredicates | None#

Provides all predicates that are contained in the source.

For plain table sources this will be None, but for subquery sources, etc. all predicates are returned.

Returns:

The predicates or None if the source does not allow predicates or simply does not contain any.

Return type:

QueryPredicates | None

tables() set[TableReference]#

Provides all tables that are referenced in the source.

For plain table sources this will just be the actual table. For more complicated structures, such as subquery sources, this will include all tables of the subquery as well.

Returns:

The tables.

Return type:

set[TableReference]

property target_name: str#

Get the name under which the virtual table can be accessed in the actual query.

For anonymous tables, this is an empty string.

property target_table: TableReference | None#

Get the virtual table that contains the tuples.

This will always be a virtual table, or None for anonymous tables.

This property differs from target_name only by its return type.

class postbound.qal.GroupBy(group_columns: Sequence[SqlExpression], distinct: bool = False)#

The GROUP BY clause combines rows that match a grouping criterion to enable aggregation on these groups.

Despite their names, all grouped columns can be arbitrary SqlExpression instances, rules and restrictions of the SQL standard are not enforced by PostBOUND.

Parameters:
  • group_columns (Sequence[SqlExpression]) – The expressions that should be used to perform the grouping

  • distinct (bool, optional) – Whether the grouping should perform duplicate elimination, by default False

Raises:

ValueError – If group_columns is empty.

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

static create_for(column: ColumnReference | Iterable[ColumnReference], *more_cols) GroupBy#

Shortcut method to create a GROUP BY clause from column references.

Parameters:

column (ColumnReference | Iterable[ColumnReference])

Return type:

GroupBy

property distinct: bool#

Get whether the grouping should eliminate duplicates.

Returns:

Whether duplicate removal is performed.

Return type:

bool

property group_columns: Sequence[SqlExpression]#

Get all expressions that should be used to determine the grouping.

Returns:

The grouping expressions in exactly the sequence in which they were specified.

Return type:

Sequence[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

class postbound.qal.Having(condition: AbstractPredicate)#

The HAVING clause enables filtering of the groups constructed by a GROUP BY clause.

All conditions are collected in a (potentially conjunctive or disjunctive) predicate object. See AbstractPredicate for details.

The structure of this clause is similar to the Where clause, but its scope is different (even though PostBOUND does no semantic validation to enforce this): predicates of the HAVING clause are only checked on entire groups of values and have to be valid their, instead of on individual tuples.

Parameters:

condition (AbstractPredicate) – The root predicate that contains all actual conditions

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

property condition: AbstractPredicate#

Get the root predicate that is used to form the HAVING clause.

Returns:

The condition

Return type:

AbstractPredicate

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

class postbound.qal.Hint(preparatory_statements: str = '', query_hints: str = '')#

Hint block of a clause.

Depending on the SQL dialect, these hints will be placed at different points in the query. Furthermore, the precise contents (i.e. syntax and semantic) vary from database system to system.

Hints are differentiated in two parts:

  • preparatory statements can be executed as valid commands on the database system, e.g. optimizer settings, etc.

  • query hints are the actual hints. Typically, these will be inserted as comments at some place in the query.

These two parts are set as parameters in the __init__ method and are available as read-only properties afterwards.

Parameters:
  • preparatory_statements (str, optional) – Statements that configure the optimizer and have to be run before the actual query is executed. Such settings often configure the optimizer for the entire session and can thus influence other queries as well. Defaults to an empty string, which indicates that there are no such settings.

  • query_hints (str, optional) – Hints that configure the optimizer, often for an individual join. These hints are executed as part of the actual query.

Examples

A hint clause for MySQL could look like this:

This enforces the join between tables R and S to be executed as a hash join (due to the query hint) and disables usage of the block nested-loop join for the entire query (which in this case only affects the join between tables S and T) due to the preparatory SET optimizer_switch statement.

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

property preparatory_statements: str#

Get the string of preparatory statements. Can be empty.

Returns:

The preparatory statements. If these are multiple statements, they are concatenated into a single string with appropriate separator characters between them.

Return type:

str

property query_hints: str#

Get the query hint text. Can be empty.

Returns:

The hints. The string has to be understood as-is by the target database system. If multiple hints are used, they have to be concatenated into a single string with appropriate separator characters between them. Correspondingly, if the hint blocks requires a specific prefix/suffix (e.g. comment syntax), this has to be part of the string as well.

Return type:

str

class postbound.qal.ImplicitFromClause(tables: DirectTableSource | Iterable[DirectTableSource])#

Represents a special case of FROM clause that only allows for pure tables to be selected.

Specifically, this means that subqueries or explicit joins using the JOIN ON syntax are not allowed. Just plain old SELECT ... FROM R, S, T WHERE ... queries.

As a special case, all FROM clauses that consist of a single (non-subquery) table can be represented as implicit clauses.

Parameters:

tables (DirectTableSource | Iterable[DirectTableSource]) – The tables that should be selected

static create_for(tables: TableReference | Iterable[TableReference]) ImplicitFromClause#

Shorthand method to create a FROM clause for a set of table references.

This saves the user from creating the DirectTableSource instances before instantiating a implicit FROM clause.

Parameters:

tables (TableReference | Iterable[TableReference]) – The tables that should be sourced

Returns:

The FROM clause

Return type:

ImplicitFromClause

itertables() Sequence[TableReference]#

Provides all tables in the FROM clause exactly in the sequence in which they were specified.

This utility saves the user from unwrapping all the DirectTableSource objects by herself.

Returns:

The tables.

Return type:

Sequence[TableReference]

class postbound.qal.ImplicitSqlQuery(*, select_clause: Select, from_clause: ImplicitFromClause | None = None, where_clause: Where | None = None, groupby_clause: GroupBy | None = None, having_clause: Having | None = None, orderby_clause: OrderBy | None = None, limit_clause: Limit | None = None, cte_clause: CommonTableExpression | None = None, explain_clause: Explain | None = None, hints: Hint | None = None)#

An implicit query restricts the constructs that may appear in the FROM clause.

For implicit queries, the FROM clause may only consist of simple table sources. All join conditions have to be put in the WHERE clause. Notice that this does not restrict the structure of other clauses. For example, the WHERE clause can still contain subqueries. As a special case, queries without a FROM clause are also considered implicit.

The attributes and parameters for this query type are the same as for SqlQuery, only the type of the From clause is restricted.

Examples

The following queries are considered as implicit queries:

SELECT *
FROM R, S, T
WHERE R.a = S.b
    AND S.b = T.c
    AND R.a < 42
SELECT *
FROM R, S, T
WHERE R.a = S.b
    AND S.b = T.c
    AND R.a = (SELECT MIN(R.a) FROM R)
Parameters:
property from_clause: ImplicitFromClause | None#

Get the FROM clause of the query.

Returns:

The FROM clause if it was specified, or None otherwise.

Return type:

Optional[From]

is_explicit() bool#

Checks, whether this query has an explicit FROM clause.

The explicit FROM clause exclusively makes use of the JOIN ON syntax to denote both the tables that should be scanned, and the predicates that should be used to join the tables together. Therefore, the WHERE clause should only consist of filter predicates on the base tables. However, this is not enforced and the contents of the ON conditions as well as the WHERE clause can be arbitrary predicates.

Returns:

Whether the query is explicit

Return type:

bool

See also

ExplicitSqlQuery

is_implicit() bool#

Checks, whether this query has an implicit FROM clause.

The implicit FROM clause only consists of the source tables that should be scanned for the query. No subqueries or joins are contained in the clause. All join predicates must be part of the WHERE clause.

Returns:

Whether the query is implicit

Return type:

bool

See also

ImplicitSqlQuery

class postbound.qal.InPredicate(column: SqlExpression, values: Sequence[SqlExpression])#

An IN predicate lists the allowed values for a column.

In most cases, such a predicate with n allowed values can be transformed into a disjunction of n equality predicates, i.e. R.a IN (1, 2, 3) is equivalent to R.a = 1 OR R.a = 2 OR R.a = 3. Depending on the allowed values, an IN predicate can denote a join, e.g. R.a IN (S.b, S.c). An important special case arises if the allowed values are produced by a subquery, e.g. R.a IN (SELECT S.b FROM S). This does no longer allow for a transformation into binary predicates since it is unclear how many rows the subquery will produce.

Parameters:
  • column (SqlExpression) – The value that is checked by the predicate

  • values (Sequence[SqlExpression]) – The allowed column values. The individual expressions are not limited to StaticValueExpression instances, but can also include subqueries, columns or complicated mathematical expressions.

Raises:

ValueError – If values is empty.

Notes

Some systems even allow the column part to be an arbitrary expression. For example, in Postgres this is a valid query:

SELECT *
FROM R JOIN S ON R.a = S.b
WHERE 42 IN (R.c, 40 * S.d)
accept_visitor(visitor: PredicateVisitor[VisitorResult] | SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current predicate by a predicate visitor.

Parameters:
  • visitor (PredicateVisitor[VisitorResult]) – The visitor

  • args – Additional arguments to pass to the visitor

  • kwargs – Additional keyword arguments to pass to the visitor

Return type:

VisitorResult

property column: SqlExpression#

Get the expression that is tested (R.a in SELECT * FROM R WHERE R.a IN (1, 2, 3)).

Returns:

The expression

Return type:

SqlExpression

columns() set[ColumnReference]#

Provides all columns that are referenced by this predicate.

Returns:

The columns. If the predicate contains a subquery, all columns of that query are included.

Return type:

set[ColumnReference]

is_join() bool#

Checks, whether this predicate encodes a join between two tables.

PostBOUND uses the following criteria to determine, whether a predicate is join or not:

  1. all predicates of the form <col 1> <operator> <col 2> where <col 1> and <col 2> come from different tables are joins. The columns can optionally be modified by value casts or static expressions, e.g. R.a::integer + 7

  2. all functions that access columns from multiple tables are joins, e.g. my_udf(R.a, S.b)

  3. all subqueries are treated as filters, no matter whether they are dependent subqueries or not. This means that both R.a = (SELECT MAX(S.b) FROM S) and R.a = (SELECT MAX(S.b) FROM S WHERE R.c = S.d) are treated as filters and not as joins, even though the second subquery will require some sort of the join in the query plan.

  4. BETWEEN and IN predicates are treated according to rule 1 since they can be emulated via base predicates (subqueries in IN predicates are evaluated according to rule 3.)

Although these rules might seem a bit arbitrary at first, there is actually no clear consensus of what constitutes a join and the query optimizers of different industrial database systems treat different predicates as joins. For example, some systems might not recognize function calls that access columns form two or more tables as joins or do not recognize predicates that use non-equi joins as opertors as actual joins.

If the specific join and filter recognition procedure breaks a specific use-case, subclasses of the predicate classes can be implemented. These subclasses can then apply the required rules. Using the tools in the transformation module, the queries can be updated. For some use-cases it can also be sufficient to change the join/filter recognition rules of the QueryPredicates objects. Consult its documentation for more details.

Lastly, notice that the distinction between join and filter is not entirely binary. There may also be a third class of predicates, potentially called “post-join filters”. These are filters that are applied after a join but cannot be included in the join predicate itself. This is usually the case due to limitations in the operator implementation of the actual database system. For example, invocations of user defined functions (case 2 above) usually fall in this category. Since the query abstraction layer is agnostic to specific details of database systems, we apply the binary categorization outlined above.

Returns:

Whether the predicate is a join of different relations.

Return type:

bool

is_subquery_predicate() bool#

Checks, if this is a subquery-based IN predicate, i.e. a predicate of the form R.a IN (SELECT S.b FROM S).

Returns:

Whether this predicate is based on a subquery

Return type:

bool

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this predicate.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all expressions that are directly contained in this predicate.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

join_partners() set[tuple[ColumnReference, ColumnReference]]#

Provides all pairs of columns that are joined within this predicate.

If multiple columns are joined or it is unclear which columns are involved in a join exactly, this method falls back to returning the cross-product of all potential join partners. For example, consider the following query: SELECT * FROM R, S WHERE my_udf(R.a, R.b, S.c). In this case, it cannot be determined which columns of R take part in the join. Therefore, join_partners will return the set {(R.a, S.c), (R.b, S.c)}.

Returns:

The pairs of joined columns. These pairs are normalized, such that two predicates which join the same columns provide the join partners in the same order.

Return type:

set[tuple[ColumnReference, ColumnReference]]

Raises:

NoJoinPredicateError

static subquery(column: SqlExpression, subquery: SubqueryExpression | SqlQuery) InPredicate#

Generates an IN predicate that is based on a subquery.

Such a predicate is of the form R.a IN (SELECT S.b FROM S).

Parameters:
  • column (SqlExpression) – The column that should be checked for being contained by the subquerie’s result set.

  • subquery (SubqueryExpression | SqlQuery) – The subquery to produce the allowed values.

Returns:

The predicate

Return type:

InPredicate

property values: Sequence[SqlExpression]#

Get the allowed values of the tested expression.

Returns:

The allowed values. This sequence always contains at least one entry.

Return type:

Sequence[SqlExpression]

class postbound.qal.IntersectClause(left_query: SqlQuery | SetQuery, right_query: SqlQuery | SetQuery)#

The INTERSECT clause of a query.

Parameters:
  • left_query (SelectStatement) – The left query that is part of the INTERSECT operation. Since set intersection is commutative, the assignment of left and right does not really matter.

  • right_query (SelectStatement) – The right query that is part of the INTERSECT. Since set intersection is commutative, the assignment of left and right does not really matter.

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

input_queries() set[SqlQuery | SetQuery]#

Get the two input queries that are part of the INTERSECT operation.

Returns:

The left and right queries. Since set intersection is commutative, the assignment of left and right does not really matter.

Return type:

set[SelectStatement]

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

property left_query: SqlQuery | SetQuery#

Get the left query that is part of the INTERSECT operation.

Returns:

The left query. Since set intersection is commutative, the assignment of left and right does not really matter.

Return type:

SelectStatement

See also

input_queries

property right_query: SqlQuery | SetQuery#

Get the right query that is part of the INTERSECT operation.

Returns:

The right query. Since set intersection is commutative, the assignment of left and right does not really matter.

Return type:

SelectStatement

See also

input_queries

tables() set[TableReference]#

Provides all tables that are referenced in the clause.

Returns:

All tables. This includes virtual tables if such tables are present in the clause

Return type:

set[TableReference]

class postbound.qal.JoinTableSource(left: TableSource, right: TableSource, *, join_condition: AbstractPredicate | None = None, join_type: JoinType = JOIN)#

Models a table that is referenced in a FROM clause using the explicit JOIN syntax.

Such a table source consists of three parts:

  1. the left-hand side table source

  2. the right-hand side table source being joined to the left-hand side

  3. an (optional) join condition that specifies how the two tables are joined, if the specific join requires such a predicate

Parameters:
  • left (TableSource) – The left-hand side of the join

  • right (TableSource) – The right-hand side of the join

  • join_condition (Optional[AbstractPredicate], optional) – The predicate that is used to join the specified table with the other tables of the FROM clause. For most joins this is a required argument in order to create a valid SQL query (e.g. LEFT JOIN or INNER JOIN), but there are some joins without a condition (e.g. CROSS JOIN and NATURAL JOIN).

  • join_type (JoinType, optional) – The specific join that should be performed. Defaults to JoinType.InnerJoin.

base_table() TableReference#

Provide the table that is farthest to the left in the join chain.

For subqueries or VALUES clauses, this will return the alias of the expression, i.e. the name of the virtual table that is created for the subquery or VALUES clause.

Returns:

The table

Return type:

TableReference

columns() set[ColumnReference]#

Provides all column sthat are referenced in the source.

For plain table sources this will be empty, but for more complicate structures such as subquery source, this will include all columns that are referenced in the subquery.

Returns:

The columns

Return type:

set[ColumnReference]

itercolumns() Iterable[ColumnReference]#

Provides access to all column in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this source, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

property join_condition: AbstractPredicate | None#

Get the predicate that is used to determine matching tuples from the table.

This can be None if the specific join_type does not require or allow a join condition (e.g. NATURAL JOIN).

Returns:

The condition if it is specified, None otherwise.

Return type:

Optional[AbstractPredicate]

property join_type: JoinType#

Get the kind of join that should be performed.

Returns:

The join type

Return type:

JoinType

property joined_table: JoinTableSource#

Get the nested join statements contained in this join.

Deprecated since version 0.10.0: Use right instead. This is an artifact of the old mosql-based query representation

A nested join is a JOIN statement within a JOIN statement, as in SELECT * FROM R JOIN (S JOIN T ON a = b) ON a = c.

Returns:

The nested joins, can be empty if there are no such joins.

Return type:

JoinTableSource

property left: TableSource#

Get the left-hand side of the join.

Returns:

The join partner. Can be anything from a plain base table, to a subquery, to another join.

Return type:

TableSource

predicates() QueryPredicates | None#

Provides all predicates that are contained in the source.

For plain table sources this will be None, but for subquery sources, etc. all predicates are returned.

Returns:

The predicates or None if the source does not allow predicates or simply does not contain any.

Return type:

QueryPredicates | None

property right: TableSource#

Get the right-hand side of the join.

Returns:

The join partner. Can be anything from a plain base table, to a subquery, to another join.

Return type:

TableSource

property source: TableSource#

Get the actual table being joined. This can be a proper table or a subquery.

Deprecated since version 0.10.0: Use left instead. This is an artifact of the old mosql-based query representation

Returns:

The table

Return type:

TableSource

tables() set[TableReference]#

Provides all tables that are referenced in the source.

For plain table sources this will just be the actual table. For more complicated structures, such as subquery sources, this will include all tables of the subquery as well.

Returns:

The tables.

Return type:

set[TableReference]

class postbound.qal.JoinType(*values)#

Indicates the type of a join using the explicit JOIN syntax, e.g. OUTER JOIN or NATURAL JOIN.

The names of the individual values should be pretty self-explanatory and correspond entirely to the names in the SQL standard.

class postbound.qal.Limit(*, limit: int | None = None, offset: int | None = None, fetch_direction: Literal['first', 'next', 'prior', 'last'] = 'first')#

The FETCH FIRST or LIMIT clause restricts the number of output rows returned by the database system.

Each clause can specify an offset (which is probably only meaningful if there is also an ORDER BY clause) and the actual limit. Notice that although many database systems use a non-standard syntax for this clause, our implementation is modelled after the actual SQL standard version (i.e. it produces a FETCH … string output).

Parameters:
  • limit (Optional[int], optional) – The maximum number of tuples to put in the result set. Defaults to None which indicates that all tuples should be returned.

  • offset (Optional[int], optional) – The number of tuples that should be skipped from the beginning of the result set. If no OrderBy clause is defined, this makes the result set’s contents non-deterministic (at least in theory). Defaults to None which indicates that no tuples should be skipped.

  • fetch_direction (FetchDirection)

Raises:

ValueError – If neither a limit, nor an offset are specified

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

property fetch_direction: Literal['first', 'next', 'prior', 'last']#

Get the direction of the limit (e.g. first or prior).

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

property limit: int | None#

Get the maximum number of rows in the result set.

Returns:

The limit or None, if all rows should be returned.

Return type:

Optional[int]

property offset: int | None#

Get the offset within the result set (i.e. number of first rows to skip).

Returns:

The offset or None if no rows should be skipped.

Return type:

Optional[int]

class postbound.qal.LogicalOperator(*values)#

The supported unary and binary operators.

class postbound.qal.MathExpression(operator: MathOperator, first_argument: SqlExpression, second_argument: SqlExpression | Sequence[SqlExpression] | None = None)#

A mathematical expression computes a result value based on some formula.

The formula is based on an arbitrary expression, an operator and potentially a number of additional expressions/arguments.

If it is necessary to represent boolean expressions outside of the WHERE and HAVING clauses, a AbstractPredicate should be used instead of a mathematical expression.

Parameters:
  • operator (MathOperator) – The operator that is used to combine the arguments.

  • first_argument (SqlExpression) – The first argument. For unary expressions, this can also be the only argument

  • second_argument (SqlExpression | Sequence[SqlExpression] | None, optional) –

    Additional arguments. For the most common case of a binary expression, this will be exactly one argument. Defaults to None to accomodate for unary expressions.

    Deprecated since version 0.21.1: Passing a sequence of expressions is deprecated and will be removed in v0.22.0. To encode operations involving more than two operands, a hierarchy of binary expressions should be used instead.

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

static create_binary(lhs: Any, operator: MathOperator | str, rhs: Any) MathExpression#

Shortcut method to create a new math expression.

Lhs and rhs are processed depending on their actual types:

  • SqlExpression instances are left as-is

  • ColumnReference instances are mapped to `ColumnExpression`s

  • Anything else is wrapped in a StaticValueExpression

Parameters:
Return type:

MathExpression

property first_arg: SqlExpression#

Get the first argument to the operator. This is always specified.

Returns:

The argument

Return type:

SqlExpression

is_unary() bool#

Checks, whether the expression is a unary one (e.g. a negation as in -42).

Return type:

bool

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

property operator: MathOperator#

Get the operation to combine the input value(s).

Returns:

The operator

Return type:

MathOperator

property second_arg: SqlExpression | Sequence[SqlExpression] | None#

Get the second argument to the operator.

Depending on the operator, this can be a single expression (the most common case), but also a sequence of expressions (e.g. sum of multiple values) or no value at all (e.g. negation).

Returns:

The argument(s)

Return type:

SqlExpression | Sequence[SqlExpression] | None

tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

class postbound.qal.MathOperator(*values)#

The supported mathematical operators.

class postbound.qal.MixedSqlQuery(*, select_clause: Select, from_clause: From | None = None, where_clause: Where | None = None, groupby_clause: GroupBy | None = None, having_clause: Having | None = None, orderby_clause: OrderBy | None = None, limit_clause: Limit | None = None, cte_clause: CommonTableExpression | None = None, explain_clause: Explain | None = None, hints: Hint | None = None)#

A mixed query allows for both the explicit as well as the implicit syntax to be used within the same FROM clause.

The mixed query complements ImplicitSqlQuery and ExplicitSqlQuery by removing the “purity” restriction: the tables that appear in the FROM clause can be described using either plain references or subqueries and they are free to use the JOIN ON syntax. The only thing that is not allowed as a FROM clause is an instance of ImplicitFromClause or an instance of ExplicitFromClause, since those cases are already covered by their respective query classes.

Notice however, that we currently do not enforce the From clause to not be a valid explicit or implicit clause. All checks happen on a type level. If the contents of a general From clause just happen to also be a valid ImplicitFromClause, this is fine.

The attributes and parameters for this query type are the same as for SqlQuery, only the type of the From clause is restricted.

Raises:

ValueError – If the given from_clause is either an implicit FROM clause or an explicit one.

Parameters:
is_explicit() bool#

Checks, whether this query has an explicit FROM clause.

The explicit FROM clause exclusively makes use of the JOIN ON syntax to denote both the tables that should be scanned, and the predicates that should be used to join the tables together. Therefore, the WHERE clause should only consist of filter predicates on the base tables. However, this is not enforced and the contents of the ON conditions as well as the WHERE clause can be arbitrary predicates.

Returns:

Whether the query is explicit

Return type:

bool

See also

ExplicitSqlQuery

is_implicit() bool#

Checks, whether this query has an implicit FROM clause.

The implicit FROM clause only consists of the source tables that should be scanned for the query. No subqueries or joins are contained in the clause. All join predicates must be part of the WHERE clause.

Returns:

Whether the query is implicit

Return type:

bool

See also

ImplicitSqlQuery

exception postbound.qal.NoFilterPredicateError(predicate: AbstractPredicate | None = None)#

Error to indicate that a join predicate was supplied at a place where a filter predicate was expected.

Parameters:

predicate (AbstractPredicate | None, optional) – The predicate that caused the error, defaults to None.

Return type:

None

exception postbound.qal.NoJoinPredicateError(predicate: AbstractPredicate | None = None)#

Error to indicate that a filter predicate was supplied at a place where a join predicate was expected.

Parameters:

predicate (AbstractPredicate | None, optional) – The predicate that caused the error, defaults to None

Return type:

None

class postbound.qal.OrderBy(expressions: Iterable[OrderByExpression] | OrderByExpression)#

The ORDER BY clause specifies how result rows should be sorted.

This clause has a similar structure like a Select clause and simply consists of an arbitrary number of OrderByExpression objects.

Parameters:

expressions (Iterable[OrderByExpression] | OrderByExpression) – The terms that should be used to determine the ordering. At least one expression is required

Raises:

ValueError – If no expressions are provided

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

static create_for(*columns: ColumnReference | Iterable[ColumnReference], ascending: bool | None = None, nulls_first: bool | None = None) OrderBy#

Shorthand method to create an ORDER BY clause for a set of column references.

Additional ordering parameters are assigned to all columns.

Parameters:
Return type:

OrderBy

property expressions: Sequence[OrderByExpression]#

Get the expressions that form this ORDER BY clause.

Returns:

The individual terms that make up the ordering in exactly the sequence in which they were specified (which is the only valid sequence since all other orders could change the ordering of the result set).

Return type:

Sequence[OrderByExpression]

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

class postbound.qal.OrderByExpression(column: SqlExpression, ascending: bool | None = None, nulls_first: bool | None = None)#

The OrderByExpression is the fundamental building block for an ORDER BY clause.

Each expression consists of the actual column (which might be an arbitrary SqlExpression, rules and restrictions by the SQL standard are not enforced here) as well as information regarding the ordering of the column. Setting this information to None falls back to the default interpretation by the target database system.

Parameters:
  • column (SqlExpression) – The column that should be used for ordering

  • ascending (Optional[bool], optional) – Whether the column values should be sorted in ascending order. Defaults to None, which indicates that the system-default ordering should be used.

  • nulls_first (Optional[bool], optional) – Whether NULL values should be placed at beginning or at the end of the sorted list. Defaults to None, which indicates that the system-default behaviour should be used.

property ascending: bool | None#

Get the desired ordering of the output rows.

Returns:

Whether to sort in ascending order. None indicates that the default behaviour of the system should be used.

Return type:

Optional[bool]

property column: SqlExpression#

Get the expression used to specify the current grouping.

In the simplest case this can just be a ColumnExpression which sorts directly by the column values. More complicated constructs like mathematical expressions over the column values are also possible.

Returns:

The expression

Return type:

SqlExpression

static create_for(column: ColumnReference, *, ascending: bool | None = None, nulls_first: bool | None = None) OrderByExpression#

Shorthand method to create an OrderByExpression for a specific column reference.

Parameters:
  • column (ColumnReference)

  • ascending (bool | None)

  • nulls_first (bool | None)

Return type:

OrderByExpression

property nulls_first: bool | None#

Get where to place NULL values in the result set.

Returns:

Whether to put NULL values at the beginning of the result set (or at the end). None indicates that the default behaviour of the system should be used.

Return type:

Optional[bool]

class postbound.qal.PredicateVisitor#

Basic visitor to operator on arbitrary predicate trees.

As a modification to a strict vanilla interpretation of the design pattern, we provide dedicated matching methods for the different composite operators (i.e. for AND, OR and NOT predicates), rather than just matching on CompoundPredicate.

References

class postbound.qal.QuantifierExpression(expression: SqlExpression, *, quantifier: QuantifierOperator)#

An ANY/ALL expression.

For a predicate such as R.a > ALL (SELECT b FROM S) this expression is used to represent the right-hand side of the predicate. It typically appears as a child expression of a BinaryPredicate.

Parameters:
  • expression (SqlExpression) – The expression being quantified. Typically this is a SubqueryExpression, but some database systems also allow comparing against arrays or other collection types.

  • quantifier (QuantifierOperator) – The quantifier operator (_ANY_ or _ALL_).

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

static all(expression: SqlExpression | SqlQuery) QuantifierExpression#

Create an ALL expression.

Parameters:

expression (SqlExpression | SqlQuery)

Return type:

QuantifierExpression

static any(expression: SqlExpression | SqlQuery) QuantifierExpression#

Create an ANY expression.

Parameters:

expression (SqlExpression | SqlQuery)

Return type:

QuantifierExpression

columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

property expression: SqlExpression#

Get the expression that is being compared.

Typically, this will be a SubqueryExpression that produces a relation with a single column. Some database systems also allow comparing against arrays or other collection types.

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

property quantifier: QuantifierOperator#

Get the actual quantifier.

tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

class postbound.qal.QuantifierOperator(*values)#

The supported quantifier operators.

class postbound.qal.QueryPredicates(root: AbstractPredicate | None)#

The query predicates provide high-level access to all the different predicates in a query.

Generally speaking, this class provides the most user-friendly access into the predicate hierarchy and should be sufficient for most use-cases. The provided methods revolve around identifying filter and join predicates easily, as well finding the predicates that are specified on specific tables.

Parameters:

root (Optional[AbstractPredicate]) – The root predicate of the predicate hierarchy that should be represented by the QueryPredicates. Typically, this is a conjunction of the actual predicates.

all_simple() bool#

Checks, whether all predicates in the hierarchy can be represented as simplified views.

See also

SimpleFilter

The simplified representation of predicates

SimpleJoin

The simplified representation of join predicates

Return type:

bool

and_(other_predicate: QueryPredicates | AbstractPredicate) QueryPredicates#

Combines the current predicates with additional predicates, creating a conjunction of the two predicates.

The input predicates, as well as the current predicates object are not modified. All changes are applied to the resulting predicates object.

Parameters:

other_predicate (QueryPredicates | AbstractPredicate) – The predicates to combine. Can also be an AbstractPredicate, in which case this predicate is used as the root for the other predicates instance.

Returns:

The merged predicates wrapper. Its root is roughly equivalent to self.root AND other_predicate.root.

Return type:

QueryPredicates

static empty_predicate() QueryPredicates#

Constructs a new predicates instance without any actual content.

Returns:

The predicates wrapper

Return type:

QueryPredicates

filters() Collection[AbstractPredicate]#

Provides all filter predicates that are contained in the predicate hierarchy.

By default, the distinction between filters and joins that is defined in AbstractPredicate.is_join is used. However, this behaviour can be changed by subclasses.

This method handles compound predicates as follows:

  • conjunctions are un-nested, i.e. all predicates that form an AND predicate are collected individually

  • OR predicates are included with exactly those predicates from their children that are filters. If this is only true for a single predicate, that predicate will be returned directly.

  • NOT predicates are included if their child predicate is a filter.

Returns:

The filter predicates.

Return type:

Collection[AbstractPredicate]

filters_for(table: TableReference) AbstractPredicate | None#

Provides all filter predicates that reference a specific table.

If multiple individual filter predicates are specified in the query, they will be combined in one large conjunction.

The determination of matching filter predicates is the same as for the filters() method.

Parameters:

table (TableReference) – The table to retrieve the filters for

Returns:

A (conjunction of) the filter predicates of the table, or None if the table is unfiltered.

Return type:

Optional[AbstractPredicate]

is_empty() bool#

Checks, whether this predicate wrapper contains any actual predicates.

Returns:

Whether at least one predicate was specified.

Return type:

bool

join_graph(*, merge_aliases: bool = False) Graph#

Provides the join graph for the predicates.

A join graph is an undirected graph, where each node corresponds to a base table and each edge corresponds to a join predicate between two base tables. In addition, each node is annotated by a predicate key which is a conjunction of all filter predicates on that table (or None if the table is unfiltered). Likewise, each edge is annotated by a predicate key that corresponds to the join predicate (which can never be None).

Parameters:

merge_aliases (bool, optional) – Whether to merge aliases of the same base table into one node. Setting this to True transforms all predicates to dictionaries that map the aliased tables to their filter predicates and frozensets of tables to their join predicates. Otherwise, the predicates are annotated directly.

Returns:

The join graph.

Return type:

nx.Graph

Notes

Since our implementation of a join graph is not based on a multigraph, only binary joins can be represented. The determination of edges is based on AbstractPredicate.join_partners.

joins() Collection[AbstractPredicate]#

Provides all join predicates that are contained in the predicate hierarchy.

By default, the distinction between filters and joins that is defined in AbstractPredicate.is_join is used. However, this behaviour can be changed by subclasses.

This method handles compound predicates as follows:

  • conjunctions are un-nested, i.e. all predicates that form an AND predicate are collected individually

  • OR predicates are included with exactly those predicates from their children that are joins. If this is only true for a single predicate, that predicate will be returned directly.

  • NOT predicates are included if their child predicate is a join.

Returns:

The join predicates

Return type:

Collection[AbstractPredicate]

joins_between(first_table: TableReference | Iterable[TableReference], second_table: TableReference | Iterable[TableReference], *, _computation: Literal['legacy', 'graph', 'map'] = 'map') AbstractPredicate | None#

Provides the (conjunctive) join predicate that joins specific tables.

The precise behaviour of this method depends on the provided parameters: If first_table or second_table contain multiple tables, all join predicates between tables from the different sets are returned (but joins from tables within first_table or from tables within second_table are not).

Notice that the returned predicate might also include other tables, if they are part of a join predicate that also joins the given two tables.

The performance of this method can be crucial for some applications, since the join check is often part of a very hot loop. Therefore, a number of different calculation strategies are implemented. If profiling shows that the application is slowed down heavily by the current strategy, it might be a good idea to switch to another algorithm. This can be achieved using the _computation parameter.

Parameters:
  • first_table (TableReference | Iterable[TableReference]) – The (set of) tables to join

  • second_table (TableReference | Iterable[TableReference]) – The (set of) join partners for first_table.

  • _computation (Literal[&quot;legacy&quot;, &quot;graph&quot;, &quot;map&quot;], optional) –

    The specific algorithm to use for determining the join partners. The algorithms have very different performance and memory charactersistics. The default map setting is usually the fastest. It should only really be changed if there are very good reasons for it. The following settings exist:

    • legacy: uses a recursive strategy in case first_table or second_table contain multiple references. This is pretty slow, but easy to debug

    • graph: builds an internal join graph and merges the involved nodes from first_table and second_table to extract the overall join predicate directly.

    • map: stores a mapping between tables and their join predicates and merges these mappings to determine the overall join predicate

Returns:

A conjunction of all the individual join predicates between the two sets of candidate tables. If there is no join predicate between any of the tables, None is returned.

Return type:

Optional[AbstractPredicate]

Raises:

ValueError – If the _computation strategy is none of the allowed values

joins_for(table: TableReference) Collection[AbstractPredicate]#

Provides all join predicates that reference a specific table.

Each entry in the resulting collection is a join predicate between the given table and a (set of) partner tables, such that the partner tables in different entries in the collection are also different. If multiple join predicates are specified between the given table and a specific (set of) partner tables, these predicates are aggregated into one large conjunction.

The determination of matching join predicates is the same as for the joins() method.

Parameters:

table (TableReference) – The table to retrieve the joins for

Returns:

The join predicates with table. If there are no such predicates, the collection is empty.

Return type:

Collection[AbstractPredicate]

joins_tables(tables: TableReference | Iterable[TableReference], *more_tables: TableReference) bool#

Checks, whether specific tables are all joined with each other.

This does not mean that there has to be a join predicate between each pair of tables, but rather that all pairs of tables must at least be connected through a sequence of other tables. From a graph-theory centric point of view this means that the join (sub-) graph induced by the given tables is connected.

Parameters:
  • tables (TableReference | Iterable[TableReference]) – The tables to check.

  • *more_tables – Additional tables that also should be included in the check. This parameter is mainly for convenience usage in interactive scenarios.

Returns:

Whether the given tables can be joined without any cross product

Return type:

bool

Raises:

ValueError – If tables and more_tables is both empty

Examples

The following calls are exactly equivalent:

>>> predicates.joins_tables([table1, table2, table3])
>>> predicates.joins_tables(table1, table2, table3)
>>> predicates.joins_table([table1, table2], table3)
property root: AbstractPredicate#

Get the root predicate that represents the entire predicate hierarchy.

Typically, this is a conjunction of the actual predicates. This conjunction can be used to start a custom traversal of the predicate hierarchy.

Returns:

The root predicate

Return type:

AbstractPredicate

Raises:

StateError – If the predicates warpper is empty and there is no root predicate.

simplify() Sequence[SimpleFilter | SimpleJoin]#

Converts all predicates in the hierarchy into their simplified counterparts.

See also

SimpleFilter

The simplified representation of predicates

SimpleJoin

The simplified representation of join predicates

Return type:

Sequence[SimpleFilter | SimpleJoin]

class postbound.qal.Select(targets: BaseProjection | Sequence[BaseProjection], *, distinct: Iterable[SqlExpression] | bool = False)#

The SELECT clause of a query.

This is the only required part of a query. Everything else is optional and can be left out. (Notice that PostBOUND is focused on SPJ-queries, hence there are no INSERT, UPDATE, or DELETE queries)

A SELECT clause simply consists of a number of individual projections (see BaseProjection), its targets.

Parameters:
  • targets (BaseProjection | Sequence[BaseProjection]) – The individual projection(s) that form the SELECT clause

  • distinct (Iterable[SqlExpression] | bool, optional) – Whether a duplicate elimination should be performed. By default, this is False indicating no duplicate elimination. If True, rows are eliminated based on all columns. Alternatively, a DISTINCT ON clause can be created by specifying the columns that should be used for duplicate elimination.

Raises:

ValueError – If the targets are empty.

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

static count_star(*, distinct: Iterable[SqlExpression] | bool = False) Select#

Shortcut method to create a SELECT COUNT(*) clause.

Parameters:

distinct (Iterable[SqlExpression] | bool) – Whether a duplicate elimination should be performed. By default, this is False indicating no duplicate elimination. If True, rows are eliminated based on all columns. Alternatively, a DISTINCT ON clause can be created by specifying the columns that should be used for duplicate elimination.

Returns:

The clause

Return type:

Select

static create_for(columns: ColumnReference | SqlExpression | Iterable[ColumnReference | SqlExpression], *, distinct: Iterable[SqlExpression] | bool = False) Select#

Full factory method to accompany star and count_star factory methods.

This is basically the same as calling the __init__ method directly.

Parameters:
  • columns (ColumnReference | Iterable[ColumnReference]) – The columns that should form the projection

  • distinct (Iterable[SqlExpression] | bool, optional) – Whether a duplicate elimination should be performed. By default, this is False indicating no duplicate elimination. If True, rows are eliminated based on all columns. Alternatively, a DISTINCT ON clause can be created by specifying the columns that should be used for duplicate elimination.

Returns:

The clause

Return type:

Select

distinct_specifier() Iterable[SqlExpression] | bool#

Provides a precise description of the distinct qualifier.

Returns:

Output should be interpreted as follows:

  • True: plain DISTINCT over all columns

  • False: no DISTINCT qualifier

  • Iterable[SqlExpression]: DISTINCT ON clause over the given expressions

Return type:

Iterable[SqlExpression] | bool

is_distinct() bool#

Checks, whether this is a SELECT DISTINCT clause (including a DISTINCT ON).

Return type:

bool

is_star() bool#

Checks, whether the clause is simply SELECT *.

Return type:

bool

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

local_columns() set[str]#

Get all column names that are defined by the SELECT clause and not provided by the underlying tables.

For example, consider a query SELECT R.a, R.b AS foo FROM R. This query defines the local column foo, while R.a is provided by the underlying table R.

Return type:

set[str]

output_names() dict[str, ColumnReference]#

Output names map the alias of each column to the actual column.

For example, consider a query SELECT R.a AS foo, R.b AS bar FROM R. Calling output_names on this query provides the dictionary {'foo': R.a, 'bar': R.b}.

Currently, this method only works for 1:1 mappings and other aliases are ignored. For example, consider a query SELECT my_udf(R.a, R.b) AS c FROM R. Here, a user-defined function is used to combine the values of R.a and R.b to form an output column c. Such a projection is ignored by output_names.

Returns:

A mapping from the column target name to the original column.

Return type:

dict[str, ColumnReference]

static star(*, distinct: Iterable[SqlExpression] | bool = False) Select#

Shortcut to create a SELECT * clause.

Parameters:

distinct (Iterable[SqlExpression] | bool) – Whether a duplicate elimination should be performed. By default, this is False indicating no duplicate elimination. If True, rows are eliminated based on all columns. Alternatively, a DISTINCT ON clause can be created by specifying the columns that should be used for duplicate elimination.

Returns:

The clause

Return type:

Select

star_expressions() Iterable[BaseProjection]#

Provides all * and R.* expressions.

Returns:

The star expressions. Can be empty if no star expressions are used.

Return type:

Iterable[BaseProjection]

tables() set[TableReference]#

Provides all tables that are referenced in the clause.

Returns:

All tables. This includes virtual tables if such tables are present in the clause

Return type:

set[TableReference]

property targets: Sequence[BaseProjection]#

Get all projections.

Returns:

The projections in the order in which they were originally specified

Return type:

Sequence[BaseProjection]

class postbound.qal.SetOperator(*values)#

The supported set operators.

class postbound.qal.SetQuery(left_query: SqlQuery | SetQuery, right_query: SqlQuery | SetQuery, *, set_operation: SetOperator, cte_clause: CommonTableExpression | None = None, orderby_clause: OrderBy | None = None, limit_clause: Limit | None = None, hints: Hint | None = None, explain_clause: Explain | None = None)#

A set query combines the result sets of two queries using one of the set operations.

Set operations include UNION, UNION ALL, INTERSECT, and EXCEPT. We represent set queries as a different type than “plain” SELECT queries because these allow for a different interface (e.g. providing access to predicates or the SELECT block). See the documentation of SqlQuery for more details on the distinction and the reasoning behind it.

Still, the SetQuery provides exactly the same high-level interface. In case a specific method or property is not applicable for set queries (e.g. calling query.predicates()), a QueryTypeError will be raised. This is motivated by entirely pragmatic reasons: oftentimes a client will not care whether it receive a SqlQuery or a SetQuery because it is only interested in the common denominator between the two (e.g. calling str or retrieving its tables). Therefore, we want to be set queries applicable in the same places. At the same time, set queries are a much more recent addition to PostBOUND, and we do not want to force the client to update its code base if this is not really necessary.

Notice that set queries provide some clauses are supported by both plain SQL queries as well as set queries.

Parameters:
  • left_query (SelectStatement) – The left-hand side of the set operation

  • right_query (SelectStatement) – The right-hand side of the set operation

  • set_operation (SetOperator) – The actual operation to combine the two result sets.

  • cte_clause (Optional[CommonTableExpression], optional) – The WITH part of the query, by default None

  • orderby_clause (Optional[OrderBy], optional) – The ORDER BY part of the query, by default None

  • limit_clause (Optional[Limit], optional) – The LIMIT and OFFSET part of the query. In standard SQL, this is designated using the FETCH FIRST syntax. Defaults to None.

  • hints (Optional[Hint], optional) – The hint block of the query. Hints are not part of standard SQL and follow a completely system-specific syntax. Even their placement in within the query varies from system to system and from extension to extension. Defaults to None.

  • explain_clause (Optional[Explain], optional) – The EXPLAIN part of the query. Like hints, this is not part of standard SQL. However, most systems provide EXPLAIN functionality. The specific features and syntax are quite similar, but still system specific. Defaults to None.

See also

SqlQuery

accept_visitor(clause_visitor: ClauseVisitor, *args, **kwargs) dict[BaseClause, VisitorResult]#

Applies a visitor over all clauses in the current query.

Notice that since the visitor is applied to all clauses, it returns the results for each of them.

Parameters:

clause_visitor (ClauseVisitor) – The visitor algorithm to use.

Return type:

dict[BaseClause, VisitorResult]

ast() str#

Provides a human-readable representation of the abstract syntax tree for this query.

The AST is a textual representation of the query that shows the structure of the query in a tree-like manner.

Returns:

The abstract syntax tree of this query

Return type:

str

bound_tables() set[TableReference]#

Provides all tables that can be assigned to a physical or virtual table reference in this query.

Bound tables are those tables, that are selected in the FROM clause of the query, or a subquery. Conversely, unbound tables are those that have to be “injected” by an outer query, as is the case for dependent subqueries.

For example, the query SELECT * FROM R, S WHERE R.a = S.b has two bound tables: R and S. On the other hand, the query SELECT * FROM R WHERE R.a = S.b has only bound R, whereas S has to be bound in a surrounding query.

Returns:

All tables that are bound (i.e. listed in any FROM clause or a CTE) of the query.

Return type:

set[TableReference]

clauses(*, skip: ~typing.Type | ~collections.abc.Iterable[~typing.Type] | None = <class 'NoneType'>) Sequence[BaseClause]#

Provides all the clauses that are defined (i.e. not None) in this query.

Parameters:

skip (Optional[Type | Iterable[Type]], optional) – The clause types that should be skipped in the output. This can be a single type or an iterable of types.

Returns:

The clauses. The current order of the clauses is as follows: hints, explain, cte, set operation, orderby, limit. Notice however, that this order is not strictly standardized and may change in the future. All clauses that are not specified on the query will be skipped.

Return type:

Sequence[BaseClause]

columns() set[ColumnReference]#

Provides all columns that are referenced at any point in the query.

This includes columns from all clauses and does not account for renamed columns from subqueries. For example, consider the query SELECT R.a, my_sq.b FROM R JOIN (SELECT b FROM S) my_sq ON R.a < my_sq.b. columns would return the following set: {R.a, S.b, my_sq.b}, even though my_sq.b can be considered as just an alias for S.b.

Returns:

All columns that are referenced in the query.

Return type:

set[ColumnReference]

contains_cross_product() bool#

Checks, whether this query has at least one cross product.

Returns:

Whether this query has cross products.

Return type:

bool

property cte_clause: CommonTableExpression | None#

Get the WITH clause of the query.

Returns:

The WITH clause if it was specified, or None otherwise.

Return type:

Optional[CommonTableExpression]

property explain: Explain | None#

Get the EXPLAIN block of the query.

Returns:

The EXPLAIN settings if specified, or None otherwise.

Return type:

Optional[Explain]

property from_clause: From | None#

Placeholder method to ensure compatibility with the SqlQuery interface. Raises a QueryTypeError.

property groupby_clause: GroupBy | None#

Placeholder method to ensure compatibility with the SqlQuery interface. Raises a QueryTypeError.

property having_clause: Having | None#

Placeholder method to ensure compatibility with the SqlQuery interface. Raises a QueryTypeError.

property hints: Hint | None#

Get the hint block of the query.

The hints can specify preparatory statements that have to be executed before the actual query is run in addition to the hints themselves.

Returns:

The hint block if it was specified, or None otherwise.

Return type:

Optional[Hint]

is_dependent() bool#

Checks, whether all columns that are referenced in this query are provided by the tables from this query.

In order for this check to work, all columns have to be bound to actual tables, i.e. the tables attribute of all column references have to be set to a valid object.

Returns:

Whether all columns belong to tables that are bound by this query

Return type:

bool

is_explain() bool#

Checks, whether this query is an EXPLAIN query rather than a normal SQL query.

An EXPLAIN query is not executed like a normal SELECT query. Instead of actually calculating a result set, the database system only provides a query plan. This plan is the execution plan that would be used, had the query been entered as a normal SQL query.

Returns:

Whether this query should be explained, rather than executed.

Return type:

bool

is_explicit() bool#

Placeholder method to ensure compatibility with the SqlQuery interface. Raises a QueryTypeError.

Return type:

bool

is_implicit() bool#

Placeholder method to ensure compatibility with the SqlQuery interface. Raises a QueryTypeError.

Return type:

bool

is_ordered() bool#

Checks, whether this query produces its result tuples in order.

Returns:

Whether a valid ORDER BY clause was specified on the query.

Return type:

bool

is_scalar() bool#

Checks, whether the query is guaranteed to provide a single scalar value as a result.

Scalar results can only be calculated by queries with a single projection in the SELECT clause and if that projection is an aggregate function, e.g. SELECT min(R.a) FROM R. However, there are other queries which could also be scalar “by chance”, e.g. SELECT R.b FROM R WHERE R.a = 1 if R.a is the primary key of R. Notice that such cases are not recognized by this method.

Returns:

Whether the query will always return a single scalar value

Return type:

bool

is_set_query() bool#

Checks, whether this query is a set query.

A set query is a query that combines the results of two or more queries into a single result set. This can be done by combining the tuples from both sets using a UNION clause (which removes duplicates), or a UNION ALL clause (which retains duplicates). Alternatively, only tuples that are present in both sets can be retained using an INTERSECT clause. Finally, all tuples from the first result set that are not part of the second result set can be computed using an EXCEPT clause.

Notice that only one of the set operators can be used at a time, but the input query of one set operation can itself use another set operation.

Returns:

Whether this query is a set query

Return type:

bool

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this query.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this query, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this query.

Returns:

The columns

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all expressions that are directly contained in this query.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

property left_query: SqlQuery | SetQuery#

Get the left-hand side of the set operation.

Returns:

The left-hand side of the set operation

Return type:

SelectStatement

property limit_clause: Limit | None#

Get the combined LIMIT and OFFSET clauses of the query.

According to the SQL standard, these clauses should use the FETCH FIRST syntax. However, many systems use OFFSET and LIMIT instead.

Returns:

The FETCH FIRST clause if it was specified, or None otherwise.

Return type:

Optional[Limit]

property orderby_clause: OrderBy | None#

Get the ORDER BY clause of the query.

Returns:

The ORDER BY clause if it was specified, or None otherwise.

Return type:

Optional[OrderBy]

output_columns() Sequence[ColumnReference]#

Provides the columns that form the result relation of this query.

Columns are ordered according to their appearance in the SELECT clause and will not have a bound table associated with them. This is because the query result is “anonymous” and does not have a relation name associated with it. The columns are named according to the following rules:

  • If the expression has an alias, this name is used

  • If the expression is a simple column reference, the column name is used

  • Otherwise, a generic name is used, e.g. “column_1”, “column_2”, etc.

Additionally, to resolve naming conflicts between the left-hand side and the right-hand side of the set operation, names from the left-hand side overwrite names from the right-hand side. Names from the right-hand side are only used if the left-hand side does not provide a name. If both sides do not specify a name, a new generic name is used.

Returns:

The columns. Their order matches the order in which they appear in the SELECT clause of the query.

Return type:

Sequence[ColumnReference]

predicates() QueryPredicates#

Placeholder method to ensure compatibility with the SqlQuery interface. Raises a QueryTypeError.

Return type:

QueryPredicates

property right_query: SqlQuery | SetQuery#

Get the right-hand side of the set operation.

Returns:

The right-hand side of the set operation

Return type:

SelectStatement

property select_clause: Select#

Placeholder method to ensure compatibility with the SqlQuery interface. Raises a QueryTypeError.

property set_clause: UnionClause | ExceptClause | IntersectClause#

Get the set clause of the query.

property set_operation: SetOperator#

Get the set operation that is used to combine the two queries.

Returns:

The set operation

Return type:

SetOperator

stringify(*, trailing_delimiter: bool = True) str#

Provides a string representation of this query.

The only difference to calling str directly, is that the stringify method provides control over whether a trailing delimiter should be appended to the query.

Parameters:

trailing_delimiter (bool, optional) – Whether a delimiter should be appended to the query. Defaults to True.

Returns:

A string representation of this query

Return type:

str

subqueries() Collection[SqlQuery]#

Provides all subqueries that are referenced in this query.

Notice that CTEs are ignored by this method, since they can be accessed directly via the cte_clause property.

Returns:

All subqueries that appear in any of the “inner” clauses of the query

Return type:

Collection[SqlQuery]

tables() set[TableReference]#

Provides all tables that are referenced at any point in the query.

This includes tables from all clauses. Virtual tables will be included and tables that are only scanned within subqueries are included as well. Notice however, that some database systems might not support subqueries to be put at arbitrary positions in the query (e.g. GROUP BY clause).

Returns:

All tables that are referenced in the query.

Return type:

set[TableReference]

unbound_tables() set[TableReference]#

Provides all tables that are referenced in this query but not bound.

While tables() provides all tables that are referenced in this query in any way, bound_tables restricts these tables. This method provides the complementary set to bound_tables i.e. tables = bound_tables unbound_tables.

Returns:

The unbound tables that have to be supplied as part of an outer query

Return type:

set[TableReference]

property where_clause: Where | None#

Placeholder method to ensure compatibility with the SqlQuery interface. Raises a QueryTypeError.

class postbound.qal.SimpleFilter(predicate: AbstractPredicate)#

The intent behind this view is to provide more streamlined and direct access to filter predicates.

A simple filter is a read-only predicate, i.e. it cannot be created on its own and has to be derived from a base predicate (either a binary predicate, a BETWEEN predicate or an IN predicate). Afterward, it provides read-only access to the predicate being filtered, the filter operation, as well as the values used to restrict the allowed column instances.

Note that not all base predicates can be represented as a simplified view. In order for the view to work, both the column as well as the filter values cannot be modified by other expressions such as function calls or mathematical expressions. However, cast expressions are tolerated and will simply be dropped. As a rule of thumb, if an expression modifies a value (such as a function call), this cannot be unwrapped. Therefore, a filter approximately has to be of the form <column reference> <operator> <static values> in order for the representation to work.

The static methods wrap, can_wrap and wrap_all can serve as high-level access points into the view. The components of the view are accessible via properties.

Parameters:

predicate (AbstractPredicate) – The predicate that should be simplified

Raises:

ValueError – If the predicate cannot be represented by a simplified view.

See also

SimpleJoin

A similar view dedicated to join predicates

Examples

The best way to construct simplified views is to start with the QueryPredicates and extract the filter predicates, e.g., by using views = SimpleFilter.wrap_all(query.predicates()) or filters = SimpleFilter.wrap_all(query.predicates().joins()). Notice that especially the first conversion can be “lossy”: all join predicates are dropped. Likewise, all filters that are more complex such as disjunctions are ignored. Alternatively, the QueryPredicates also provides a simplify() method that can be used to convert all predicates (filters and joins) into their simplified counterparts.

The following predicates can be represented as a simplified view: R.a = 42, R.b BETWEEN 1 AND 2 or R.c IN (11, 22, 33). On the other hand, the following predicates cannot be represented b/c they involve advanced operations: R.a + 10 = 42 (contains a mathematical expression) and some_udf(R.a) < 11 % 2 (contains a function call and a mathematical expression).

Notes

Simple filters can be used in match statements and provide (column, operation, value) as arguments.

accept_visitor(visitor: PredicateVisitor[VisitorResult] | SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current predicate by a predicate visitor.

Parameters:
  • visitor (PredicateVisitor[VisitorResult]) – The visitor

  • args – Additional arguments to pass to the visitor

  • kwargs – Additional keyword arguments to pass to the visitor

Return type:

VisitorResult

base_predicates() Iterable[AbstractPredicate]#

Provides all base predicates that form this predicate.

This allows to iterate over all leaves of a compound predicate, for base predicates it simply returns the predicate itself.

Returns:

The base predicates, in an arbitrary order. If the predicate is a base predicate already, it will be the only item in the iterable.

Return type:

Iterable[AbstractPredicate]

static can_wrap(predicate: AbstractPredicate) bool#

Checks, whether a specific predicate can be represented as a simplified view.

Parameters:

predicate (AbstractPredicate) – The predicate to check

Returns:

Whether a representation as a simplified view is possible.

Return type:

bool

property column: ColumnReference#

Get the filtered column.

Returns:

The column

Return type:

ColumnReference

columns() set[ColumnReference]#

Provides all columns that are referenced by this predicate.

Returns:

The columns. If the predicate contains a subquery, all columns of that query are included.

Return type:

set[ColumnReference]

is_compound() bool#

Checks, whether this predicate combines the evaluation of other predicates to compute the overall evaluation.

Operators to combine such predicates can be standard logical operators like conjunction, disjunction and negation. This method serves as a high-level check, preventing the usage of dedicated isinstance calls in some use-cases.

Returns:

Whethter this predicate is a composite of other predicates.

Return type:

bool

is_join() bool#

Checks, whether this predicate encodes a join between two tables.

PostBOUND uses the following criteria to determine, whether a predicate is join or not:

  1. all predicates of the form <col 1> <operator> <col 2> where <col 1> and <col 2> come from different tables are joins. The columns can optionally be modified by value casts or static expressions, e.g. R.a::integer + 7

  2. all functions that access columns from multiple tables are joins, e.g. my_udf(R.a, S.b)

  3. all subqueries are treated as filters, no matter whether they are dependent subqueries or not. This means that both R.a = (SELECT MAX(S.b) FROM S) and R.a = (SELECT MAX(S.b) FROM S WHERE R.c = S.d) are treated as filters and not as joins, even though the second subquery will require some sort of the join in the query plan.

  4. BETWEEN and IN predicates are treated according to rule 1 since they can be emulated via base predicates (subqueries in IN predicates are evaluated according to rule 3.)

Although these rules might seem a bit arbitrary at first, there is actually no clear consensus of what constitutes a join and the query optimizers of different industrial database systems treat different predicates as joins. For example, some systems might not recognize function calls that access columns form two or more tables as joins or do not recognize predicates that use non-equi joins as opertors as actual joins.

If the specific join and filter recognition procedure breaks a specific use-case, subclasses of the predicate classes can be implemented. These subclasses can then apply the required rules. Using the tools in the transformation module, the queries can be updated. For some use-cases it can also be sufficient to change the join/filter recognition rules of the QueryPredicates objects. Consult its documentation for more details.

Lastly, notice that the distinction between join and filter is not entirely binary. There may also be a third class of predicates, potentially called “post-join filters”. These are filters that are applied after a join but cannot be included in the join predicate itself. This is usually the case due to limitations in the operator implementation of the actual database system. For example, invocations of user defined functions (case 2 above) usually fall in this category. Since the query abstraction layer is agnostic to specific details of database systems, we apply the binary categorization outlined above.

Returns:

Whether the predicate is a join of different relations.

Return type:

bool

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this predicate.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all expressions that are directly contained in this predicate.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

join_partners() set[tuple[ColumnReference, ColumnReference]]#

Provides all pairs of columns that are joined within this predicate.

If multiple columns are joined or it is unclear which columns are involved in a join exactly, this method falls back to returning the cross-product of all potential join partners. For example, consider the following query: SELECT * FROM R, S WHERE my_udf(R.a, R.b, S.c). In this case, it cannot be determined which columns of R take part in the join. Therefore, join_partners will return the set {(R.a, S.c), (R.b, S.c)}.

Returns:

The pairs of joined columns. These pairs are normalized, such that two predicates which join the same columns provide the join partners in the same order.

Return type:

set[tuple[ColumnReference, ColumnReference]]

Raises:

NoJoinPredicateError

property operation: LogicalOperator#

Get the SQL operation that is used for the filter (e.g. IN or <>).

Returns:

the operator. This cannot be EXISTS or MISSING, since subqueries cannot be represented in simplified views.

Return type:

LogicalSqlOperators

unwrap() AbstractPredicate#

Get the original predicate that is represented by this view.

Returns:

The original predicate

Return type:

AbstractPredicate

property value: object | tuple[object] | Sequence[object]#

Get the filter value.

Returns:

The value. For a binary predicate, this is just the value itself. For a BETWEEN predicate, this is tuple of the form (lower, upper) and for an IN predicate, this is a sequence of the allowed values.

Return type:

object | tuple[object] | Sequence[object]

static wrap(predicate: AbstractPredicate) SimpleFilter#

Transforms a specific predicate into a simplified view. Raises an error if that is not possible.

Parameters:

predicate (AbstractPredicate) – The predicate to represent as a simplified view

Returns:

A simplified view wrapping the given predicate

Return type:

SimpleFilter

Raises:

ValueError – If the predicate cannot be represented as a simplified view.

static wrap_all(predicates: Iterable[AbstractPredicate]) Sequence[SimpleFilter]#

Transforms specific predicates into simplified views.

If individual predicates cannot be represented as views, they are ignored.

Parameters:

predicates (Iterable[AbstractPredicate]) – The predicates to represent as views. These can be arbitrary predicates, i.e. including joins and complex filters.

Returns:

The simplified views for all predicates that can be represented this way. The sequence of the views matches the sequence in the predicates. If the representation fails for individual predicates, they simply do not appear in the result. Therefore, this sequence may be empty if none of the predicates are valid simplified views.

Return type:

Sequence[SimpleFilter]

class postbound.qal.SimpleJoin(predicate: AbstractPredicate)#

The intent behind this view is to provide a more streamlined and direct access to join predicates.

A simple join is a read-only predicate, i.e. it cannot be created on its own and has to be derived from a binary equi join predicate. Afterward, it provides read-only access to the partner columns that are joined.

Note that not all binary joins can be represented in a simplified view. In order for the view to work, the join must be an equi-join, i.e. using LogicalOperator.Equal. Furthermore, both sides of the join have to cannot be modified by other expressions such as function calls or mathematical expressions. However, cast expressions are tolerated and will simply be dropped. As a rule of thumb, if an expression modifies a value (such as a function call), this cannot be unwrapped. Therefore, a join approximately has to be of the form <first col> = <second col> in order for the representation to work.

The static methods wrap, can_wrap and wrap_all can serve as high-level access points into the view. The components of the view are accessible via properties.

Parameters:

predicate (AbstractPredicate) – The predicate that should be simplified

Raises:

ValueError – If the predicate cannot be represented by a simplified view.

See also

SimpleFilter

A similar view dedicated to filter predicates

Examples

The best way to construct simplified views is to start with the QueryPredicates and extract the joins, e.g. by using views = SimpleJoin.wrap_all(query.predicates()) or joins = SimpleJoin.wrap_all(query.predicates().joins()). Notice that especially the first conversion can be “lossy”: all filter predicates are dropped. Likewise, all joins that are not equi-joins are ignored. Alternatively, the QueryPredicates also provides a simplify() method that can be used to convert all predicates (filters and joins) into their simplified counterparts.

Notes

Simple joins can be used in match statements and provide the lhs and rhs properties.

accept_visitor(visitor: PredicateVisitor[VisitorResult] | SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current predicate by a predicate visitor.

Parameters:
  • visitor (PredicateVisitor[VisitorResult]) – The visitor

  • args – Additional arguments to pass to the visitor

  • kwargs – Additional keyword arguments to pass to the visitor

Return type:

VisitorResult

base_predicates() Iterable[AbstractPredicate]#

Provides all base predicates that form this predicate.

This allows to iterate over all leaves of a compound predicate, for base predicates it simply returns the predicate itself.

Returns:

The base predicates, in an arbitrary order. If the predicate is a base predicate already, it will be the only item in the iterable.

Return type:

Iterable[AbstractPredicate]

static can_wrap(predicate: AbstractPredicate) bool#

Checks, whether a specific predicate can be represented as a simplified view.

Parameters:

predicate (AbstractPredicate) – The predicate to check

Returns:

Whether a representation as a simplified view is possible.

Return type:

bool

columns() set[ColumnReference]#

Provides all columns that are referenced by this predicate.

Returns:

The columns. If the predicate contains a subquery, all columns of that query are included.

Return type:

set[ColumnReference]

is_compound() bool#

Checks, whether this predicate combines the evaluation of other predicates to compute the overall evaluation.

Operators to combine such predicates can be standard logical operators like conjunction, disjunction and negation. This method serves as a high-level check, preventing the usage of dedicated isinstance calls in some use-cases.

Returns:

Whethter this predicate is a composite of other predicates.

Return type:

bool

is_join() bool#

Checks, whether this predicate encodes a join between two tables.

PostBOUND uses the following criteria to determine, whether a predicate is join or not:

  1. all predicates of the form <col 1> <operator> <col 2> where <col 1> and <col 2> come from different tables are joins. The columns can optionally be modified by value casts or static expressions, e.g. R.a::integer + 7

  2. all functions that access columns from multiple tables are joins, e.g. my_udf(R.a, S.b)

  3. all subqueries are treated as filters, no matter whether they are dependent subqueries or not. This means that both R.a = (SELECT MAX(S.b) FROM S) and R.a = (SELECT MAX(S.b) FROM S WHERE R.c = S.d) are treated as filters and not as joins, even though the second subquery will require some sort of the join in the query plan.

  4. BETWEEN and IN predicates are treated according to rule 1 since they can be emulated via base predicates (subqueries in IN predicates are evaluated according to rule 3.)

Although these rules might seem a bit arbitrary at first, there is actually no clear consensus of what constitutes a join and the query optimizers of different industrial database systems treat different predicates as joins. For example, some systems might not recognize function calls that access columns form two or more tables as joins or do not recognize predicates that use non-equi joins as opertors as actual joins.

If the specific join and filter recognition procedure breaks a specific use-case, subclasses of the predicate classes can be implemented. These subclasses can then apply the required rules. Using the tools in the transformation module, the queries can be updated. For some use-cases it can also be sufficient to change the join/filter recognition rules of the QueryPredicates objects. Consult its documentation for more details.

Lastly, notice that the distinction between join and filter is not entirely binary. There may also be a third class of predicates, potentially called “post-join filters”. These are filters that are applied after a join but cannot be included in the join predicate itself. This is usually the case due to limitations in the operator implementation of the actual database system. For example, invocations of user defined functions (case 2 above) usually fall in this category. Since the query abstraction layer is agnostic to specific details of database systems, we apply the binary categorization outlined above.

Returns:

Whether the predicate is a join of different relations.

Return type:

bool

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this predicate.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all expressions that are directly contained in this predicate.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

join_partners() set[tuple[ColumnReference, ColumnReference]]#

Provides all pairs of columns that are joined within this predicate.

If multiple columns are joined or it is unclear which columns are involved in a join exactly, this method falls back to returning the cross-product of all potential join partners. For example, consider the following query: SELECT * FROM R, S WHERE my_udf(R.a, R.b, S.c). In this case, it cannot be determined which columns of R take part in the join. Therefore, join_partners will return the set {(R.a, S.c), (R.b, S.c)}.

Returns:

The pairs of joined columns. These pairs are normalized, such that two predicates which join the same columns provide the join partners in the same order.

Return type:

set[tuple[ColumnReference, ColumnReference]]

Raises:

NoJoinPredicateError

joins(partner: TableReference | ColumnReference) bool#

Checks, whether this predicate joins the given column or table.

Parameters:

partner (TableReference | ColumnReference)

Return type:

bool

property lhs: ColumnReference#

Get the left-hand side of the join.

partner_of(other: TableReference | ColumnReference) TableReference | ColumnReference | None#

Provides the join partner of the given column or table. If the column or table is not joined, None is returned.

Parameters:

other (TableReference | ColumnReference)

Return type:

TableReference | ColumnReference | None

property rhs: ColumnReference#

Get the right-hand side of the join.

unwrap() AbstractPredicate#

Get the original predicate that is represented by this view.

Return type:

AbstractPredicate

static wrap(predicate: AbstractPredicate) SimpleJoin#

Transforms a specific predicate into a simplified view. Raises an error if that is not possible.

Parameters:

predicate (AbstractPredicate) – The predicate to represent as a simplified view

Returns:

A simplified view wrapping the given predicate

Return type:

SimpleJoin

Raises:

ValueError – If the predicate cannot be represented as a simplified view.

static wrap_all(predicates: Iterable[AbstractPredicate]) Sequence[SimpleJoin]#

Transforms specific predicates into simplified views.

If individual predicates cannot be represented as views, they are ignored.

Parameters:

predicates (Iterable[AbstractPredicate]) – The predicates to represent as views. These can be arbitrary predicates, i.e. including filters and complex joins.

Returns:

The simplified views for all predicates that can be represented this way. The sequence of the views matches the sequence in the predicates. If the representation fails for individual predicates, they simply do not appear in the result. Therefore, this sequence may be empty if none of the predicates are valid simplified views.

Return type:

Sequence[SimpleJoin]

class postbound.qal.SqlExpression(hash_val: int)#

Base class for all expressions.

Expressions form one of the central building blocks of representing a SQL query in the QAL. They specify how values from different columns are modified and combined, thereby forming larger (hierarchical) structures.

Expressions can be inserted in many different places in a SQL query. For example, a SELECT clause produces columns such as in SELECT R.a FROM R, but it can also modify the column values slightly, such as in SELECT R.a + 42 FROM R. To account for all these different situations, the SqlExpression is intended to form hierarchical trees and chains of expressions. In the first case, a ColumnExpression is used, whereas a MathExpression can model the second case. Whereas column expressions represent leaves in the expression tree, mathematical expressions are intermediate nodes.

As a more advanced example, a complicated expressions such as my_udf(R.a::interval + 42) which consists of a user-defined function, a value cast and a mathematical operation is represented the following way: FunctionExpression(MathExpression(CastExpression(ColumnExpression), StaticValueExpression)). The methods provided by all expression instances enable a more convenient use and access to the expression hierarchies.

The different kinds of expressions are represented using different subclasses of the SqlExpression interface. This really is an abstract interface, not a usable expression. All inheriting expression have to provide their own __eq__ method and re-use the __hash__ method provided by the base expression. Remember to explicitly set this up! The concrete hash value is constant since the clause itself is immutable. It is up to the implementing class to make sure that the equality/hash consistency is enforced.

Parameters:

hash_val (int) – The hash of the concrete expression object

abstractmethod accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

abstractmethod columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

abstractmethod iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

abstractmethod itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

abstractmethod tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

class postbound.qal.SqlExpressionVisitor#

Basic visitor to operator on arbitrary expression trees.

See also

SqlExpression

References

class postbound.qal.StarExpression(*, from_table: TableReference | None = None)#

A special expression that is only used in SELECT clauses to select all columns.

Parameters:

from_table (Optional[TableReference], optional) – The table from which to select all columns. Defaults to None, in which case all columns of all tables are being selected.

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

property from_table: TableReference | None#

Get the table from which to select all columns.

If no such table was selected, all columns of all tables are being selected.

Returns:

The table, or None if all columns are selected

Return type:

Optional[TableReference]

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

class postbound.qal.StaticValueExpression(value: T)#

An expression that wraps a literal/static value.

This is one of the leaf expressions that does not contain any further child expressions.

NULL values are represented by None, you can use StaticValueExpression.null() to create such an expression and is_null() to check whether the value is NULL.

Parameters:

value (T) – The value that is wrapped by the expression

Examples

Consider the following SQL query: SELECT * FROM R WHERE R.a = 42. In this case the comparison value of 42 will be represented as a static value expression. The reference to the column R.a cannot be a static value since its values depend on the actual column values. Hence, a ColumnExpression is used for it.

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

is_null() bool#

Checks, whether the value is NULL.

Return type:

bool

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

static null() StaticValueExpression[None]#

Create a static value expression that represents a NULL value.

Return type:

StaticValueExpression[None]

tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

property value: T#

Get the value.

Returns:

The value, duh!

Return type:

T

class postbound.qal.SubqueryExpression(subquery: SqlQuery)#

A subquery expression wraps an arbitrary subquery.

This expression can be used in two different contexts: as table source to produce a virtual temporary table for reference in the query (see the clauses module), or as a part of a predicate. In the latter scenario the subqueries’ results are transient for the rest of the query. Therefore, this expression only represents the subquery part but no name under which the query result can be accessed. This is added by the different parts of the clauses module (e.g. WithQuery or SubqueryTableSource).

This is a leaf expression, i.e. a subquery expression cannot have any more child expressions. However, the subquery itself likely consists of additional expressions.

Parameters:

subquery (SqlQuery) – The subquery that forms this expression

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

property query: SqlQuery#

The (sub)query that is wrapped by this expression.

Returns:

The query

Return type:

SqlQuery

tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

class postbound.qal.SubqueryTableSource(query: SqlQuery | SubqueryExpression, target_name: str | TableReference = '', *, lateral: bool = False)#

Models subquery that is referenced as a virtual table in the FROM clause.

Consider the example query SELECT * FROM R, (SELECT * FROM S, T WHERE S.a = T.b) AS s_t WHERE R.c = s_t.a. In this query, the subquery s_t would be represented as a subquery table source.

Parameters:
  • query (SqlQuery | SubqueryExpression) – The query that is sourced as a subquery

  • target_name (str | TableReference, optional) – The name under which the subquery should be made available. Can empty for an anonymous subquery.

  • lateral (bool, optional) – Whether the subquery should be executed as a lateral join. Defaults to False.

Raises:

ValueError – If the target_name is empty

columns() set[ColumnReference]#

Provides all column sthat are referenced in the source.

For plain table sources this will be empty, but for more complicate structures such as subquery source, this will include all columns that are referenced in the subquery.

Returns:

The columns

Return type:

set[ColumnReference]

property expression: SubqueryExpression#

Get the query that is used to construct the virtual table, as a subquery expression.

Returns:

The subquery.

Return type:

SubqueryExpression

itercolumns() Iterable[ColumnReference]#

Provides access to all column in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this source, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

property lateral: bool#

Get, whether this is a lateral join.

Returns:

Whether this is a lateral join

Return type:

bool

output_columns() Sequence[ColumnReference]#

Provides the columns that form the result relation of this subquery.

The columns are named according to the following rules:

  • If the column has an alias, this name is used

  • If the column is a simple column reference, the column name is used

  • Otherwise, a generic name is used, e.g. “column_1”, “column_2”, etc.

Returns:

The columns. Their order matches the order in which they appear in the SELECT clause of the query.

Return type:

Sequence[ColumnReference]

predicates() QueryPredicates | None#

Provides all predicates that are contained in the source.

For plain table sources this will be None, but for subquery sources, etc. all predicates are returned.

Returns:

The predicates or None if the source does not allow predicates or simply does not contain any.

Return type:

QueryPredicates | None

property query: SqlQuery#

Get the query that is sourced as a virtual table.

Returns:

The query

Return type:

SqlQuery

tables() set[TableReference]#

Provides all tables that are referenced in the source.

For plain table sources this will just be the actual table. For more complicated structures, such as subquery sources, this will include all tables of the subquery as well.

Returns:

The tables.

Return type:

set[TableReference]

property target_name: str#

Get the name under which the virtual table can be accessed in the actual query.

Returns:

The name. Can be empty for an anonymous subquery.

Return type:

str

property target_table: TableReference | None#

Get the name under which the virtual table can be accessed in the actual query.

The only difference to target_name this return type: this property provides the name as a proper table reference, rather than a string.

Returns:

The table. This will always be a virtual table. Can be None for an anonymous subquery.

Return type:

Optional[TableReference]

class postbound.qal.TableSource#

A table source models a relation that can be scanned by the database system, filtered, joined, …

This is what is commonly reffered to as a table or a relation and forms the basic item of a FROM clause. In an SQL query the items of the FROM clause can originate from a number of different concepts. In the simplest case, this is just a physical table (e.g. SELECT * FROM R, S, T WHERE ...), but other queries might reference subqueries or common table expressions in the FROM clause (e.g. SELECT * FROM R, (SELECT * FROM S, T WHERE ...) WHERE ...). This class models the similarities between these concepts. Specific sub-classes implement them for the concrete kind of source (e.g. DirectTableSource or SubqueryTableSource).

abstractmethod columns() set[ColumnReference]#

Provides all column sthat are referenced in the source.

For plain table sources this will be empty, but for more complicate structures such as subquery source, this will include all columns that are referenced in the subquery.

Returns:

The columns

Return type:

set[ColumnReference]

abstractmethod itercolumns() Iterable[ColumnReference]#

Provides access to all column in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this source, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

abstractmethod iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

abstractmethod predicates() QueryPredicates | None#

Provides all predicates that are contained in the source.

For plain table sources this will be None, but for subquery sources, etc. all predicates are returned.

Returns:

The predicates or None if the source does not allow predicates or simply does not contain any.

Return type:

QueryPredicates | None

abstractmethod tables() set[TableReference]#

Provides all tables that are referenced in the source.

For plain table sources this will just be the actual table. For more complicated structures, such as subquery sources, this will include all tables of the subquery as well.

Returns:

The tables.

Return type:

set[TableReference]

class postbound.qal.UnaryPredicate(column: SqlExpression, operation: MathOperator | LogicalOperator | CompoundOperator | None = None)#

A unary predicate is applied directly to an expression, evaluating to TRUE or FALSE.

Examples of such predicates include R.a IS NOT NULL, EXISTS (SELECT S.b FROM S WHERE R.a = S.b), or my_udf(R.a). In the last case, my_udf has to produce a boolean return value.

Parameters:
  • column (SqlExpression) – The expression that is tested. This can also be a user-defined function that produces a boolen return value.

  • operation (Optional[SqlOperator], optional) – The operation that is used to generate the unary predicate. Only a small subset of operators can actually be used in this context (e.g. EXISTS or MISSING). If the predicate does not require an operator (e.g. in the case of filtering UDFs), the operation can be None. Notice however, that PostBOUND has no knowledge of the semantics of UDFs and can therefore not enforce, whether UDFs is actually valid in this context. This has to be done at runtime by the actual database system.

Raises:

ValueError – If the given operation is not a valid unary operator

accept_visitor(visitor: PredicateVisitor[VisitorResult] | SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current predicate by a predicate visitor.

Parameters:
  • visitor (PredicateVisitor[VisitorResult]) – The visitor

  • args – Additional arguments to pass to the visitor

  • kwargs – Additional keyword arguments to pass to the visitor

Return type:

VisitorResult

property column: SqlExpression#

The column that is checked by this predicate

Returns:

The expression

Return type:

SqlExpression

columns() set[ColumnReference]#

Provides all columns that are referenced by this predicate.

Returns:

The columns. If the predicate contains a subquery, all columns of that query are included.

Return type:

set[ColumnReference]

static exists(subquery: SqlQuery | SubqueryExpression) UnaryPredicate#

Creates an EXISTS predicate for a subquery.

Parameters:

subquery (SqlQuery | SubqueryExpression) – The subquery. Will be wrapped in a SubqueryExpression if it is not already one.

Returns:

The EXISTS predicate

Return type:

UnaryPredicate

is_exists() bool#

Checks, whether this predicate is an EXISTS predicate.

Returns:

Whether this predicate is an EXISTS predicate

Return type:

bool

is_join() bool#

Checks, whether this predicate encodes a join between two tables.

PostBOUND uses the following criteria to determine, whether a predicate is join or not:

  1. all predicates of the form <col 1> <operator> <col 2> where <col 1> and <col 2> come from different tables are joins. The columns can optionally be modified by value casts or static expressions, e.g. R.a::integer + 7

  2. all functions that access columns from multiple tables are joins, e.g. my_udf(R.a, S.b)

  3. all subqueries are treated as filters, no matter whether they are dependent subqueries or not. This means that both R.a = (SELECT MAX(S.b) FROM S) and R.a = (SELECT MAX(S.b) FROM S WHERE R.c = S.d) are treated as filters and not as joins, even though the second subquery will require some sort of the join in the query plan.

  4. BETWEEN and IN predicates are treated according to rule 1 since they can be emulated via base predicates (subqueries in IN predicates are evaluated according to rule 3.)

Although these rules might seem a bit arbitrary at first, there is actually no clear consensus of what constitutes a join and the query optimizers of different industrial database systems treat different predicates as joins. For example, some systems might not recognize function calls that access columns form two or more tables as joins or do not recognize predicates that use non-equi joins as opertors as actual joins.

If the specific join and filter recognition procedure breaks a specific use-case, subclasses of the predicate classes can be implemented. These subclasses can then apply the required rules. Using the tools in the transformation module, the queries can be updated. For some use-cases it can also be sufficient to change the join/filter recognition rules of the QueryPredicates objects. Consult its documentation for more details.

Lastly, notice that the distinction between join and filter is not entirely binary. There may also be a third class of predicates, potentially called “post-join filters”. These are filters that are applied after a join but cannot be included in the join predicate itself. This is usually the case due to limitations in the operator implementation of the actual database system. For example, invocations of user defined functions (case 2 above) usually fall in this category. Since the query abstraction layer is agnostic to specific details of database systems, we apply the binary categorization outlined above.

Returns:

Whether the predicate is a join of different relations.

Return type:

bool

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this predicate.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all expressions that are directly contained in this predicate.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

join_partners() set[tuple[ColumnReference, ColumnReference]]#

Provides all pairs of columns that are joined within this predicate.

If multiple columns are joined or it is unclear which columns are involved in a join exactly, this method falls back to returning the cross-product of all potential join partners. For example, consider the following query: SELECT * FROM R, S WHERE my_udf(R.a, R.b, S.c). In this case, it cannot be determined which columns of R take part in the join. Therefore, join_partners will return the set {(R.a, S.c), (R.b, S.c)}.

Returns:

The pairs of joined columns. These pairs are normalized, such that two predicates which join the same columns provide the join partners in the same order.

Return type:

set[tuple[ColumnReference, ColumnReference]]

Raises:

NoJoinPredicateError

class postbound.qal.UnionClause(left_query: SqlQuery | SetQuery, right_query: SqlQuery | SetQuery, *, union_all: bool = False)#

The UNION or UNION ALL clause of a query.

Parameters:
  • left_query (SelectStatement) – The left input to the UNION operation. Since UNIONs are commutative, the assignment of left and right does not really matter.

  • right_query (SelectStatement) – The right input to the UNION operation. Since UNIONs are commutative, the assignment of left and right does not really matter.

  • union_all (bool, optional) – Whether the UNION operation should eliminate duplicates or not. Defaults to False which indicates that duplicates should be eliminated.

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

input_queries() set[SqlQuery | SetQuery]#

Get the two input queries that are part of the UNION operation.

Returns:

The left and right queries. Since UNIONs are commutative, the assignment of left and right does not really matter.

Return type:

set[SelectStatement]

is_union_all() bool#

Whether this is a UNION or UNION ALL clause.

Returns:

True if duplicates are eliminated, False otherwise.

Return type:

bool

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

property left_query: SqlQuery | SetQuery#

Get the left query that is part of the UNION operation.

Returns:

The left query. Since UNIONs are commutative, the assignment of left and right does not really matter.

Return type:

SelectStatement

See also

input_queries

property right_query: SqlQuery | SetQuery#

Get the right query that is part of the UNION operation.

Returns:

The right query. Since UNIONs are commutative, the assignment of left and right does not really matter.

Return type:

SelectStatement

See also

input_queries

tables() set[TableReference]#

Provides all tables that are referenced in the clause.

Returns:

All tables. This includes virtual tables if such tables are present in the clause

Return type:

set[TableReference]

property union_all: bool#

Get whether this is a UNION or UNION ALL clause.

Returns:

True if duplicates are eliminated, False otherwise.

Return type:

bool

class postbound.qal.ValuesTableSource(values: Iterable[tuple[StaticValueExpression, ...]], *, alias: str | TableReference = '', columns: Iterable[str | ColumnReference] | None = None)#

Represents a virtual table that is constructed from a literal list of values.

Parameters:
  • values (ValuesList) – The available table rows.

  • alias (str | TableReference, optional) – The name under which the virtual table can be accessed in the actual query. If this is empty, an anonymous table is created.

  • columns (Optional[Iterable[str | ColumnReference]], optional) – The names of the columns that are available in the virtual table. The length of this list must match the length of the tuples in the values list. Alternatively, an empty list can be provided, in which case the columns will be named automatically.

property alias: str#

Get the name under which the virtual table can be accessed in the actual query.

Returns:

The name. Can be empty for an anonymous table.

Return type:

str

property cols: Sequence[ColumnReference]#

Get the columns that are available in the virtual table.

Returns:

The columns. Can be empty if the columns are not named explicitly.

Return type:

Sequence[ColumnReference]

columns() set[ColumnReference]#

Provides all column sthat are referenced in the source.

For plain table sources this will be empty, but for more complicate structures such as subquery source, this will include all columns that are referenced in the subquery.

Returns:

The columns

Return type:

set[ColumnReference]

itercolumns() Iterable[ColumnReference]#

Provides access to all column in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this source, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in the source.

For plain table sources this will be empty, but for subquery sources, etc. all expressions are returned. Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

output_columns() Sequence[ColumnReference]#

Provides the columns that form the result relation.

Returns:

The columns. If column names have been explicitly provided, these are used. Otherwise, generic names are used, e.g. “column_1”, “column_2”, etc.

Return type:

Sequence[ColumnReference]

predicates() QueryPredicates | None#

Provides all predicates that are contained in the source.

For plain table sources this will be None, but for subquery sources, etc. all predicates are returned.

Returns:

The predicates or None if the source does not allow predicates or simply does not contain any.

Return type:

QueryPredicates | None

property rows: Iterable[tuple[StaticValueExpression, ...]]#

Get the rows that are available in the virtual table.

Returns:

The rows

Return type:

ValuesList

property table: TableReference | None#

Get the name under which the virtual table can be accessed in the actual query.

Returns:

The table. Can be None for an anonymous table.

Return type:

Optional[TableReference]

tables() set[TableReference]#

Provides all tables that are referenced in the source.

For plain table sources this will just be the actual table. For more complicated structures, such as subquery sources, this will include all tables of the subquery as well.

Returns:

The tables.

Return type:

set[TableReference]

class postbound.qal.ValuesWithQuery(values: Iterable[tuple[StaticValueExpression, ...]], *, target_name: str | TableReference = '', columns: Iterable[str | ColumnReference] | None = None, materialized: bool | None = None)#

Models a common table expression that is based on a VALUES clause, e.g. WITH t(a, b) AS (VALUES (1, 2), (3, 4)).

Parameters:
  • values (ValuesList) – The values that should be used to construct the CTE.

  • target_name (str | TableReference, optional) – The name under which the table should be made available. If a table reference is provided, its identifier will be used.

  • columns (Optional[Iterable[str | ColumnReference]], optional) – The columns that should be used to construct the CTE. If no columns are provided, all columns are anonymous. If columns are provided, they have to match the number of columns in the values list.

  • materialized (Optional[bool], optional) – Whether the query should be materialized or not. If this is not supported or not known, this can be set to None (the default). Since materialization is not part of the SQL standard, we do not include it in the WITH querie’s identity.

property cols: Sequence[ColumnReference]#

Get the columns that are used to construct the common table expression.

Returns:

The columns

Return type:

Sequence[ColumnReference]

output_columns() Sequence[ColumnReference]#

Provides the columns that form the result relation of this CTE.

Returns:

The columns. If no columns were provided, generic column names are used.

Return type:

Sequence[ColumnExpression]

property rows: Iterable[tuple[StaticValueExpression, ...]]#

Get the values that are used to construct the CTE.

Returns:

The values

Return type:

ValuesList

class postbound.qal.Where(predicate: AbstractPredicate)#

The WHERE clause specifies conditions that result rows must satisfy.

All conditions are collected in a (potentially conjunctive or disjunctive) predicate object. See AbstractPredicate for details.

Parameters:

predicate (AbstractPredicate) – The root predicate that specifies all conditions

accept_visitor(visitor: ClauseVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current clause by a visitor.

Parameters:
  • visitor (ClauseVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced in the clause.

Returns:

The columns

Return type:

set[ColumnReference]

itercolumns() Iterable[ColumnReference]#

Provides access to all column in this clause.

In contrast to the columns method, duplicates are returned multiple times, i.e. if a column is referenced n times in this clause, it will also be returned n times by this method. Furthermore, the order in which columns are provided by the iterable matches the order in which they appear in this clause.

Returns:

All columns exactly in the order in which they are used

Return type:

Iterable[ColumnReference]

iterexpressions() Iterable[SqlExpression]#

Provides access to all directly contained expressions in this clause.

Nested expressions can be accessed from these expressions in a recursive manner (see the SqlExpression interface for details).

Returns:

The expressions

Return type:

Iterable[SqlExpression]

property predicate: AbstractPredicate#

Get the root predicate that contains all filters and joins in the WHERE clause.

Returns:

The condition

Return type:

AbstractPredicate

tables() set[TableReference]#

Provides all tables that are referenced in the clause.

Returns:

All tables. This includes virtual tables if such tables are present in the clause

Return type:

set[TableReference]

class postbound.qal.WindowExpression(window_function: FunctionExpression | str, *, partitioning: Sequence[SqlExpression] | None = None, ordering: OrderBy | None = None, filter_condition: AbstractPredicate | None = None)#

Represents a window expression in SQL.

Parameters:
  • window_function (FunctionExpression | str) – The window function to be applied. If this is a string, it will be wrapped in a plain function expression.

  • partitioning (Optional[Sequence[SqlExpression]], optional) – The expressions used for partitioning the window. Defaults to None.

  • ordering (Optional[OrderBy], optional) – The ordering of the window. Defaults to None.

  • filter_condition (Optional[AbstractPredicate], optional) – The filter condition for the window. Defaults to None.

accept_visitor(visitor: SqlExpressionVisitor[VisitorResult], *args, **kwargs) VisitorResult#

Enables processing of the current expression by an expression visitor.

Parameters:
  • visitor (SqlExpressionVisitor[VisitorResult]) – The visitor

  • args – Additional arguments that are passed to the visitor

  • kwargs – Additional keyword arguments that are passed to the visitor

Return type:

VisitorResult

columns() set[ColumnReference]#

Provides all columns that are referenced by this expression.

Returns:

The columns

Return type:

set[ColumnReference]

property filter_condition: AbstractPredicate | None#

Get the filter condition for tuples in the current window.

Returns:

The filter condition for the expression, or None if all tuples are aggegrated.

Return type:

Optional[AbstractPredicate]

iterchildren() Iterable[SqlExpression]#

Provides unified access to all child expressions of the concrete expression type.

For leaf expressions such as static values, the iterable will not contain any elements. Otherwise, all direct children will be returned. For example, a mathematical expression could return both the left, as well as the right operand. This allows for access to nested expressions in a recursive manner.

Returns:

The expressions

Return type:

Iterable[SqlExpression]

itercolumns() Iterable[ColumnReference]#

Provides all columns that are referenced by this expression.

If a column is referenced multiple times, it is also returned multiple times.

Returns:

All columns in exactly the order in which they are used.

Return type:

Iterable[ColumnReference]

property ordering: OrderBy | None#

Get the ordering of tuples in the current window.

Returns:

The ordering of the tuples, or None if no ordering is specified.

Return type:

Optional[OrderBy]

property partitioning: Sequence[SqlExpression]#

Get the expressions used for partitioning the window.

Returns:

The expressions used for partitioning the window. Can be empty if no partitioning is used.

Return type:

Sequence[SqlExpression]

tables() set[TableReference]#

Provides all tables that are accessed by this expression.

Returns:

All tables. This includes virtual tables if such tables are present in the expression.

Return type:

set[TableReference]

property window_function: FunctionExpression#

Get the window function of the window expression.

Returns:

The window function.

Return type:

FunctionExpression

class postbound.qal.WithQuery(query: SqlQuery, target_name: str | TableReference, *, materialized: bool | None = None)#

A single common table expression that can be referenced in the actual query.

Each WITH clause can consist of multiple auxiliary common table expressions. This class models exactly one such query. It consists of the query as well as the name under which the temporary table can be referenced in the actual SQL query.

Parameters:
  • query (SqlQuery) – The query that should be used to construct the temporary common table.

  • target_name (str | TableReference) – The name under which the table should be made available. If a table reference is provided, its identifier will be used.

  • materialized (Optional[bool], optional) – Whether the query should be materialized or not. If this is not supported or not known, this can be set to None (the default). Since materialization is not part of the SQL standard, we do not include it in the WITH querie’s identity.

Raises:

ValueError – If the target_name is empty

property materialized: bool | None#

Get whether this is materialized WITH query or not.

If materialization is unknown or not supported, None can be used. Therefore, this property should always be checked against None before checking the actual truth value. Since materialization is not part of the SQL standard, we do not include it in the WITH querie’s identity.

Returns:

The materialization status

Return type:

Optional[bool]

output_columns() Sequence[ColumnReference]#

Provides the columns that form the result relation of this CTE.

The columns are named according to the following rules:

  • If the column has an alias, this name is used

  • If the column is a simple column reference, the column name is used

  • Otherwise, a generic name is used, e.g. “column_1”, “column_2”, etc.

Returns:

The columns. Their order matches the order in which they appear in the SELECT clause of the query.

Return type:

Sequence[ColumnReference]

property query: SqlQuery#

The query that is used to construct the temporary table

Returns:

The query

Return type:

SqlQuery

property subquery: SubqueryExpression#

Provides the query that constructsd the temporary table as a subquery object.

Returns:

The subquery

Return type:

SubqueryExpression

property target_name: str#

The table name under which the temporary table can be referenced in the actual SQL query

Returns:

The name. Will never be empty.

Return type:

str

property target_table: TableReference#

The table under which the temporary CTE table can be referenced in the actual SQL query

The only difference to target_name is the type of this property: it provides a proper (virtual) table reference object

Returns:

The table. Will always be a virtual table.

Return type:

TableReference

postbound.qal.as_expression(value: object) SqlExpression#

Transforms the given value into the most appropriate SqlExpression instance.

This is a heuristic utility method that applies the following rules:

  • ColumnReference becomes ColumnExpression

  • SqlQuery becomes SubqueryExpression

  • the star-string * becomes a StarExpression

All other values become a StaticValueExpression.

Parameters:

value (object) – The object to be transformed into an expression

Returns:

The most appropriate expression object according to the transformation rules

Return type:

SqlExpression

postbound.qal.as_predicate(column: ColumnReference, operation: LogicalOperator | str, *arguments) BasePredicate#

Utility method to quickly construct instances of base predicates.

The given arguments are transformed into appropriate expression objects as necessary.

The specific type of generated predicate is determined by the given operation. The following rules are applied:

  • for BETWEEN predicates, the arguments can be either two values, or a tuple of values (additional arguments are ignored)

  • for IN predicates, the arguments can be either a number of arguments, or a (nested) iterable of arguments

  • for all other binary predicates exactly one additional argument must be given (and an error is raised if that is not the case)

Parameters:
  • column (ColumnReference) – The column that should become the first operand of the predicate

  • operation (LogicalSqlOperators | str) – The operation that should be used to build the predicate. The actual return type depends on this value. As an alternative to LogicalOperator, the operation can also be provided as a string (e.g. “=” for LogicalOperator.Equal).

  • *arguments – Further operands for the predicate. The allowed values and their structure depend on the precise predicate (see rules above).

Returns:

A predicate representing the given operation on the given operands

Return type:

BasePredicate

Raises:

ValueError – If a binary predicate is requested, but *arguments does not contain a single value

postbound.qal.build_query(query_clauses: Iterable[BaseClause]) SqlQuery#

Constructs an SQL query based on specific clauses.

No validation is performed. If clauses appear multiple times, later clauses overwrite former ones. The specific type of query (i.e. implicit, explicit or mixed) is inferred from the clauses (i.e. occurrence of an implicit FROM clause enforces an ImplicitSqlQuery and vice-versa). The overwriting rules apply here as well: a later From clause overwrites a former one and can change the type of the produced query.

This method can also be used to contruct SetQuery objects by passing one of the set clauses (UnionClause, IntersectClause or ExceptClause). In this case, the user must ensure that no clauses that are illegal in the context of a set operation are supplied (e.g. Select or From). Otherwise, an error is raised.

Parameters:

query_clauses (Iterable[BaseClause]) – The clauses that should be used to construct the query. If any of the clauses are None, they will simply be skipped.

Returns:

A query consisting of the specified clauses

Return type:

SqlQuery

Raises:
  • ValueError – If query_clauses does not contain a Select clause

  • ValueError – If any of the clause types is unknown. This indicates that this method is missing a handler for a specific clause type that was added later on.

postbound.qal.collect_subqueries_in_expression(expression: SqlExpression) set[SqlQuery]#

Handler method to provide all the subqueries that are contained in a specific expression.

Parameters:

expression (SqlExpression) – The expression to analyze

Returns:

The subqueries from the expression

Return type:

set[SqlQuery]

See also

SqlQuery.subqueries

postbound.qal.determine_join_equivalence_classes(predicates: Iterable[BinaryPredicate]) set[frozenset[ColumnReference]]#

Calculates all equivalence classes of equijoin predicates.

Columns are in an equivalence class if they can all be compared with matching equality predicates. For example, consider two predicates a = b and a = c. From these predicates it follows that b = c and hence the set of columns {a, b, c} is an equivalence class. Likewise, the predicates a = b and c = d form two equivalence classes, namely {a, b} and {c, d}.

Parameters:

predicates (Iterable[BinaryPredicate]) – The predicates to check. Non-equijoin predicates are discarded automatically.

Returns:

The equivalence classes. Each element of the set describes a complete equivalence class.

Return type:

set[frozenset[ColumnReference]]

postbound.qal.format_quick(query: SqlQuery | SetQuery, *, flavor: Literal['vanilla', 'postgres'] = 'vanilla', inline_hint_block: bool = False, trailing_semicolon: bool = True, custom_formatter: Callable[[SqlQuery], SqlQuery] | None = None) str#

Applies a quick formatting heuristic to structure the given query.

The query will be structured as follows:

  • all clauses start at a new line

  • long clauses with multiple parts (e.g. SELECT clause, FROM clause) are split along multiple intended lines

  • the predicate in the WHERE clause is split on multiple lines along the different parts of a conjunctive predicate

All other clauses are written on a single line (e.g. GROUP BY clause).

Parameters:
  • query (SelectStatement) – The query to format

  • inline_hint_block (bool, optional) – Whether to insert a potential hint block in the SELECT clause (i.e. inline it), or leave it as a block preceding the actual query. Defaults to False which indicates that the clause should be printed before the actual query.

  • custom_formatter (Callable[[SqlQuery], SqlQuery], optional) – A post-processing formatting service to apply to the SQL query after all preparatory steps have been performed, but before the actual formatting is started. This can be used to inject custom clause or expression formatting rules that are necessary to adhere to specific SQL syntax deviations for a database system. Defaults to None which skips this step.

  • flavor (Literal['vanilla', 'postgres'])

  • trailing_semicolon (bool)

Returns:

A pretty string representation of the query.

Return type:

str

postbound.qal.generate_predicates_for_equivalence_classes(equivalence_classes: set[frozenset[ColumnReference]]) set[BinaryPredicate]#

Provides all possible equijoin predicates for a set of equivalence classes.

This function can be used in combination with determine_join_equivalence_classes to expand join predicates to also include additional joins that can be derived from the predicates.

For example, consider two joins a = b and b = c. These joins form one equivalence class {a, b, c}. Based on the equivalence class, the predicates a = b, b = c and a = c can be generated.

Parameters:

equivalence_classes (set[frozenset[ColumnReference]]) – The equivalence classes. Each class is described by the columns it contains.

Returns:

The predicates

Return type:

set[BinaryPredicate]