Relational Algebra

Contents

Relational Algebra#

Module name: postbound.relalg

relalg provides fundamental building blocks of relational algebra and a converter from SQL to algebra.

The central component of our algebra implementation is the RelNode class. All relational operators inherit from this abstract class. Following the design of the expression and predicate models, all algebraic trees are immutable data structures. Once a tree has been generated, it can no longer be modified.

One important aspect of our relational algebra design is how to model arbitrary expressions and projections, mappings, etc. on these expressions. Some systems introduce temporary variables for mapping targets, e.g. arg0 <- R.a + 42 and then base all further accesses on these temporary variables. In this school of thought, an algebra tree for the query SELECT R.a + 42 FROM R would like this:

\[\pi_{arg_0}(\chi_{arg_0 \leftarrow R.a + 42}(R))\]

This representation is especially usefull for a code-generation or physical optimization scenario because it enables a straightforward creation of additional (temporary) columns. At the same time, it makes the translation of SQL queries to relational algebra more challenging, since re-writes have to be applied during parsing. Since we are not concerned with code-generation in our algebra representation and focus more on structural properties, we take a different approach: all expressions (as defined in the expressions module) are contained as-is in the operators. However, we make sure that necessary pre-processing actions are included as required. For example, if a complex expression is included in a predicate or a projection, we generate the appropriate mapping operation beforehand and use it as an input for the consuming operator.

In addition to the conventional operators of relational algebra, we introduce a couple of additional operators that either mimic features from SQL, or that make working with the algebra much easier from a technical point-of-view. The first category includes operators such as Sort or DuplicateElimination and Limit, whereas the second category includes the SubqueryScan.

Notice that while most algebraic expressions correspond to tree structures, there might be cases where a directed, acyclic graph is generated. This is especially the case when a base relation is used as part of subqueries. Nevertheless, there will always be only one root (sink) node.

class postbound.relalg.RelNode(parent_node: RelNode | None)#

Models a fundamental operator in relation algebra. All specific operators like selection or theta join inherit from it.

Parameters:

parent_node (Optional[RelNode]) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

See also

parse_relalg

property node_type: str#

Get the current operator as a string.

Returns:

The operator name

Return type:

str

property parent_node: RelNode | None#

Get the parent node of the current operator, if it exists.

Returns:

The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None is returned.

Return type:

Optional[RelNode]

property sideways_pass: frozenset[RelNode]#

Get all nodes that receive the output of the current operator in addition to the parent node.

Returns:

The sideways pass nodes

Return type:

frozenset[RelNode]

root() RelNode#

Traverses the algebra tree upwards until the root node is found.

Returns:

The root node of the algebra expression. Can be the current node if it does not have a parent.

Return type:

RelNode

leaf() RelNode | None#

Traverses the algebra tree downwards until a unique leaf node is found.

Returns:

The leaf node. If multiple leaf nodes exist (e.g. for join nodes), None is returned.

Return type:

Optional[RelNode]

abstractmethod children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

tables(*, ignore_subqueries: bool = False) frozenset[TableReference]#

Provides all relations that are contained in the current node.

Consider the following algebraic expression: π(⋈(σ(R), S)). This expression contains two relations: R and S.

Parameters:

ignore_subqueries (bool, optional) – Whether relations that are only referenced in subquery subtrees should be excluded. Off by default.

Returns:

The tables

Return type:

frozenset[TableReference]

provided_expressions() frozenset[SqlExpression]#

Collects all expressions that are available to parent nodes.

These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.

Returns:

The expressions

Return type:

frozenset[expressions.SqlExpression]

abstractmethod accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

dfs_walk() Generator[RelNode, None, None]#

Performs a depth-first search on the algebraic expression.

This produces the subtree induced by the current node. The current node is also included in the output.

Yields:

Generator[RelNode, None, None] – All nodes of the subtree induced by the current node.

Return type:

Generator[RelNode, None, None]

mutate(*, as_root: bool = False, **kwargs) RelNode#

Creates a new instance of the current operator with modified attributes.

The specific parameters depend on the concrete operator type. Calling mutate() on any node generates a copy of the entire tree, so make sure to use this operation sparingly.

See Notes for important remarks regarding the mutate implementation for custom node types.

Parameters:
  • as_root (bool, optional) – Whether the current node should become the new root node of the tree. Defaults to False, which leaves the node at its current position in the tree.

  • **kwargs – The attributes to modify. The specific attributes depend on the concrete operator type.

Returns:

The modified node

Return type:

RelNode

Notes

Concrete node types should implement their own mutate() function which accepts parameters specific to the node type (in addition to the required as_root parameter). Other than the function prototype, the methods can share the same default implementation:

params = {param: val for param, val in locals().items() if param != "self" and not param.startswith("__")}
return super().mutate(**params)

In order for the method to work properly, all field-specific parameters must use the common property/attribute conventions. For example, to update a property input_node of the operator, its value must be stored in a “private” attribute _input_node. At the same time, the mutate() method must accept a parameter input_node (without the leading underscore). The method will then automatically update the internal attribute with the new value.

See the implementation of Selection for an example.

clone() RelNode#

Obtains a 1:1 copy of the current node.

Returns:

The cloned node

Return type:

RelNode

inspect(*, _indentation: int = 0) str#

Provides a nice hierarchical string representation of the algebraic expression.

The representation typically spans multiple lines and uses indentation to separate parent nodes from their children.

Parameters:
  • indentation (int, optional) – Internal parameter to the inspect function. Should not be modified by the user. Denotes how deeply recursed we are in the plan tree. This enables the correct calculation of the current indentation level. Defaults to 0 for the root node.

  • _indentation (int)

Returns:

A string representation of the algebraic expression

Return type:

str

class postbound.relalg.Selection(input_node: RelNode, predicate: AbstractPredicate, *, parent_node: RelNode | None = None)#

A selection filters the input relation based on an arbitrary predicate.

Parameters:
  • input_node (RelNode) – The tuples to filter

  • predicate (AbstractPredicate) – The predicate that must be satisfied by all output tuples

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

Notes

A selection is defined as

\[\sigma_\theta(R) := \{ r \in R | \theta(r) \}\]
property input_node: RelNode#

Get the input relation that should be filtered.

Returns:

A relation

Return type:

RelNode

property predicate: AbstractPredicate#

Get the predicate that must be satisfied by the output tuples.

Returns:

The filter condition

Return type:

AbstractPredicate

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, input_node: RelNode | None = None, predicate: AbstractPredicate | None = None, as_root: bool = False) Selection#

Creates a new selection with modified attributes.

Parameters:
  • input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.

  • predicate (Optional[AbstractPredicate], optional) – The new predicate to use. If None, the current predicate is re-used.

  • as_root (bool, optional) – Whether the selection should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified selection node

Return type:

Selection

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.CrossProduct(left_input: RelNode, right_input: RelNode, *, parent_node: RelNode | None = None)#

A cross product calculates the cartesian product between tuples from two relations.

Parameters:
  • left_input (RelNode) – Relation containing the first set of tuples

  • right_input (RelNode) – Relation containing the second set of tuples

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

Notes

A cross product is defined as

\[R \times S := \{ r \circ s | r \in R, s \in S \}\]
property left_input: RelNode#

Get the operator providing the first set of tuples.

Returns:

A relation

Return type:

RelNode

property right_input: RelNode#

Get the operator providing the second set of tuples.

Returns:

A relation

Return type:

RelNode

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, left_input: RelNode | None = None, right_input: RelNode | None = None, as_root: bool = False) CrossProduct#

Creates a new cross product with modified attributes.

Parameters:
  • left_input (Optional[RelNode], optional) – The new left child node to use. If None, the current left input node is re-used.

  • right_input (Optional[RelNode], optional) – The new right child node to use. If None, the current right input node is re-used.

  • as_root (bool, optional) – Whether the cross product should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified cross product node

Return type:

CrossProduct

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.Union(left_input: RelNode, right_input: RelNode, *, parent_node: RelNode | None = None)#

A union combines the tuple sets of two relations into a single output relation.

In order for a union to work, both relations must have the same structure.

Parameters:
  • left_input (RelNode) – The first relation

  • right_input (RelNode) – The second relation

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

Notes

The union is defined as

\[R \cup S := \{ t | t \in R \lor t \in S \}\]
property left_input: RelNode#

Get the operator providing the first relation’s tuples.

Returns:

A relation

Return type:

RelNode

property right_input: RelNode#

Get the operator providing the second relation’s tuples.

Returns:

A relation

Return type:

RelNode

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, left_input: RelNode | None = None, right_input: RelNode | None = None, as_root: bool = False) Union#

Creates a new union with modified attributes.

Parameters:
  • left_input (Optional[RelNode], optional) – The new left child node to use. If None, the current left input node is re-used.

  • right_input (Optional[RelNode], optional) – The new right child node to use. If None, the current right input node is re-used.

  • as_root (bool, optional) – Whether the union should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified union node

Return type:

Union

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.Intersection(left_input: RelNode, right_input: RelNode, *, parent_node: RelNode | None = None)#

An intersection provides all tuples that are contained in both of its input operators.

In order for an intersection to work, both relations must have the same structure.

Parameters:
  • left_input (RelNode) – The first relation.

  • right_input (RelNode) – The second relation.

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

Notes

The difference is defined as

\[R \cap S := \{ t | t \in R \land t \in S \}\]
property left_input: RelNode#

Get the operator providing the first relation’s tuples.

Returns:

A relation

Return type:

RelNode

property right_input: RelNode#

Get the operator providing the second relation’s tuples.

Returns:

A relation

Return type:

RelNode

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, left_input: RelNode | None = None, right_input: RelNode | None = None, as_root: bool = False) Intersection#

Creates a new intersection with modified attributes.

Parameters:
  • left_input (Optional[RelNode], optional) – The new left child node to use. If None, the current left input node is re-used.

  • right_input (Optional[RelNode], optional) – The new right child node to use. If None, the current right input node is re-used.

  • as_root (bool, optional) – Whether the intersection should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified intersection node

Return type:

Intersection

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.Difference(left_input: RelNode, right_input: RelNode, *, parent_node: RelNode | None = None)#

An intersection returns all tuples from one relation, that are not present in another relation.

In order for the difference to work, both input relations must share the same structure.

Parameters:
  • left_input (RelNode) – The first relation. This is the relation to remove tuples from.

  • right_input (RelNode) – The second relation. This is the relation containing the tuples that should be removed from the left_input.

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

Notes

The difference is defined as

\[R \setminus S := \{ r \in R | r \notin S \}\]
property left_input: RelNode#

Get the operator providing the relation to remove tuples from.

Returns:

A relation

Return type:

RelNode

property right_input: RelNode#

Get the operator providing the tuples to remove.

Returns:

A relation

Return type:

RelNode

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, left_input: RelNode | None = None, right_input: RelNode | None = None, as_root: bool = False) Difference#

Creates a new difference with modified attributes.

Parameters:
  • left_input (Optional[RelNode], optional) – The new left child node to use. If None, the current left input node is re-used.

  • right_input (Optional[RelNode], optional) – The new right child node to use. If None, the current right input node is re-used.

  • as_root (bool, optional) – Whether the difference should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified difference node

Return type:

Difference

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.Relation(table: TableReference, provided_columns: Iterable[ColumnReference | ColumnExpression], *, subquery_input: RelNode | None = None, parent_node: RelNode | None = None)#

A relation provides the tuples (“rows”) contained in a table.

Each relation can correspond to a physical table contained in some relational schema, or it can represent the result of a subquery operation.

Parameters:
  • table (TableReference) – The table that is represented by this relation.

  • provided_columns (Iterable[ColumnReference | ColumnExpression]) – The columns that are contained in the table.

  • subquery_input (Optional[RelNode], optional) – For subquery relations, this is the algebraic expression that computes the results of the subquery. Relations that correspond to base tables do not have this attribute set.

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

property table: TableReference#

Get the table that is represented by this relation.

Returns:

A table. Usually this will correpond to an actual physical database table, but for subqueries this might also be a virtual table.

Return type:

TableReference

property subquery_input: RelNode | None#

Get the root node of the subquery that produces the input tuples for this relation.

Returns:

The root node if it exists, or None for actual base table relations.

Return type:

Optional[RelNode]

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

tables(*, ignore_subqueries: bool = False) frozenset[TableReference]#

Provides all relations that are contained in the current node.

Consider the following algebraic expression: π(⋈(σ(R), S)). This expression contains two relations: R and S.

Parameters:

ignore_subqueries (bool, optional) – Whether relations that are only referenced in subquery subtrees should be excluded. Off by default.

Returns:

The tables

Return type:

frozenset[TableReference]

provided_expressions() frozenset[SqlExpression]#

Collects all expressions that are available to parent nodes.

These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.

Returns:

The expressions

Return type:

frozenset[expressions.SqlExpression]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, table: TableReference | None = None, provided_columns: Iterable[ColumnReference | ColumnExpression] | None = None, subquery_input: RelNode | None = None, as_root: bool = False) Relation#

Creates a new relation with modified attributes.

Parameters:
  • table (Optional[TableReference], optional) – The new table to use. If None, the current table is re-used.

  • provided_columns (Optional[Iterable[ColumnReference | ColumnExpression]], optional) – The new columns to use. If None, the current columns are re-used.

  • subquery_input (Optional[RelNode], optional) – The new subquery input to use. If None, the current subquery input is re-used.

  • as_root (bool, optional) – Whether the relation should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified relation node

Return type:

Relation

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.ThetaJoin(left_input: RelNode, right_input: RelNode, predicate: AbstractPredicate, *, parent_node: RelNode | None = None)#

A theta joins combines individual tuples from two input relations if they match a specific predicate.

Parameters:
  • left_input (RelNode) – Relation containing the first set of tuples.

  • right_input (RelNode) – Relation containing the second set of tuples.

  • predicate (AbstractPredicate) – A predicate that must be satisfied by all joined tuples.

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

Notes

A theta join is defined as

\[\bowtie_\theta(R, S) := \{ r \circ s | r \in R \land s \in S \land \theta(r, s) \}\]
property left_input: RelNode#

Get the operator providing the first set of tuples.

Returns:

A relation

Return type:

RelNode

property right_input: RelNode#

Get the operator providing the second set of tuples.

Returns:

A relation

Return type:

RelNode

property predicate: AbstractPredicate#

Get the condition that must be satisfied by the input tuples.

Returns:

A predicate

Return type:

AbstractPredicate

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, left_input: RelNode | None = None, right_input: RelNode | None = None, predicate: AbstractPredicate | None = None, as_root: bool = False) ThetaJoin#

Creates a new theta join with modified attributes.

Parameters:
  • left_input (Optional[RelNode], optional) – The new left child node to use. If None, the current left input node is re-used.

  • right_input (Optional[RelNode], optional) – The new right child node to use. If None, the current right input node is re-used.

  • predicate (Optional[AbstractPredicate], optional) – The new predicate to use. If None, the current predicate is re-used.

  • as_root (bool, optional) – Whether the theta join should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified theta join node

Return type:

ThetaJoin

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.Projection(input_node: RelNode, targets: Sequence[SqlExpression], *, parent_node: RelNode | None = None)#

A projection selects individual attributes from the tuples of an input relation.

The output relation will contain exactly the same tuples as the input, but each tuple will potentially contain less attributes.

Parameters:
  • input_node (RelNode) – The tuples to process

  • targets (Sequence[SqlExpression]) – The attributes that should still be contained in the output relation

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

property input_node: RelNode#

Get the operator providing the tuples to project.

Returns:

A relation

Return type:

RelNode

property columns: Sequence[SqlExpression]#

Provides the attributes that should be included in the output relation’s tuples.

Returns:

The projected attributes.

Return type:

Sequence[SqlExpression]

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, input_node: RelNode | None = None, targets: Sequence[SqlExpression] | None = None, as_root: bool = False) Projection#

Creates a new projection with modified attributes.

Parameters:
  • input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.

  • targets (Optional[Sequence[SqlExpression]], optional) – The new targets to use. If None, the current targets are re-used.

  • as_root (bool, optional) – Whether the projection should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified projection node

Return type:

Projection

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.Grouping(input_node: RelNode, group_columns: Sequence[SqlExpression], *, aggregates: dict[frozenset[SqlExpression], frozenset[FunctionExpression]] | None = None, parent_node: RelNode | None = None)#

Grouping partitions input tuples according to specific attributes and calculates aggregated values.

Parameters:
  • input_node (RelNode) – The tuples to process

  • group_columns (Sequence[SqlExpression]) – The expressions that should be used to partition the input tuples. Can be empty if only aggregations over all input tuples should be computed.

  • aggregates (Optional[dict[frozenset[SqlExpression], frozenset[FunctionExpression]]], optional) – The aggregates that should be computed. This is a mapping from the input expressions to the desired aggregate. Can be empty if only a grouping should be performed. In this case, the grouping operates as a duplicate-elimination mechanism.

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

property input_node: RelNode#

Get the operator that provides the tuples to group.

Returns:

A relation

Return type:

RelNode

property group_columns: Sequence[SqlExpression]#

Get the expressions that should be used to partition the input tuples.

Returns:

The group columns. Can be empty if only aggregations over all input tuples should be computed.

Return type:

Sequence[SqlExpression]

property aggregates: frozendict[SqlExpression, FunctionExpression]#

Get the aggregates that should be computed.

Aggregates map from the input expressions to the desired aggregation function.

Returns:

The aggregations. Can be empty if only a grouping should be performed.

Return type:

util.frozendict[SqlExpression, FunctionExpression]

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

provided_expressions() frozenset[SqlExpression]#

Collects all expressions that are available to parent nodes.

These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.

Returns:

The expressions

Return type:

frozenset[expressions.SqlExpression]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, input_node: RelNode | None = None, group_columns: Sequence[SqlExpression] | None = None, aggregates: dict[frozenset[SqlExpression], frozenset[FunctionExpression]] | None = None, parent: RelNode | None = None, as_root: bool = False) Grouping#

Creates a new group by with modified attributes.

Parameters:
  • input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.

  • group_columns (Optional[Sequence[SqlExpression]], optional) – The new group columns to use. If None, the current group columns are re-used.

  • aggregates (Optional[dict[frozenset[SqlExpression], frozenset[FunctionExpression]]], optional) – The new aggregates to use. If None, the current aggregates are re-used.

  • as_root (bool, optional) – Whether the group by should become the new root node of the tree. This overwrites any value passed to parent.

  • parent (RelNode | None)

Returns:

The modified group by node

Return type:

Grouping

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.Rename(input_node: RelNode, mapping: dict[ColumnReference, ColumnReference], *, parent_node: RelNode | None = None)#

Rename remaps column names to different names.

Parameters:
  • input_node (RelNode) – The tuples to modify

  • mapping (dict[ColumnReference, ColumnReference]) – A map from current column name to new column name.

  • parent_node (Optional[RelNode]) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

Warning

This node is currently not used since we do not support natural joins.

property input_node: RelNode#

Get the operator that provides the tuples to modify

Returns:

A relation

Return type:

RelNode

property mapping: frozendict[ColumnReference, ColumnReference]#

Get the performed renamings.

Returns:

A map from current column name to new column name.

Return type:

util.frozendict[ColumnReference, ColumnReference]

provided_expressions() frozenset[SqlExpression]#

Collects all expressions that are available to parent nodes.

These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.

Returns:

The expressions

Return type:

frozenset[expressions.SqlExpression]

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, input_node: RelNode | None = None, mapping: dict[ColumnReference, ColumnReference] | None = None, as_root: bool = False) Rename#

Creates a new rename with modified attributes.

Parameters:
  • input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.

  • mapping (Optional[dict[ColumnReference, ColumnReference]], optional) – The new mapping to use. If None, the current mapping is re-used.

  • as_root (bool, optional) – Whether the rename should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified rename node

Return type:

Rename

See also

RelNode.mutate

for safety considerations and calling conventions

postbound.relalg.SortDirection#

Describes whether tuples should be sorted in ascending or descending order.

alias of Literal[‘asc’, ‘desc’]

class postbound.relalg.Sort(input_node: RelNode, sorting: Sequence[tuple[SqlExpression, Literal['asc', 'desc']] | SqlExpression], *, parent_node: RelNode | None = None)#

Sort modifies the order in which tuples are provided.

Parameters:
  • input_node (RelNode) – The tuples to order

  • sorting (Sequence[tuple[SqlExpression, SortDirection] | SqlExpression]) – The expressions that should be used to determine the sorting. For expressions that do not specify any particular direction, ascending order is assumed. Later expressions are used to solve ties among tuples with the same expression values in the first couple of expressions.

  • parent_node (Optional[RelNode], optional) – _description_, by default None

Notes

Strictly speaking, this operator is not part of traditional relational algebra. This is because the algebra uses set-semantics which do not supply any ordering. However, due to SQL’s ORDER BY clause, most relational algebra dialects support ordering nevertheless.

However, we do not support special placement of NULL column values, i.e. no ORDER BY R.a NULLS LAST, etc.

property input_node: RelNode#

Get the operator providing the tuples to sort.

Returns:

A relation

Return type:

RelNode

property sorting: Sequence[tuple[SqlExpression, Literal['asc', 'desc']]]#

Get the desired ordering.

Later expressions are used to solve ties among tuples with the same expression values in the first couple of expressions.

Returns:

The expressions to order, most signifcant orders coming first.

Return type:

Sequence[tuple[SqlExpression, SortDirection]]

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, input_node: RelNode | None = None, sorting: Sequence[tuple[SqlExpression, Literal['asc', 'desc']] | SqlExpression] | None = None, as_root: bool = False) Sort#

Creates a new sort with modified attributes.

Parameters:
  • input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.

  • sorting (Optional[Sequence[tuple[SqlExpression, SortDirection] | SqlExpression]], optional) – The new sorting to use. If None, the current sorting is re-used.

  • as_root (bool, optional) – Whether the sort should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified sort node

Return type:

Sort

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.Map(input_node: RelNode, mapping: dict[frozenset[SqlExpression | ColumnReference], frozenset[SqlExpression]], *, parent_node: RelNode | None = None)#

Mapping computes new expressions from the currently existing ones.

For example, the expression R.a + 42 can be computed during a mapping operation based on the input from a relation node R.

Parameters:
  • input_node (RelNode) – The tuples to process

  • mapping (dict[frozenset[SqlExpression | ColumnReference], frozenset[SqlExpression]]) – The expressions to compute. Maps from the arguments to the target expressions. The arguments themselves can be computed during the very same mapping operation. Alternatively, they can be supplied by the input_node.

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

property input_node: RelNode#

Get the operator that provides the tuples to map.

Returns:

A relation

Return type:

RelNode

property mapping: frozendict[frozenset[SqlExpression], frozenset[SqlExpression]]#

Get the expressions to compute. Maps from the arguments to the target expressions.

The arguments themselves can be computed during the very same mapping operation. Alternatively, they can be supplied by the input node.

Returns:

The expressions

Return type:

util.frozendict[frozenset[SqlExpression], frozenset[SqlExpression]]

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

provided_expressions() frozenset[SqlExpression]#

Collects all expressions that are available to parent nodes.

These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.

Returns:

The expressions

Return type:

frozenset[expressions.SqlExpression]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, input_node: RelNode | None = None, mapping: dict[frozenset[SqlExpression | ColumnReference], frozenset[SqlExpression]] | None = None, as_root: bool = False) Map#

Creates a new map with modified attributes.

Parameters:
  • input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.

  • mapping (Optional[dict[frozenset[SqlExpression | ColumnReference], frozenset[SqlExpression]]], optional) – The new mapping to use. If None, the current mapping is re-used.

  • as_root (bool, optional) – Whether the map should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified map node

Return type:

Map

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.DuplicateElimination(input_node: RelNode, *, parent_node: RelNode | None = None)#

Duplicate elimination ensures that all attribute combinations of all tuples are unique.

Parameters:
  • input_node (RelNode) – The tuples that should be unique

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

Notes

Strictly speaking, this operator is not part of traditional relational algebra. This is because the algebra uses set-semantics which do not supply any ordering. However, due to SQL’s usage of multi-sets which allow duplicates, most relational algebra dialects support ordering nevertheless.

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, input_node: RelNode | None = None, as_root: bool = False) DuplicateElimination#

Creates a new duplicate elimination with modified attributes.

Parameters:
  • input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.

  • as_root (bool, optional) – Whether the duplicate elimination should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified duplicate elimination node

Return type:

DuplicateElimination

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.SemiJoin(input_node: RelNode, subquery_node: SubqueryScan, predicate: AbstractPredicate | None = None, *, parent_node: RelNode | None = None)#

A semi join provides all tuples from one relation with a matching partner tuple from another relation.

Parameters:
  • input_node (RelNode) – The tuples to “filter”

  • subquery_node (SubqueryScan) – The relation that provides all tuples that have to match tuples in the input_node.

  • predicate (Optional[AbstractPredicate], optional) – An optional predicate that is used to determine a match.

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

Notes

A semi join is defined as

\[⋉_\theta(R, S) := \{ r | r \in R \land s \in S \land \theta(r, s) \}\]
property input_node: RelNode#

Get the operator providing the tuples to filter.

Returns:

A relation

Return type:

RelNode

property subquery_node: SubqueryScan#

Get the operator providing the filtering tuples.

Returns:

A relation

Return type:

SubqueryScan

property predicate: AbstractPredicate | None#

Get the match condition to determine the join partners.

If there is no dedicated predicate, tuples from the input_node match, if any tuple is emitted by the subquery_node.

Returns:

The condition

Return type:

Optional[AbstractPredicate]

is_dependent() bool#

Checks, whether the subquery relation is depdent (sometimes also called correlated) with the input relation.

Returns:

Whether the subquery is correlated with the input query

Return type:

bool

See also

SqlQuery.is_depedent

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, input_node: RelNode | None = None, subquery_node: SubqueryScan | None = None, predicate: AbstractPredicate | None = None, as_root: bool = False) SemiJoin#

Creates a new semi join with modified attributes.

Parameters:
  • input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.

  • subquery_node (Optional[SubqueryScan], optional) – The new subquery node to use. If None, the current subquery node is re-used.

  • predicate (Optional[AbstractPredicate], optional) – The new predicate to use. If None, the current predicate is re-used.

  • as_root (bool, optional) – Whether the semi join should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified semi join node

Return type:

SemiJoin

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.AntiJoin(input_node: RelNode, subquery_node: SubqueryScan, predicate: AbstractPredicate | None = None, *, parent_node: RelNode | None = None)#

An anti join provides all tuples from one relation with no matching partner tuple from another relation.

Parameters:
  • input_node (RelNode) – The tuples to “filter”

  • subquery_node (SubqueryScan) – The relation that provides all tuples that have to match tuples in the input_node.

  • predicate (Optional[AbstractPredicate], optional) – An optional predicate that is used to determine a match.

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

Notes

An anti join is defined as

\[▷_\theta(R, S) := \{ r | r \in R \land s \in S \land \theta(r, s) \}\]
property input_node: RelNode#

Get the operator providing the tuples to filter.

Returns:

A relation

Return type:

RelNode

property subquery_node: SubqueryScan#

Get the operator providing the filtering tuples.

Returns:

A relation

Return type:

SubqueryScan

property predicate: AbstractPredicate | None#

Get the match condition to determine the join partners.

If there is no dedicated predicate, tuples from the input_node match, if any tuple is emitted by the subquery_node.

Returns:

The condition

Return type:

Optional[AbstractPredicate]

is_dependent() bool#

Checks, whether the subquery relation is depdent (sometimes also called correlated) with the input relation.

Returns:

Whether the subquery is correlated with the input query

Return type:

bool

See also

SqlQuery.is_depedent

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, input_node: RelNode | None = None, subquery_node: SubqueryScan | None = None, predicate: AbstractPredicate | None = None, as_root: bool = False) AntiJoin#

Creates a new anti join with modified attributes.

Parameters:
  • input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.

  • subquery_node (Optional[SubqueryScan], optional) – The new subquery node to use. If None, the current subquery node is re-used.

  • predicate (Optional[AbstractPredicate], optional) – The new predicate to use. If None, the current predicate is re-used.

  • as_root (bool, optional) – Whether the anti join should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified anti join node

Return type:

AntiJoin

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.SubqueryScan(input_node: RelNode, subquery: SqlQuery, *, parent_node: RelNode | None = None)#

A meta node to designate a subtree that originated from a subquery.

Parameters:
  • input_node (RelNode) – The relation that identifies the subquery result

  • subquery (SqlQuery) – The query that actually calculates the subquery

  • parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.

Notes

This node is not part of traditional relational algebra, nor do many other systems make use of it. For our purposes it serves as a marker node to quickly designate subqueries and to operate on the original queries or their algebraic representation in a convenient manner.

property input_node: RelNode#

Get the result node of the subquery

Returns:

A relation

Return type:

RelNode

property subquery: SqlQuery#

Get the actual subquery.

Returns:

A query

Return type:

SqlQuery

tables(*, ignore_subqueries: bool = False) frozenset[TableReference]#

Provides all relations that are contained in the current node.

Consider the following algebraic expression: π(⋈(σ(R), S)). This expression contains two relations: R and S.

Parameters:

ignore_subqueries (bool, optional) – Whether relations that are only referenced in subquery subtrees should be excluded. Off by default.

Returns:

The tables

Return type:

frozenset[TableReference]

children() Sequence[RelNode]#

Provides all input nodes of the current operator.

Returns:

The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.

Return type:

Sequence[RelNode]

provided_expressions() frozenset[SqlExpression]#

Collects all expressions that are available to parent nodes.

These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.

Returns:

The expressions

Return type:

frozenset[expressions.SqlExpression]

accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#

Enables processing of the current algebraic expression by an expression visitor.

Parameters:

visitor (RelNodeVisitor[VisitorResult]) – The visitor

Return type:

VisitorResult

mutate(*, input_node: RelNode | None = None, subquery: SqlQuery | None = None, as_root: bool = False) SubqueryScan#

Creates a new subquery scan with modified attributes.

Parameters:
  • input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.

  • subquery (Optional[SqlQuery], optional) – The new subquery to use. If None, the current subquery is re-used.

  • as_root (bool, optional) – Whether the subquery scan should become the new root node of the tree. This overwrites any value passed to parent.

Returns:

The modified subquery scan node

Return type:

SubqueryScan

See also

RelNode.mutate

for safety considerations and calling conventions

class postbound.relalg.VisitorResult#

Result type of visitor processes.

alias of TypeVar(‘VisitorResult’)

class postbound.relalg.RelNodeVisitor#

Basic visitor to operator on arbitrary relational algebra trees.

See also

RelNode

References

class postbound.relalg.EvaluationPhase(*values)#

Indicates when a specific expression or predicate can be evaluated at the earliest.

BaseTable = 1#

Evaluation is possible using only the tuples from the base table, e.g. base table filters.

Join = 2#

Evaluation is possible while joining the required base tables, e.g. join predicates.

PostJoin = 3#

Evaluation is possible once all base tables have been joined, e.g. mappings over joined columns.

PostAggregation = 4#

Evaluation is possible once all aggregations have been performed, e.g. filters over aggregated columns.

postbound.relalg.parse_relalg(query: ImplicitSqlQuery) RelNode#

Converts an SQL query to a representation in relational algebra.

Parameters:

query (util.ImplicitSqlQuery) – The query to convert

Returns:

The root node of the relational algebra tree. Notice that in some cases the algebraic expression might not be a tree but a directed, acyclic graph instead. However, in this case there still is a single root node.

Return type:

RelNode