Relational Algebra#
Module name: postbound.relalg
relalg provides fundamental building blocks of relational algebra and a converter from SQL to algebra.
The central component of our algebra implementation is the RelNode class. All relational operators inherit from this abstract class. Following the design of the expression and predicate models, all algebraic trees are immutable data structures. Once a tree has been generated, it can no longer be modified.
One important aspect of our relational algebra design is how to model arbitrary expressions and projections, mappings, etc. on
these expressions. Some systems introduce temporary variables for mapping targets, e.g. arg0 <- R.a + 42 and then base all
further accesses on these temporary variables. In this school of thought, an algebra tree for the query
SELECT R.a + 42 FROM R would like this:
This representation is especially usefull for a code-generation or physical optimization scenario because it enables a straightforward creation of additional (temporary) columns. At the same time, it makes the translation of SQL queries to relational algebra more challenging, since re-writes have to be applied during parsing. Since we are not concerned with code-generation in our algebra representation and focus more on structural properties, we take a different approach: all expressions (as defined in the expressions module) are contained as-is in the operators. However, we make sure that necessary pre-processing actions are included as required. For example, if a complex expression is included in a predicate or a projection, we generate the appropriate mapping operation beforehand and use it as an input for the consuming operator.
In addition to the conventional operators of relational algebra, we introduce a couple of additional operators that either mimic features from SQL, or that make working with the algebra much easier from a technical point-of-view. The first category includes operators such as Sort or DuplicateElimination and Limit, whereas the second category includes the SubqueryScan.
Notice that while most algebraic expressions correspond to tree structures, there might be cases where a directed, acyclic graph is generated. This is especially the case when a base relation is used as part of subqueries. Nevertheless, there will always be only one root (sink) node.
- class postbound.relalg.RelNode(parent_node: RelNode | None)#
Models a fundamental operator in relation algebra. All specific operators like selection or theta join inherit from it.
- Parameters:
parent_node (Optional[RelNode]) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
See also
- property node_type: str#
Get the current operator as a string.
- Returns:
The operator name
- Return type:
str
- property parent_node: RelNode | None#
Get the parent node of the current operator, if it exists.
- Returns:
The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None is returned.
- Return type:
Optional[RelNode]
- property sideways_pass: frozenset[RelNode]#
Get all nodes that receive the output of the current operator in addition to the parent node.
- Returns:
The sideways pass nodes
- Return type:
frozenset[RelNode]
- root() RelNode#
Traverses the algebra tree upwards until the root node is found.
- Returns:
The root node of the algebra expression. Can be the current node if it does not have a parent.
- Return type:
- leaf() RelNode | None#
Traverses the algebra tree downwards until a unique leaf node is found.
- Returns:
The leaf node. If multiple leaf nodes exist (e.g. for join nodes), None is returned.
- Return type:
Optional[RelNode]
- abstractmethod children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- tables(*, ignore_subqueries: bool = False) frozenset[TableReference]#
Provides all relations that are contained in the current node.
Consider the following algebraic expression: π(⋈(σ(R), S)). This expression contains two relations: R and S.
- Parameters:
ignore_subqueries (bool, optional) – Whether relations that are only referenced in subquery subtrees should be excluded. Off by default.
- Returns:
The tables
- Return type:
frozenset[TableReference]
- provided_expressions() frozenset[SqlExpression]#
Collects all expressions that are available to parent nodes.
These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.
- Returns:
The expressions
- Return type:
frozenset[expressions.SqlExpression]
- abstractmethod accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- dfs_walk() Generator[RelNode, None, None]#
Performs a depth-first search on the algebraic expression.
This produces the subtree induced by the current node. The current node is also included in the output.
- Yields:
Generator[RelNode, None, None] – All nodes of the subtree induced by the current node.
- Return type:
Generator[RelNode, None, None]
- mutate(*, as_root: bool = False, **kwargs) RelNode#
Creates a new instance of the current operator with modified attributes.
The specific parameters depend on the concrete operator type. Calling mutate() on any node generates a copy of the entire tree, so make sure to use this operation sparingly.
See Notes for important remarks regarding the mutate implementation for custom node types.
- Parameters:
as_root (bool, optional) – Whether the current node should become the new root node of the tree. Defaults to False, which leaves the node at its current position in the tree.
**kwargs – The attributes to modify. The specific attributes depend on the concrete operator type.
- Returns:
The modified node
- Return type:
Notes
Concrete node types should implement their own mutate() function which accepts parameters specific to the node type (in addition to the required as_root parameter). Other than the function prototype, the methods can share the same default implementation:
params = {param: val for param, val in locals().items() if param != "self" and not param.startswith("__")} return super().mutate(**params)
In order for the method to work properly, all field-specific parameters must use the common property/attribute conventions. For example, to update a property input_node of the operator, its value must be stored in a “private” attribute _input_node. At the same time, the mutate() method must accept a parameter input_node (without the leading underscore). The method will then automatically update the internal attribute with the new value.
See the implementation of Selection for an example.
- inspect(*, _indentation: int = 0) str#
Provides a nice hierarchical string representation of the algebraic expression.
The representation typically spans multiple lines and uses indentation to separate parent nodes from their children.
- Parameters:
indentation (int, optional) – Internal parameter to the inspect function. Should not be modified by the user. Denotes how deeply recursed we are in the plan tree. This enables the correct calculation of the current indentation level. Defaults to 0 for the root node.
_indentation (int)
- Returns:
A string representation of the algebraic expression
- Return type:
str
- class postbound.relalg.Selection(input_node: RelNode, predicate: AbstractPredicate, *, parent_node: RelNode | None = None)#
A selection filters the input relation based on an arbitrary predicate.
- Parameters:
input_node (RelNode) – The tuples to filter
predicate (AbstractPredicate) – The predicate that must be satisfied by all output tuples
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
Notes
A selection is defined as
\[\sigma_\theta(R) := \{ r \in R | \theta(r) \}\]- property input_node: RelNode#
Get the input relation that should be filtered.
- Returns:
A relation
- Return type:
- property predicate: AbstractPredicate#
Get the predicate that must be satisfied by the output tuples.
- Returns:
The filter condition
- Return type:
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, input_node: RelNode | None = None, predicate: AbstractPredicate | None = None, as_root: bool = False) Selection#
Creates a new selection with modified attributes.
- Parameters:
input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.
predicate (Optional[AbstractPredicate], optional) – The new predicate to use. If None, the current predicate is re-used.
as_root (bool, optional) – Whether the selection should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified selection node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.CrossProduct(left_input: RelNode, right_input: RelNode, *, parent_node: RelNode | None = None)#
A cross product calculates the cartesian product between tuples from two relations.
- Parameters:
left_input (RelNode) – Relation containing the first set of tuples
right_input (RelNode) – Relation containing the second set of tuples
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
Notes
A cross product is defined as
\[R \times S := \{ r \circ s | r \in R, s \in S \}\]- property left_input: RelNode#
Get the operator providing the first set of tuples.
- Returns:
A relation
- Return type:
- property right_input: RelNode#
Get the operator providing the second set of tuples.
- Returns:
A relation
- Return type:
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, left_input: RelNode | None = None, right_input: RelNode | None = None, as_root: bool = False) CrossProduct#
Creates a new cross product with modified attributes.
- Parameters:
left_input (Optional[RelNode], optional) – The new left child node to use. If None, the current left input node is re-used.
right_input (Optional[RelNode], optional) – The new right child node to use. If None, the current right input node is re-used.
as_root (bool, optional) – Whether the cross product should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified cross product node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.Union(left_input: RelNode, right_input: RelNode, *, parent_node: RelNode | None = None)#
A union combines the tuple sets of two relations into a single output relation.
In order for a union to work, both relations must have the same structure.
- Parameters:
left_input (RelNode) – The first relation
right_input (RelNode) – The second relation
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
Notes
The union is defined as
\[R \cup S := \{ t | t \in R \lor t \in S \}\]- property left_input: RelNode#
Get the operator providing the first relation’s tuples.
- Returns:
A relation
- Return type:
- property right_input: RelNode#
Get the operator providing the second relation’s tuples.
- Returns:
A relation
- Return type:
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, left_input: RelNode | None = None, right_input: RelNode | None = None, as_root: bool = False) Union#
Creates a new union with modified attributes.
- Parameters:
left_input (Optional[RelNode], optional) – The new left child node to use. If None, the current left input node is re-used.
right_input (Optional[RelNode], optional) – The new right child node to use. If None, the current right input node is re-used.
as_root (bool, optional) – Whether the union should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified union node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.Intersection(left_input: RelNode, right_input: RelNode, *, parent_node: RelNode | None = None)#
An intersection provides all tuples that are contained in both of its input operators.
In order for an intersection to work, both relations must have the same structure.
- Parameters:
left_input (RelNode) – The first relation.
right_input (RelNode) – The second relation.
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
Notes
The difference is defined as
\[R \cap S := \{ t | t \in R \land t \in S \}\]- property left_input: RelNode#
Get the operator providing the first relation’s tuples.
- Returns:
A relation
- Return type:
- property right_input: RelNode#
Get the operator providing the second relation’s tuples.
- Returns:
A relation
- Return type:
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, left_input: RelNode | None = None, right_input: RelNode | None = None, as_root: bool = False) Intersection#
Creates a new intersection with modified attributes.
- Parameters:
left_input (Optional[RelNode], optional) – The new left child node to use. If None, the current left input node is re-used.
right_input (Optional[RelNode], optional) – The new right child node to use. If None, the current right input node is re-used.
as_root (bool, optional) – Whether the intersection should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified intersection node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.Difference(left_input: RelNode, right_input: RelNode, *, parent_node: RelNode | None = None)#
An intersection returns all tuples from one relation, that are not present in another relation.
In order for the difference to work, both input relations must share the same structure.
- Parameters:
left_input (RelNode) – The first relation. This is the relation to remove tuples from.
right_input (RelNode) – The second relation. This is the relation containing the tuples that should be removed from the left_input.
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
Notes
The difference is defined as
\[R \setminus S := \{ r \in R | r \notin S \}\]- property left_input: RelNode#
Get the operator providing the relation to remove tuples from.
- Returns:
A relation
- Return type:
- property right_input: RelNode#
Get the operator providing the tuples to remove.
- Returns:
A relation
- Return type:
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, left_input: RelNode | None = None, right_input: RelNode | None = None, as_root: bool = False) Difference#
Creates a new difference with modified attributes.
- Parameters:
left_input (Optional[RelNode], optional) – The new left child node to use. If None, the current left input node is re-used.
right_input (Optional[RelNode], optional) – The new right child node to use. If None, the current right input node is re-used.
as_root (bool, optional) – Whether the difference should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified difference node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.Relation(table: TableReference, provided_columns: Iterable[ColumnReference | ColumnExpression], *, subquery_input: RelNode | None = None, parent_node: RelNode | None = None)#
A relation provides the tuples (“rows”) contained in a table.
Each relation can correspond to a physical table contained in some relational schema, or it can represent the result of a subquery operation.
- Parameters:
table (TableReference) – The table that is represented by this relation.
provided_columns (Iterable[ColumnReference | ColumnExpression]) – The columns that are contained in the table.
subquery_input (Optional[RelNode], optional) – For subquery relations, this is the algebraic expression that computes the results of the subquery. Relations that correspond to base tables do not have this attribute set.
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
- property table: TableReference#
Get the table that is represented by this relation.
- Returns:
A table. Usually this will correpond to an actual physical database table, but for subqueries this might also be a virtual table.
- Return type:
- property subquery_input: RelNode | None#
Get the root node of the subquery that produces the input tuples for this relation.
- Returns:
The root node if it exists, or None for actual base table relations.
- Return type:
Optional[RelNode]
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- tables(*, ignore_subqueries: bool = False) frozenset[TableReference]#
Provides all relations that are contained in the current node.
Consider the following algebraic expression: π(⋈(σ(R), S)). This expression contains two relations: R and S.
- Parameters:
ignore_subqueries (bool, optional) – Whether relations that are only referenced in subquery subtrees should be excluded. Off by default.
- Returns:
The tables
- Return type:
frozenset[TableReference]
- provided_expressions() frozenset[SqlExpression]#
Collects all expressions that are available to parent nodes.
These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.
- Returns:
The expressions
- Return type:
frozenset[expressions.SqlExpression]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, table: TableReference | None = None, provided_columns: Iterable[ColumnReference | ColumnExpression] | None = None, subquery_input: RelNode | None = None, as_root: bool = False) Relation#
Creates a new relation with modified attributes.
- Parameters:
table (Optional[TableReference], optional) – The new table to use. If None, the current table is re-used.
provided_columns (Optional[Iterable[ColumnReference | ColumnExpression]], optional) – The new columns to use. If None, the current columns are re-used.
subquery_input (Optional[RelNode], optional) – The new subquery input to use. If None, the current subquery input is re-used.
as_root (bool, optional) – Whether the relation should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified relation node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.ThetaJoin(left_input: RelNode, right_input: RelNode, predicate: AbstractPredicate, *, parent_node: RelNode | None = None)#
A theta joins combines individual tuples from two input relations if they match a specific predicate.
- Parameters:
left_input (RelNode) – Relation containing the first set of tuples.
right_input (RelNode) – Relation containing the second set of tuples.
predicate (AbstractPredicate) – A predicate that must be satisfied by all joined tuples.
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
Notes
A theta join is defined as
\[\bowtie_\theta(R, S) := \{ r \circ s | r \in R \land s \in S \land \theta(r, s) \}\]- property left_input: RelNode#
Get the operator providing the first set of tuples.
- Returns:
A relation
- Return type:
- property right_input: RelNode#
Get the operator providing the second set of tuples.
- Returns:
A relation
- Return type:
- property predicate: AbstractPredicate#
Get the condition that must be satisfied by the input tuples.
- Returns:
A predicate
- Return type:
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, left_input: RelNode | None = None, right_input: RelNode | None = None, predicate: AbstractPredicate | None = None, as_root: bool = False) ThetaJoin#
Creates a new theta join with modified attributes.
- Parameters:
left_input (Optional[RelNode], optional) – The new left child node to use. If None, the current left input node is re-used.
right_input (Optional[RelNode], optional) – The new right child node to use. If None, the current right input node is re-used.
predicate (Optional[AbstractPredicate], optional) – The new predicate to use. If None, the current predicate is re-used.
as_root (bool, optional) – Whether the theta join should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified theta join node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.Projection(input_node: RelNode, targets: Sequence[SqlExpression], *, parent_node: RelNode | None = None)#
A projection selects individual attributes from the tuples of an input relation.
The output relation will contain exactly the same tuples as the input, but each tuple will potentially contain less attributes.
- Parameters:
input_node (RelNode) – The tuples to process
targets (Sequence[SqlExpression]) – The attributes that should still be contained in the output relation
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
- property input_node: RelNode#
Get the operator providing the tuples to project.
- Returns:
A relation
- Return type:
- property columns: Sequence[SqlExpression]#
Provides the attributes that should be included in the output relation’s tuples.
- Returns:
The projected attributes.
- Return type:
Sequence[SqlExpression]
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, input_node: RelNode | None = None, targets: Sequence[SqlExpression] | None = None, as_root: bool = False) Projection#
Creates a new projection with modified attributes.
- Parameters:
input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.
targets (Optional[Sequence[SqlExpression]], optional) – The new targets to use. If None, the current targets are re-used.
as_root (bool, optional) – Whether the projection should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified projection node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.Grouping(input_node: RelNode, group_columns: Sequence[SqlExpression], *, aggregates: dict[frozenset[SqlExpression], frozenset[FunctionExpression]] | None = None, parent_node: RelNode | None = None)#
Grouping partitions input tuples according to specific attributes and calculates aggregated values.
- Parameters:
input_node (RelNode) – The tuples to process
group_columns (Sequence[SqlExpression]) – The expressions that should be used to partition the input tuples. Can be empty if only aggregations over all input tuples should be computed.
aggregates (Optional[dict[frozenset[SqlExpression], frozenset[FunctionExpression]]], optional) – The aggregates that should be computed. This is a mapping from the input expressions to the desired aggregate. Can be empty if only a grouping should be performed. In this case, the grouping operates as a duplicate-elimination mechanism.
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
- property input_node: RelNode#
Get the operator that provides the tuples to group.
- Returns:
A relation
- Return type:
- property group_columns: Sequence[SqlExpression]#
Get the expressions that should be used to partition the input tuples.
- Returns:
The group columns. Can be empty if only aggregations over all input tuples should be computed.
- Return type:
Sequence[SqlExpression]
- property aggregates: frozendict[SqlExpression, FunctionExpression]#
Get the aggregates that should be computed.
Aggregates map from the input expressions to the desired aggregation function.
- Returns:
The aggregations. Can be empty if only a grouping should be performed.
- Return type:
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- provided_expressions() frozenset[SqlExpression]#
Collects all expressions that are available to parent nodes.
These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.
- Returns:
The expressions
- Return type:
frozenset[expressions.SqlExpression]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, input_node: RelNode | None = None, group_columns: Sequence[SqlExpression] | None = None, aggregates: dict[frozenset[SqlExpression], frozenset[FunctionExpression]] | None = None, parent: RelNode | None = None, as_root: bool = False) Grouping#
Creates a new group by with modified attributes.
- Parameters:
input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.
group_columns (Optional[Sequence[SqlExpression]], optional) – The new group columns to use. If None, the current group columns are re-used.
aggregates (Optional[dict[frozenset[SqlExpression], frozenset[FunctionExpression]]], optional) – The new aggregates to use. If None, the current aggregates are re-used.
as_root (bool, optional) – Whether the group by should become the new root node of the tree. This overwrites any value passed to parent.
parent (RelNode | None)
- Returns:
The modified group by node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.Rename(input_node: RelNode, mapping: dict[ColumnReference, ColumnReference], *, parent_node: RelNode | None = None)#
Rename remaps column names to different names.
- Parameters:
input_node (RelNode) – The tuples to modify
mapping (dict[ColumnReference, ColumnReference]) – A map from current column name to new column name.
parent_node (Optional[RelNode]) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
Warning
This node is currently not used since we do not support natural joins.
- property input_node: RelNode#
Get the operator that provides the tuples to modify
- Returns:
A relation
- Return type:
- property mapping: frozendict[ColumnReference, ColumnReference]#
Get the performed renamings.
- Returns:
A map from current column name to new column name.
- Return type:
- provided_expressions() frozenset[SqlExpression]#
Collects all expressions that are available to parent nodes.
These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.
- Returns:
The expressions
- Return type:
frozenset[expressions.SqlExpression]
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, input_node: RelNode | None = None, mapping: dict[ColumnReference, ColumnReference] | None = None, as_root: bool = False) Rename#
Creates a new rename with modified attributes.
- Parameters:
input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.
mapping (Optional[dict[ColumnReference, ColumnReference]], optional) – The new mapping to use. If None, the current mapping is re-used.
as_root (bool, optional) – Whether the rename should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified rename node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- postbound.relalg.SortDirection#
Describes whether tuples should be sorted in ascending or descending order.
alias of
Literal[‘asc’, ‘desc’]
- class postbound.relalg.Sort(input_node: RelNode, sorting: Sequence[tuple[SqlExpression, Literal['asc', 'desc']] | SqlExpression], *, parent_node: RelNode | None = None)#
Sort modifies the order in which tuples are provided.
- Parameters:
input_node (RelNode) – The tuples to order
sorting (Sequence[tuple[SqlExpression, SortDirection] | SqlExpression]) – The expressions that should be used to determine the sorting. For expressions that do not specify any particular direction, ascending order is assumed. Later expressions are used to solve ties among tuples with the same expression values in the first couple of expressions.
parent_node (Optional[RelNode], optional) – _description_, by default None
Notes
Strictly speaking, this operator is not part of traditional relational algebra. This is because the algebra uses set-semantics which do not supply any ordering. However, due to SQL’s ORDER BY clause, most relational algebra dialects support ordering nevertheless.
However, we do not support special placement of NULL column values, i.e. no ORDER BY R.a NULLS LAST, etc.
- property input_node: RelNode#
Get the operator providing the tuples to sort.
- Returns:
A relation
- Return type:
- property sorting: Sequence[tuple[SqlExpression, Literal['asc', 'desc']]]#
Get the desired ordering.
Later expressions are used to solve ties among tuples with the same expression values in the first couple of expressions.
- Returns:
The expressions to order, most signifcant orders coming first.
- Return type:
Sequence[tuple[SqlExpression, SortDirection]]
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, input_node: RelNode | None = None, sorting: Sequence[tuple[SqlExpression, Literal['asc', 'desc']] | SqlExpression] | None = None, as_root: bool = False) Sort#
Creates a new sort with modified attributes.
- Parameters:
input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.
sorting (Optional[Sequence[tuple[SqlExpression, SortDirection] | SqlExpression]], optional) – The new sorting to use. If None, the current sorting is re-used.
as_root (bool, optional) – Whether the sort should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified sort node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.Map(input_node: RelNode, mapping: dict[frozenset[SqlExpression | ColumnReference], frozenset[SqlExpression]], *, parent_node: RelNode | None = None)#
Mapping computes new expressions from the currently existing ones.
For example, the expression R.a + 42 can be computed during a mapping operation based on the input from a relation node R.
- Parameters:
input_node (RelNode) – The tuples to process
mapping (dict[frozenset[SqlExpression | ColumnReference], frozenset[SqlExpression]]) – The expressions to compute. Maps from the arguments to the target expressions. The arguments themselves can be computed during the very same mapping operation. Alternatively, they can be supplied by the input_node.
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
- property input_node: RelNode#
Get the operator that provides the tuples to map.
- Returns:
A relation
- Return type:
- property mapping: frozendict[frozenset[SqlExpression], frozenset[SqlExpression]]#
Get the expressions to compute. Maps from the arguments to the target expressions.
The arguments themselves can be computed during the very same mapping operation. Alternatively, they can be supplied by the input node.
- Returns:
The expressions
- Return type:
util.frozendict[frozenset[SqlExpression], frozenset[SqlExpression]]
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- provided_expressions() frozenset[SqlExpression]#
Collects all expressions that are available to parent nodes.
These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.
- Returns:
The expressions
- Return type:
frozenset[expressions.SqlExpression]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, input_node: RelNode | None = None, mapping: dict[frozenset[SqlExpression | ColumnReference], frozenset[SqlExpression]] | None = None, as_root: bool = False) Map#
Creates a new map with modified attributes.
- Parameters:
input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.
mapping (Optional[dict[frozenset[SqlExpression | ColumnReference], frozenset[SqlExpression]]], optional) – The new mapping to use. If None, the current mapping is re-used.
as_root (bool, optional) – Whether the map should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified map node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.DuplicateElimination(input_node: RelNode, *, parent_node: RelNode | None = None)#
Duplicate elimination ensures that all attribute combinations of all tuples are unique.
- Parameters:
input_node (RelNode) – The tuples that should be unique
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
Notes
Strictly speaking, this operator is not part of traditional relational algebra. This is because the algebra uses set-semantics which do not supply any ordering. However, due to SQL’s usage of multi-sets which allow duplicates, most relational algebra dialects support ordering nevertheless.
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, input_node: RelNode | None = None, as_root: bool = False) DuplicateElimination#
Creates a new duplicate elimination with modified attributes.
- Parameters:
input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.
as_root (bool, optional) – Whether the duplicate elimination should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified duplicate elimination node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.SemiJoin(input_node: RelNode, subquery_node: SubqueryScan, predicate: AbstractPredicate | None = None, *, parent_node: RelNode | None = None)#
A semi join provides all tuples from one relation with a matching partner tuple from another relation.
- Parameters:
input_node (RelNode) – The tuples to “filter”
subquery_node (SubqueryScan) – The relation that provides all tuples that have to match tuples in the input_node.
predicate (Optional[AbstractPredicate], optional) – An optional predicate that is used to determine a match.
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
Notes
A semi join is defined as
\[⋉_\theta(R, S) := \{ r | r \in R \land s \in S \land \theta(r, s) \}\]- property input_node: RelNode#
Get the operator providing the tuples to filter.
- Returns:
A relation
- Return type:
- property subquery_node: SubqueryScan#
Get the operator providing the filtering tuples.
- Returns:
A relation
- Return type:
- property predicate: AbstractPredicate | None#
Get the match condition to determine the join partners.
If there is no dedicated predicate, tuples from the input_node match, if any tuple is emitted by the subquery_node.
- Returns:
The condition
- Return type:
Optional[AbstractPredicate]
- is_dependent() bool#
Checks, whether the subquery relation is depdent (sometimes also called correlated) with the input relation.
- Returns:
Whether the subquery is correlated with the input query
- Return type:
bool
See also
SqlQuery.is_depedent
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, input_node: RelNode | None = None, subquery_node: SubqueryScan | None = None, predicate: AbstractPredicate | None = None, as_root: bool = False) SemiJoin#
Creates a new semi join with modified attributes.
- Parameters:
input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.
subquery_node (Optional[SubqueryScan], optional) – The new subquery node to use. If None, the current subquery node is re-used.
predicate (Optional[AbstractPredicate], optional) – The new predicate to use. If None, the current predicate is re-used.
as_root (bool, optional) – Whether the semi join should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified semi join node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.AntiJoin(input_node: RelNode, subquery_node: SubqueryScan, predicate: AbstractPredicate | None = None, *, parent_node: RelNode | None = None)#
An anti join provides all tuples from one relation with no matching partner tuple from another relation.
- Parameters:
input_node (RelNode) – The tuples to “filter”
subquery_node (SubqueryScan) – The relation that provides all tuples that have to match tuples in the input_node.
predicate (Optional[AbstractPredicate], optional) – An optional predicate that is used to determine a match.
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
Notes
An anti join is defined as
\[▷_\theta(R, S) := \{ r | r \in R \land s \in S \land \theta(r, s) \}\]- property input_node: RelNode#
Get the operator providing the tuples to filter.
- Returns:
A relation
- Return type:
- property subquery_node: SubqueryScan#
Get the operator providing the filtering tuples.
- Returns:
A relation
- Return type:
- property predicate: AbstractPredicate | None#
Get the match condition to determine the join partners.
If there is no dedicated predicate, tuples from the input_node match, if any tuple is emitted by the subquery_node.
- Returns:
The condition
- Return type:
Optional[AbstractPredicate]
- is_dependent() bool#
Checks, whether the subquery relation is depdent (sometimes also called correlated) with the input relation.
- Returns:
Whether the subquery is correlated with the input query
- Return type:
bool
See also
SqlQuery.is_depedent
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, input_node: RelNode | None = None, subquery_node: SubqueryScan | None = None, predicate: AbstractPredicate | None = None, as_root: bool = False) AntiJoin#
Creates a new anti join with modified attributes.
- Parameters:
input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.
subquery_node (Optional[SubqueryScan], optional) – The new subquery node to use. If None, the current subquery node is re-used.
predicate (Optional[AbstractPredicate], optional) – The new predicate to use. If None, the current predicate is re-used.
as_root (bool, optional) – Whether the anti join should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified anti join node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.SubqueryScan(input_node: RelNode, subquery: SqlQuery, *, parent_node: RelNode | None = None)#
A meta node to designate a subtree that originated from a subquery.
- Parameters:
input_node (RelNode) – The relation that identifies the subquery result
subquery (SqlQuery) – The query that actually calculates the subquery
parent_node (Optional[RelNode], optional) – The parent node of the operator, if one exists. The parent is the operator that receives the output relation of the current operator. If the current operator is the root and (currently) does not have a parent, None can be used.
Notes
This node is not part of traditional relational algebra, nor do many other systems make use of it. For our purposes it serves as a marker node to quickly designate subqueries and to operate on the original queries or their algebraic representation in a convenient manner.
- tables(*, ignore_subqueries: bool = False) frozenset[TableReference]#
Provides all relations that are contained in the current node.
Consider the following algebraic expression: π(⋈(σ(R), S)). This expression contains two relations: R and S.
- Parameters:
ignore_subqueries (bool, optional) – Whether relations that are only referenced in subquery subtrees should be excluded. Off by default.
- Returns:
The tables
- Return type:
frozenset[TableReference]
- children() Sequence[RelNode]#
Provides all input nodes of the current operator.
- Returns:
The input nodes. For leave nodes such as table scans, the sequence will be usually empty (except for subquery aliases), otherwise the children are provided from left to right.
- Return type:
Sequence[RelNode]
- provided_expressions() frozenset[SqlExpression]#
Collects all expressions that are available to parent nodes.
These expressions will contain all expressions that are provided by child nodes as well as all expressions that are calculated by the current node itself.
- Returns:
The expressions
- Return type:
frozenset[expressions.SqlExpression]
- accept_visitor(visitor: RelNodeVisitor[VisitorResult]) VisitorResult#
Enables processing of the current algebraic expression by an expression visitor.
- Parameters:
visitor (RelNodeVisitor[VisitorResult]) – The visitor
- Return type:
- mutate(*, input_node: RelNode | None = None, subquery: SqlQuery | None = None, as_root: bool = False) SubqueryScan#
Creates a new subquery scan with modified attributes.
- Parameters:
input_node (Optional[RelNode], optional) – The new input node to use. If None, the current input node is re-used.
subquery (Optional[SqlQuery], optional) – The new subquery to use. If None, the current subquery is re-used.
as_root (bool, optional) – Whether the subquery scan should become the new root node of the tree. This overwrites any value passed to parent.
- Returns:
The modified subquery scan node
- Return type:
See also
RelNode.mutatefor safety considerations and calling conventions
- class postbound.relalg.VisitorResult#
Result type of visitor processes.
alias of TypeVar(‘VisitorResult’)
- class postbound.relalg.RelNodeVisitor#
Basic visitor to operator on arbitrary relational algebra trees.
See also
References
- class postbound.relalg.EvaluationPhase(*values)#
Indicates when a specific expression or predicate can be evaluated at the earliest.
- BaseTable = 1#
Evaluation is possible using only the tuples from the base table, e.g. base table filters.
- Join = 2#
Evaluation is possible while joining the required base tables, e.g. join predicates.
- PostJoin = 3#
Evaluation is possible once all base tables have been joined, e.g. mappings over joined columns.
- PostAggregation = 4#
Evaluation is possible once all aggregations have been performed, e.g. filters over aggregated columns.
- postbound.relalg.parse_relalg(query: ImplicitSqlQuery) RelNode#
Converts an SQL query to a representation in relational algebra.
- Parameters:
query (util.ImplicitSqlQuery) – The query to convert
- Returns:
The root node of the relational algebra tree. Notice that in some cases the algebraic expression might not be a tree but a directed, acyclic graph instead. However, in this case there still is a single root node.
- Return type: