Workloads Package#

Provides abstractions to represent entire query workloads and utilities to read some pre-defined instances.

The main abstraction provided by this class is the Workload. A number of utility functions to read collections of queries from different sources and input formats into workload objects exist as well. The pre-defined workloads include the Join Order Benchmark [1] (including JOB-light [2] and JOB-Complex [3]), Star Schema Benchmark [4], Stack Benchmark [5] and Stats Benchmark [6].

PostBOUND stores the workload queries in a dedicated directory, located relative to the user’s home directory at $HOME/.postbound/. If a workload is requested for the first time, it will be downloaded automatically. Therefore, the initial usage of a novel workload may take slightly longer than usual.

References

class postbound.workloads.LabelType#

The labels that are used to identify individual queries in a workload.

alias of TypeVar(‘LabelType’, bound=Hashable)

class postbound.workloads.NewLabelType#

In case of mutations of the workload labels, this denotes the new type of the labels after the mutation.

alias of TypeVar(‘NewLabelType’, bound=Hashable)

postbound.workloads.wrap_workload(queries: Iterable[SqlQuery], name: str = '', root: Path | str | None = None) → Workload[int]#

Wraps a number of queries in a workload with numerical labels.

Parameters:

queries (Iterable[SqlQuery])
name (str)
root (Path | str | None)

Return type:

Workload[int]

postbound.workloads.read_workload(path: str | Path, name: str = '', *, query_file_pattern: str = '*.sql', recurse_subdirectories: bool = False, query_label_prefix: str = '', file_encoding: str = 'utf-8', bind_columns: bool = True, include_hints: bool = True, on_error: Literal['raise', 'warn', 'ignore'] = 'raise', verbose: bool = False) → Workload[str]#

Loads a workload consisting of multiple different files, potentially scattered in multiple directories

The main advantage of this method over using Workload.read directly is the support for recursive directory layouts: it can traverse subdirectories relative to the specified root and collect all workload files in a recursive manner. If subdirectories are used, their names will be used as prefixes to the query label, which is still inferred from the query file name.

Parameters:

path (str) – The root directory containing the workload files. Each query is expected to be stored in its own file.
name (str, optional) – The name of the workload, by default “”
query_file_pattern (str, optional) – A glob pattern that all query files have to match. All files that match the pattern are assumed to be valid query files. Defaults to "*.sql"
recurse_subdirectories (bool, optional) – Whether query files in subdirectories should be read as well. Defaults to False, which emulates the behaviour of Workload.read
query_label_prefix (str, optional) – A global prefix that should be added to all labels, no matter their placement in subdirectories. Defaults to an empty string.
file_encoding (str, optional) – The encoding of the query files. All files must share a common encoding. Defaults to UTF-8
include_hints (bool, optional) – Whether the parser should try to infer and extract hint blocks from the queries
on_error (Literal["raise", "warn", "ignore"], optional) – How to react to parser errors. By default, the exception is propagated to the parent process.
verbose (bool, optional) – Whether progress information should be printed.
bind_columns (bool)

Returns:

The workload

Return type:

Workload[str]

postbound.workloads.read_batch_workload(filename: str, name: str = '', *, file_encoding: str = 'utf-8') → Workload[int]#

Loads a workload consisting of multiple queries from a single file.

The input file has to contain one valid SQL query per line. While empty lines are skipped, any non-SQL line will raise an Error.

The workload will have numeric labels: the query in the first line will have label 1, the second one label 2 and so on.

Parameters:

filename (str) – The file to load. The extension does not matter, as long as it contains plain text and each query is placed on a single and separate line.
name (str, optional) – The name of the workload. If omitted, this defaults to the file name.
file_encoding (str, optional) – The encoding of the workload file. Defaults to UTF-8

Returns:

The workload

Return type:

Workload[int]

postbound.workloads.read_csv_workload(filename: str | Path, name: str = '', *, query_column: str = 'query', label_column: str | None = None, file_encoding: str = 'utf-8', pd_args: dict | None = None) → Workload[str] | Workload[int]#

Loads a workload consisting of queries from a CSV column.

All queries are expected to be contained in the same column and each query is expected to be put onto its own row.

The column containing the actual queries can be configured via the query_column parameter. Likewise, the CSV file can already provide query labels in the label_column column. If this parameter is omitted, labels will be inferred based on the row number.

Parameters:

filename (str | Path) – The name of the CSV file to read. The extension does not matter, as long as the file can be read by the pandas CSV parser. The parser can receive additional arguments via the pd_args parameter.
name (str, optional) – The name of the workload. If omitted, this defaults to the file name.
query_column (str, optional) – The CSV column that contains the workload queries. All rows of that column will be read, by default “query”
label_column (Optional[str], optional) – The column containing the query labels. Each will receive a label from the label_column of the same row. If omitted, labels will be inferred based on the row number.
file_encoding (str, optional) – The encoding of the CSV file. Defaults to UTF-8.
pd_args (Optional[dict]) – Additional arguments to customize the behaviour of the pandas.read_csv method. They will be forwarded as-is. Consult the documentation of this method for more details on the allowed parameters and their functionality.

Returns:

The workload. It has string labels if label_column was provided, or numerical labels otherwise.

Return type:

Workload[str] | Workload[int]

Workloads Package

Contents

Workloads Package#