Utilities

Utilities#

Global Utilities#

Contains utilities that are not specific to PostBOUND’s domain of databases and query optimization.

class postbound.util.DependencyGraph#

A simple dependency graph abstraction. Entries are added via add_task and iteration yields the source nodes first.

add_task(node: T, *, depends_on: Iterable[T] | None = None) → None#

Queues a new task/entry/whatever.

Parameters:

node (T) – The new task
depends_on (Optional[Iterable[T]], optional) – Optional other tasks that have to be completed before this one. If those task have not been added yet, this will be done automatically.

Return type:

None

exception postbound.util.InvariantViolationError(*args, **kwargs)#

Indicates that some contract of a method was violated. The arguments should provide further details.

Return type:: None

exception postbound.util.LogicError(*args, **kwargs)#

Generic error to indicate that any kind of algorithmic problem occurred.

This error is generally used when some assumption within PostBOUND is violated, but it’s (probably) not the user’s fault. As a rule of thumb, if the user supplies faulty input, a ValueError should be raised instead. Therefore, encoutering a LogicError indicates a bug in PostBOUND itself.

Return type:: None

exception postbound.util.StateError(*args, **kwargs)#

Indicates that an object is not in the right state to perform an operation.

Return type:: None

class postbound.util.Version(ver: str | int | list[str] | list[int])#

Version instances represent versioning information and ensure that comparison operations work as expected.

For example, Version instances can be created for strings such as “14.6” or “1.3.1” and ensure that 14.6 > 1.3.1

Parameters:: ver (str | int | list[str] | list[int])

property components: Iterable[int]#: Get all version components as a list of integers.

static is_valid(ver: str | int | list[str] | list[int]) → bool#

Checks whether the provided version representation is valid.

Parameters:: ver (str | int | list[str] | list[int])
Return type:: bool

property major: int#: Get the major version component.

property minor: int | None#: Get the minor version component, if available.

property patch: int | None#: Get the patch version component, if available.

Tries to wrap the provided version representation into a Version instance. Returns None if not possible.

Parameters:: ver (Version | str | int | list[str] | list[int])
Return type:: Version | None

postbound.util.argmax(mapping: dict[K, Number]) → K#

For a dict mapping keys to numeric types, returns the key k with maximum value v, s.t. for all keys k’ with values v’ it holds that v >= v’.

Parameters:: mapping (dict[K, Number])
Return type:: K

postbound.util.argmin(mapping: dict[K, Number]) → K#

For a dict mapping keys to numeric types, returns the key k with minimum value v, s.t. for all keys k’ with values v’ it holds that v <= v’.

Parameters:: mapping (dict[K, Number])
Return type:: K

postbound.util.enlist(obj, *, enlist_tuples: bool = False)#

Transforms any object into a singular list of that object, if it is not a container already.

Specifically, the following types are treated as container-like and will not be transformed: lists, tuples, sets and frozensets. The treatment of tuples can be configured via parameters. All other arguments will be wrapped in a list.

For example, "abc" is turned into ["abc"], whereas ["abc"] is returned unmodified.

Parameters:

obj (T | Iterable[T]) – The object or list to wrap
enlist_tuples (bool, optional) – Whether a tuple obj should be enlisted. This is False by default

Returns:

The object, wrapped into a list if necessary

Return type:

Iterable[T]

postbound.util.flatten(xs)#

Transforms a nested list into a flat list: [[1, 2], [3]] is turned into [1, 2, 3]

Scalar elements (including strings) are preserved as-is. Elements of containers are extracted and added to the resulting list. Nested containers are treated as scalar elements and are not flattened recursively.

class postbound.util.frozendict(items=None)#

Read-only variant of a normal Python dictionary.

Once the dictionary has been created, its key/value pairs can no longer be modified. At the same time, this allows the dictionary to be hashable by default.

Parameters:: items (any, optional) – Supports the same argument types as the normal dictionary. If no items are supplied, an empty frozen dictionary is returned.

postbound.util.hash_dict(dictionary: dict[K, V]) → int#

Calculates a hash value based on the current dict contents (keys and values).

Parameters:: dictionary (dict[K, V])
Return type:: int

postbound.util.jaccard(a: set | frozenset, b: set | frozenset) → float#

Jaccard coefficient between a and b. Defined as |a ∩ b| / |a ∪ b|

Parameters:

a (set | frozenset)
b (set | frozenset)

Return type:

float

postbound.util.make_logger(enabled: bool = True, *, file: ~typing.IO[str] = <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>, pretty: bool = False, prefix: str | ~collections.abc.Callable[[], str] = '') → Callable[[...], None]#

Creates a new logging utility.

The generated method can be used like a regular print, but with defaults that are better suited for logging purposes.

If enabled is False, calling the logging function will not actually print anything and simply return. This is especially useful to implement logging-hooks in longer functions without permanently re-checking whether logging is enabled or not.

By default, all logging output will be written to stderr, but this can be customized by supplying a different file.

If pretty is enabled, structured objects such as dictionaries will be pretty-printed instead of being written on a single line. Note that pprint is used for all of the logging data everytime in that case.

Parameters:

enabled (bool, optional) – Whether logging is enabled, by default True
file (IO[str], optional) – Destination to write the log entries to, by default sys.stderr
pretty (bool, optional) – Whether complex objects should be pretty-printed using the pprint module, by default False
prefix (str | Callable[[], str], optional) – A common prefix that should be added before each log entry. Can be either a hard-coded string, or a callable that dynamically produces a string for each logging action separately (e.g. timestamp).

Returns:

_description_

Return type:

Callable

postbound.util.open_files(pid: int | None = None) → list[str]#

Provides all files (e.g. text files and shared objects) opened by the given process/PID.

Parameters:: pid (Optional[int], optional) – The PID of the process to query. Defaults to the current process.
Returns:: All opened files
Return type:: list[str]

postbound.util.powerset(lst: Collection[T]) → Iterable[tuple[T, ...]]#

Calculates the powerset of the provided iterable.

The powerset of a set S is defined as the set that contains all subsets of S. This is includes the empty set, as well as the entire set S.

Parameters:: lst (Collection[T]) – The “set” S
Returns:: The powerset of S. Each tuple correponds to a specific subset. The order of the elements within the tuple is not significant.
Return type:: Iterable[tuple[T, …]]

postbound.util.read_df(path: Path | str, **kwargs) → DataFrame#

Reads a Pandas DataFrame from a file, inferring the file format from the file extension.

All additional arguments are passed directly to the respective Pandas read function.

For SQLite databases, you must specify the table name using the ‘table’ or ‘relation’ argument, e.g. read_df("data.db", table="my_table").

Parameters:: path (Path | str)
Return type:: DataFrame

postbound.util.run_cmd(cmd: str | Iterable[Any], *args, work_dir: str | Path | None = None, **kwargs) → ProcResult#

Executes an arbitrary external command.

The command can be executed in an different working directory. After execution the working directory is restored.

This function delegates to subprocess.run. Therefore, most arguments accepted by this function follow the same rules as the run function.

Parameters:

cmd (str | Iterable[Any]) – The program to execute. Can be either a single invocation, or a list of the program name and its arguments.
work_dir (Optional[str | pathlib.Path], optional) – The working directory where the process should be executed. If None, the current working directory is used. Otherwise, the current working directory is changed to the desired directory for the duration of the process execution and restored afterwards.
*args – Additional arguments to be passed to the command.
**kwargs – Additional arguments to customize the subprocess invocation.

Returns:

The result of the process execution. If the command can be executed but fails, the exit_code will be non-zero. On the other hand, if the command cannot be executed at all (e.g. because it is not found or the user does not have the required permissions), an error is raised.

Return type:

ProcResult

postbound.util.set_union(sets: Iterable[Iterable[T]]) → set[T]#

Computes the union of many sets.

Parameters:: sets (Iterable[set[T] | frozenset[T]]) – The sets to combine.
Returns:: Large union of all provided sets.
Return type:: set[T]

postbound.util.simplify(obj)#

Unwraps containers containing just a single element.

This can be thought of as the inverse operation to enlist. If the object contains multiple elements, nothing happens.

Parameters:: obj (Iterable[T] | Mapping[K, V]) – The object to simplify
Returns:: For a singular list, the object that was contained in that list. Otherwise obj is returned unmodified. Since this method is mainly intended for lists which are known to contain exactly one element, we use T as a return type to assist the type checker. We use the same logic for dictionaries, however here we return the single key/value pair.
Return type:: T

Examples

The singular list [1] is simplified to 1. On the other hand, [1,2] is returned unmodified.

postbound.util.sliding_window(lst: Sequence[T], size: int, step: int = 1) → Generator[Sequence[T], None, None]#

Iterates over the given sequence using a sliding window.

The window will contain exactly size many entries, starting at the beginning of the sequence. After yielding a window, the next window will be shifted step many elements.

Parameters:

lst (Sequence[T]) – The sequence to iterate over
size (int) – The number of elements in the sliding window
step (int, optional) – The number of elements to shift after each window, defaults to 1.

Yields:

Generator[Sequence[T], None, None] – The current window.

Return type:

Generator[Sequence[T], None, None]

postbound.util.standard_logger(enabled: bool = True) → Callable[[...], None]#

A standardized logger for consistent output throughout the codebase.

Parameters:: enabled (bool)
Return type:: Callable[[…], None]

postbound.util.timestamp() → str#

Provides the current time as a nice and normalized string.

Return type:: str

postbound.util.to_json(obj: Any, *args, **kwargs) → str | None#

Utility to transform any object to a JSON object, while making use of the JsonizeEncoder.

All arguments other than the object itself are passed to the default Python json.dumps function.

Parameters:: obj (Any)
Return type:: str | None

postbound.util.to_json_dump(obj: Any, file: IO, *args, **kwargs) → None#

Utility to transform any object to a JSON object and write it to a file, while making use of the JsonizeEncoder.

All arguments other than the object itself are passed to the default Python json.dump function.

Parameters:

obj (Any)
file (IO)

Return type:

None

postbound.util.write_df(df: DataFrame | Sequence[dict] | dict, path: Path | str, *, index: bool = False, append: bool = False, **kwargs) → None#

Writes a Pandas DataFrame to a file, inferring the file format from the file extension.

In addition to the automatic dispatch based on the file type, this function performs the following preprocessing steps:

It ensures that the parent directory of the target file exists, creating it if necessary.
It transforms all complex objects in the data frame into their JSON representation

All additional arguments are passed directly to the respective Pandas write function.

Please note that append-export is currently only supported for CSV files. We are working on adding support for other file formats in the future.

Parameters:

df (DataFrame | Sequence[dict] | dict)
path (Path | str)
index (bool)
append (bool)

Return type:

None

Collections#

Provides utilities to work with arbitrary collections like lists, sets and tuples.

postbound.util.collections.ContainerType#

Specifies which types are considered containers.

For some methods this is necessary to determine whether any work still has to be done.

alias of list[T] | tuple[T, …] | set[T] | frozenset[T]

postbound.util.collections.flatten(xs: Iterable[Iterable[T]]) → list[T]#

postbound.util.collections.flatten(xs: Iterable[Iterable[T] | T]) → list[T]

Transforms a nested list into a flat list: [[1, 2], [3]] is turned into [1, 2, 3]

postbound.util.collections.enlist(obj: list[T]) → list[T]#

postbound.util.collections.enlist(obj: tuple[T, ...], *, enlist_tuples: Literal[True]) → list[tuple[T, ...]]

postbound.util.collections.enlist(obj: tuple[T, ...], *, enlist_tuples: Literal[False]) → tuple[T, ...]

postbound.util.collections.enlist(obj: tuple[T, ...]) → tuple[T, ...]

postbound.util.collections.enlist(obj: set[T]) → set[T]

postbound.util.collections.enlist(obj: frozenset[T]) → frozenset[T]

postbound.util.collections.enlist(obj: KeysView[T] | ValuesView[T]) → list[T]

postbound.util.collections.enlist(obj: ItemsView[K, T]) → list[tuple[K, T]]

postbound.util.collections.enlist(obj: Sequence[T]) → Sequence[T]

postbound.util.collections.enlist(obj: Sequence[T] | T) → Sequence[T]

postbound.util.collections.enlist(obj: str) → list[str]

postbound.util.collections.enlist(obj: Iterable[T]) → Iterable[T]

postbound.util.collections.enlist(obj: Iterable[T] | T) → Iterable[T]

postbound.util.collections.enlist(obj: T) → list[T]

Transforms any object into a singular list of that object, if it is not a container already.

For example, "abc" is turned into ["abc"], whereas ["abc"] is returned unmodified.

Parameters:

obj (T | Iterable[T]) – The object or list to wrap
enlist_tuples (bool, optional) – Whether a tuple obj should be enlisted. This is False by default

Returns:

The object, wrapped into a list if necessary

Return type:

Iterable[T]

postbound.util.collections.get_any(elems: Iterable[T]) → T#

Provides any element from an iterable. There is no guarantee which one will be returned.

This method can potentially iterate over the entire iterable. The behaviour for empty iterables is undefined.

Parameters:: elems (Iterable[T]) – The items from which to choose.
Returns:: Any of the elements from the iterable. If the iterable is empty, the behaviour is undefined.
Return type:: T

postbound.util.collections.simplify(obj: Mapping[K, V]) → tuple[K, V]#

postbound.util.collections.simplify(obj: Iterable[T]) → T

postbound.util.collections.simplify(obj: T) → T

Unwraps containers containing just a single element.

This can be thought of as the inverse operation to enlist. If the object contains multiple elements, nothing happens.

Parameters:: obj (Iterable[T] | Mapping[K, V]) – The object to simplify
Returns:: For a singular list, the object that was contained in that list. Otherwise obj is returned unmodified. Since this method is mainly intended for lists which are known to contain exactly one element, we use T as a return type to assist the type checker. We use the same logic for dictionaries, however here we return the single key/value pair.
Return type:: T

Examples

The singular list [1] is simplified to 1. On the other hand, [1,2] is returned unmodified.

postbound.util.collections.foreach(lst: Iterable[T], action: Callable[[T], None]) → None#

Shortcut to apply a specific action to each element in an iterable.

Parameters:

lst (Iterable[T]) – The elements.
action (Callable[[T], None]) – The side-effect that should be applied to all elements.

Return type:

None

postbound.util.collections.powerset(lst: Collection[T]) → Iterable[tuple[T, ...]]#

Calculates the powerset of the provided iterable.

The powerset of a set S is defined as the set that contains all subsets of S. This is includes the empty set, as well as the entire set S.

Parameters:: lst (Collection[T]) – The “set” S
Returns:: The powerset of S. Each tuple correponds to a specific subset. The order of the elements within the tuple is not significant.
Return type:: Iterable[tuple[T, …]]

postbound.util.collections.sliding_window(lst: Sequence[T], size: int, step: int = 1) → Generator[Sequence[T], None, None]#

Iterates over the given sequence using a sliding window.

The window will contain exactly size many entries, starting at the beginning of the sequence. After yielding a window, the next window will be shifted step many elements.

Parameters:

lst (Sequence[T]) – The sequence to iterate over
size (int) – The number of elements in the sliding window
step (int, optional) – The number of elements to shift after each window, defaults to 1.

Yields:

Generator[Sequence[T], None, None] – The current window.

Return type:

Generator[Sequence[T], None, None]

postbound.util.collections.pairs(lst: Iterable[T]) → Generator[tuple[T, T], None, None]#

Provides all pairs of elements of the given iterable, disregarding order and identical pairs.

This means that the resulting iterable will not contain entries (a, a) unless a itself is present multiple times in the input. Likewise, tuples (a, b) and (b, a) are treated as equal and only one of them will be returned (Again, unless a or b are present multiple times in the input. In that case, their order is unspecified.)

Parameters:: lst (Iterable[T]) – The iterable that contains the pairs. It must be possible to iterate over it multiple times (twice, to be exact).
Yields:: Generator[tuple[T, T], None, None] – The element pairs.
Return type:: Generator[tuple[T, T], None, None]

postbound.util.collections.set_union(sets: Iterable[Iterable[T]]) → set[T]#

Computes the union of many sets.

Parameters:: sets (Iterable[set[T] | frozenset[T]]) – The sets to combine.
Returns:: Large union of all provided sets.
Return type:: set[T]

postbound.util.collections.make_hashable(obj: Any) → Any#

Attempts to generate an equivalent, hashable representation for a container.

This function operates on the standard container types list, tuple, set, dictionary and frozenset and performs the following conversion:

list becomes tuple, all elements of the list are recursively made hashable
tuples are left as-is, but all elements of the tuple are recursively made hashable
sets become frozensets. The elements are left as they are, because they must already be hashable
dictionaries become instances of dict_utils.HashableDict. The values are recursively made hashable, keys are left the way they are because they must already be hashable
frozensets are left as-is

All other types, including user-defined types are returned as-is.

Parameters:: obj (Any) – The object to hash
Returns:: The hashable counterpart of the object
Return type:: Any

class postbound.util.collections.Queue(data: Iterable[T] | None = None)#

A queue is a wrapper around an underlying list of elements which provides FIFO semantics for access.

Parameters:: data (Iterable[T] | None, optional) – Initial contents of the queue. By default the queue is empty at the beginning.

enqueue(value: T) → None#

Adds a new item to the end of the queue.

Parameters:: value (T) – The item to add
Return type:: None

push(value: T) → None#

Adds a new item to the end of the queue.

This is an alias for enqueue.

Parameters:: value (T) – The item to add
Return type:: None

append(value: T) → None#

Adds a new item to end of the queue.

This method is an alias for enqueue to enable easier interchangeability with normal lists.

Parameters:: value (T) – The item to add
Return type:: None

extend(values: Iterable[T]) → None#

Adds a number of values to the end of the queue.

Parameters:: values (Iterable[T]) – The elements to add. The order in the queue matches the order in the iterable.
Return type:: None

head() → T | None#

Provides the current first element of the queue without removing.

Returns:: The first element if it exists, or None if the queue is empty.
Return type:: Optional[T]

peak() → T | None#

Provides the current first element of the queue without removing.

This is an alias for head.

Returns:: The first element if it exists, or None if the queue is empty.
Return type:: Optional[T]

pop() → T | None#

Provides the current first element of the queue and removes it.

Returns:: The first element if it exists, or None if the queue is empty.
Return type:: Optional[T]

class postbound.util.collections.SizedQueue(capacity: int, data: Iterable[T] | None = None)#

A sized queue extends on the behaviour of a normal queue by restricting the number of items in the queue.

A sized queue has weak FIFO semantics: items can only be appended at the end, but the contents of the entire queue can be accessed at any time.

If upon enqueuing a new item the queue is already at maximum capacity, the current head of the queue will be dropped.

Parameters:

capacity (int) – The maximum number of items the queue can contain at the same time.
data (Optional[Iterable[T]], optional) – Initial contents of the queue. By default the queue is empty at the beginning.

Notes

Although Queue and SizedQueue provide similar FIFO semantics, there is no subclass relationship between the two. This is by design, since the contract of a queue is very different from the contract of a sized queue.

append(value: T) → None#

Adds a new item to the end of the queue, popping any excess items.

Parameters:: value (T) – The value to add
Return type:: None

extend(values: Iterable[T]) → None#

Adds all the items to the end of the queue, popping any excess items.

Parameters:: values (Iterable[T]) – The values to add
Return type:: None

head() → T | None#

Provides the current first item of the queue without removing it.

Returns:: The first item in the queue, or None if the queue is empty
Return type:: Optional[T]

pop() → T | None#

Provides the current first item of the queue and removes it.

Returns:: The first item in the queue, or None if the queue is empty
Return type:: Optional[T]

Dicts#

Contains utilities to access and modify dictionaries more conveniently.

postbound.util.dicts.stringify(d: dict[K, V]) → str#

Generates a string-representation of a dictionary.

In contrast to calling str() directly, this method generates proper string representations of both keys and values and does not use repr() for them. Nested objects are stilled formatted according to str() however.

Parameters:: d (dict[K, V]) – The dictionary to stringify
Returns:: The string representation
Return type:: str

postbound.util.dicts.key(dictionary: dict[K, V]) → K#

Provides the key of a dictionary with just 1 item.

key({‘a’: 1}) = ‘a’

Parameters:: dictionary (dict[K, V])
Return type:: K

postbound.util.dicts.value(dictionary: dict[K, V]) → V#

Provides the value of a dictionary with just 1 item.

value({‘a’: 1}) = 1

Parameters:: dictionary (dict[K, V])
Return type:: V

postbound.util.dicts.difference(a: dict[K, V], b: dict[K, V]) → dict[K, V]#

Computes the set difference between two dictionaries based on their keys.

Parameters:

a (dict[K, V]) – The dict to remove entries from
b (dict[K, V]) – The entries to remove

Returns:

A dictionary that contains all key, value pairs from a where the key is not in b.

Return type:

dict[K, V]

postbound.util.dicts.intersection(a: dict[K, V], b: dict[K, V]) → dict[K, V]#

Computes the set intersection between two dictionaries based on their keys.

Parameters:

a (dict[K, V]) – The first dictionary
b (dict[K, V]) – The second dictionary

Returns:

A dictionary that contains all key, value pairs from a where the key is also in b.

Return type:

dict[K, V]

postbound.util.dicts.merge(a: dict[K, V], b: dict[K, V], *, updater: Callable[[K, V, V], V] | None = None) → dict[K, V]#

Creates a new dict containing all key/values pairs from both argument dictionaries.

If keys overlap, entries from dictionary b will take priority, unless an update method is given. If update is given, and a[k] = v and b[k] = v’ (i.e. both a and b share a key k) the merged dictionary will contain the result of update(k, v, v’) as entry for k.

Note that as of Python 3.9, such a method was added to dictionaries as well (via the |= syntax). Our current implementation is not optimized for larger dictionaries and will probably have a pretty bad performance on such input data.

Parameters:

a (dict[K, V])
b (dict[K, V])
updater (Callable[[K, V, V], V] | None)

Return type:

dict[K, V]

postbound.util.dicts.update(dictionary: dict[K, V], updater: Callable[[K, V], T]) → dict[K, T]#

Creates a new dict by calling update on each key/value pair on the old dict, retaining its keys.

Parameters:

dictionary (dict[K, V])
updater (Callable[[K, V], T])

Return type:

dict[K, T]

postbound.util.dicts.explode(dictionary: dict[K, list[V]]) → list[tuple[K, V]]#

Transforms dicts mapping keys to lists of values to a list of key/value pairs.

Parameters:: dictionary (dict[K, list[V]])
Return type:: list[tuple[K, V]]

postbound.util.dicts.hash_dict(dictionary: dict[K, V]) → int#

Calculates a hash value based on the current dict contents (keys and values).

Parameters:: dictionary (dict[K, V])
Return type:: int

postbound.util.dicts.generate_multi(entries: Iterable[tuple[K, V]]) → dict[K, list[V]]#

Generates a multi-dict based on its entries.

Each key can occur multiple times and values will be aggregated in a list.

Parameters:: entries (Iterable[tuple[K, V]])
Return type:: dict[K, list[V]]

postbound.util.dicts.reduce_multi(multi_dict: dict[K, list[V]], reduction: Callable[[K, list[V]], V]) → dict[K, V]#

Ungroups a multi-dict by aggregating the values based on key and values.

Parameters:

multi_dict (dict[K, list[V]])
reduction (Callable[[K, list[V]], V])

Return type:

dict[K, V]

postbound.util.dicts.invert_multi(mapping: dict[K, list[V]]) → dict[V, list[K]]#

Inverts the key -> values mapping of a dict to become value -> keys instead.

Supppose a multi-dict has the following contents: {‘a’: [1, 2], ‘b’: [2, 3]}. Calling invert transforms this mapping to {1: [‘a’], 2: [‘a’, ‘b’], 3: [‘b’]}.

Parameters:: mapping (dict[K, list[V]])
Return type:: dict[V, list[K]]

postbound.util.dicts.invert(mapping: dict[K, V]) → dict[V, K]#

Inverts the key -> value mapping of a dict to become value -> key instead.

In contrast to invert_multi this does not handle duplicate values (which leads to duplicate keys), nor does it process the original values of mapping in any way.

Basically, this function is just a better-readable shortcut for {v: k for k, v in d.items()}.

Parameters:: mapping (dict[K, V])
Return type:: dict[V, K]

postbound.util.dicts.argmin(mapping: dict[K, Number]) → K#

For a dict mapping keys to numeric types, returns the key k with minimum value v, s.t. for all keys k’ with values v’ it holds that v <= v’.

Parameters:: mapping (dict[K, Number])
Return type:: K

postbound.util.dicts.argmax(mapping: dict[K, Number]) → K#

For a dict mapping keys to numeric types, returns the key k with maximum value v, s.t. for all keys k’ with values v’ it holds that v >= v’.

Parameters:: mapping (dict[K, Number])
Return type:: K

class postbound.util.dicts.HashableDict(dict=None, /, **kwargs)#: A dictionary implementation that can be hashed.

Warning

This type should be used with extreme caution in order to not violate any invariants due to unintended data modification.

class postbound.util.dicts.CustomHashDict(hash_func: Callable[[K], int], **kwargs)#

Wrapper of a normal Python dictionary that uses a custom hash function instead of the default hash() method.

All non-hashing related behavior is directly inherited from the default Python dictionary. Only the item access is changed to enforce the usage of the new hashing function.

Notice that since the custom hash function always provides an integer value, collision detection is weaker than originally. This is because the actual dictionary never sees the original keys to run an equality comparison. Instead, the comparison is based on the integer values.

Parameters:

hash_func (Callable[[K], int]) – The hashing function to use. It receives the key as input and must produce a valid hash value as output.
**kwargs (dict, optional) – Additional keyword arguments that should be passed to the hashing function upon each invocation.

class postbound.util.dicts.DynamicDefaultDict(factory: Callable[[K], V])#

Wrapper of a normal Python defaultdict that permits dynamic default values.

When using a standard Python defaultdict, the default value must be decided up-front. This value is used for all missing keys. In contrast, this dictionary implementation allows for the default value to be calculated based on the requested key.

Parameters:: factory (Callable[[K], V]) – A function that generates the value based on a missing key. It receives the key as input and must return the value.

class postbound.util.dicts.frozendict(items=None)#

Read-only variant of a normal Python dictionary.

Once the dictionary has been created, its key/value pairs can no longer be modified. At the same time, this allows the dictionary to be hashable by default.

Parameters:: items (any, optional) – Supports the same argument types as the normal dictionary. If no items are supplied, an empty frozen dictionary is returned.

JSON#

Contains utilities to store and load objects more conveniently to/from JSON.

More specifically, this module introduces the JsonizeEncoder, which can be accessed via the to_json utility method. This encoder allows to transform instances of any class to JSON by providing a __json__ method in the class implementation. This method does not take any (required) parameters and returns a JSON-izeable representation of the current instance, e.g. a dict or a list.

Sadly (or luckily?), the inverse conversion does not work because JSON does not store any type information.

class postbound.util.jsonize.Jsonizable(*args, **kwargs)#: Protocol to indicate that a certain class provides the __json__ method.

class postbound.util.jsonize.JsonizeEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)#

The JsonizeEncoder allows to transform instances of any class to JSON.

This can be achieved by providing a __json__ method in the class implementation. This method does not take any (required) parameters and returns a JSON-izeable representation of the current instance, e.g. a dict or a list.

default(o: Any) → Any#

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)

Parameters:: o (Any)
Return type:: Any

postbound.util.jsonize.to_json(obj: Any, *args, **kwargs) → str | None#

Utility to transform any object to a JSON object, while making use of the JsonizeEncoder.

All arguments other than the object itself are passed to the default Python json.dumps function.

Parameters:: obj (Any)
Return type:: str | None

postbound.util.jsonize.to_json_dump(obj: Any, file: IO, *args, **kwargs) → None#

Utility to transform any object to a JSON object and write it to a file, while making use of the JsonizeEncoder.

All arguments other than the object itself are passed to the default Python json.dump function.

Parameters:

obj (Any)
file (IO)

Return type:

None

Logging#

Contains utilities to conveniently log different information.

postbound.util.logging.Logger#

Type alias for our loggers.

Each logger accepts an arbitrary amount of arguments and is inteded to function as a replacement for the print function.

alias of Callable[[…], None]

postbound.util.logging.timestamp() → str#

Provides the current time as a nice and normalized string.

Return type:: str

postbound.util.logging.make_logger(enabled: bool = True, *, file: ~typing.IO[str] = <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>, pretty: bool = False, prefix: str | ~collections.abc.Callable[[], str] = '') → Callable[[...], None]#

Creates a new logging utility.

The generated method can be used like a regular print, but with defaults that are better suited for logging purposes.

By default, all logging output will be written to stderr, but this can be customized by supplying a different file.

Parameters:

enabled (bool, optional) – Whether logging is enabled, by default True
file (IO[str], optional) – Destination to write the log entries to, by default sys.stderr
pretty (bool, optional) – Whether complex objects should be pretty-printed using the pprint module, by default False
prefix (str | Callable[[], str], optional) – A common prefix that should be added before each log entry. Can be either a hard-coded string, or a callable that dynamically produces a string for each logging action separately (e.g. timestamp).

Returns:

_description_

Return type:

Callable

postbound.util.logging.standard_logger(enabled: bool = True) → Callable[[...], None]#

A standardized logger for consistent output throughout the codebase.

Parameters:: enabled (bool)
Return type:: Callable[[…], None]

postbound.util.logging.print_stderr(*args, **kwargs) → None#

A normal print that writes to stderr instead of stdout.

Return type:: None

postbound.util.logging.print_if(should_print: bool, *args, use_stderr: bool = False, **kwargs) → None#

A normal print that only prints something if should_print evaluates true-ish. Can optionally print to stderr.

Parameters:

should_print (bool)
use_stderr (bool)

Return type:

None

NetworkX#

Provides graph-centric algorithms based on NetworkX [nx].

References

[nx]

Aric A. Hagberg, Daniel A. Schult and Pieter J. Swart, “Exploring network structure, dynamics, and function using NetworkX”, in Proceedings of the 7th Python in Science Conference (SciPy2008), Gäel Varoquaux, Travis Vaught, and Jarrod Millman (Eds), (Pasadena, CA USA), pp. 11-15, Aug 2008

class postbound.util.nx.NodeType#

Generic type to model the specific nodes contained in a NetworkX graph.

alias of TypeVar(‘NodeType’)

postbound.util.nx.nx_sinks(graph: DiGraph) → Collection[NodeType]#

Determines all sink nodes in a directed graph.

A sink is a node with no outgoing edges.

Parameters:: graph (nx.DiGraph) – The graph to check
Returns:: All sink nodes. Can be an empty collection.
Return type:: Collection[NodeType]

postbound.util.nx.nx_sources(graph: DiGraph) → Collection[NodeType]#

Determines all source nodes in a directed graph.

A source is a node with no incoming edges.

Parameters:: graph (nx.DiGraph) – The graph to check
Returns:: All source nodes. Can be an empty collection.
Return type:: Collection[NodeType]

postbound.util.nx.nx_random_walk(graph: Graph, *, starting_node: NodeType | None = None) → Generator[NodeType, None, None]#

A modified random walk implementation for NetworkX graphs.

A random walk starts at any of the nodes of the graph. At each iteration, a neighboring node is selected and moved to. Afterwards, the iteration continues with that node.

Our implementation uses the following modifications: after each stop, the walk may jump to a node that is connected to one of the visited nodes. This node does not necessarily have to be connected to the current node. Secondly, if the graph contains multiple connected components, the walk will first explore one component before jumping to the next one.

The walk finishes when all nodes have been explored.

Parameters:

graph (nx.Graph) – The graph to walk over
starting_node (Optional[NodeType], optional) – The node where the walk starts. If unspecified, a random node is selected.

Yields:

Generator[NodeType, None, None] – The nodes in the order in which they have been moved to.

Return type:

Generator[NodeType, None, None]

postbound.util.nx.nx_bfs_tree(graph: Graph, start_node: NodeType, condition: Callable[[NodeType, dict], bool], *, node_order: Callable[[NodeType, dict], int] | None = None) → Generator[tuple[NodeType, dict], None, None]#

Traverses a specific graph in breadth-first manner, yielding its nodes along the way.

The traversal starts at a specific start node. During the traversal all nodes that match a condition are provided. If no more nodes are found or the condition cannot be satisfied for any more nodes, traversal terminates. Notice that there is no “early stopping”: if a parent node fails the condition check, its children are still explored.

Parameters:

graph (nx.Graph) – The graph to explore
start_node (NodeType) – The node where the exploration starts. This node will never be yielded.
condition (Callable[[NodeType, dict], bool]) – A condition that is satisfied by all nodes that should be yielded
node_order (Callable[[NodeType, dict], int] | None, optional) – The sequence in which child nodes should be explored. This function receives the child node as well as the edge from its parent as arguments and produces a numerical position value as output (lower values indicate earlier yielding). If unspecified, this produces the nodes in an arbitrary order.

Yields:

Generator[tuple[NodeType, dict], None, None] – The node along with their edge data from the parent.

Return type:

Generator[tuple[NodeType, dict], None, None]

Statistics#

Different mathematical and statistical formulas and utilities.

postbound.util.stats.catalan_number(n: int) → int#

Computes the n-th catalan number. See https://en.wikipedia.org/wiki/Catalan_number.

Parameters:: n (int)
Return type:: int

postbound.util.stats.jaccard(a: set | frozenset, b: set | frozenset) → float#

Jaccard coefficient between a and b. Defined as |a ∩ b| / |a ∪ b|

Parameters:

a (set | frozenset)
b (set | frozenset)

Return type:

float

postbound.util.stats.trigrams(text: str) → list[str]#

Generates the trigrams of a given text.

Parameters:: text (str)
Return type:: list[str]

Numerical#

Utilities centered around numbers.

postbound.util.num.represents_number(val: str) → bool#

Checks, whether val can be cast into an integer/float value.

Parameters:: val (str)
Return type:: bool

class postbound.util.num.AtomicInt(value: int = 0)#

An atomic int allows for multi-threaded access to the integer value.

Parameters:: value (int)

class postbound.util.num.BoundedInt(value: int, *, allowed_min: int | None = None, allowed_max: int | None = None)#

A bounded int cannot become larger and/or smaller than a specified interval.

If the bounded integer does leave the allowed interval, it will be snapped back to the minimum/maximum allowed number, respectively.

Parameters:

value (int)
allowed_min (Union[int, None])
allowed_max (Union[int, None])

Processes#

Provides utilities to interact with outside processes.

class postbound.util.proc.ProcResult(out_data: str, err_data: str, exit_code: int)#

Wrapper for the result of an external process.

In contrast to CompletedProcess provided by the subprocess module, this class is designed for more convenient usage. More specifically, it can be used directly as a substitute for the stdout of the process (hence the subclassing of str). Furthermore, bool checks ensure that the process exited with a zero exit code.

All output is provided in dedicated attributes.

Parameters:

out_data (str) – The stdout of the process.
err_data (str) – The stderr of the process.
exit_code (int) – The exit code of the process.

echo() → None#

Provides the contents of stdout and stderr in a format for debugging by humans.

Return type:: None

raise_if_error() → None#

Raises an exception if the process exited with a non-zero exit code.

Return type:: None

postbound.util.proc.run_cmd(cmd: str | Iterable[Any], *args, work_dir: str | Path | None = None, **kwargs) → ProcResult#

Executes an arbitrary external command.

The command can be executed in an different working directory. After execution the working directory is restored.

This function delegates to subprocess.run. Therefore, most arguments accepted by this function follow the same rules as the run function.

Parameters:

cmd (str | Iterable[Any]) – The program to execute. Can be either a single invocation, or a list of the program name and its arguments.
work_dir (Optional[str | pathlib.Path], optional) – The working directory where the process should be executed. If None, the current working directory is used. Otherwise, the current working directory is changed to the desired directory for the duration of the process execution and restored afterwards.
*args – Additional arguments to be passed to the command.
**kwargs – Additional arguments to customize the subprocess invocation.

Returns:

Return type:

ProcResult

System#

Provides utilities to access (operating system) related information.

postbound.util.system.open_files(pid: int | None = None) → list[str]#

Provides all files (e.g. text files and shared objects) opened by the given process/PID.

Parameters:: pid (Optional[int], optional) – The PID of the process to query. Defaults to the current process.
Returns:: All opened files
Return type:: list[str]

Miscellaneous#

Contains various utilities that did not fit any other category.

postbound.util.misc.current_timestamp() → str#

Provides the current time (year-month-day hour:minute)

Return type:: str

class postbound.util.misc.Version(ver: str | int | list[str] | list[int])#

Version instances represent versioning information and ensure that comparison operations work as expected.

For example, Version instances can be created for strings such as “14.6” or “1.3.1” and ensure that 14.6 > 1.3.1

Parameters:: ver (str | int | list[str] | list[int])

static is_valid(ver: str | int | list[str] | list[int]) → bool#

Checks whether the provided version representation is valid.

Parameters:: ver (str | int | list[str] | list[int])
Return type:: bool

Tries to wrap the provided version representation into a Version instance. Returns None if not possible.

Parameters:: ver (Version | str | int | list[str] | list[int])
Return type:: Version | None

property major: int#: Get the major version component.

property minor: int | None#: Get the minor version component, if available.

property patch: int | None#: Get the patch version component, if available.

property components: Iterable[int]#: Get all version components as a list of integers.

class postbound.util.misc.DependencyGraph#

A simple dependency graph abstraction. Entries are added via add_task and iteration yields the source nodes first.

add_task(node: T, *, depends_on: Iterable[T] | None = None) → None#

Queues a new task/entry/whatever.

Parameters:

node (T) – The new task
depends_on (Optional[Iterable[T]], optional) – Optional other tasks that have to be completed before this one. If those task have not been added yet, this will be done automatically.

Return type:

None

Types#

Provides additional type hints, type decorators, …

postbound.util.typing.deprecated(func: Callable) → Callable#

Indicates that the given function or class should no longer be used.

Parameters:: func (Callable)
Return type:: Callable

postbound.util.typing.module_local(func: Callable) → Callable#

Marker decorator to show that a seemingly private method of a class is intended to be used by other objects from the same module.

Parameters:: func (Callable)
Return type:: Callable

postbound.util.typing.Lazy = None#: A placeholder to indicate that a value is not yet computed, but will be computed lazily.

postbound.util.typing.LazyVal#

Type hint for a value that is computed lazily.

alias of T | None

Utilities

Contents

Utilities#

Global Utilities#

Collections#

Dicts#

JSON#

Logging#

NetworkX#

Statistics#

Numerical#

Processes#

System#

Miscellaneous#

Types#