parallel#

Parallel execution utilities for computationally intensive operations.

This module provides a unified interface for parallelizing operations across the PhyloZoo package. It supports multiple backends (sequential, threading, multiprocessing) and can be used via function parameters.

Note: This module provides the standard interface for parallel execution across PhyloZoo. No PhyloZoo functions currently expose a parallel parameter, but this module is the intended way to introduce parallelization in future implementations, or for dependent packages to utilize – any function that wants to support parallel execution should accept a ParallelConfig (or None for the default sequential behavior) and use its ParallelConfig.get_executor() to obtain a backend.

Examples

Basic usage with a function that accepts a parallel parameter:

>>> from phylozoo.utils.parallel import ParallelConfig, ParallelBackend
>>>
>>> # Use multiprocessing with 4 cores
>>> result = some_parallel_function(
...     data,
...     parallel=ParallelConfig(
...         backend=ParallelBackend.MULTIPROCESSING,
...         n_jobs=4
...     )
... )
>>>
>>> # Use all available cores (auto-detect)
>>> result = some_parallel_function(
...     data,
...     parallel=ParallelConfig(
...         backend=ParallelBackend.MULTIPROCESSING,
...         n_jobs=None  # or -1 for all cores
...     )
... )
>>>
>>> # Sequential execution (no parallelization)
>>> result = some_parallel_function(
...     data,
...     parallel=ParallelConfig(backend=ParallelBackend.SEQUENTIAL)
... )

Using with combinations/iterables:

>>> from phylozoo.utils.parallel import ParallelConfig, ParallelBackend
>>> import itertools
>>>
>>> def process_quartet(indices):
...     i, j, k, l = indices
...     # Process quartet...
...     return result
>>>
>>> combinations = list(itertools.combinations(range(20), 4))
>>> config = ParallelConfig(
...     backend=ParallelBackend.MULTIPROCESSING,
...     n_jobs=4
... )
>>> executor = config.get_executor()
>>> results = list(executor.map(process_quartet, combinations))

class phylozoo.utils.parallel.MultiprocessingExecutor(n_jobs: int | None = None)[source]#

Bases: object

Process-based parallel executor.

Uses Python’s multiprocessing module. Best for CPU-bound tasks as it bypasses Python’s Global Interpreter Lock (GIL). Each worker is a separate process with its own memory space.

__del__() → None[source]#: Clean up process pool.

map(func: Callable[[T], R], iterable: Iterator[T] | list[T], chunksize: int | None = None) → Iterator[R][source]#

Apply function to items using process pool.

Parameters:

func (Callable[[T], R]) – Function to apply. Must be picklable for multiprocessing.
iterable (Iterator[T] | list[T]) – Items to process. Items must be picklable.
chunksize (int | None, optional) – Number of items to send to each worker at once. If None, multiprocessing chooses an appropriate value. By default None.

Yields:

R – Results from applying func to each item.

starmap(func: Callable[[...], R], iterable: Iterator[tuple[Any, ...]] | list[tuple[Any, ...]], chunksize: int | None = None) → Iterator[R][source]#

Apply function to unpacked arguments using process pool.

Parameters:

func (Callable[..., R]) – Function to apply. Must be picklable.
iterable (Iterator[tuple[Any, ...]] | list[tuple[Any, ...]]) – Tuples of arguments to unpack. Must be picklable.
chunksize (int | None, optional) – Number of items to send to each worker at once. By default None.

Yields:

R – Results from applying func to unpacked arguments.

class phylozoo.utils.parallel.ParallelBackend(value)[source]#

Bases: Enum

Available parallelization backends.

The three members are:

SEQUENTIAL – no parallelization; executes sequentially (default). Use for debugging or when overhead outweighs benefits.
THREADING – thread-based parallelization. Good for I/O-bound operations or when sharing memory is important. Limited by Python’s GIL for CPU-bound tasks.
MULTIPROCESSING – process-based parallelization. Best for CPU-bound tasks that don’t require shared memory. Bypasses Python’s GIL.

MULTIPROCESSING = 'multiprocessing'#

SEQUENTIAL = 'sequential'#

THREADING = 'threading'#

class phylozoo.utils.parallel.ParallelConfig(backend: ParallelBackend | str = ParallelBackend.SEQUENTIAL, n_jobs: int | None = None, chunksize: int | None = None)[source]#

Bases: object

Configuration for parallel execution.

This class encapsulates all settings needed for parallel execution, including backend selection and worker count. Use this as a function parameter (Pattern A) to enable parallelization in PhyloZoo functions.

Parameters:

backend (ParallelBackend | str, optional) – Parallelization backend to use. By default ParallelBackend.SEQUENTIAL.
n_jobs (int | None, optional) –
Number of workers. Interpretation depends on backend:
- SEQUENTIAL: Ignored
- THREADING: Number of threads (None/-1 = 1)
- MULTIPROCESSING: Number of processes (None/-1 = all CPU cores)
By default None.
chunksize (int | None, optional) – Number of items to process per worker batch. Backend-dependent. By default None (backend chooses).

Examples

>>> from phylozoo.utils.parallel import ParallelConfig, ParallelBackend
>>>
>>> # Use 4 CPU cores
>>> config = ParallelConfig(
...     backend=ParallelBackend.MULTIPROCESSING,
...     n_jobs=4
... )
>>>
>>> # Use all available cores
>>> config = ParallelConfig(
...     backend=ParallelBackend.MULTIPROCESSING,
...     n_jobs=None  # or -1
... )
>>>
>>> # Sequential execution
>>> config = ParallelConfig(backend=ParallelBackend.SEQUENTIAL)

__repr__() → str[source]#: String representation of configuration.

get_executor() → ParallelExecutor[source]#

Get executor instance based on backend configuration.

Returns:: Executor instance ready to use.
Return type:: ParallelExecutor
Raises:: PhyloZooValueError – If backend is not recognized.

class phylozoo.utils.parallel.ParallelExecutor(*args, **kwargs)[source]#

Bases: Protocol

Protocol for parallel execution backends.

All executors must implement map and starmap methods that apply a function to items in an iterable, potentially in parallel.

map(func: Callable[[T], R], iterable: Iterator[T] | list[T], chunksize: int | None = None) → Iterator[R][source]#

Apply function to each item in iterable, potentially in parallel.

Parameters:

func (Callable[[T], R]) – Function to apply to each item.
iterable (Iterator[T] | list[T]) – Items to process.
chunksize (int | None, optional) – Number of items to process per worker (backend-dependent). By default None (backend chooses).

Yields:

R – Results from applying func to each item.

starmap(func: Callable[[...], R], iterable: Iterator[tuple[Any, ...]] | list[tuple[Any, ...]], chunksize: int | None = None) → Iterator[R][source]#

Apply function to unpacked arguments from iterable, potentially in parallel.

Parameters:

func (Callable[..., R]) – Function to apply. Will receive unpacked arguments from tuples.
iterable (Iterator[tuple[Any, ...]] | list[tuple[Any, ...]]) – Tuples of arguments to unpack and pass to func.
chunksize (int | None, optional) – Number of items to process per worker (backend-dependent). By default None (backend chooses).

Yields:

R – Results from applying func to unpacked arguments.

class phylozoo.utils.parallel.SequentialExecutor[source]#

Bases: object

Sequential (no parallelization) executor.

Executes operations in order, one at a time. Useful for debugging or when parallelization overhead is not worth it.

map(func: Callable[[T], R], iterable: Iterator[T] | list[T], chunksize: int | None = None) → Iterator[R][source]#

Apply function sequentially to each item.

Parameters:

func (Callable[[T], R]) – Function to apply.
iterable (Iterator[T] | list[T]) – Items to process.
chunksize (int | None, optional) – Ignored for sequential execution. By default None.

Yields:

R – Results from applying func to each item.

starmap(func: Callable[[...], R], iterable: Iterator[tuple[Any, ...]] | list[tuple[Any, ...]], chunksize: int | None = None) → Iterator[R][source]#

Apply function sequentially to unpacked arguments.

Parameters:

func (Callable[..., R]) – Function to apply.
iterable (Iterator[tuple[Any, ...]] | list[tuple[Any, ...]]) – Tuples of arguments to unpack.
chunksize (int | None, optional) – Ignored for sequential execution. By default None.

Yields:

R – Results from applying func to unpacked arguments.

class phylozoo.utils.parallel.ThreadingExecutor(n_jobs: int | None = None)[source]#

Bases: object

Thread-based parallel executor.

Uses Python’s threading module. Good for I/O-bound operations or when sharing memory is important. Limited by Python’s Global Interpreter Lock (GIL) for CPU-bound tasks.

__del__() → None[source]#: Clean up thread pool executor.

map(func: Callable[[T], R], iterable: Iterator[T] | list[T], chunksize: int | None = None) → Iterator[R][source]#

Apply function to items using thread pool.

Parameters:

func (Callable[[T], R]) – Function to apply.
iterable (Iterator[T] | list[T]) – Items to process.
chunksize (int | None, optional) – Ignored for threading executor. By default None.

Yields:

R – Results from applying func to each item.

starmap(func: Callable[[...], R], iterable: Iterator[tuple[Any, ...]] | list[tuple[Any, ...]], chunksize: int | None = None) → Iterator[R][source]#

Apply function to unpacked arguments using thread pool.

Parameters:

func (Callable[..., R]) – Function to apply.
iterable (Iterator[tuple[Any, ...]] | list[tuple[Any, ...]]) – Tuples of arguments to unpack.
chunksize (int | None, optional) – Ignored for threading executor. By default None.

Yields:

R – Results from applying func to unpacked arguments.