parallel#
Parallel execution utilities for computationally intensive operations.
This module provides a unified interface for parallelizing operations across the PhyloZoo package. It supports multiple backends (sequential, threading, multiprocessing) and can be used via function parameters.
Note: This module provides the standard interface for parallel execution
across PhyloZoo. No PhyloZoo functions currently expose a parallel parameter,
but this module is the intended way to introduce parallelization in future
implementations, or for dependent packages to utilize – any function that wants to support parallel execution should
accept a ParallelConfig (or None for the default sequential
behavior) and use its ParallelConfig.get_executor() to obtain a backend.
Examples
Basic usage with a function that accepts a parallel parameter:
>>> from phylozoo.utils.parallel import ParallelConfig, ParallelBackend
>>>
>>> # Use multiprocessing with 4 cores
>>> result = some_parallel_function(
... data,
... parallel=ParallelConfig(
... backend=ParallelBackend.MULTIPROCESSING,
... n_jobs=4
... )
... )
>>>
>>> # Use all available cores (auto-detect)
>>> result = some_parallel_function(
... data,
... parallel=ParallelConfig(
... backend=ParallelBackend.MULTIPROCESSING,
... n_jobs=None # or -1 for all cores
... )
... )
>>>
>>> # Sequential execution (no parallelization)
>>> result = some_parallel_function(
... data,
... parallel=ParallelConfig(backend=ParallelBackend.SEQUENTIAL)
... )
Using with combinations/iterables:
>>> from phylozoo.utils.parallel import ParallelConfig, ParallelBackend
>>> import itertools
>>>
>>> def process_quartet(indices):
... i, j, k, l = indices
... # Process quartet...
... return result
>>>
>>> combinations = list(itertools.combinations(range(20), 4))
>>> config = ParallelConfig(
... backend=ParallelBackend.MULTIPROCESSING,
... n_jobs=4
... )
>>> executor = config.get_executor()
>>> results = list(executor.map(process_quartet, combinations))
- class phylozoo.utils.parallel.MultiprocessingExecutor(n_jobs: int | None = None)[source]#
Bases:
objectProcess-based parallel executor.
Uses Python’s multiprocessing module. Best for CPU-bound tasks as it bypasses Python’s Global Interpreter Lock (GIL). Each worker is a separate process with its own memory space.
- map(func: Callable[[T], R], iterable: Iterator[T] | list[T], chunksize: int | None = None) Iterator[R][source]#
Apply function to items using process pool.
- Parameters:
func (Callable[[T], R]) – Function to apply. Must be picklable for multiprocessing.
iterable (Iterator[T] | list[T]) – Items to process. Items must be picklable.
chunksize (int | None, optional) – Number of items to send to each worker at once. If None, multiprocessing chooses an appropriate value. By default None.
- Yields:
R – Results from applying func to each item.
- class phylozoo.utils.parallel.ParallelBackend(value)[source]#
Bases:
EnumAvailable parallelization backends.
The three members are:
SEQUENTIAL– no parallelization; executes sequentially (default). Use for debugging or when overhead outweighs benefits.THREADING– thread-based parallelization. Good for I/O-bound operations or when sharing memory is important. Limited by Python’s GIL for CPU-bound tasks.MULTIPROCESSING– process-based parallelization. Best for CPU-bound tasks that don’t require shared memory. Bypasses Python’s GIL.
- MULTIPROCESSING = 'multiprocessing'#
- SEQUENTIAL = 'sequential'#
- THREADING = 'threading'#
- class phylozoo.utils.parallel.ParallelConfig(backend: ParallelBackend | str = ParallelBackend.SEQUENTIAL, n_jobs: int | None = None, chunksize: int | None = None)[source]#
Bases:
objectConfiguration for parallel execution.
This class encapsulates all settings needed for parallel execution, including backend selection and worker count. Use this as a function parameter (Pattern A) to enable parallelization in PhyloZoo functions.
- Parameters:
backend (ParallelBackend | str, optional) – Parallelization backend to use. By default ParallelBackend.SEQUENTIAL.
n_jobs (int | None, optional) –
Number of workers. Interpretation depends on backend:
SEQUENTIAL: Ignored
THREADING: Number of threads (None/-1 = 1)
MULTIPROCESSING: Number of processes (None/-1 = all CPU cores)
By default None.
chunksize (int | None, optional) – Number of items to process per worker batch. Backend-dependent. By default None (backend chooses).
Examples
>>> from phylozoo.utils.parallel import ParallelConfig, ParallelBackend >>> >>> # Use 4 CPU cores >>> config = ParallelConfig( ... backend=ParallelBackend.MULTIPROCESSING, ... n_jobs=4 ... ) >>> >>> # Use all available cores >>> config = ParallelConfig( ... backend=ParallelBackend.MULTIPROCESSING, ... n_jobs=None # or -1 ... ) >>> >>> # Sequential execution >>> config = ParallelConfig(backend=ParallelBackend.SEQUENTIAL)
- get_executor() ParallelExecutor[source]#
Get executor instance based on backend configuration.
- Returns:
Executor instance ready to use.
- Return type:
- Raises:
PhyloZooValueError – If backend is not recognized.
- class phylozoo.utils.parallel.ParallelExecutor(*args, **kwargs)[source]#
Bases:
ProtocolProtocol for parallel execution backends.
All executors must implement map and starmap methods that apply a function to items in an iterable, potentially in parallel.
- map(func: Callable[[T], R], iterable: Iterator[T] | list[T], chunksize: int | None = None) Iterator[R][source]#
Apply function to each item in iterable, potentially in parallel.
- Parameters:
- Yields:
R – Results from applying func to each item.
- starmap(func: Callable[[...], R], iterable: Iterator[tuple[Any, ...]] | list[tuple[Any, ...]], chunksize: int | None = None) Iterator[R][source]#
Apply function to unpacked arguments from iterable, potentially in parallel.
- Parameters:
func (Callable[..., R]) – Function to apply. Will receive unpacked arguments from tuples.
iterable (Iterator[tuple[Any, ...]] | list[tuple[Any, ...]]) – Tuples of arguments to unpack and pass to func.
chunksize (int | None, optional) – Number of items to process per worker (backend-dependent). By default None (backend chooses).
- Yields:
R – Results from applying func to unpacked arguments.
- class phylozoo.utils.parallel.SequentialExecutor[source]#
Bases:
objectSequential (no parallelization) executor.
Executes operations in order, one at a time. Useful for debugging or when parallelization overhead is not worth it.
- map(func: Callable[[T], R], iterable: Iterator[T] | list[T], chunksize: int | None = None) Iterator[R][source]#
Apply function sequentially to each item.
- class phylozoo.utils.parallel.ThreadingExecutor(n_jobs: int | None = None)[source]#
Bases:
objectThread-based parallel executor.
Uses Python’s threading module. Good for I/O-bound operations or when sharing memory is important. Limited by Python’s Global Interpreter Lock (GIL) for CPU-bound tasks.
- map(func: Callable[[T], R], iterable: Iterator[T] | list[T], chunksize: int | None = None) Iterator[R][source]#
Apply function to items using thread pool.