Parallel Execution ================== The :mod:`phylozoo.utils.parallel` module provides the standard interface for parallel execution across PhyloZoo. It defines a small set of backends (sequential, threading, multiprocessing) behind a single :class:`~phylozoo.utils.parallel.ParallelConfig` configuration object and a common executor protocol, so that any function that wants to support parallelization exposes the same API to its users. All classes on this page can be imported from the parallel module: .. code-block:: python from phylozoo.utils.parallel import ParallelConfig, ParallelBackend .. note:: No PhyloZoo function currently exposes a ``parallel`` parameter. This module is the intended way to introduce parallelization in future implementations or for dependent packages to utilize, so that all parallelized functions share a uniform interface. How parallelization is organized -------------------------------- Parallel execution in PhyloZoo follows a single pattern: a function that supports parallel execution accepts a :class:`~phylozoo.utils.parallel.ParallelConfig` (or ``None`` for the default sequential behavior), obtains an executor from it, and uses the executor's :meth:`map` / :meth:`starmap` to dispatch work. Choosing a backend, the number of workers, and the chunk size is therefore done in one place (the config), and the function itself does not need to know about threads, processes, or pools. Backends ^^^^^^^^ Three backends are available through :class:`~phylozoo.utils.parallel.ParallelBackend`: - ``SEQUENTIAL`` — no parallelization; executes one item at a time. The default, and the right choice for debugging or when parallel overhead would outweigh the benefits. - ``THREADING`` — uses a thread pool (:class:`concurrent.futures.ThreadPoolExecutor`). Good for I/O-bound work or when worker memory must be shared; limited by Python's GIL for CPU-bound tasks. - ``MULTIPROCESSING`` — uses a process pool (:class:`multiprocessing.Pool`). Best for CPU-bound tasks; bypasses the GIL but requires that the function and its arguments be picklable. Each backend is implemented as a small executor class (:class:`~phylozoo.utils.parallel.SequentialExecutor`, :class:`~phylozoo.utils.parallel.ThreadingExecutor`, :class:`~phylozoo.utils.parallel.MultiprocessingExecutor`) that conforms to the :class:`~phylozoo.utils.parallel.ParallelExecutor` protocol with ``map`` and ``starmap`` methods. Configuring parallel execution ------------------------------ Construct a :class:`~phylozoo.utils.parallel.ParallelConfig` to describe how work should be dispatched. **Sequential (default)** .. code-block:: python from phylozoo.utils.parallel import ParallelConfig, ParallelBackend config = ParallelConfig(backend=ParallelBackend.SEQUENTIAL) **Multiprocessing with a fixed number of workers** .. code-block:: python config = ParallelConfig( backend=ParallelBackend.MULTIPROCESSING, n_jobs=4, ) **Multiprocessing using all available cores** Pass ``n_jobs=None`` (or ``-1``) to use every available CPU core: .. code-block:: python config = ParallelConfig( backend=ParallelBackend.MULTIPROCESSING, n_jobs=None, ) **Threading** .. code-block:: python config = ParallelConfig( backend=ParallelBackend.THREADING, n_jobs=4, ) The backend can also be supplied as a string (``"sequential"``, ``"threading"``, ``"multiprocessing"``) for convenience. Using an executor directly -------------------------- Once a config is built, call :meth:`~phylozoo.utils.parallel.ParallelConfig.get_executor` to obtain an executor and use its :meth:`map` or :meth:`starmap` methods: .. code-block:: python import itertools from phylozoo.utils.parallel import ParallelConfig, ParallelBackend def process_quartet(indices: tuple[int, int, int, int]) -> float: i, j, k, l = indices # ... do work ... return 0.0 combinations = list(itertools.combinations(range(20), 4)) config = ParallelConfig( backend=ParallelBackend.MULTIPROCESSING, n_jobs=4, ) executor = config.get_executor() results = list(executor.map(process_quartet, combinations)) For functions that take multiple positional arguments, use :meth:`~phylozoo.utils.parallel.ParallelExecutor.starmap` with an iterable of argument tuples: .. code-block:: python def pair_distance(a: int, b: int) -> float: # ... do work ... return 0.0 pairs = [(0, 1), (0, 2), (1, 2)] results = list(executor.starmap(pair_distance, pairs)) Pattern for PhyloZoo functions ------------------------------ When a PhyloZoo function wants to support parallel execution, it should accept a ``parallel`` keyword argument typed as :class:`~phylozoo.utils.parallel.ParallelConfig` ``| None``, default to a sequential config when ``None`` is given, and dispatch its inner loop through the executor: .. code-block:: python from phylozoo.utils.parallel import ParallelConfig, ParallelBackend def my_parallel_function( data: list[int], parallel: ParallelConfig | None = None, ) -> list[float]: if parallel is None: parallel = ParallelConfig(backend=ParallelBackend.SEQUENTIAL) executor = parallel.get_executor() return list(executor.map(_process_item, data)) This keeps the choice of backend and worker count in the caller's hands while the function body stays agnostic to the parallelization strategy. .. warning:: For the multiprocessing backend, both ``func`` and every item in ``iterable`` must be picklable. Module-level functions are picklable; closures and lambdas are not. See Also -------- - :doc:`Parallel API <../../api/utils/parallel>` — Full class and method reference. - :doc:`Exceptions ` — :class:`~phylozoo.utils.exceptions.general.PhyloZooValueError` is raised for invalid backend names or non-positive worker counts.