Introduction#
What is PhyloZoo?#
PhyloZoo is a Python package for working with phylogenetic networks. Phylogenetic networks extend phylogenetic trees by allowing for both divergence (splitting) and merging events, making them suitable for modeling processes like hybridization, horizontal gene transfer, and admixture. Unlike traditional phylogenetic trees, networks can explicitly represent reticulate evolutionary events where lineages merge, providing a more realistic model of evolutionary history for many groups of organisms.
PhyloZoo provides a comprehensive toolkit for phylogenetic network analysis, including network representation, manipulation, and visualization. The package is designed with a focus on correctness, performance, and ease of use. It supports both directed and semi-directed network representations, allowing you to work with rooted and unrooted phylogenetic analyses. PhyloZoo integrates seamlessly with the Python scientific computing ecosystem, using NumPy for efficient numerical operations and providing a clean, intuitive API for phylogenetic analysis workflows.
Beyond network representation, PhyloZoo offers native support for quartets, split systems, multiple sequence alignments, and distance matrices within a consistent interface. Conversions between these representations are supported, allowing analyses to move flexibly between data types as required. All core data structures are validated upon construction to ensure well-defined phylogenetic objects, improving reliability and reproducibility. The package includes support for standard file formats such as eNewick, DOT, FASTA, and NEXUS, and flexible visualization functionality with customizable layouts for figures.
Package Structure#
PhyloZoo is organized into several main modules. For detailed documentation on each module, see the corresponding sections in this manual:
- Core Module (phylozoo.core)
The core module contains fundamental data structures and classes. See Core Module for detailed documentation.
Networks:
DirectedPhyNetworkandSemiDirectedPhyNetworkclasses for representing phylogenetic networks. Directed networks are fully directed DAGs with explicit root and hybrid nodes, while semi-directed networks allow undirected tree edges for modelling root uncertainty. See Networks for details.Quartets:
Quartet,QuartetProfile, andQuartetProfileSetclasses for working with four-taxon relationships, which are fundamental building blocks for network inference. See Quartets for details.Splits:
SplitandSplitSystemclasses for representing bipartitions of taxa, a common way to encode phylogenetic relationships. See Splits for details.Sequences:
MSA(Multiple Sequence Alignment) class with efficient NumPy-based storage and bootstrapping capabilities. See Sequences for details.Distance:
DistanceMatrixclass for pairwise distance data with support for various distance matrix properties and classifications. See Distance for details.Primitives: Fundamental structures like
Partition,CircularOrdering, andCircularSetOrderingused throughout the package. See Primitives for details.
- Visualization Module (
phylozoo.viz) Flexible plotting system for networks. See Visualization Module for detailed documentation.
Network Plotting: Functions for visualizing directed and semi-directed networks with customizable layouts and styling using Matplotlib. See Plotting for details.
Layout Algorithms: Custom PhyloZoo layouts (pz-dag, pz-radial) and access to standard NetworkX and Graphviz layouts for various visualization needs. See Plotting for layout options.
- Utilities Module (
phylozoo.utils) Supporting functionality. See Utils Module for detailed documentation.
Exceptions: Comprehensive custom exception hierarchy for clear error reporting. See Exceptions for details.
Validation: Class and object validation utilities and decorators. See Validation for details.
I/O: File format support including eNewick, DOT, and many other formats. See I/O for details.
Design Philosophy#
PhyloZoo follows several key design principles that benefit end users:
- Object-Oriented and Immutable
PhyloZoo uses an object-oriented design where core data structures (networks, quartets, splits, etc.) are implemented as immutable classes. Once created, these objects cannot be modified in-place, ensuring data integrity and making code more predictable and easier to reason about. To modify a network, you create a new instance with the desired changes.
- Comprehensive Documentation
All public functions, classes, and methods include detailed docstrings with parameter descriptions, return values, exceptions, and examples. This ensures that the code is self-documenting and accessible to users.
- Validation and Error Handling
PhyloZoo includes a custom exception hierarchy for clear, specific error messages. Network validation ensures that objects always represent valid phylogenetic structures, catching errors early and providing helpful diagnostic information.
- Performance
Where appropriate, PhyloZoo leverages NumPy for efficient numerical operations and supports optional Numba JIT compilation for computationally intensive algorithms. The package is designed to handle both small-scale exploratory analyses and larger production workflows.
Getting Started#
To get started with PhyloZoo, see the Installation Guide for detailed installation instructions and visit the detailed documentation on specific modules:
Core Module: Data structures and network operations
Visualization Module: Network plotting and visualization
Utilities Module: Utility functions and classes
Alternatively, see the Quickstart guide for a quickstart tutorial.
For complete API reference, see the API Reference section.