Pairwise Distances#
The phylozoo.core.distance module provides immutable containers for pairwise distances
between taxa, along with comprehensive tools for classification and optimization.
Distance matrices are fundamental to many phylogenetic inference algorithms.
All classes and functions on this page can be imported from the core distance module:
from phylozoo.core.distance import *
# or directly
from phylozoo.core.distance import DistanceMatrix
Working with Distance Matrices#
The DistanceMatrix class is the canonical container
for pairwise distances in PhyloZoo. It provides an immutable, labeled, and read-only
representation that ensures data integrity throughout your analysis pipeline.
Creating a Distance Matrix#
Distance matrices can be created from NumPy arrays with optional labels:
import numpy as np
from phylozoo.core.distance import DistanceMatrix
# Create a symmetric distance matrix
matrix = np.array([
[0.0, 1.0, 2.0],
[1.0, 0.0, 1.0],
[2.0, 1.0, 0.0]
])
# Create with labels
dm = DistanceMatrix(matrix, labels=["A", "B", "C"])
# Or without labels (defaults to 0, 1, 2, ...)
dm2 = DistanceMatrix(matrix)
The matrix must be square and symmetric. The constructor validates these properties
and raises PhyloZooValueError if the input is invalid.
Accessing Distances#
Distance matrices provide several methods for accessing distances and metadata:
# Get distance between two labeled items
distance = dm.get_distance("A", "B") # Returns: 1.0
# Get the index of a label
index = dm.get_index("A") # Returns: 0
# Access labels and underlying array
labels = dm.labels # Returns: ("A", "B", "C")
array = dm.np_array # Returns: read-only numpy array
These methods provide safe access to distances without exposing the mutable underlying array, maintaining immutability guarantees.
File Input/Output#
Distance matrices support reading and writing in multiple phylogenetic formats:
NEXUS (default): Standard phylogenetic data format — see NEXUS format
PHYLIP: Classic phylogenetic format — see PHYLIP format
CSV: Comma-separated values for interoperability — see CSV format
# Load from file (auto-detects format by extension)
dm = DistanceMatrix.load("distances.nexus")
# Load with explicit format
dm = DistanceMatrix.load("distances.phy", format="phylip")
# Save to file
dm.save("output.csv", format="csv")
See also
The DistanceMatrix class uses the IOMixin interface, providing
consistent file handling across PhyloZoo classes. For details on the I/O system,
see the I/O manual.
Classification Functions#
The distance module provides functions to classify distance matrices based on mathematical properties. These classifications are essential for validating that distance matrices meet the requirements of specific algorithms.
Basic Properties#
Non-negativity
The is_nonnegative() function checks
that all distances are non-negative: \(d(i,j) \geq 0\) for all \(i, j\).
This is a fundamental requirement for distance functions.
Zero Diagonal
The has_zero_diagonal() function
verifies that all diagonal entries are exactly zero, ensuring that the distance
from any item to itself is zero. This is a strict equality check using NumPy
equality, intended to verify metric conventions.
Triangle Inequality
The satisfies_triangle_inequality()
function checks whether the distance matrix satisfies the triangle inequality:
This property is fundamental to metric spaces and is required by many distance-based algorithms. The implementation uses Numba-accelerated triple loops for efficient computation on moderately sized matrices.
Example:
from phylozoo.core.distance.classifications import satisfies_triangle_inequality
if satisfies_triangle_inequality(dm):
print("Matrix satisfies triangle inequality")
else:
print("Matrix violates triangle inequality")
Metric Properties#
Metrics
A metric distance matrix satisfies four properties:
Non-negativity: \(d(x, y) \geq 0\) for all \(x, y\)
Triangle inequality: \(d(x, z) \leq d(x, y) + d(y, z)\) for all \(x, y, z\)
Zero diagonal: \(d(x, x) = 0\) for all \(x\)
Symmetry: \(d(x, y) = d(y, x)\) for all \(x, y\) (enforced by constructor)
The is_metric() function provides
a convenient wrapper that checks all of these properties. Use this function when
downstream algorithms require a proper metric and you want a single boolean check.
Pseudo-metrics
A pseudo-metric is like a metric but allows distinct items to have zero distance.
The is_pseudo_metric() function
checks non-negativity and triangle inequality, but does not require a zero diagonal.
Kalmanson Conditions
The Kalmanson condition [Kalmanson, 1975] is a pair of inequalities that must hold for every ordered quadruple \((i < j < k < l)\) in a circular ordering:
The is_kalmanson() function checks
these conditions with respect to a given circular ordering. This function requires:
The matrix to be a pseudo-metric
The provided
CircularOrderingto contain exactly the same labels as the distance matrix
The check is implemented using Numba-accelerated loops.
Traveling Salesman Problem (TSP)#
The distance module provides functions for solving the Traveling Salesman Problem, which finds the optimal tour visiting all taxa exactly once and returning to the starting point. TSP solutions are useful for generating circular orderings.
Exact Solution#
The optimal_tsp_tour() function provides
an exact solution using the Held–Karp dynamic programming algorithm [Held and Karp, 1962].
The function returns a CircularOrdering in canonical
form, representing the optimal tour.
Example:
from phylozoo.core.distance.operations import optimal_tsp_tour
tour = optimal_tsp_tour(dm)
print(f"Optimal tour: {tour.order}")
Approximate Solutions#
For larger instances where exact solutions are infeasible, the
approximate_tsp_tour() function provides
heuristic solvers. Supported methods include:
simulated_annealing (default): Simulated annealing with greedy initialization. Often produces good-quality tours at moderate runtime.
greedy: Nearest-neighbor heuristic. Very fast but can produce poor results in adversarial cases.
christofides: Christofides algorithm [Christofides, 1976] providing a \(3/2\)-approximation when the distance matrix is metric.
The function uses NetworkX’s traveling salesman utilities and returns a
CircularOrdering. It is suitable for larger
matrices where exact solutions are computationally prohibitive.
See Also#
API Reference — Complete function signatures and detailed examples
Circular Orderings — Working with circular orderings