distance#
Distance module.
This module provides classes and functions for working with distance matrices. A distance matrix represents pairwise distances between a set of labeled items, where distances satisfy properties such as symmetry and non-negativity. The public API (DistanceMatrix and the classifications, operations, io submodules) is re-exported here; the implementation is split across the base, classifications, operations, and io submodules.
Main Classes#
Distance matrix base module.
This module provides the core DistanceMatrix class for working with distance matrices.
- class phylozoo.core.distance.base.DistanceMatrix(distance_matrix: ndarray, labels: list[T] | None = None)[source]#
Bases:
IOMixinAn immutable distance matrix.
A DistanceMatrix represents pairwise distances between a set of labeled items. The matrix is stored as a symmetric numpy array and is immutable after initialization.
- Parameters:
distance_matrix (numpy.ndarray) – A symmetric square 2D numpy array representing pairwise distances. Must be square and symmetric.
labels (list[T] | None, optional) – List of labels corresponding to the rows/columns of the distance matrix. If None, defaults to [0, 1, 2, …, n-1] where n is the matrix size. By default None.
Notes
The class is immutable after initialization. To create a modified version, create a new DistanceMatrix instance with the modified data.
Supported I/O formats:
nexus(default):.nexus,.nex,.nxsphylip:.phy,.phylipcsv:.csv
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> >>> # Create from numpy array >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C']) >>> len(dm) 3 >>> dm.get_distance('A', 'B') 1.0
>>> # Default labels (0, 1, 2, ...) >>> dm2 = DistanceMatrix(matrix) >>> dm2.labels (0, 1, 2)
- __contains__(label: T) bool[source]#
Check if a label is in the distance matrix.
- Parameters:
label (T) – Label to check.
- Returns:
True if label is in the matrix, False otherwise.
- Return type:
- __len__() int[source]#
Return the size of the distance matrix.
- Returns:
Number of rows/columns.
- Return type:
- __repr__() str[source]#
Return string representation of the distance matrix.
- Returns:
String representation.
- Return type:
- __str__() str[source]#
Return human-readable string representation.
For small matrices (up to 10 elements), prints the full upper triangle. For larger matrices, truncates the display. Always includes element names.
- Returns:
Human-readable string with matrix contents (upper triangle only).
- Return type:
- copy() DistanceMatrix[source]#
Create a copy of the distance matrix.
- Returns:
A new DistanceMatrix instance with copied data (also immutable).
- Return type:
Examples
>>> import numpy as np >>> dm1 = DistanceMatrix(np.array([[0, 1], [1, 0]]), labels=['A', 'B']) >>> dm2 = dm1.copy() >>> dm1 is dm2 False >>> dm1.get_distance('A', 'B') == dm2.get_distance('A', 'B') True
- get_distance(label1: T, label2: T) float[source]#
Get the distance between two labels.
- Parameters:
label1 (T) – First label.
label2 (T) – Second label.
- Returns:
Distance between the two labels.
- Return type:
- Raises:
ValueError – If either label is not found in the distance matrix.
Examples
>>> import numpy as np >>> dm = DistanceMatrix(np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]), ... labels=['A', 'B', 'C']) >>> dm.get_distance('A', 'B') 1.0 >>> dm.get_distance('B', 'A') # Symmetric 1.0 >>> dm.get_distance('A', 'C') 2.0
- get_index(label: T) int[source]#
Get the index of a label in the distance matrix.
- Parameters:
label (T) – Label to look up.
- Returns:
Index of the label in the matrix.
- Return type:
- Raises:
PhyloZooValueError – If label is not found in the distance matrix.
Examples
>>> import numpy as np >>> dm = DistanceMatrix(np.array([[0, 1], [1, 0]]), labels=['A', 'B']) >>> dm.get_index('A') 0 >>> dm.get_index('B') 1
- property labels: tuple[T, ...]#
Get the labels corresponding to rows/columns.
- Returns:
Tuple of labels (immutable).
- Return type:
tuple[T, …]
- property np_array: ndarray#
Get the underlying numpy array (read-only).
- Returns:
The distance matrix as a read-only numpy array.
- Return type:
Notes
The returned array is read-only. To modify, create a new DistanceMatrix.
Examples
>>> import numpy as np >>> dm = DistanceMatrix(np.array([[0, 1], [1, 0]]), labels=['A', 'B']) >>> arr = dm.np_array >>> arr[0, 1] 1.0
Classification Functions#
Distance matrix classification module.
This module provides functions for classifying distance matrices based on mathematical properties: triangle inequality, metric properties (triangle inequality, symmetry, non-negativity), and Kalmanson conditions (circular ordering constraints).
- phylozoo.core.distance.classifications.has_zero_diagonal(distance_matrix: DistanceMatrix) bool[source]#
Check if the diagonal of the distance matrix is zero.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
- Returns:
True if diagonal is zero, False otherwise.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import has_zero_diagonal >>> >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix) >>> has_zero_diagonal(dm) True
- phylozoo.core.distance.classifications.is_kalmanson(distance_matrix: DistanceMatrix, circular_order: CircularOrdering[T]) bool[source]#
Check if the distance matrix is Kalmanson with respect to a circular order.
A distance matrix is Kalmanson with respect to a circular order if it satisfies the Kalmanson inequalities for all quadruples of labels in that order.
The Kalmanson conditions are classical inequalities for circular metrics [Kalmanson, 1975].
For a circular order (l1, l2, …, ln), the Kalmanson conditions are:
d(ei, ej) + d(ek, el) <= d(ei, ek) + d(ej, el) for all i < j < k < l
d(ei, el) + d(ej, ek) <= d(ei, ek) + d(ej, el) for all i < j < k < l
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
circular_order (CircularOrdering[T]) – A circular ordering of all labels in the distance matrix. Must contain the same elements as the distance matrix labels.
- Returns:
True if the matrix is Kalmanson with respect to the given order, False otherwise.
- Return type:
- Raises:
PhyloZooValueError – If circular_order is empty, does not contain all labels, or if the matrix is not a pseudo-metric.
TypeError – If circular_order is not a CircularOrdering.
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import is_kalmanson >>> from phylozoo.core.primitives.circular_ordering import CircularOrdering >>> >>> # Kalmanson matrix (e.g., from a circular network) >>> matrix = np.array([ ... [0, 1, 2, 2, 1], ... [1, 0, 1, 2, 2], ... [2, 1, 0, 1, 2], ... [2, 2, 1, 0, 1], ... [1, 2, 2, 1, 0] ... ]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C', 'D', 'E']) >>> co = CircularOrdering(['A', 'B', 'C', 'D', 'E']) >>> is_kalmanson(dm, co) True
- phylozoo.core.distance.classifications.is_metric(distance_matrix: DistanceMatrix) bool[source]#
Check if the distance matrix is a metric.
A metric distance matrix satisfies:
Non-negativity: d(x, y) >= 0 for all x, y
Triangle inequality: d(x, z) <= d(x, y) + d(y, z) for all x, y, z
Zero diagonal: d(x, x) = 0 for all x
Symmetry: d(x, y) = d(y, x) for all x, y (already enforced in constructor)
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
- Returns:
True if the matrix is a metric, False otherwise.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import is_metric >>> >>> # Euclidean distance matrix (metric) >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix) >>> is_metric(dm) True >>> >>> # Non-metric (violates triangle inequality) >>> bad_matrix = np.array([[0, 1, 5], [1, 0, 1], [5, 1, 0]]) >>> bad_dm = DistanceMatrix(bad_matrix) >>> is_metric(bad_dm) False
- phylozoo.core.distance.classifications.is_nonnegative(distance_matrix: DistanceMatrix) bool[source]#
Check if all distances are non-negative.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
- Returns:
True if all distances are non-negative, False otherwise.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import is_nonnegative >>> >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix) >>> is_nonnegative(dm) True
- phylozoo.core.distance.classifications.is_pseudo_metric(distance_matrix: DistanceMatrix) bool[source]#
Check if the distance matrix is a pseudo-metric.
A pseudo-metric distance matrix satisfies:
Non-negativity: d(x, y) >= 0 for all x, y
Triangle inequality: d(x, z) <= d(x, y) + d(y, z) for all x, y, z
Note: Unlike a metric, a pseudo-metric does not require d(x, x) = 0 (though it may still hold).
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
- Returns:
True if the matrix is a pseudo-metric, False otherwise.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import is_pseudo_metric >>> >>> # Pseudo-metric (satisfies non-negativity and triangle inequality) >>> matrix = np.array([[0.1, 1, 2], [1, 0.1, 1], [2, 1, 0.1]]) >>> dm = DistanceMatrix(matrix) >>> is_pseudo_metric(dm) True >>> >>> # Not a pseudo-metric (violates triangle inequality) >>> bad_matrix = np.array([[0, 1, 5], [1, 0, 1], [5, 1, 0]]) >>> bad_dm = DistanceMatrix(bad_matrix) >>> is_pseudo_metric(bad_dm) False
- phylozoo.core.distance.classifications.satisfies_triangle_inequality(distance_matrix: DistanceMatrix) bool[source]#
Check if the distance matrix satisfies the triangle inequality.
A distance matrix satisfies the triangle inequality if: d(i,k) <= d(i,j) + d(j,k) for all i, j, k.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
- Returns:
True if triangle inequality holds, False otherwise.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import satisfies_triangle_inequality >>> >>> # Matrix satisfying triangle inequality >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix) >>> satisfies_triangle_inequality(dm) True >>> >>> # Matrix violating triangle inequality >>> bad_matrix = np.array([[0, 1, 5], [1, 0, 1], [5, 1, 0]]) >>> bad_dm = DistanceMatrix(bad_matrix) >>> satisfies_triangle_inequality(bad_dm) False
Operations#
Distance matrix operations module.
This module provides algorithms for working with distance matrices, including the Traveling Salesman Problem (TSP) solver using the Held-Karp dynamic programming [Held and Karp, 1962] algorithm.
- phylozoo.core.distance.operations.approximate_tsp_tour(distance_matrix: DistanceMatrix, method: str = 'simulated_annealing') CircularOrdering[source]#
Find an approximate TSP tour using heuristic methods.
This function uses approximation algorithms to find a good (but not necessarily optimal) traveling salesman tour. These methods are much faster than the optimal algorithm and can handle larger instances.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to solve TSP for.
method (str, optional) –
Heuristic method to use. Must be one of:
’simulated_annealing’: Simulated annealing heuristic (default)
’greedy’: Greedy nearest-neighbor heuristic
’christofides’: Christofides algorithm (for metric distances)
By default ‘simulated_annealing’.
- Returns:
A circular ordering (tour) of all labels. The ordering is in canonical form.
- Return type:
- Raises:
PhyloZooValueError – If method is not one of the supported methods.
Notes
simulated_annealing: Uses simulated annealing with a greedy initialization. Generally produces good solutions but is slower than greedy.
greedy: Simple nearest-neighbor heuristic. Fast but may produce poor solutions.
christofides: Provides a 3/2-approximation for metric distances [Christofides, 1976]. Slower than greedy but guarantees better worst-case performance.
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.operations import approximate_tsp_tour >>> >>> matrix = np.array([ ... [0, 1, 2, 3], ... [1, 0, 1, 2], ... [2, 1, 0, 1], ... [3, 2, 1, 0] ... ]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C', 'D']) >>> >>> # Use simulated annealing >>> tour1 = approximate_tsp_tour(dm, method='simulated_annealing') >>> len(tour1) == len(dm.labels) True >>> >>> # Use greedy heuristic >>> tour2 = approximate_tsp_tour(dm, method='greedy') >>> len(tour2) == len(dm.labels) True
- phylozoo.core.distance.operations.optimal_tsp_tour(distance_matrix: DistanceMatrix) CircularOrdering[source]#
Solve TSP to optimality using dynamic programming (Held-Karp algorithm).
This function finds the optimal traveling salesman tour that visits all labels exactly once and returns to the starting point, minimizing the total distance.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to solve TSP for.
- Returns:
A circular ordering (tour) of all labels. The ordering is in canonical form.
- Return type:
Notes
This implementation uses the Held-Karp algorithm [Held and Karp, 1962] with dynamic programming, optimized with Numba JIT compilation and bitmask-based set operations. The time complexity is O(n^2 * 2^n), so it’s only practical for small instances (typically n <= 20).
The algorithm uses bitmasks to represent sets of nodes, which is more efficient than Python sets and compatible with Numba acceleration.
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.operations import optimal_tsp_tour >>> >>> # Small example >>> matrix = np.array([ ... [0, 1, 2, 3], ... [1, 0, 1, 2], ... [2, 1, 0, 1], ... [3, 2, 1, 0] ... ]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C', 'D']) >>> tour = optimal_tsp_tour(dm) >>> len(tour) == len(dm.labels) True >>> set(tour.order) == set(dm.labels) True
I/O Support#
Distance matrix I/O module.
Distance matrices support reading and writing in multiple formats: NEXUS, PHYLIP, and CSV. This module provides format handlers registered with FormatRegistry for use with the IOMixin system.
The following format handlers are defined and registered:
nexus: NEXUS format for distance matrices (extensions: .nexus, .nex, .nxs)
Writer: to_nexus() - Converts DistanceMatrix to NEXUS string
Reader: from_nexus() - Parses NEXUS string to DistanceMatrix
phylip: PHYLIP format for distance matrices (extensions: .phy, .phylip)
Writer: to_phylip() - Converts DistanceMatrix to PHYLIP string
Reader: from_phylip() - Parses PHYLIP string to DistanceMatrix
csv: CSV format for distance matrices (extensions: .csv)
Writer: to_csv() - Converts DistanceMatrix to CSV string
Reader: from_csv() - Parses CSV string to DistanceMatrix
These handlers are automatically registered when this module is imported. DistanceMatrix inherits from IOMixin, so you can use:
dm.save(‘file.nexus’) - Save to file (auto-detects format)
dm.load(‘file.nexus’) - Load from file (auto-detects format)
dm.to_string(format=’phylip’) - Convert to string
dm.from_string(string, format=’csv’) - Parse from string
DistanceMatrix.convert(‘in.nexus’, ‘out.phy’) - Convert between formats
DistanceMatrix.convert_string(str1, ‘nexus’, ‘phylip’) - Convert strings
- phylozoo.core.distance.io.from_csv(csv_string: str, **kwargs: Any) DistanceMatrix[source]#
Parse a CSV format string and create a DistanceMatrix.
- Parameters:
csv_string (str) – CSV format string containing distance matrix data.
**kwargs –
Additional arguments:
delimiter (str): Field delimiter (default: ‘,’). Can be ‘,’ or ‘ ‘ or whitespace
has_header (bool): Whether first row is a header (default: True)
- Returns:
Parsed distance matrix.
- Return type:
- Raises:
PhyloZooParseError – If the CSV string is malformed or cannot be parsed (e.g., empty string, no data rows, invalid distance values, mismatched dimensions, non-symmetric matrix).
Examples
>>> from phylozoo.core.distance.io import from_csv >>> >>> csv_str = ''',A,B,C ... A,0.0,1.0,2.0 ... B,1.0,0.0,1.0 ... C,2.0,1.0,0.0 ... ''' >>> >>> dm = from_csv(csv_str) >>> len(dm) 3 >>> dm.get_distance('A', 'B') 1.0
Notes
This parser expects:
First row (if has_header=True): empty first cell, then taxon labels
Subsequent rows: taxon label in first column, then distances
Delimiter can be comma, tab, or whitespace
- phylozoo.core.distance.io.from_nexus(nexus_string: str, **kwargs: Any) DistanceMatrix[source]#
Parse a NEXUS format string and create a DistanceMatrix.
- Parameters:
nexus_string (str) – NEXUS format string containing distance matrix data.
**kwargs – Additional arguments (currently unused, for compatibility).
- Returns:
Parsed distance matrix.
- Return type:
- Raises:
PhyloZooParseError – If the NEXUS string is malformed or cannot be parsed (e.g., missing Taxa or DISTANCES blocks, mismatched number of taxa and matrix rows, invalid matrix format, invalid distance values).
Examples
>>> from phylozoo.core.distance.io import from_nexus >>> >>> # Lower triangular format >>> nexus_str = '''#NEXUS ... ... BEGIN Taxa; ... DIMENSIONS ntax=3; ... TAXLABELS ... A ... B ... C ... ; ... END; ... ... BEGIN DISTANCES; ... DIMENSIONS ntax=3; ... FORMAT triangle=LOWER diagonal LABELS; ... MATRIX ... A 0.000000 ... B 1.000000 0.000000 ... C 2.000000 1.000000 0.000000 ... ; ... END;''' >>> >>> dm = from_nexus(nexus_str) >>> len(dm) 3 >>> dm.get_distance('A', 'B') 1.0
Notes
This parser supports:
A Taxa block with TAXLABELS
A DISTANCES block with FORMAT triangle=LOWER/UPPER/BOTH diagonal LABELS
Lower triangular, upper triangular, or full matrix formats
- phylozoo.core.distance.io.from_phylip(phylip_string: str, **kwargs: Any) DistanceMatrix[source]#
Parse a PHYLIP format string and create a DistanceMatrix.
- Parameters:
phylip_string (str) – PHYLIP format string containing distance matrix data.
**kwargs – Additional arguments (currently unused, for compatibility).
- Returns:
Parsed distance matrix.
- Return type:
- Raises:
PhyloZooParseError – If the PHYLIP string is malformed or cannot be parsed (e.g., empty string, invalid number of taxa, insufficient values, invalid distance values, non-symmetric matrix).
PhyloZooValueError – If the number of taxa is not positive.
Examples
>>> from phylozoo.core.distance.io import from_phylip >>> >>> phylip_str = '''3 ... A 0.00000 1.00000 2.00000 ... B 1.00000 0.00000 1.00000 ... C 2.00000 1.00000 0.00000 ... ''' >>> >>> dm = from_phylip(phylip_str) >>> len(dm) 3 >>> dm.get_distance('A', 'B') 1.0
Notes
This parser expects:
First line: number of taxa
Subsequent lines: taxon name (first 10 chars or until whitespace) followed by distances
Full matrix format (not just lower triangle)
- phylozoo.core.distance.io.to_csv(distance_matrix: DistanceMatrix, **kwargs: Any) str[source]#
Convert a distance matrix to CSV format string.
CSV format consists of: - First row: header with empty first cell, then taxon labels - Subsequent rows: taxon label in first column, then distances
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to convert.
**kwargs –
Additional arguments:
delimiter (str): Field delimiter (default: ‘,’)
include_header (bool): Include header row (default: True)
- Returns:
The CSV format string representation of the distance matrix.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.io import to_csv >>> >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C']) >>> csv_str = to_csv(dm) >>> print(csv_str[:30]) ,A,B,C A,0.000000,1.000000,2.0
Notes
Default delimiter is comma. Use delimiter=’ ‘ for tab-separated values.
- phylozoo.core.distance.io.to_nexus(distance_matrix: DistanceMatrix, **kwargs: Any) str[source]#
Convert a distance matrix to a NEXUS format string.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to convert.
**kwargs – Additional arguments: - triangle (str): Triangle format - ‘LOWER’, ‘UPPER’, or ‘BOTH’ (default: ‘LOWER’)
- Returns:
The NEXUS format string representation of the distance matrix.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.io import to_nexus
>>> # Basic usage (lower triangular format) >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C']) >>> nexus_str = to_nexus(dm) >>> '#NEXUS' in nexus_str True >>> 'triangle=LOWER' in nexus_str True
>>> # Upper triangular format >>> nexus_str_upper = to_nexus(dm, triangle='UPPER') >>> 'triangle=UPPER' in nexus_str_upper True
Notes
The NEXUS format includes:
Taxa block with label names
DISTANCES block with matrix in specified triangle format
Format options: triangle=LOWER, triangle=UPPER, or triangle=BOTH
- phylozoo.core.distance.io.to_phylip(distance_matrix: DistanceMatrix, **kwargs: Any) str[source]#
Convert a distance matrix to PHYLIP format string.
PHYLIP format consists of:
First line: number of taxa
Subsequent lines: taxon name (padded to 10 chars) followed by all distances
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to convert.
**kwargs – Additional arguments (currently unused, for compatibility).
- Returns:
The PHYLIP format string representation of the distance matrix.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.io import to_phylip >>> >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C']) >>> phylip_str = to_phylip(dm) >>> print(phylip_str[:31]) 3 A 0.00000 1.00000 2.0
Notes
Taxon names are padded to 10 characters (standard PHYLIP format). Distances are formatted with 5 decimal places.