distance#
Distance module.
This module provides classes and functions for working with distance matrices. A distance matrix represents pairwise distances between a set of labeled items, where distances satisfy properties such as symmetry and non-negativity. The public API (DistanceMatrix and the classifications, io submodules) is re-exported here; the implementation is split across the base, classifications, and io submodules.
Main Classes#
Distance matrix base module.
This module provides the core DistanceMatrix class for working with distance matrices.
- class phylozoo.core.distance.base.DistanceMatrix(distance_matrix: ndarray, labels: list[T] | None = None)[source]#
Bases:
IOMixinAn immutable distance matrix.
A DistanceMatrix represents pairwise distances between a set of labeled items. The matrix is stored as a symmetric numpy array and is immutable after initialization.
- Parameters:
distance_matrix (numpy.ndarray) – A symmetric square 2D numpy array representing pairwise distances. Must be square and symmetric.
labels (list[T] | None, optional) – List of labels corresponding to the rows/columns of the distance matrix. If None, defaults to [0, 1, 2, …, n-1] where n is the matrix size. By default None.
Notes
The class is immutable after initialization. To create a modified version, create a new DistanceMatrix instance with the modified data.
Supported I/O formats:
nexus(default):.nexus,.nex,.nxsphylip:.phy,.phylipcsv:.csv
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> >>> # Create from numpy array >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C']) >>> len(dm) 3 >>> dm.get_distance('A', 'B') 1.0
>>> # Default labels (0, 1, 2, ...) >>> dm2 = DistanceMatrix(matrix) >>> dm2.labels (0, 1, 2)
- __contains__(label: T) bool[source]#
Check if a label is in the distance matrix.
- Parameters:
label (T) – Label to check.
- Returns:
True if label is in the matrix, False otherwise.
- Return type:
- __len__() int[source]#
Return the size of the distance matrix.
- Returns:
Number of rows/columns.
- Return type:
- __repr__() str[source]#
Return string representation of the distance matrix.
- Returns:
String representation.
- Return type:
- __str__() str[source]#
Return human-readable string representation.
For small matrices (up to 10 elements), prints the full upper triangle. For larger matrices, truncates the display. Always includes element names.
- Returns:
Human-readable string with matrix contents (upper triangle only).
- Return type:
- copy() DistanceMatrix[source]#
Create a copy of the distance matrix.
- Returns:
A new DistanceMatrix instance with copied data (also immutable).
- Return type:
Examples
>>> import numpy as np >>> dm1 = DistanceMatrix(np.array([[0, 1], [1, 0]]), labels=['A', 'B']) >>> dm2 = dm1.copy() >>> dm1 is dm2 False >>> dm1.get_distance('A', 'B') == dm2.get_distance('A', 'B') True
- get_distance(label1: T, label2: T) float[source]#
Get the distance between two labels.
- Parameters:
label1 (T) – First label.
label2 (T) – Second label.
- Returns:
Distance between the two labels.
- Return type:
- Raises:
ValueError – If either label is not found in the distance matrix.
Examples
>>> import numpy as np >>> dm = DistanceMatrix(np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]), ... labels=['A', 'B', 'C']) >>> dm.get_distance('A', 'B') 1.0 >>> dm.get_distance('B', 'A') # Symmetric 1.0 >>> dm.get_distance('A', 'C') 2.0
- get_index(label: T) int[source]#
Get the index of a label in the distance matrix.
- Parameters:
label (T) – Label to look up.
- Returns:
Index of the label in the matrix.
- Return type:
- Raises:
PhyloZooValueError – If label is not found in the distance matrix.
Examples
>>> import numpy as np >>> dm = DistanceMatrix(np.array([[0, 1], [1, 0]]), labels=['A', 'B']) >>> dm.get_index('A') 0 >>> dm.get_index('B') 1
- property labels: tuple[Any, ...]#
Get the labels corresponding to rows/columns.
- Returns:
Tuple of labels (immutable).
- Return type:
tuple[Any, …]
- property np_array: ndarray#
Get the underlying numpy array (read-only).
- Returns:
The distance matrix as a read-only numpy array.
- Return type:
Notes
The returned array is read-only. To modify, create a new DistanceMatrix.
Examples
>>> import numpy as np >>> dm = DistanceMatrix(np.array([[0, 1], [1, 0]]), labels=['A', 'B']) >>> arr = dm.np_array >>> arr[0, 1] 1.0
Classification Functions#
Distance matrix classification module.
This module provides functions for classifying distance matrices based on mathematical properties: triangle inequality, metric properties (triangle inequality, symmetry, non-negativity), Kalmanson conditions (circular ordering constraints), and split- decomposition properties (tree metrics, total decomposability).
- phylozoo.core.distance.classifications.has_zero_diagonal(distance_matrix: DistanceMatrix) bool[source]#
Check if the diagonal of the distance matrix is zero.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
- Returns:
True if diagonal is zero, False otherwise.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import has_zero_diagonal >>> >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix) >>> has_zero_diagonal(dm) True
- phylozoo.core.distance.classifications.is_kalmanson(distance_matrix: DistanceMatrix, circular_order: CircularOrdering[T]) bool[source]#
Check if the distance matrix is Kalmanson with respect to a circular order.
A distance matrix is Kalmanson with respect to a circular order if it satisfies the Kalmanson inequalities for all quadruples of labels in that order.
The Kalmanson conditions are classical inequalities for circular metrics [Kalmanson, 1975].
For a circular order (l1, l2, …, ln), the Kalmanson conditions are:
d(ei, ej) + d(ek, el) <= d(ei, ek) + d(ej, el) for all i < j < k < l
d(ei, el) + d(ej, ek) <= d(ei, ek) + d(ej, el) for all i < j < k < l
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
circular_order (CircularOrdering[T]) – A circular ordering of all labels in the distance matrix. Must contain the same elements as the distance matrix labels.
- Returns:
True if the matrix is Kalmanson with respect to the given order, False otherwise.
- Return type:
- Raises:
PhyloZooValueError – If circular_order is empty, does not contain all labels, or if the matrix is not a pseudo-metric.
TypeError – If circular_order is not a CircularOrdering.
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import is_kalmanson >>> from phylozoo.core.primitives.circular_ordering import CircularOrdering >>> >>> # Kalmanson matrix (e.g., from a circular network) >>> matrix = np.array([ ... [0, 1, 2, 2, 1], ... [1, 0, 1, 2, 2], ... [2, 1, 0, 1, 2], ... [2, 2, 1, 0, 1], ... [1, 2, 2, 1, 0] ... ]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C', 'D', 'E']) >>> co = CircularOrdering(['A', 'B', 'C', 'D', 'E']) >>> is_kalmanson(dm, co) True
- phylozoo.core.distance.classifications.is_metric(distance_matrix: DistanceMatrix) bool[source]#
Check if the distance matrix is a metric.
A metric distance matrix satisfies:
Non-negativity: d(x, y) >= 0 for all x, y
Triangle inequality: d(x, z) <= d(x, y) + d(y, z) for all x, y, z
Zero diagonal: d(x, x) = 0 for all x
Symmetry: d(x, y) = d(y, x) for all x, y (already enforced in constructor)
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
- Returns:
True if the matrix is a metric, False otherwise.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import is_metric >>> >>> # Euclidean distance matrix (metric) >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix) >>> is_metric(dm) True >>> >>> # Non-metric (violates triangle inequality) >>> bad_matrix = np.array([[0, 1, 5], [1, 0, 1], [5, 1, 0]]) >>> bad_dm = DistanceMatrix(bad_matrix) >>> is_metric(bad_dm) False
- phylozoo.core.distance.classifications.is_nonnegative(distance_matrix: DistanceMatrix) bool[source]#
Check if all distances are non-negative.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
- Returns:
True if all distances are non-negative, False otherwise.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import is_nonnegative >>> >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix) >>> is_nonnegative(dm) True
- phylozoo.core.distance.classifications.is_pseudo_metric(distance_matrix: DistanceMatrix) bool[source]#
Check if the distance matrix is a pseudo-metric.
A pseudo-metric distance matrix satisfies:
Non-negativity: d(x, y) >= 0 for all x, y
Triangle inequality: d(x, z) <= d(x, y) + d(y, z) for all x, y, z
Note: Unlike a metric, a pseudo-metric does not require d(x, x) = 0 (though it may still hold).
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
- Returns:
True if the matrix is a pseudo-metric, False otherwise.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import is_pseudo_metric >>> >>> # Pseudo-metric (satisfies non-negativity and triangle inequality) >>> matrix = np.array([[0.1, 1, 2], [1, 0.1, 1], [2, 1, 0.1]]) >>> dm = DistanceMatrix(matrix) >>> is_pseudo_metric(dm) True >>> >>> # Not a pseudo-metric (violates triangle inequality) >>> bad_matrix = np.array([[0, 1, 5], [1, 0, 1], [5, 1, 0]]) >>> bad_dm = DistanceMatrix(bad_matrix) >>> is_pseudo_metric(bad_dm) False
- phylozoo.core.distance.classifications.is_totally_decomposable(distance_matrix: DistanceMatrix, atol: float = 1e-10) bool[source]#
Check if the distance matrix is totally decomposable.
A distance matrix is totally decomposable if its split-prime residual d^0 is zero, i.e., d can be expressed exactly as a weighted sum of split metrics:
d = sum_S alpha_S * delta_S
This is equivalent to saying that all pairwise distances are fully explained by the d-splits (no indecomposable noise remains).
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
atol (float, optional) – Absolute tolerance used to compare the residual to zero. By default 1e-10.
- Returns:
True if the residual is zero within
atol, False otherwise.- Return type:
See also
split_decompositionReturns the residual directly.
is_tree_metricStronger condition (totally decomposable + compatible splits).
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import is_totally_decomposable >>> >>> # Table 1 from :cite:`Bandelt1992` has zero residual >>> d = np.array([ ... [ 0, 4, 5, 7, 13, 8, 6], ... [ 4, 0, 1, 3, 9, 12, 10], ... [ 5, 1, 0, 2, 8, 13, 11], ... [ 7, 3, 2, 0, 6, 11, 13], ... [13, 9, 8, 6, 0, 5, 7], ... [ 8, 12, 13, 11, 5, 0, 2], ... [ 6, 10, 11, 13, 7, 2, 0], ... ], dtype=float) >>> dm = DistanceMatrix(d, labels=list("ABCDEFG")) >>> is_totally_decomposable(dm) True
- phylozoo.core.distance.classifications.is_tree_metric(distance_matrix: DistanceMatrix, atol: float = 1e-10) bool[source]#
Check if the distance matrix is a tree metric.
A distance matrix is a tree metric if and only if it satisfies the four-point condition: for every choice of four elements i, j, k, l the maximum of the three sums
{d_ij + d_kl, d_ik + d_jl, d_il + d_jk}
is attained by at least two of them [Bandelt and Dress, 1992]. Equivalently, the split decomposition has zero residual and all d-splits are pairwise compatible.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
atol (float, optional) – Absolute tolerance for floating-point comparisons. By default 1e-10.
- Returns:
True if the four-point condition holds for every quartet, False otherwise.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import is_tree_metric >>> >>> # Path-tree metric on four taxa (1-2-3-4) >>> matrix = np.array([ ... [0, 1, 2, 3], ... [1, 0, 1, 2], ... [2, 1, 0, 1], ... [3, 2, 1, 0], ... ], dtype=float) >>> dm = DistanceMatrix(matrix) >>> is_tree_metric(dm) True >>> >>> # Not a tree metric (incompatible splits) >>> bad = np.array([[0, 1, 2, 2], [1, 0, 2, 2], [2, 2, 0, 1], [2, 2, 1, 0]], dtype=float) >>> dm2 = DistanceMatrix(bad) >>> is_tree_metric(dm2) False
- phylozoo.core.distance.classifications.satisfies_triangle_inequality(distance_matrix: DistanceMatrix) bool[source]#
Check if the distance matrix satisfies the triangle inequality.
A distance matrix satisfies the triangle inequality if: d(i,k) <= d(i,j) + d(j,k) for all i, j, k.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to check.
- Returns:
True if triangle inequality holds, False otherwise.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.classifications import satisfies_triangle_inequality >>> >>> # Matrix satisfying triangle inequality >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix) >>> satisfies_triangle_inequality(dm) True >>> >>> # Matrix violating triangle inequality >>> bad_matrix = np.array([[0, 1, 5], [1, 0, 1], [5, 1, 0]]) >>> bad_dm = DistanceMatrix(bad_matrix) >>> satisfies_triangle_inequality(bad_dm) False
Split Decomposition#
Split decomposition module for distance matrices.
Implements the canonical split decomposition of [Bandelt and Dress, 1992]:
d = d^0 + sum_S alpha_S * delta_S
where S ranges over all d-splits, alpha_S is the isolation index, delta_S is the split metric, and d^0 is the split-prime residual.
- phylozoo.core.distance.decomposition.isolation_index(distance_matrix: DistanceMatrix, split: Split) float[source]#
Compute the isolation index of a split with respect to a distance matrix.
The isolation index of a split (A, B) is defined as:
alpha_{A,B} = (1/2) * min_{i,j in A, k,l in B} (max{d_ij+d_kl, d_ik+d_jl, d_il+d_jk} - d_ij - d_kl)
The index is always non-negative. It is strictly positive if and only if (A, B) is a d-split of
distance_matrix— i.e., for every choice of i, j in A and k, l in B the sum d_ij + d_kl is not the largest of the three quartet sums ([Bandelt and Dress, 1992], Eq. 1-2).For trivial splits (|A| = 1 or |B| = 1) the formula reduces to:
alpha = (1/2) * min_{k != l in B} (d_ak + d_al - d_kl)
where a is the single element of A (and symmetrically for |B| = 1).
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to evaluate against.
split (Split) – The split whose isolation index is to be computed. All elements of the split must be present as labels in
distance_matrix.
- Returns:
The isolation index (>= 0).
- Return type:
- Raises:
PhyloZooValueError – If any element of
splitis not a label indistance_matrix.
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.decomposition import isolation_index >>> from phylozoo.core.split import Split >>> >>> # Table 1 from :cite:`Bandelt1992`: split EFG|ABCD has index 6 >>> d = np.array([ ... [ 0, 4, 5, 7, 13, 8, 6], ... [ 4, 0, 1, 3, 9, 12, 10], ... [ 5, 1, 0, 2, 8, 13, 11], ... [ 7, 3, 2, 0, 6, 11, 13], ... [13, 9, 8, 6, 0, 5, 7], ... [ 8, 12, 13, 11, 5, 0, 2], ... [ 6, 10, 11, 13, 7, 2, 0], ... ], dtype=float) >>> dm = DistanceMatrix(d, labels=list("ABCDEFG")) >>> s = Split({"E", "F", "G"}, {"A", "B", "C", "D"}) >>> isolation_index(dm, s) 6.0
- phylozoo.core.distance.decomposition.split_decomposition(distance_matrix: DistanceMatrix) tuple[WeightedSplitSystem, DistanceMatrix][source]#
Compute the canonical split decomposition of a distance matrix.
Decomposes the distance matrix d as:
d = d^0 + sum_S alpha_S * delta_S
where S ranges over all d-splits (partitions with positive isolation index), alpha_S is the isolation index of S, delta_S is the corresponding split metric (delta_S(i,j) = 1 iff i and j are on opposite sides of S, 0 otherwise), and d^0 is the split-prime residual — a metric that admits no further splits with positive isolation index.
The algorithm is the recursive procedure of [Bandelt and Dress, 1992] (Section “Finding the d-Splits”): taxa are added one at a time; at each step the existing d-splits of the current subset are extended by placing the new taxon on either side, and the new trivial split (all prior taxa | new taxon) is also tested.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to decompose.
- Returns:
weighted_system (WeightedSplitSystem) – All d-splits with their isolation indices as weights. Includes trivial splits (|A| = 1 or |B| = 1) if their isolation index is positive. Empty if no d-splits exist (only possible for n < 2).
residual (DistanceMatrix) – The split-prime residual d^0 = d - d^1, where d^1 = sum_S alpha_S * delta_S. Has the same labels as the input.
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.decomposition import split_decomposition >>> >>> # Table 1 from :cite:`Bandelt1992`: four d-splits, zero residual >>> d = np.array([ ... [ 0, 4, 5, 7, 13, 8, 6], ... [ 4, 0, 1, 3, 9, 12, 10], ... [ 5, 1, 0, 2, 8, 13, 11], ... [ 7, 3, 2, 0, 6, 11, 13], ... [13, 9, 8, 6, 0, 5, 7], ... [ 8, 12, 13, 11, 5, 0, 2], ... [ 6, 10, 11, 13, 7, 2, 0], ... ], dtype=float) >>> dm = DistanceMatrix(d, labels=list("ABCDEFG")) >>> system, residual = split_decomposition(dm) >>> len(system.splits) 4 >>> np.allclose(residual.np_array, 0) True
I/O Support#
Distance matrix I/O module.
Distance matrices support reading and writing in multiple formats: NEXUS, PHYLIP, and CSV. This module provides format handlers registered with FormatRegistry for use with the IOMixin system.
The following format handlers are defined and registered:
nexus: NEXUS format for distance matrices (extensions: .nexus, .nex, .nxs)
Writer: to_nexus() - Converts DistanceMatrix to NEXUS string
Reader: from_nexus() - Parses NEXUS string to DistanceMatrix
phylip: PHYLIP format for distance matrices (extensions: .phy, .phylip)
Writer: to_phylip() - Converts DistanceMatrix to PHYLIP string
Reader: from_phylip() - Parses PHYLIP string to DistanceMatrix
csv: CSV format for distance matrices (extensions: .csv)
Writer: to_csv() - Converts DistanceMatrix to CSV string
Reader: from_csv() - Parses CSV string to DistanceMatrix
These handlers are automatically registered when this module is imported. DistanceMatrix inherits from IOMixin, so you can use:
dm.save(‘file.nexus’) - Save to file (auto-detects format)
dm.load(‘file.nexus’) - Load from file (auto-detects format)
dm.to_string(format=’phylip’) - Convert to string
dm.from_string(string, format=’csv’) - Parse from string
DistanceMatrix.convert(‘in.nexus’, ‘out.phy’) - Convert between formats
DistanceMatrix.convert_string(str1, ‘nexus’, ‘phylip’) - Convert strings
- phylozoo.core.distance.io.from_csv(csv_string: str, **kwargs: Any) DistanceMatrix[source]#
Parse a CSV format string and create a DistanceMatrix.
- Parameters:
csv_string (str) – CSV format string containing distance matrix data.
**kwargs –
Additional arguments:
delimiter (str): Field delimiter (default: ‘,’). Can be ‘,’ or ‘ ‘ or whitespace
has_header (bool): Whether first row is a header (default: True)
- Returns:
Parsed distance matrix.
- Return type:
- Raises:
PhyloZooParseError – If the CSV string is malformed or cannot be parsed (e.g., empty string, no data rows, invalid distance values, mismatched dimensions, non-symmetric matrix).
Examples
>>> from phylozoo.core.distance.io import from_csv >>> >>> csv_str = ''',A,B,C ... A,0.0,1.0,2.0 ... B,1.0,0.0,1.0 ... C,2.0,1.0,0.0 ... ''' >>> >>> dm = from_csv(csv_str) >>> len(dm) 3 >>> dm.get_distance('A', 'B') 1.0
Notes
This parser expects:
First row (if has_header=True): empty first cell, then taxon labels
Subsequent rows: taxon label in first column, then distances
Delimiter can be comma, tab, or whitespace
- phylozoo.core.distance.io.from_nexus(nexus_string: str, **kwargs: Any) DistanceMatrix[source]#
Parse a NEXUS format string and create a DistanceMatrix.
- Parameters:
nexus_string (str) – NEXUS format string containing distance matrix data.
**kwargs – Additional arguments (currently unused, for compatibility).
- Returns:
Parsed distance matrix.
- Return type:
- Raises:
PhyloZooParseError – If the NEXUS string is malformed or cannot be parsed (e.g., missing Taxa or DISTANCES blocks, mismatched number of taxa and matrix rows, invalid matrix format, invalid distance values).
Examples
>>> from phylozoo.core.distance.io import from_nexus >>> >>> # Lower triangular format >>> nexus_str = '''#NEXUS ... ... BEGIN Taxa; ... DIMENSIONS ntax=3; ... TAXLABELS ... A ... B ... C ... ; ... END; ... ... BEGIN DISTANCES; ... DIMENSIONS ntax=3; ... FORMAT triangle=LOWER diagonal LABELS; ... MATRIX ... A 0.000000 ... B 1.000000 0.000000 ... C 2.000000 1.000000 0.000000 ... ; ... END;''' >>> >>> dm = from_nexus(nexus_str) >>> len(dm) 3 >>> dm.get_distance('A', 'B') 1.0
Notes
This parser supports:
A Taxa block with TAXLABELS
A DISTANCES block with FORMAT triangle=LOWER/UPPER/BOTH diagonal LABELS
Lower triangular, upper triangular, or full matrix formats
- phylozoo.core.distance.io.from_phylip(phylip_string: str, **kwargs: Any) DistanceMatrix[source]#
Parse a PHYLIP format string and create a DistanceMatrix.
- Parameters:
phylip_string (str) – PHYLIP format string containing distance matrix data.
**kwargs – Additional arguments (currently unused, for compatibility).
- Returns:
Parsed distance matrix.
- Return type:
- Raises:
PhyloZooParseError – If the PHYLIP string is malformed or cannot be parsed (e.g., empty string, invalid number of taxa, insufficient values, invalid distance values, non-symmetric matrix).
PhyloZooValueError – If the number of taxa is not positive.
Examples
>>> from phylozoo.core.distance.io import from_phylip >>> >>> phylip_str = '''3 ... A 0.00000 1.00000 2.00000 ... B 1.00000 0.00000 1.00000 ... C 2.00000 1.00000 0.00000 ... ''' >>> >>> dm = from_phylip(phylip_str) >>> len(dm) 3 >>> dm.get_distance('A', 'B') 1.0
Notes
This parser expects:
First line: number of taxa
Subsequent lines: taxon name (first 10 chars or until whitespace) followed by distances
Full matrix format (not just lower triangle)
- phylozoo.core.distance.io.to_csv(distance_matrix: DistanceMatrix, **kwargs: Any) str[source]#
Convert a distance matrix to CSV format string.
CSV format consists of: - First row: header with empty first cell, then taxon labels - Subsequent rows: taxon label in first column, then distances
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to convert.
**kwargs –
Additional arguments:
delimiter (str): Field delimiter (default: ‘,’)
include_header (bool): Include header row (default: True)
- Returns:
The CSV format string representation of the distance matrix.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.io import to_csv >>> >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C']) >>> csv_str = to_csv(dm) >>> print(csv_str[:30]) ,A,B,C A,0.000000,1.000000,2.0
Notes
Default delimiter is comma. Use delimiter=’ ‘ for tab-separated values.
- phylozoo.core.distance.io.to_nexus(distance_matrix: DistanceMatrix, **kwargs: Any) str[source]#
Convert a distance matrix to a NEXUS format string.
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to convert.
**kwargs – Additional arguments: - triangle (str): Triangle format - ‘LOWER’, ‘UPPER’, or ‘BOTH’ (default: ‘LOWER’)
- Returns:
The NEXUS format string representation of the distance matrix.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.io import to_nexus
>>> # Basic usage (lower triangular format) >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C']) >>> nexus_str = to_nexus(dm) >>> '#NEXUS' in nexus_str True >>> 'triangle=LOWER' in nexus_str True
>>> # Upper triangular format >>> nexus_str_upper = to_nexus(dm, triangle='UPPER') >>> 'triangle=UPPER' in nexus_str_upper True
Notes
The NEXUS format includes:
Taxa block with label names
DISTANCES block with matrix in specified triangle format
Format options: triangle=LOWER, triangle=UPPER, or triangle=BOTH
- phylozoo.core.distance.io.to_phylip(distance_matrix: DistanceMatrix, **kwargs: Any) str[source]#
Convert a distance matrix to PHYLIP format string.
PHYLIP format consists of:
First line: number of taxa
Subsequent lines: taxon name (padded to 10 chars) followed by all distances
- Parameters:
distance_matrix (DistanceMatrix) – The distance matrix to convert.
**kwargs – Additional arguments (currently unused, for compatibility).
- Returns:
The PHYLIP format string representation of the distance matrix.
- Return type:
Examples
>>> import numpy as np >>> from phylozoo.core.distance import DistanceMatrix >>> from phylozoo.core.distance.io import to_phylip >>> >>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]) >>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C']) >>> phylip_str = to_phylip(dm) >>> print(phylip_str[:31]) 3 A 0.00000 1.00000 2.0
Notes
Taxon names are padded to 10 characters (standard PHYLIP format). Distances are formatted with 5 decimal places.