distance#

Distance module.

This module provides classes and functions for working with distance matrices. A distance matrix represents pairwise distances between a set of labeled items, where distances satisfy properties such as symmetry and non-negativity. The public API (DistanceMatrix and the classifications, io submodules) is re-exported here; the implementation is split across the base, classifications, and io submodules.

Main Classes#

Distance matrix base module.

This module provides the core DistanceMatrix class for working with distance matrices.

class phylozoo.core.distance.base.DistanceMatrix(distance_matrix: ndarray, labels: list[T] | None = None)[source]#

Bases: IOMixin

An immutable distance matrix.

A DistanceMatrix represents pairwise distances between a set of labeled items. The matrix is stored as a symmetric numpy array and is immutable after initialization.

Parameters:
  • distance_matrix (numpy.ndarray) – A symmetric square 2D numpy array representing pairwise distances. Must be square and symmetric.

  • labels (list[T] | None, optional) – List of labels corresponding to the rows/columns of the distance matrix. If None, defaults to [0, 1, 2, …, n-1] where n is the matrix size. By default None.

Notes

The class is immutable after initialization. To create a modified version, create a new DistanceMatrix instance with the modified data.

Supported I/O formats:

  • nexus (default): .nexus, .nex, .nxs

  • phylip: .phy, .phylip

  • csv: .csv

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>>
>>> # Create from numpy array
>>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]])
>>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C'])
>>> len(dm)
3
>>> dm.get_distance('A', 'B')
1.0
>>> # Default labels (0, 1, 2, ...)
>>> dm2 = DistanceMatrix(matrix)
>>> dm2.labels
(0, 1, 2)
__contains__(label: T) bool[source]#

Check if a label is in the distance matrix.

Parameters:

label (T) – Label to check.

Returns:

True if label is in the matrix, False otherwise.

Return type:

bool

__len__() int[source]#

Return the size of the distance matrix.

Returns:

Number of rows/columns.

Return type:

int

__repr__() str[source]#

Return string representation of the distance matrix.

Returns:

String representation.

Return type:

str

__str__() str[source]#

Return human-readable string representation.

For small matrices (up to 10 elements), prints the full upper triangle. For larger matrices, truncates the display. Always includes element names.

Returns:

Human-readable string with matrix contents (upper triangle only).

Return type:

str

copy() DistanceMatrix[source]#

Create a copy of the distance matrix.

Returns:

A new DistanceMatrix instance with copied data (also immutable).

Return type:

DistanceMatrix

Examples

>>> import numpy as np
>>> dm1 = DistanceMatrix(np.array([[0, 1], [1, 0]]), labels=['A', 'B'])
>>> dm2 = dm1.copy()
>>> dm1 is dm2
False
>>> dm1.get_distance('A', 'B') == dm2.get_distance('A', 'B')
True
get_distance(label1: T, label2: T) float[source]#

Get the distance between two labels.

Parameters:
  • label1 (T) – First label.

  • label2 (T) – Second label.

Returns:

Distance between the two labels.

Return type:

float

Raises:

ValueError – If either label is not found in the distance matrix.

Examples

>>> import numpy as np
>>> dm = DistanceMatrix(np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]]),
...                     labels=['A', 'B', 'C'])
>>> dm.get_distance('A', 'B')
1.0
>>> dm.get_distance('B', 'A')  # Symmetric
1.0
>>> dm.get_distance('A', 'C')
2.0
get_index(label: T) int[source]#

Get the index of a label in the distance matrix.

Parameters:

label (T) – Label to look up.

Returns:

Index of the label in the matrix.

Return type:

int

Raises:

PhyloZooValueError – If label is not found in the distance matrix.

Examples

>>> import numpy as np
>>> dm = DistanceMatrix(np.array([[0, 1], [1, 0]]), labels=['A', 'B'])
>>> dm.get_index('A')
0
>>> dm.get_index('B')
1
property indices: tuple[int, ...]#

Get the indices (0, 1, 2, …, len(self)-1).

Returns:

Tuple of indices (immutable).

Return type:

tuple[int, …]

property labels: tuple[Any, ...]#

Get the labels corresponding to rows/columns.

Returns:

Tuple of labels (immutable).

Return type:

tuple[Any, …]

property np_array: ndarray#

Get the underlying numpy array (read-only).

Returns:

The distance matrix as a read-only numpy array.

Return type:

numpy.ndarray

Notes

The returned array is read-only. To modify, create a new DistanceMatrix.

Examples

>>> import numpy as np
>>> dm = DistanceMatrix(np.array([[0, 1], [1, 0]]), labels=['A', 'B'])
>>> arr = dm.np_array
>>> arr[0, 1]
1.0

Classification Functions#

Distance matrix classification module.

This module provides functions for classifying distance matrices based on mathematical properties: triangle inequality, metric properties (triangle inequality, symmetry, non-negativity), Kalmanson conditions (circular ordering constraints), and split- decomposition properties (tree metrics, total decomposability).

phylozoo.core.distance.classifications.has_zero_diagonal(distance_matrix: DistanceMatrix) bool[source]#

Check if the diagonal of the distance matrix is zero.

Parameters:

distance_matrix (DistanceMatrix) – The distance matrix to check.

Returns:

True if diagonal is zero, False otherwise.

Return type:

bool

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.classifications import has_zero_diagonal
>>>
>>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]])
>>> dm = DistanceMatrix(matrix)
>>> has_zero_diagonal(dm)
True
phylozoo.core.distance.classifications.is_kalmanson(distance_matrix: DistanceMatrix, circular_order: CircularOrdering[T]) bool[source]#

Check if the distance matrix is Kalmanson with respect to a circular order.

A distance matrix is Kalmanson with respect to a circular order if it satisfies the Kalmanson inequalities for all quadruples of labels in that order.

The Kalmanson conditions are classical inequalities for circular metrics [Kalmanson, 1975].

For a circular order (l1, l2, …, ln), the Kalmanson conditions are:

  • d(ei, ej) + d(ek, el) <= d(ei, ek) + d(ej, el) for all i < j < k < l

  • d(ei, el) + d(ej, ek) <= d(ei, ek) + d(ej, el) for all i < j < k < l

Parameters:
  • distance_matrix (DistanceMatrix) – The distance matrix to check.

  • circular_order (CircularOrdering[T]) – A circular ordering of all labels in the distance matrix. Must contain the same elements as the distance matrix labels.

Returns:

True if the matrix is Kalmanson with respect to the given order, False otherwise.

Return type:

bool

Raises:
  • PhyloZooValueError – If circular_order is empty, does not contain all labels, or if the matrix is not a pseudo-metric.

  • TypeError – If circular_order is not a CircularOrdering.

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.classifications import is_kalmanson
>>> from phylozoo.core.primitives.circular_ordering import CircularOrdering
>>>
>>> # Kalmanson matrix (e.g., from a circular network)
>>> matrix = np.array([
...     [0, 1, 2, 2, 1],
...     [1, 0, 1, 2, 2],
...     [2, 1, 0, 1, 2],
...     [2, 2, 1, 0, 1],
...     [1, 2, 2, 1, 0]
... ])
>>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C', 'D', 'E'])
>>> co = CircularOrdering(['A', 'B', 'C', 'D', 'E'])
>>> is_kalmanson(dm, co)
True
phylozoo.core.distance.classifications.is_metric(distance_matrix: DistanceMatrix) bool[source]#

Check if the distance matrix is a metric.

A metric distance matrix satisfies:

  1. Non-negativity: d(x, y) >= 0 for all x, y

  2. Triangle inequality: d(x, z) <= d(x, y) + d(y, z) for all x, y, z

  3. Zero diagonal: d(x, x) = 0 for all x

  4. Symmetry: d(x, y) = d(y, x) for all x, y (already enforced in constructor)

Parameters:

distance_matrix (DistanceMatrix) – The distance matrix to check.

Returns:

True if the matrix is a metric, False otherwise.

Return type:

bool

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.classifications import is_metric
>>>
>>> # Euclidean distance matrix (metric)
>>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]])
>>> dm = DistanceMatrix(matrix)
>>> is_metric(dm)
True
>>>
>>> # Non-metric (violates triangle inequality)
>>> bad_matrix = np.array([[0, 1, 5], [1, 0, 1], [5, 1, 0]])
>>> bad_dm = DistanceMatrix(bad_matrix)
>>> is_metric(bad_dm)
False
phylozoo.core.distance.classifications.is_nonnegative(distance_matrix: DistanceMatrix) bool[source]#

Check if all distances are non-negative.

Parameters:

distance_matrix (DistanceMatrix) – The distance matrix to check.

Returns:

True if all distances are non-negative, False otherwise.

Return type:

bool

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.classifications import is_nonnegative
>>>
>>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]])
>>> dm = DistanceMatrix(matrix)
>>> is_nonnegative(dm)
True
phylozoo.core.distance.classifications.is_pseudo_metric(distance_matrix: DistanceMatrix) bool[source]#

Check if the distance matrix is a pseudo-metric.

A pseudo-metric distance matrix satisfies:

  1. Non-negativity: d(x, y) >= 0 for all x, y

  2. Triangle inequality: d(x, z) <= d(x, y) + d(y, z) for all x, y, z

Note: Unlike a metric, a pseudo-metric does not require d(x, x) = 0 (though it may still hold).

Parameters:

distance_matrix (DistanceMatrix) – The distance matrix to check.

Returns:

True if the matrix is a pseudo-metric, False otherwise.

Return type:

bool

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.classifications import is_pseudo_metric
>>>
>>> # Pseudo-metric (satisfies non-negativity and triangle inequality)
>>> matrix = np.array([[0.1, 1, 2], [1, 0.1, 1], [2, 1, 0.1]])
>>> dm = DistanceMatrix(matrix)
>>> is_pseudo_metric(dm)
True
>>>
>>> # Not a pseudo-metric (violates triangle inequality)
>>> bad_matrix = np.array([[0, 1, 5], [1, 0, 1], [5, 1, 0]])
>>> bad_dm = DistanceMatrix(bad_matrix)
>>> is_pseudo_metric(bad_dm)
False
phylozoo.core.distance.classifications.is_totally_decomposable(distance_matrix: DistanceMatrix, atol: float = 1e-10) bool[source]#

Check if the distance matrix is totally decomposable.

A distance matrix is totally decomposable if its split-prime residual d^0 is zero, i.e., d can be expressed exactly as a weighted sum of split metrics:

d = sum_S alpha_S * delta_S

This is equivalent to saying that all pairwise distances are fully explained by the d-splits (no indecomposable noise remains).

Parameters:
  • distance_matrix (DistanceMatrix) – The distance matrix to check.

  • atol (float, optional) – Absolute tolerance used to compare the residual to zero. By default 1e-10.

Returns:

True if the residual is zero within atol, False otherwise.

Return type:

bool

See also

split_decomposition

Returns the residual directly.

is_tree_metric

Stronger condition (totally decomposable + compatible splits).

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.classifications import is_totally_decomposable
>>>
>>> # Table 1 from :cite:`Bandelt1992` has zero residual
>>> d = np.array([
...     [ 0,  4,  5,  7, 13,  8,  6],
...     [ 4,  0,  1,  3,  9, 12, 10],
...     [ 5,  1,  0,  2,  8, 13, 11],
...     [ 7,  3,  2,  0,  6, 11, 13],
...     [13,  9,  8,  6,  0,  5,  7],
...     [ 8, 12, 13, 11,  5,  0,  2],
...     [ 6, 10, 11, 13,  7,  2,  0],
... ], dtype=float)
>>> dm = DistanceMatrix(d, labels=list("ABCDEFG"))
>>> is_totally_decomposable(dm)
True
phylozoo.core.distance.classifications.is_tree_metric(distance_matrix: DistanceMatrix, atol: float = 1e-10) bool[source]#

Check if the distance matrix is a tree metric.

A distance matrix is a tree metric if and only if it satisfies the four-point condition: for every choice of four elements i, j, k, l the maximum of the three sums

{d_ij + d_kl, d_ik + d_jl, d_il + d_jk}

is attained by at least two of them [Bandelt and Dress, 1992]. Equivalently, the split decomposition has zero residual and all d-splits are pairwise compatible.

Parameters:
  • distance_matrix (DistanceMatrix) – The distance matrix to check.

  • atol (float, optional) – Absolute tolerance for floating-point comparisons. By default 1e-10.

Returns:

True if the four-point condition holds for every quartet, False otherwise.

Return type:

bool

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.classifications import is_tree_metric
>>>
>>> # Path-tree metric on four taxa (1-2-3-4)
>>> matrix = np.array([
...     [0, 1, 2, 3],
...     [1, 0, 1, 2],
...     [2, 1, 0, 1],
...     [3, 2, 1, 0],
... ], dtype=float)
>>> dm = DistanceMatrix(matrix)
>>> is_tree_metric(dm)
True
>>>
>>> # Not a tree metric (incompatible splits)
>>> bad = np.array([[0, 1, 2, 2], [1, 0, 2, 2], [2, 2, 0, 1], [2, 2, 1, 0]], dtype=float)
>>> dm2 = DistanceMatrix(bad)
>>> is_tree_metric(dm2)
False
phylozoo.core.distance.classifications.satisfies_triangle_inequality(distance_matrix: DistanceMatrix) bool[source]#

Check if the distance matrix satisfies the triangle inequality.

A distance matrix satisfies the triangle inequality if: d(i,k) <= d(i,j) + d(j,k) for all i, j, k.

Parameters:

distance_matrix (DistanceMatrix) – The distance matrix to check.

Returns:

True if triangle inequality holds, False otherwise.

Return type:

bool

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.classifications import satisfies_triangle_inequality
>>>
>>> # Matrix satisfying triangle inequality
>>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]])
>>> dm = DistanceMatrix(matrix)
>>> satisfies_triangle_inequality(dm)
True
>>>
>>> # Matrix violating triangle inequality
>>> bad_matrix = np.array([[0, 1, 5], [1, 0, 1], [5, 1, 0]])
>>> bad_dm = DistanceMatrix(bad_matrix)
>>> satisfies_triangle_inequality(bad_dm)
False

Split Decomposition#

Split decomposition module for distance matrices.

Implements the canonical split decomposition of [Bandelt and Dress, 1992]:

d = d^0 + sum_S alpha_S * delta_S

where S ranges over all d-splits, alpha_S is the isolation index, delta_S is the split metric, and d^0 is the split-prime residual.

phylozoo.core.distance.decomposition.isolation_index(distance_matrix: DistanceMatrix, split: Split) float[source]#

Compute the isolation index of a split with respect to a distance matrix.

The isolation index of a split (A, B) is defined as:

alpha_{A,B} = (1/2) * min_{i,j in A, k,l in B}
    (max{d_ij+d_kl, d_ik+d_jl, d_il+d_jk} - d_ij - d_kl)

The index is always non-negative. It is strictly positive if and only if (A, B) is a d-split of distance_matrix — i.e., for every choice of i, j in A and k, l in B the sum d_ij + d_kl is not the largest of the three quartet sums ([Bandelt and Dress, 1992], Eq. 1-2).

For trivial splits (|A| = 1 or |B| = 1) the formula reduces to:

alpha = (1/2) * min_{k != l in B} (d_ak + d_al - d_kl)

where a is the single element of A (and symmetrically for |B| = 1).

Parameters:
  • distance_matrix (DistanceMatrix) – The distance matrix to evaluate against.

  • split (Split) – The split whose isolation index is to be computed. All elements of the split must be present as labels in distance_matrix.

Returns:

The isolation index (>= 0).

Return type:

float

Raises:

PhyloZooValueError – If any element of split is not a label in distance_matrix.

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.decomposition import isolation_index
>>> from phylozoo.core.split import Split
>>>
>>> # Table 1 from :cite:`Bandelt1992`: split EFG|ABCD has index 6
>>> d = np.array([
...     [ 0,  4,  5,  7, 13,  8,  6],
...     [ 4,  0,  1,  3,  9, 12, 10],
...     [ 5,  1,  0,  2,  8, 13, 11],
...     [ 7,  3,  2,  0,  6, 11, 13],
...     [13,  9,  8,  6,  0,  5,  7],
...     [ 8, 12, 13, 11,  5,  0,  2],
...     [ 6, 10, 11, 13,  7,  2,  0],
... ], dtype=float)
>>> dm = DistanceMatrix(d, labels=list("ABCDEFG"))
>>> s = Split({"E", "F", "G"}, {"A", "B", "C", "D"})
>>> isolation_index(dm, s)
6.0
phylozoo.core.distance.decomposition.split_decomposition(distance_matrix: DistanceMatrix) tuple[WeightedSplitSystem, DistanceMatrix][source]#

Compute the canonical split decomposition of a distance matrix.

Decomposes the distance matrix d as:

d = d^0 + sum_S alpha_S * delta_S

where S ranges over all d-splits (partitions with positive isolation index), alpha_S is the isolation index of S, delta_S is the corresponding split metric (delta_S(i,j) = 1 iff i and j are on opposite sides of S, 0 otherwise), and d^0 is the split-prime residual — a metric that admits no further splits with positive isolation index.

The algorithm is the recursive procedure of [Bandelt and Dress, 1992] (Section “Finding the d-Splits”): taxa are added one at a time; at each step the existing d-splits of the current subset are extended by placing the new taxon on either side, and the new trivial split (all prior taxa | new taxon) is also tested.

Parameters:

distance_matrix (DistanceMatrix) – The distance matrix to decompose.

Returns:

  • weighted_system (WeightedSplitSystem) – All d-splits with their isolation indices as weights. Includes trivial splits (|A| = 1 or |B| = 1) if their isolation index is positive. Empty if no d-splits exist (only possible for n < 2).

  • residual (DistanceMatrix) – The split-prime residual d^0 = d - d^1, where d^1 = sum_S alpha_S * delta_S. Has the same labels as the input.

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.decomposition import split_decomposition
>>>
>>> # Table 1 from :cite:`Bandelt1992`: four d-splits, zero residual
>>> d = np.array([
...     [ 0,  4,  5,  7, 13,  8,  6],
...     [ 4,  0,  1,  3,  9, 12, 10],
...     [ 5,  1,  0,  2,  8, 13, 11],
...     [ 7,  3,  2,  0,  6, 11, 13],
...     [13,  9,  8,  6,  0,  5,  7],
...     [ 8, 12, 13, 11,  5,  0,  2],
...     [ 6, 10, 11, 13,  7,  2,  0],
... ], dtype=float)
>>> dm = DistanceMatrix(d, labels=list("ABCDEFG"))
>>> system, residual = split_decomposition(dm)
>>> len(system.splits)
4
>>> np.allclose(residual.np_array, 0)
True

I/O Support#

Distance matrix I/O module.

Distance matrices support reading and writing in multiple formats: NEXUS, PHYLIP, and CSV. This module provides format handlers registered with FormatRegistry for use with the IOMixin system.

The following format handlers are defined and registered:

  • nexus: NEXUS format for distance matrices (extensions: .nexus, .nex, .nxs)

    • Writer: to_nexus() - Converts DistanceMatrix to NEXUS string

    • Reader: from_nexus() - Parses NEXUS string to DistanceMatrix

  • phylip: PHYLIP format for distance matrices (extensions: .phy, .phylip)

    • Writer: to_phylip() - Converts DistanceMatrix to PHYLIP string

    • Reader: from_phylip() - Parses PHYLIP string to DistanceMatrix

  • csv: CSV format for distance matrices (extensions: .csv)

    • Writer: to_csv() - Converts DistanceMatrix to CSV string

    • Reader: from_csv() - Parses CSV string to DistanceMatrix

These handlers are automatically registered when this module is imported. DistanceMatrix inherits from IOMixin, so you can use:

  • dm.save(‘file.nexus’) - Save to file (auto-detects format)

  • dm.load(‘file.nexus’) - Load from file (auto-detects format)

  • dm.to_string(format=’phylip’) - Convert to string

  • dm.from_string(string, format=’csv’) - Parse from string

  • DistanceMatrix.convert(‘in.nexus’, ‘out.phy’) - Convert between formats

  • DistanceMatrix.convert_string(str1, ‘nexus’, ‘phylip’) - Convert strings

phylozoo.core.distance.io.from_csv(csv_string: str, **kwargs: Any) DistanceMatrix[source]#

Parse a CSV format string and create a DistanceMatrix.

Parameters:
  • csv_string (str) – CSV format string containing distance matrix data.

  • **kwargs

    Additional arguments:

    • delimiter (str): Field delimiter (default: ‘,’). Can be ‘,’ or ‘ ‘ or whitespace

    • has_header (bool): Whether first row is a header (default: True)

Returns:

Parsed distance matrix.

Return type:

DistanceMatrix

Raises:

PhyloZooParseError – If the CSV string is malformed or cannot be parsed (e.g., empty string, no data rows, invalid distance values, mismatched dimensions, non-symmetric matrix).

Examples

>>> from phylozoo.core.distance.io import from_csv
>>>
>>> csv_str = ''',A,B,C
... A,0.0,1.0,2.0
... B,1.0,0.0,1.0
... C,2.0,1.0,0.0
... '''
>>>
>>> dm = from_csv(csv_str)
>>> len(dm)
3
>>> dm.get_distance('A', 'B')
1.0

Notes

This parser expects:

  • First row (if has_header=True): empty first cell, then taxon labels

  • Subsequent rows: taxon label in first column, then distances

  • Delimiter can be comma, tab, or whitespace

phylozoo.core.distance.io.from_nexus(nexus_string: str, **kwargs: Any) DistanceMatrix[source]#

Parse a NEXUS format string and create a DistanceMatrix.

Parameters:
  • nexus_string (str) – NEXUS format string containing distance matrix data.

  • **kwargs – Additional arguments (currently unused, for compatibility).

Returns:

Parsed distance matrix.

Return type:

DistanceMatrix

Raises:

PhyloZooParseError – If the NEXUS string is malformed or cannot be parsed (e.g., missing Taxa or DISTANCES blocks, mismatched number of taxa and matrix rows, invalid matrix format, invalid distance values).

Examples

>>> from phylozoo.core.distance.io import from_nexus
>>>
>>> # Lower triangular format
>>> nexus_str = '''#NEXUS
...
... BEGIN Taxa;
...     DIMENSIONS ntax=3;
...     TAXLABELS
...         A
...         B
...         C
...     ;
... END;
...
... BEGIN DISTANCES;
...     DIMENSIONS ntax=3;
...     FORMAT triangle=LOWER diagonal LABELS;
...     MATRIX
...     A 0.000000
...     B 1.000000 0.000000
...     C 2.000000 1.000000 0.000000
...     ;
... END;'''
>>>
>>> dm = from_nexus(nexus_str)
>>> len(dm)
3
>>> dm.get_distance('A', 'B')
1.0

Notes

This parser supports:

  • A Taxa block with TAXLABELS

  • A DISTANCES block with FORMAT triangle=LOWER/UPPER/BOTH diagonal LABELS

  • Lower triangular, upper triangular, or full matrix formats

phylozoo.core.distance.io.from_phylip(phylip_string: str, **kwargs: Any) DistanceMatrix[source]#

Parse a PHYLIP format string and create a DistanceMatrix.

Parameters:
  • phylip_string (str) – PHYLIP format string containing distance matrix data.

  • **kwargs – Additional arguments (currently unused, for compatibility).

Returns:

Parsed distance matrix.

Return type:

DistanceMatrix

Raises:
  • PhyloZooParseError – If the PHYLIP string is malformed or cannot be parsed (e.g., empty string, invalid number of taxa, insufficient values, invalid distance values, non-symmetric matrix).

  • PhyloZooValueError – If the number of taxa is not positive.

Examples

>>> from phylozoo.core.distance.io import from_phylip
>>>
>>> phylip_str = '''3
... A          0.00000 1.00000 2.00000
... B          1.00000 0.00000 1.00000
... C          2.00000 1.00000 0.00000
... '''
>>>
>>> dm = from_phylip(phylip_str)
>>> len(dm)
3
>>> dm.get_distance('A', 'B')
1.0

Notes

This parser expects:

  • First line: number of taxa

  • Subsequent lines: taxon name (first 10 chars or until whitespace) followed by distances

  • Full matrix format (not just lower triangle)

phylozoo.core.distance.io.to_csv(distance_matrix: DistanceMatrix, **kwargs: Any) str[source]#

Convert a distance matrix to CSV format string.

CSV format consists of: - First row: header with empty first cell, then taxon labels - Subsequent rows: taxon label in first column, then distances

Parameters:
  • distance_matrix (DistanceMatrix) – The distance matrix to convert.

  • **kwargs

    Additional arguments:

    • delimiter (str): Field delimiter (default: ‘,’)

    • include_header (bool): Include header row (default: True)

Returns:

The CSV format string representation of the distance matrix.

Return type:

str

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.io import to_csv
>>>
>>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]])
>>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C'])
>>> csv_str = to_csv(dm)
>>> print(csv_str[:30])
,A,B,C
A,0.000000,1.000000,2.0

Notes

Default delimiter is comma. Use delimiter=’ ‘ for tab-separated values.

phylozoo.core.distance.io.to_nexus(distance_matrix: DistanceMatrix, **kwargs: Any) str[source]#

Convert a distance matrix to a NEXUS format string.

Parameters:
  • distance_matrix (DistanceMatrix) – The distance matrix to convert.

  • **kwargs – Additional arguments: - triangle (str): Triangle format - ‘LOWER’, ‘UPPER’, or ‘BOTH’ (default: ‘LOWER’)

Returns:

The NEXUS format string representation of the distance matrix.

Return type:

str

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.io import to_nexus
>>> # Basic usage (lower triangular format)
>>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]])
>>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C'])
>>> nexus_str = to_nexus(dm)
>>> '#NEXUS' in nexus_str
True
>>> 'triangle=LOWER' in nexus_str
True
>>> # Upper triangular format
>>> nexus_str_upper = to_nexus(dm, triangle='UPPER')
>>> 'triangle=UPPER' in nexus_str_upper
True

Notes

The NEXUS format includes:

  • Taxa block with label names

  • DISTANCES block with matrix in specified triangle format

  • Format options: triangle=LOWER, triangle=UPPER, or triangle=BOTH

phylozoo.core.distance.io.to_phylip(distance_matrix: DistanceMatrix, **kwargs: Any) str[source]#

Convert a distance matrix to PHYLIP format string.

PHYLIP format consists of:

  • First line: number of taxa

  • Subsequent lines: taxon name (padded to 10 chars) followed by all distances

Parameters:
  • distance_matrix (DistanceMatrix) – The distance matrix to convert.

  • **kwargs – Additional arguments (currently unused, for compatibility).

Returns:

The PHYLIP format string representation of the distance matrix.

Return type:

str

Examples

>>> import numpy as np
>>> from phylozoo.core.distance import DistanceMatrix
>>> from phylozoo.core.distance.io import to_phylip
>>>
>>> matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]])
>>> dm = DistanceMatrix(matrix, labels=['A', 'B', 'C'])
>>> phylip_str = to_phylip(dm)
>>> print(phylip_str[:31])
3
A         0.00000 1.00000 2.0

Notes

Taxon names are padded to 10 characters (standard PHYLIP format). Distances are formatted with 5 decimal places.