Quartet Profile Sets#
Working with Quartet Profile Sets#
The phylozoo.core.quartet module provides the QuartetProfileSet class,
which represents a weighted collection of quartet profiles covering multiple four-taxon
.
Note that this allows for two-level weights: the weight of a profile and the weight of a quartet within a profile.
Creating Quartet Profile Sets#
Quartet profile sets can be created from existing profiles or directly from quartets:
From Quartet Profiles
from phylozoo.core.quartet import QuartetProfileSet, QuartetProfile, Quartet
from phylozoo.core.split import Split
# Create individual profiles
q1 = Quartet(Split({"A", "B"}, {"C", "D"}))
q2 = Quartet(Split({"A", "C"}, {"B", "D"}))
profile1 = QuartetProfile({q1: 1.0}) # Single quartet: weight must be 1.0
profile2 = QuartetProfile({q2: 1.0})
# Create profile set with profile-weight tuples
profile_set = QuartetProfileSet([
(profile1, 0.5),
(profile2, 0.5)
])
From Individual Quartets
# Create from quartets (automatically grouped by taxa)
quartets = [
Quartet(Split({"A", "B"}, {"C", "D"})),
Quartet(Split({"A", "C"}, {"B", "D"})),
Quartet(Split({"A", "B"}, {"C", "E"})), # Different 4-taxon set
]
profile_set = QuartetProfileSet(profiles=quartets)
When created from quartets, they are automatically grouped by their four-taxon sets
and converted to profiles. For each 4-taxon set, all quartets on that set form a
single QuartetProfile in which every quartet
receives equal weight \(1/k\) (where \(k\) is the number of quartets for that
taxa set); the resulting profile in the set has default profile weight 1.0.
If you need non-uniform quartet weights within a profile, construct a
QuartetProfile explicitly (using a dictionary
or list of (Quartet, weight) pairs that sum to 1.0) and pass that
QuartetProfile to QuartetProfileSet,
optionally together with a separate profile weight.
Specifying Total Taxa
You can also specify the total set of taxa, which allows including taxa that don’t appear in any profile:
# Create profile set with explicit taxa set
profile_set = QuartetProfileSet(
profiles=[profile1, profile2],
taxa=frozenset({"A", "B", "C", "D", "E", "F"})
)
Accessing Profile Set Properties#
Basic properties
Quartet profile sets provide comprehensive access to their structure and contents:
# Basic properties
total_taxa = profile_set.taxa # frozenset of all taxa
num_profiles = len(profile_set) # Number of profiles
# Check maximum quartets per profile
max_len = profile_set.max_profile_len
# Access individual profiles
profile = profile_set.get_profile(frozenset({"A", "B", "C", "D"}))
# Returns QuartetProfile or None
profile_weight = profile_set.get_profile_weight(frozenset({"A", "B", "C", "D"}))
# Returns float or None
has_profile = profile_set.has_profile(frozenset({"A", "B", "C", "D"}))
# Access all profiles (read-only mapping)
all_profiles = profile_set.profiles # Dict[frozenset, (QuartetProfile, float)]
Density
The is_dense property checks if the quartet profile set is dense, meaning it has a profile for every possible 4-taxon combination.
is_dense = profile_set.is_dense # True if has all possible 4-taxon combinations
Resolution status
The is_all_resolved property checks if all profiles in the set are resolved, meaning all quartets in the profile are resolved.
is_all_resolved = profile_set.is_all_resolved # True if all profiles are resolved
Quartet Distance Computation#
The quartet module provides functions for computing distance matrices from quartet profile sets. This quartet distance metric was first defined for trees and their quartets [Rhodes, 2019], then extended to networks for the NANUQ algorithm [Allman et al., 2019], and later further explored in various forms in [Holtgrefe et al., 2025] (for the Squirrel algorithm), [Allman et al., 2025] (for the NANUQ+ algorithm), and [Holtgrefe et al., 2025] (for level-2 networks).
Quartet Distance#
The quartet_distance() function computes
a distance matrix from a quartet profile set using a rho vector.
The distance between two taxa is computed by aggregating contributions from all
quartet profiles, where the contribution depends on the quartet topology and the
rho vector values. The current implementation allows only dense quartet profile sets
with exactly 1 or 2 resolved quartets per 4-taxon set.
from phylozoo.core.quartet.qdistance import quartet_distance
# Compute distance matrix using rho vector
rho = (0.5, 1.0, 0.5, 1.0) # Squirrel rho vector
distance_matrix = quartet_distance(profile_set, rho)
The distance formula computes pairwise distances between taxa \(i\) and \(j\) as:
where \(n\) is the number of taxa, \(X\) is the set of all taxa, \(Q_S\) is the quartet profile for the 4-taxon set \(S\), and \(\rho_{\text{dist}}\) is the rho-distance function.
The rho-distance function \(\rho_{\text{dist}}(Q_S, i, j, \rho)\) depends on the quartet profile type, the leaves \(i\) and \(j\), and the rho vector \(\rho = (\rho_c, \rho_s, \rho_a, \rho_o)\)
Profiles with 1 quartet (split):
For a profile containing a single quartet with split \(\{a,b\} | \{c,d\}\), the rho-distance is:
Note: The naming convention uses \(\rho_c\) for “cherry” (same side) and \(\rho_s\) for “split” (different sides).
Profiles with 2 quartets (four-cycle):
For a profile containing two quartets, which therefore induce a single circular ordering, the rho-distance is:
Note: The rho vector must satisfy \(\rho_a \leq \rho_o\) and \(\rho_c \leq \rho_s\).
Common rho vector values:
NANUQ: \((0.0, 1.0, 0.5, 1.0)\) [Allman et al., 2019], [Holtgrefe et al., 2025]
Squirrel/MONAD: \((0.5, 1.0, 0.5, 1.0)\) [Holtgrefe et al., 2025], [Allman et al., 2025]
Quartet Distance with Partition#
The quartet_distance_with_partition() function
computes a distance matrix between partition sets (rather than individual taxa) based
on quartet profiles:
from phylozoo.core.quartet.qdistance import quartet_distance_with_partition
from phylozoo.core.primitives import Partition
# Compute distance matrix with partition information
partition = Partition([{"A"}, {"B"}, {"C", "D"}, {"E"}, {"F"}])
distance_matrix = quartet_distance_with_partition(profile_set, partition, rho)
Unlike the standard quartet distance which computes distances between individual taxa, this method computes distances between sets of taxa by averaging contributions across all possible leaf selections.
Given a partition \(\mathcal{P} = \{X_1, X_2, \ldots, X_n\}\), the distance formula computes pairwise distances between partition sets \(X_i\) and \(X_j\) as:
For each 4-subpartition \(S\) containing both \(X_i\) and \(X_j\):
Consider all representative 4-taxon sets \(R\) (one leaf from each of the 4 sets in \(S\))
For each representative set \(R\), compute rho-distance contribution \(2 \cdot \rho_{\text{dist}}(Q_R, x, y, \rho)\) for all pairs \(\{x,y\}\) where \(x \in X_i\) and \(y \in X_j\)
Average all these contributions across all representative sets, giving a single distance for the 4-subpartition
Sum the averaged contributions across all 4-subpartitions containing \(X_i\) and \(X_j\)
Add constant \(2n - 4\) (same as in the standard quartet distance)
This averaging approach ensures that when sets contain multiple taxa, the distance accounts for all possible quartet relationships between taxa in the two sets, making it suitable for aggregating quartet information at the set level rather than the individual taxon level.
The partition elements must match the profile set taxa, and the profile set must be dense.
See Also#
API Reference - Complete function signatures and detailed examples
Quartets - Individual quartet topologies
Quartet Profiles - Sets of quartets on the same 4-taxon set with weights
Distance Matrices - Distance matrix computations