NEXUS#
The NEXUS format is a flexible, block-structured format used widely in phylogenetics.
A NEXUS file starts with #NEXUS and contains one or more blocks, each of the form
BEGIN blockname; ... END;. PhyloZoo uses different block types for different
data kinds (distance matrices, sequence alignments, split systems). The same file
extensions are used for all NEXUS subtypes; the class you load or save with
determines which block structure is expected.
See also
Nexus file — Wikipedia
Classes and extensions#
File extensions: .nexus, .nex, .nxs
Classes (by subtype): DistanceMatrix (DISTANCES),
MSA (CHARACTERS),
SplitSystem and
WeightedSplitSystem (SPLITS).
Structure#
All NEXUS files begin with the #NEXUS token. Data are organized in blocks.
Each block has BEGIN blockname;, block-specific commands and a MATRIX or
data section, and END;. The TAXA block (with TAXLABELS) is shared across
subtypes to define taxon labels. The data block (DISTANCES, CHARACTERS, or SPLITS)
holds the actual data.
#NEXUS
BEGIN TAXA;
DIMENSIONS ntax=N;
TAXLABELS
taxon1
taxon2
;
END;
BEGIN SomeBlock;
...
END;
DISTANCES#
Blocks: TAXA and DISTANCES. The DISTANCES block contains a lower or upper
triangular matrix with optional FORMAT triangle=....
#NEXUS
BEGIN Taxa;
DIMENSIONS ntax=3;
TAXLABELS
A
B
C
;
END;
BEGIN DISTANCES;
DIMENSIONS ntax=3;
FORMAT triangle=LOWER diagonal LABELS;
MATRIX
A 0.000000
B 1.000000 0.000000
C 2.000000 1.000000 0.000000
;
END;
CHARACTERS#
Blocks: TAXA and CHARACTERS. The Characters block has DIMENSIONS nchar=...,
FORMAT datatype=..., and a MATRIX with aligned sequences.
#NEXUS
BEGIN TAXA;
DIMENSIONS ntax=2;
TAXLABELS
taxon1
taxon2
;
END;
BEGIN CHARACTERS;
DIMENSIONS nchar=8;
FORMAT datatype=DNA missing=N gap=-;
MATRIX
taxon1 ACGTACGT
taxon2 TGCAACGT
;
END;
SPLITS#
Blocks: TAXA and SPLITS. The SPLITS block has FORMAT labels=yes weights=yes
and a MATRIX with splits in A B | C D notation, optionally with weights.
#NEXUS
BEGIN TAXA;
DIMENSIONS ntax=4;
TAXLABELS
A
B
C
D
;
END;
BEGIN SPLITS;
DIMENSIONS ntax=4;
FORMAT labels=yes weights=yes;
MATRIX
1.0 A B | C D
0.8 A C | B D
;
END;
Examples#
Distance matrix:
from phylozoo import DistanceMatrix
import numpy as np
matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]])
dm = DistanceMatrix(matrix, labels=['A', 'B', 'C'])
dm.save("distances.nexus")
dm2 = DistanceMatrix.load("distances.nexus")
nexus_str = dm.to_string(format="nexus", triangle="UPPER")
MSA:
from phylozoo import MSA
sequences = {"taxon1": "ACGTACGT", "taxon2": "TGCAACGT"}
msa = MSA(sequences)
msa.save("alignment.nexus", format="nexus")
msa2 = MSA.load("alignment.nexus")
Split system:
from phylozoo.core.split import SplitSystem, Split
split1 = Split({'A', 'B'}, {'C', 'D'})
split2 = Split({'A', 'C'}, {'B', 'D'})
splits = SplitSystem([split1, split2])
splits.save("splits.nexus")
splits2 = SplitSystem.load("splits.nexus")
See also#
I/O Operations — Save/load and format detection
PHYLIP — Alternative distance matrix format
FASTA — Alternative sequence format
Pairwise Distances — Distance matrices
Overview — Split systems