NEXUS#

The NEXUS format is a flexible, block-structured format used widely in phylogenetics. A NEXUS file starts with #NEXUS and contains one or more blocks, each of the form BEGIN blockname; ... END;. PhyloZoo uses different block types for different data kinds (distance matrices, sequence alignments, split systems). The same file extensions are used for all NEXUS subtypes; the class you load or save with determines which block structure is expected.

See also

Nexus file — Wikipedia

Classes and extensions#

File extensions: .nexus, .nex, .nxs

Classes (by subtype): DistanceMatrix (DISTANCES), MSA (CHARACTERS), SplitSystem and WeightedSplitSystem (SPLITS).

Structure#

All NEXUS files begin with the #NEXUS token. Data are organized in blocks. Each block has BEGIN blockname;, block-specific commands and a MATRIX or data section, and END;. The TAXA block (with TAXLABELS) is shared across subtypes to define taxon labels. The data block (DISTANCES, CHARACTERS, or SPLITS) holds the actual data.

#NEXUS

BEGIN TAXA;
    DIMENSIONS ntax=N;
    TAXLABELS
        taxon1
        taxon2
    ;
END;

BEGIN SomeBlock;
    ...
END;

DISTANCES#

Blocks: TAXA and DISTANCES. The DISTANCES block contains a lower or upper triangular matrix with optional FORMAT triangle=....

#NEXUS

BEGIN Taxa;
    DIMENSIONS ntax=3;
    TAXLABELS
        A
        B
        C
    ;
END;

BEGIN DISTANCES;
    DIMENSIONS ntax=3;
    FORMAT triangle=LOWER diagonal LABELS;
    MATRIX
    A 0.000000
    B 1.000000 0.000000
    C 2.000000 1.000000 0.000000
    ;
END;

CHARACTERS#

Blocks: TAXA and CHARACTERS. The Characters block has DIMENSIONS nchar=..., FORMAT datatype=..., and a MATRIX with aligned sequences.

#NEXUS

BEGIN TAXA;
    DIMENSIONS ntax=2;
    TAXLABELS
        taxon1
        taxon2
    ;
END;

BEGIN CHARACTERS;
    DIMENSIONS nchar=8;
    FORMAT datatype=DNA missing=N gap=-;
    MATRIX
    taxon1    ACGTACGT
    taxon2    TGCAACGT
    ;
END;

SPLITS#

Blocks: TAXA and SPLITS. The SPLITS block has FORMAT labels=yes weights=yes and a MATRIX with splits in A B | C D notation, optionally with weights.

#NEXUS

BEGIN TAXA;
    DIMENSIONS ntax=4;
    TAXLABELS
        A
        B
        C
        D
    ;
END;

BEGIN SPLITS;
    DIMENSIONS ntax=4;
    FORMAT labels=yes weights=yes;
    MATRIX
        1.0  A B | C D
        0.8  A C | B D
    ;
END;

Examples#

Distance matrix:

from phylozoo import DistanceMatrix
import numpy as np

matrix = np.array([[0, 1, 2], [1, 0, 1], [2, 1, 0]])
dm = DistanceMatrix(matrix, labels=['A', 'B', 'C'])
dm.save("distances.nexus")
dm2 = DistanceMatrix.load("distances.nexus")
nexus_str = dm.to_string(format="nexus", triangle="UPPER")

MSA:

from phylozoo import MSA
sequences = {"taxon1": "ACGTACGT", "taxon2": "TGCAACGT"}
msa = MSA(sequences)
msa.save("alignment.nexus", format="nexus")
msa2 = MSA.load("alignment.nexus")

Split system:

from phylozoo.core.split import SplitSystem, Split
split1 = Split({'A', 'B'}, {'C', 'D'})
split2 = Split({'A', 'C'}, {'B', 'D'})
splits = SplitSystem([split1, split2])
splits.save("splits.nexus")
splits2 = SplitSystem.load("splits.nexus")

See also#