I/O Operations#

This page describes the I/O protocol used in PhyloZoo and the basic operations available on all I/O-capable classes.

IOMixin and I/O-capable classes#

All I/O in PhyloZoo goes through the IOMixin protocol. (If you are new to Python: a mixin is a class that provides methods to other classes without being the main parent.) IOMixin adds save, load, to_string, from_string, convert, and convert_string to any class that inherits from it (see below). So instead of each class having its own I/O code, they all share the same interface: you call network.save("file.enewick") or msa.load("file.fasta") in the same way regardless of the data type. PhyloZoo detects the format from the file extension (e.g. .enewick, .fasta) and uses the registry to find the right reader or writer for that class.

The following classes use IOMixin (directly or by inheritance):

Each class registers one or more formats with the central FormatRegistry; the format(s) and extensions supported depend on the class. See File formats for per-format details; see Registering a new format for how to register a new format for a class.

Basic operations#

All I/O-capable classes support the following methods (defined on IOMixin). Methods may accept format-specific keyword arguments (e.g. triangle="UPPER" for NEXUS, root_location=... for semi-directed Newick). See the per-format pages for details.

Saving to a file#

The save() method writes the object to a file. Format is detected from the file extension unless format is given. The overwrite parameter controls whether to overwrite the file if it already exists. Additional keyword arguments are passed to the format writer.

network.save("network.enewick")
msa.save("alignment.fasta")
dm.save("distances.nexus")

Loading from a file#

The load() method reads an object from a file. Format is detected from the extension unless format is given. If the extension is ambiguous or you use a non-standard extension, pass the format explicitly:

network = DirectedPhyNetwork.load("network.enewick")
msa = MSA.load("alignment.fasta")
dm = DistanceMatrix.load("distances.nexus")

# Explicit format when extension is unclear
network = DirectedPhyNetwork.load("network.txt", format="enewick")
msa = MSA.load("data.txt", format="fasta")

Serializing to a string#

The to_string() method serializes the object to a string in the given format (default format for the class if not specified).

enewick_string = network.to_string(format="enewick")

Parsing from a string#

The from_string() method parses a string and returns an instance. Format must be specified or be the class default.

network = DirectedPhyNetwork.from_string(enewick_string, format="enewick")

Converting between file formats#

The convert() method is a convenience wrapper that loads from one file and saves to another in one call. Formats are detected from extensions when not specified. Equivalent to cls.load(...).save(...); the object is created in memory during conversion.

MSA.convert("input.fasta", "output.nexus")

Converting between string formats#

The convert_string() method parses a string in one format and returns the serialized string in another format. Both formats must be specified. Equivalent to cls.from_string(content, format=input_format, **kwargs).to_string(format=output_format, **kwargs).

nexus_str = MSA.convert_string(fasta_content, input_format="fasta", output_format="nexus")

Error handling#

I/O operations raise PhyloZoo exceptions when appropriate:

from phylozoo.utils.exceptions import PhyloZooParseError

try:
    network = DirectedPhyNetwork.load("malformed.enewick")
except PhyloZooParseError as e:
    print(f"Parse error: {e}")

Best practices#

  • Use standard file extensions — PhyloZoo detects format from extensions. Use standard ones (e.g. .enewick, .fasta, .nexus) when possible.

  • Specify format when needed — If the extension is ambiguous or non-standard, pass the format parameter.

  • Handle exceptions — Use try/except for I/O to handle parse errors and missing files gracefully.

  • Preserve metadata — Some formats preserve more metadata (e.g. eNewick for branch lengths and gamma on networks). Choose a format that preserves what you need.

  • Use convert for convenience — Use the convert class method to change format in one call instead of separate load and save.

See Also#