Writing Tests

Phytest is easily extendable and provides a simple interface for writing custom phylogenetic tests. The interface follows the Pytest model of testing i.e. tests are defined as Python functions (or class methods) containing assert statements that are collected and evaluated at run-time. Tests that fail are captured and reported to the user allowing for repeatable and automated testing. Phytest provides many convenient helper functions for testing phylogenetic analyses including methods for testing sequences, alignments, trees and metadata files.

Phytest fixtures

Phytest injects special fixture objects into test functions, allowing for easy evaluation and testing of phylogenetic data structures. These fixtures provide the standard Biopython (sequences and trees) and Pandas (metadata) class methods as well as special assert methods for testing these data structures.

Only functions that require the fixtures will have the Pytest objects passed to them. For example consider the following tests.

from phytest import Sequence

def test_example(sequence: Sequence):
    ...

Test functions must start with the keyword test_ this allows Pytest to identify and collect the tests. Fixtures are required using one of the special arguments i.e. the lower case of the class name.

Here the sequence argument is used to require the sequences passed from the command line (see below for information on how to pass files to Phytest). Phytest will identify which test functions require which fixtures and pass the Phytest objects to them for testing.

Using Phytest classes for type hints is not required, however, makes for a better development experience. For example the following is a valid Phytest test and will be passed a Sequence object.

def test_example(sequence):
    ...

Fixtures can be combined to make more complex tests across multiple data types e.g.

from phytest import Sequence, Tree

def test_example(sequence: Sequence, tree: Tree):
    # test tree and sequence objects together
    ...

Sequence

The Phytest Sequence class is a sub-class of the Biopython SeqRecord class. This class uses the fixture sequence.

from phytest import Sequence

def test_example(sequence: Sequence):
    ...

Any tests requiring the class will be run for every sequence in the file. For example if the fasta file below is passed to Phytest the test_example function above would be run 4 times (Sequence_A-Sequence_D).

>Sequence_A
ATGAGATCCCCGATAGCGAGCTAGCGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_B
ATGAGATCCCCGATAGCGAGCTAGXGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_C
ATGAGA--CCCGATAGCGAGCTAGCGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_D
ATGAGATCCCCGATAGCGAGCTAGCGATNNNNNNNNNNNNNNNNNTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
$ phytest test.py --sequence sequences.fasta

Test session starts (platform: darwin, Python 3.9.12, pytest 7.1.1, pytest-sugar 0.9.4)
rootdir: /Users/wytamma/programming/phytest, configfile: pyproject.toml
plugins: sugar-0.9.4, html-3.1.1, cov-3.0.0
collecting ...
test.py ✓✓✓✓                                                            100% ██████████

Results (0.03s):
    4 passed

Alternative file formats can be specified using the --sequence-format flag.

Alignment

The Phytest Alignment class is a sub-class of the Biopython MultipleSeqAlignment class. This class uses the fixture alignment.

from phytest import Alignment

def test_example(alignment: Alignment):
    ...

Tests using the alignment file will be run once i.e. you will have access to the entire alignment during the test. Alignments are also passed to Phytest using the --sequence flag however they are required to be valid alignments e.g. all sequence must be the same length.

phytest test.py --sequence sequences.fasta

Test session starts (platform: darwin, Python 3.9.12, pytest 7.1.1, pytest-sugar 0.9.4)
rootdir: /Users/wytamma/programming/phytest, configfile: pyproject.toml
plugins: sugar-0.9.4, html-3.1.1, cov-3.0.0
collecting ...
test.py                                                                100% ██████████

Results (0.02s):
    1 passed

Alternative file formats can be specified using the --sequence-format flag.

Tree

The Phytest Tree class is a sub-class of the Biopython Tree class. This class uses the fixture tree.

from phytest import Tree

def test_example(tree: Tree):
    ...

Tests using the tree fixture will be run once per tree in the file. Tree files are passed to Phytest using the --tree flag.

(Sequence_A:1,Sequence_B:0.2,(Sequence_C:0.3,Sequence_D:0.4):0.5);
(Sequence_A:1,Sequence_B:0.3,(Sequence_C:0.3,Sequence_D:0.4):0.5);
phytest test.py --tree tree.newick

Test session starts (platform: darwin, Python 3.9.12, pytest 7.1.1, pytest-sugar 0.9.4)
rootdir: /Users/wytamma/programming/phytest, configfile: pyproject.toml
plugins: sugar-0.9.4, html-3.1.1, cov-3.0.0
collecting ...
test.py ✓✓                                                              100% ██████████

Results (0.02s):
    2 passed

Alternative file formats can be specified using the --tree-format flag.

Data

The Phytest Data class is a sub-class of the Pandas DataFrame class. This class uses the fixture data.

from phytest import Data

def test_example(data: Data):
    ...

Tests using the data file will be run once. Data files are passed to Phytest using the --data flag.

phytest test.py --data metadata.csv

Test session starts (platform: darwin, Python 3.9.12, pytest 7.1.1, pytest-sugar 0.9.4)
rootdir: /Users/wytamma/programming/phytest, configfile: pyproject.toml
plugins: sugar-0.9.4, html-3.1.1, cov-3.0.0
collecting ...
test.py                                                                100% ██████████

Results (0.02s):
    1 passed

Alternative file formats can be specified using the --data-format flag.

Built-in asserts

Phytest provides many convenient helper functions for testing phylogenetic analyses including methods for testing sequences, alignments, trees and metadata files.

from phytest import Sequence

def test_GC_content(sequence: Sequence):
    sequence.assert_percent_GC(38)

For example, the Phytest Sequence class implements the method Sequence.assert_percent_GC. Calling this method with the expected GC-content e.g. sequence.assert_percent_GC(38) will raise an error if the percent of G and C nucleotides in the sequence is not equal to 38%. Many methods also provide maximum and minimum arguments so the upper and lower bounds can be tested e.g. sequence.assert_percent_GC(min=30, max=40).

from phytest import Sequence

def test_GC_content(sequence: Sequence):
    sequence.assert_percent_GC(min=30, max=40)

All Phytest assert methods also provide a warning flag e.g. sequence.assert_percent_GC(38, warn=True) causing the method to raise a warning instead of an error if the test fails. In an automated pipeline, this provides a way to inform the user of potential problems without causing the pipeline to fail. The warning flag can be set automatically by calling the method with the warn_ prefix instead of assert_ e.g. sequence.warn_percent_GC(38).

from phytest import Sequence

def test_GC_content(sequence: Sequence):
    sequence.warn_percent_GC(38)

See the documentation for a full list of built-in assert methods (https://phytest-devs.github.io/phytest/reference.html).

Custom asserts

As Phytest is running Pytest under the hood it is trivial to write your own custom asserts using the Phytest fixtures.

def test_outlier_branches(tree: Tree):
    # Here we create a custom function to detect outliers
    import statistics

    tips = tree.get_terminals()
    branch_lengths = [t.branch_length for t in tips]
    cut_off = statistics.mean(branch_lengths) + statistics.stdev(branch_lengths)
    for tip in tips:
        assert tip.branch_length < cut_off, f"Outlier tip '{tip.name}' (branch length = {tip.branch_length})!"