Molecular Sequence Generation with DendroPy

The DendroPy Phylogenetic Computing Library includes native infrastructure for phylogenetic sequence simulation on DendroPy trees under the HKY model. Being pure-Python, however, it is a little slow. If Seq-Gen is installed on your system, though, you can take advantage of a lightweight Seq-Gen wrapper added to the latest revision under the interop subpackage: dendropy.interop.seqgen. Documentation is lagging, but the following examples should be enough to get started, and the class is simple and straightforward enough so that all options should be pretty much self-documented. With either the native or Seq-Gen wrapper methods, you can easily generate alignments from within DendroPy, facilitating simulation and analytical pipelines.

#! /usr/bin/env python

import dendropy
from dendropy.interop import seqgen

trees = dendropy.TreeList.get_from_path("trees.nex",
s = seqgen.SeqGen()

# generate one alignment per tree
# as substitution model is not specified, defaults to a JC model
# will result in a DataSet object with one DnaCharacterMatrix per input tree
d0 = s.generate(trees)
print len(d0.char_matrices)

# instruct Seq-Gen to scale branch lengths by factor of 0.1
# note that this does not modify the input trees
s.scale_branch_lens = 0.1

# more complex model
s.char_model = seqgen.SeqGen.GTR
s.state_freqs = [0.4, 0.4, 0.1, 0.1]
s.general_rates = [0.8, 0.4, 0.4, 0.2, 0.2, 0.1]
d1 = s.generate(trees)
print len(d1.char_matrices)