Phylogenetics

Selecting the NRPieceS

Key Points

Rational selectionof 35 donor exchange units through phylogenetic analysis.
Increased engineering success rates with rationally selected donors.
Establishment of a software tool for phylogenetic-based, guided NRPS engineering.

Graphical Abstract

The Goal - NRPs Derivatization by Exchanging Units

The unique strength of our platform lies not only in making NRPS accessible for heterologous expression, but also in enabling the use of their molecular machinery for peptide derivatization. By exchanging NRPS units, it becomes possible to control which amino acid is incorporated into a non-ribosomal peptide. Building on this principle, we developed a library consisting of modular exchange units (XUTs) that can replace existing units in an NRPS, thereby redirecting which amino acid is incorporated (Fig. 1).

**Fig. 1:** Substituting XUT^I units in the NRPS results in the formation of peptide derivatives.

This library is organized into two complementary vector sets: acceptors and donors. Acceptors are native NRPS clusters that have been split into three functional parts (initiator, elongator, and terminator) using inteins. Each acceptor has a defined site for the insertion of a donor which are single exchange units that can be introduced into these sites to diversify amino acid incorporation.

Our library exchanges NRPS units at the XUT^I split site, meaning that a standard donor exchange unit begins with a thiolation (T) domain, followed by a condensation (C) domain, and ends with an adenylation (A) domain (Fig. 2). We refer to these donor modules as NRPieceS, as they can be inserted into acceptors like puzzle pieces.

**Fig. 2:** Chaiyaphumine is split at the XUT^I site (A). The simple donor module consists of a T domain, C domain and A domain and is referred to as LCL since the upstream amino acid enters and leaves the condensation complex in L-configuration (B).

The Problem - Incompatibility Limits Peptide Production in Hybrid NRPS

However, while NRPS engineering allows the interchange of units, this does not guarantee that the resulting hybrid NRPS will produce a peptide. Successful peptide synthesis also requires that the interfaces between different NRPS domains are compatible — a condition that is not automatically fulfilled in hybrid assemblies that do not naturally occur.

In theory, a large number of NRPS can be derivatized to generate peptide diversity, yet many combinations fail to yield a product due to incompatibility. We observed exactly this effect with our first set of donor vectors.

We selected 11 target amino acids for the initial donor set and screened clusters for XUT^I units containing A domains specific to these residues (Fig. 3). Once all XUT^I units were identified, the corresponding donor vectors were cloned and combined with the Chaiyaphumine elongator acceptor vector to generate expression constructs.

**Fig. 3:** Selection and cloning process of the first donor set.

We combined the newly created expression cassettes with the native Chaiyaphumine initiator and terminator and tested the expression. The results showed that the rate of successful peptide synthesis rate was only 27 % (Fig. 4).

**Fig. 4:** Random selection of donor modules yields only a 27% peptide synthesis success rate.

This low success rate highlighted the limitations of donor selection only based on amino acid coverage. To overcome this, we moved away from randomly picking modules and instead developed a more rational approach.

The Solution - Rational Donor Selection based on TE Domains

Because the underlying architecture of NRPSs is conserved across all domains of life, proteins from clusters of different origins can, in principle, be combined^[1]. However, literature shows that successful peptide synthesis is far more likely when the NRPS clusters involved are closely related ^[2].

This suggested that mapping the relationships between NRPS clusters could be a promising strategy for selecting exchange units. However, we needed a systematic rationale for defining these relationships. Phylogenetics, the study of evolutionary relationships among genes or organisms, provides a framework to infer how NRPS clusters are related over time.

The task is complicated by the modular architecture of NRPS: As NRPS domains are highly similar to each other, these enzymes frequently undergo recombination, making their evolutionary relationships less straightforward ^[2].

We came up with the idea to use the thioesterase (TE) domain as a phylogenetic marker for donor module selection due to its unique position in the NRPS cluster. To validate this strategy, we consulted an expert in evolutionary biochemistry Prof. Dr. Georg Hochberg, who confirmed that the TE domain is particularly well suited for this approach. Since it occurs only once per cluster, it enables direct comparison between different clusters without the complication that other domains typically appear multiple times within the same cluster. This absence of intra-cluster duplication makes TE domain sequence similarity a particularly powerful criterion for selecting appropriate donor modules.

Therefore, we decided to use a phylogenetic approach to map these relationships of the TE domains to guide the selection of XUT^I units for our donor modules. This strategy was intended to enhance exchange unit compatibility in non-native NRPSs, thereby improving reprogramming efficiency and ensuring reliable amino acid incorporation.

The Workflow - Mapping NRPS Relations for Donor Selection

Genome Annotation

Before mapping the relationships between NRPS clusters based on their TE domains, we first needed to generate a pool of annotated genomes. Since the model clusters of our library all originate from Xenorhabdus and Photorhabdus strains, we focused our genome mining on these genera. By screening related genomes, we aimed to maximize the likelihood of identifying compatible biosynthetic building blocks.

We run a total of 65 genomes from Xenorhabdus and Photorhabdus strains through AntiSMASH to generate the annotations for the TE domains and A domain specificity which we needed for the mapping and donor selection.

Amino Acid Selection & Sorting

Since the overarching goal of the NRPieceS library is to serve as a tool for peptide derivatization through NRPS engineering, the strategic selection of amino acids represented by our donor exchange units was a crucial step in its design.

To access the chemical diversity presented by NRPS we wanted to cover not only the 20 canonical acids but also more “special” amino acids like ß-alanine and 2,4-Diaminobutyric acid (DAB) as they are naturally present in Xenorhabdus and Photorhabdus NRPs (Fig. 5).

**Fig. 5:** Collection of amino acids to be represented by the library donors.

Furthermore, we extended the structural diversity of the hybrid NRPS by including donors with an epimerization (E) or condensation-epimerization (CE) domain as these domains alter the stereochemistry of the upstream amino acid (Fig. 6).

**Fig. 6:** The standard LCL unit does not affect the sterochemistry (A). In E-DCL modules, the epimerization domain is separate, converting the upstream amino acid from D to L during condensation (B). In CE-type modules, the condensation–epimerization domain converts the upstream amino acid from L to D in one domain (C).

We screened the gene clusters in the annotated genomes for their module specificity. If a cluster contained a module specific for an amino acid targeted in our library, it was moved to the corresponding folder (Fig. 7).

**Fig. 7:** The annotated clusters were sorted regarding their module specificity.

Extraction and Translation of TE domains

After generating a dataset for each amino acid, we extracted the DNA sequences of every TE domain of an NRPS cluster. These sequences were then translated into amino acids. Since amino acid–based alignments better capture functional and evolutionary relationships, reduce noise from codon usage, and improve alignment quality - ultimately resulting in more reliable phylogenetic trees ^[3].

Tree Generation for Amino Acid Specificity

For each amino acid specificity, a phylogenetic tree was generated to illustrate the relationships between clusters and identify the closest relatives. Therefore, the extracted TE domain sequences from each dataset were aligned to a reference cluster—our model NRPS, Chaiyaphumine—using Clustal Omega. The resulting trees were then constructed and visualized using the tree-building tool integrated in Geneious (Fig. 8).

**Fig. 8:** Exemplary tree of TE domains of regions having glutamine (Gln).

Donor Exchange Unit Selection

Finally, the phylogenetic trees were used to identify exchange units based on the following criteria:

closest evolutionary relationship to our model library clusters Chaiyaphumine
not a starter or finisher module,
flanked upstream and downstream by additional NRPS modules,
absence of a BsaI and SapI restriction sites,
upstream E or CE domains for structural diversity.

By prioritizing closely related modules, we aimed to maximize compatibility with the acceptor NRPS and increase the likelihood of successful peptide synthesis. Modules that are evolutionarily distant may carry structural or functional differences that reduce their ability to integrate seamlessly into a hybrid NRPS.

Since our derivatization strategy is based on the XUT^I engineering approach, we required each unit to contain at least a T, C and A domain in that specific order. As starter and finisher modules do not meet this criterion, they were excluded.

To further improve the likelihood of successful hybrid NRPS construction, we focused on modules originating from “pure” NRPS clusters rather than mixed NRPS–PKS or other megasynthase hybrids. Modules from mixed rely on different interactions, which could reduce compatibility with NRPS systems. Additionally, modules with a bulkier upstream amino acid like phenylalanine were preferred as it potentially increases the acceptance of non-native substrates in the hybrid NRPS.

Since donor and acceptor vectors were combined using Golden Gate cloning, we relied on the Type IIS restriction enzyme BsaI to generate compatible overhangs. While the presence of a BsaI site in a potential donor was not an absolute exclusion criterion, it would complicate cloning. Therefore, whenever possible, we prioritized donors without internal BsaI sites to simplify the process. If unavoidable, donors containing BsaI were retained, but the restriction sites were removed via site-directed mutagenesis. The same strategy was applied to SapI sites to ensure plasmid compatibility with the RFC100 standard.

By selecting multiple donors for a single amino acid, we can further expand the chemical space of our library, including modifications to the upstream residue. To this end, whenever possible, we chose two donors per amino acid: one of the LCL type and one containing an epimerization domain (E-DCL or CE).

We applied this rationale to select our donor exchange units and ended up with the units shown in tab. 1. For two amino acid specificities, namely arginine and asparagine, no close relative could be identified, as these residues are too divergent from our reference cluster, Chaiyaphumine. In these cases, we resorted to random selection.

Tab. 1: Selected donor XUT^I units based on the TE domain alignment.

Amino acid	Strain	Donor Type	Native upstream AA
Ala	Xenorhabdus PB61.4	E-DCL	Phe
Ala	X. khoisanae DSM 25463	LCL	Pro
Arg	X. cabanillasii JM26	C/E	Leu
Arg	X. cabanillasii JM26	LCL	Leu
Asn	X. cabanillasii JM26	E-DCL	Asn
Asn	X. innexi DSM 16336	LCL	Phe
Asp	X. khoisanae DSM 25463	LCL	Gln
b-Ala	X. nematophila ATCC 19061	E-DCL	Trp/Phe
b-Ala	X. innexi DSM 16336	LCL	Thr
Dab	P. kayaii DSM 15194	C/E	Tyr
Gln	X. mauleonii DSM 17908	E-DCL	Ile
Gln	X. romanii DSM 17910	LCL	Phe
Glu	P. temperata subsp. thracensis DSM 15199	C/E	Lys
Glu	X. beddingii DSM 4764	LCL	Thr
Gly	X. szentirmaii DSM 16338	C/E	Asn
Gly	X. szentirmaii DSM 16338	LCL	Pro
His	X. miraniensis DSM 17902	C/E	Trp
His	P. temperata K122	LCL	Arg
Ile	Xenorhabdus KK7.4	E-DCL	Val
Ile	Xenorhabdus KK7.4	LCL	Val
Leu	X. cabanillasii JM26	E-DCL	Leu
Leu	X. innexi DSM 16336	LCL	Tyr
Lys	P. luminescens IT4.1	C/E	Thr
	X. hominickii DSM 17903	C/E	Lys
	X. mauleonii DSM 17908	LCL	Lys
Phe	X. mauleonii DSM 17908	E-DCL	Phe
Phe	Xenorhabdus PB61.4	LCL	Thr
Pro	Xenorhabdus KK7.4	E-DCL	Phe
	Xenorhabdus PB61.4	E-DCL	Ala
	X. nematophila ATCC 19061	LCL	b-Ala
Ser	Xenorhabdus KK7.4	E-DCL	Val
Ser	X. cabanillasii JM26	LCL	Leu
Thr	P. temperata K122	E-DCL	Phe
Thr	Xenorhabdus KK7.4	LCL	Val
Trp	X. miraniensis DSM 17902	C/E	Ser
	X. hominickii DSM 17903	LCL	Pro
	Xenorhabdus KJ12.1	LCL	Trp
Tyr	X. szentirmaii DSM 16338	C/E	Val
	X. innexi DSM 16336	E-DCL	Tyr
	X. stockiae DSM 17904	LCL	Ile
Val	X. szentirmaii DSM 16338	C/E	Phe
	X. hominickii DSM 17903	E-DCL	Phe
	Xenorhabdus KK7.4	LCL	Val

By applying this strategy we completed the third iteration of the DBTL cycle.

The Application - Creating the Donor Vectors

We started to clone all of the identified XUT^I units into the appropriate backbone to generate the donor vectors for our library. However, some constructs failed in the cloning process (Ala E-DCL, Gly LCL, His LCL, Leu LCL, Pro E-DCL, Ser LCL, Trp CE, Tyr CE and Val CE), leaving a final set of 35 donor vectors successfully incorporated into the library (Fig. 9).

**Fig. 9:** Final donor set of the NRPieceS Library.

We inserted each donor XUT^I unit into all three positions of the Chaiyaphumine cluster, generating a total of 35 × 3 = 105 NRPS derivatives. The expression of each derivative combination was subsequently tested (Library Characterization Results).

We directly saw that the constructs with these donor vectors were much better, with 60 % producing the correct peptide. To determine whether phylogenetic similarity is actually a good predictor for NRPS unit compatibility and to make this donor selection process available for future teams, we decided to streamline the approach explained in this page and to include it in our software mATChmaker. We did indeed find a very strong correlation that proves that the phylogenetic analysis was effective - to learn more about this, please visit our software page.

Outlook

Building the Donor Vectors

Identifying the donor vectors was just one part of the library construction process. Once selected, the donors were cloned into Golden Gate–compatible vectors and combined with acceptors to generate hybrid NRPS. For a detailed description of how the dry lab approach was implemented in the wet lab, see the Results – Library Building Results.

Extended Genome Mining

Our approach focused on Xenorhabdus and Photorhabdus strains, but the strategy is equally applicable to other genera that harbor NRPS.

References

[1] He, R., Zhang, J., Shao, Y., Gu, S., Song, C., Qian, L., Yin, W.-B., & Li, Z. (2023). Knowledge-guided data mining on the standardized architecture of NRPS: Subtypes, novel motifs, and sequence entanglements. PLOS Computational Biology, 19(5), e1011100. https://doi.org/10.1371/journal.pcbi.1011100

[2] Baunach, M., Chowdhury, S., Stallforth, P., & Dittmann, E. (2021). The Landscape of Recombination Events That Create Nonribosomal Peptide Diversity. Molecular Biology and Evolution, 38(5), 2116–2130. https://doi.org/10.1093/molbev/msab015

[3] Wernersson R, Pedersen AG. RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res. 2003 Jul 1;31(13):3537-9. doi: 10.1093/nar/gkg609. PMID: 12824361; PMCID: PMC169015

Show all references

Show less

Contents