Modelling

The development of a color-changing tattoo system based on melanin aggregation presents a fundamental challenge: melanin biosynthesis is essential for absorption but also highly cytotoxic to cells. When tyrosinase enzymes catalyze the conversion of tyrosine to melanin, the reactive intermediates and final melanin products can cause oxidative stress, disrupt cellular membranes, and ultimately lead to cell death Brenner and Hearing (2008) . This toxicity poses a critical obstacle to our synthetic biology approach, as we need living cells to produce melanin while maintaining their viability over extended periods. Our solution to this challenge centers on spatial compartmentalization. By confining melanin production exclusively within proteinaceous nanocages called encapsulin nanocompartments, which can safely sequester the toxic compounds away from sensitive cellular machinery while still allowing the melanin to serve its optical function. Optimizing this balance of optical absorbance and avoiding cytotoxicity by making highly stable nanocages required modeling of tyrosine production over time inside our encapsulin nanocompartments. This knowledge was then used to create a model that calculates the transmission through one hydrogel bead depending on the nanocage concentration in the cell. With experimental insights for the maximal possible number of nanocages per cell, this model could be used to calculate the number of stacked beads necessary to achieve the desiered transmission.

Tyrosinase Kinetics Model

Design Goals

Since the eumelanin is not enclosed when the nanocage assembles but instead gets continuously produced during the lifetime of the nanocage, we created a model to predict the eumelanin production speed. Since the tyrosinase only catalyzes the first two steps in the reaction from tyrosine to eumelanin and the other steps happen autocatalytically, we decided to model the steps from tyrosine over DOPA to dopaquinone (see fig. 1). The goal was to get a mathematical model that predicts the dopaquinone concentration at a given time point after nanocage assembly as well as the time when a certain dopaquinone concentration is reached.

Fig. 1: Reaction from tyrosine over DOPA to dopaquinone (Paria et al., 2020).

Results

The dopaquinone concentrations over time were calculated for the combinations of the nanocages from Myxococcus xanthus (MX), from Quasibacillus thermotolerans (QT) and the tyrosinases BmTyr, HcTyr1, LsTyr and SkMelC2 (see fig. 2). Additionally, the concentration of dopaquinone that was reached after 48 hours was calculated for all the tyrosinases and nanocages, with different percentages of nanocage monomers that have a tyrosinase attached to it (see tab. 1). The maximum number of tyrosinases (100 %) is reached, when every monomer has a tyrosinase attached to it.

Fig. 2: Course of tyrosine, DOPA and dopaquinone concentration over time for the nanocages MX and QT in combination with the tyrosinases BmTyr, HcTyr1, LsTyr and SkMelC2, with maximal amount of tyrosinases per nanocage.

Tab. 1: Concentrations of dopaquinone that were reached after 48 hours, with the nanocages MX and QT in combination with the tyrosinases BmTyr, HcTyr1, LsTyr and SkMelC2. The calculations were also done with different percentages of nanocage monomers that have a tyrosinase attached to it.

	MX				QT
	BmTyr	HcTyr1	LsTyr	SkMelC2	BmTyr	HcTyr1	LsTyr	SkMelC2
100%	0.0450	0.6869	0.4131	3.7754	0.0210	0.4126	0.2365	2.2516
75%	0.0247	0.4587	0.2658	2.5062	0.0112	0.2661	0.1505	1.4994
67%	0.0196	0.3934	0.2243	2.1456	0.0087	0.2159	0.1258	1.2817
50%	0.0102	0.2459	0.1406	1.4127	0.0044	0.1230	0.0707	0.8494
33%	0.0040	0.1132	0.0646	0.8008	0.0016	0.0549	0.0292	0.4832
25%	0.0019	0.0639	0.0346	0.5366	0.0007	0.0302	0.0148	0.3241

Methods

To model the enzyme kinetics, we created equations using the Michaelis-Menten model. Thus, we assumed that steady state kinetics are applicable to our system, as well as that the reactions run irreversibly, that the tyrosinases that are attached to the nanocage monomers behave similarly to free floating enzymes, that the tyrosine molecule detatches after reacting to DOPA, and that the flux through the nanocage pores is negligible. Based on these assumptions and simplifications we created differential equations presented below.

You can check out our code in our official iGEM GitLab.

Equations

Starting conditions

$[S_{0_{A}}]=\text{tyrosine cytosolic concentration}$ (Song et al., 2015)
$[S_{0_{B}}]=0$
$[S_{0_{P}}]=0$

$A(t)$ , $B(t)$ and $P(t)$ are the time dependent concentrations from tyrosine, DOPA and dopaquinone, respectively. $k_{cat_A}$ and $k_{cat_B}$ are the catalytic constants for tyrosine and DOPA (de Almeida Santos et al., 2024; Dolashki et al., 2012; Guo et al., 2015; Shuster and Fishman, 2009). $k_{M_A}$ and $k_{M_B}$ are the Michaelis-Menten constant for tyrosine and DOPA (de Almeida Santos et al., 2024; Dolashki et al., 2012; Guo et al., 2015; Shuster and Fishman, 2009). $[E_0]$ stands for the enzyme concentration (here a tyrosinase), which was calculated by dividing the absolute number of enzymes by the Avogadro constant and dividing this by the nanocage volume (Sigmund et al., 2023), while $t$ stands for reaction time.

Explanation

Due to the fact, that we did not consider the flux of tyrosine through the pores, the change of $A(t)$ over time is only dependent on how fast the tyrosine reacts to DOPA $v_1$ . The change of $B(t)$ over time is equal to $v_1 - v_2$ (Storer and Cornish-Bowden, 1974), so the DOPA production spped minus the dopaquinone production speed. Similarly, the change of $P(t)$ over time is only dependent on the dopaquinone production speed ( $v_2$ ).

Since the $v_{max}$ values are unknown, we derived them with $k_{cat} \times [E_0]$ . Because the tyrosinase catalyzes the reaction from tyrosine to DOPA as well as the reaction from DOPA to dopaquinone, the two substrates tyrosine and DOPA compete with each other, which was considered in the equation by adding the terms
$(1 + \frac{B(t)}{K_{M_{B}}}) \text{ and } (1 + \frac{A(t)}{K_{M_{A}}})$ to the denominator of the Michaelis-Menten equation for $v_1$ and $v_2$ (Sch"auble et al., 2013).

Since the resulting equations could not be solved analytically, we implemented a numerical approach using the solve_ivp method from scipy.integrate with method="RK45"(Virtanen et al., 2020).

Discussion

For the models with the maximum number of tyrosinases, SkMelC2 combined with the MX nanocage and tyrosinases on all of the monomers produces dopaquinone the fastest. This combination produced more than 100 times more dopaquinone than the worst combination, which was BmTyr with QT. The model could be improved upon by incorporating reversibility. Also the flux of molecules through the nanocage pores could be experimentally determined and then added to the equation.

Transmission vs. Nanocage Numbers

Design Goals

In the envisioned application of our project, the cells will be enclosed in hydrogel beads and injected subcutaneously. In order to optimize this process, we wanted to elucidate how much light is absorbed by the beads (or, conversely, transmitted through them) and how this reporter effect depends on the number of nanocages per cell. In order to acheive that, we derived an equation for the transmission, that is dependent on the cellular concentration of melanin-containing nanocages.

Results

The transmissions of the hydrogel beads at their diameter were calculated for the different concentrations of dopaquinone that were predicted from the Tyrosinase Kinetics Model to be produced after 48 hours by the different combinations of the nanocages MX, QT and the tyrosinases BmTyr, HcTyr1, LsTyr and SkMelC2 (see fig. 3). We chose 48 hours, because we estimated,that the nanocages are stable for a range of several hours to days (see project description).

Fig 3: Course of transmission with increasing amount of nanocages per cell. For the nanocages MX and QT in combination with the values for the concentrations that the tyrosinases BmTyr, HcTyr1, LsTyr and SkMelC2 produced after 48 hours with maximal amount of tyrosinases per nanocage.

Methodes

We modified the Beer-Lambert law slightly, so that it contains a factor for the number of cells stacked behind each other and the term for the concentration was replaced with a term that takes the concentration from the nanocages as well as their number and calculates what concentration the amount of eumelanin would be if it was free in the cell. We calculated the transmission for nanocage numbers ranging from 0 to the maximum number of nanocages, which we calculated by dividing the HEK cell volume minus the nucleus volume divided by the nanocage volume (Markovic, 2013), (Milo et al., 2010). We used the Tyrosinase Kinetics Model to calculate the dopaquinone concentrations that would be in the nanocages after 48 hours.

The model is based on the assumption, that the hydrogel itself does not absorb at 400 nm wavelength, nor does it scatter the light. Additionally, it was assumed that all the dopaquinone predicted to be produced by 48 hours reacted to eumelanin, so the absolute mass of the predicted dopaquinone in the nanocages were assumed to be equal to the absolute mass of eumelanin. Since neither the distribution nor the orientation of the cells within the hydrogel bead are known, we divided the bead diameter by the cell diameter to get the number of cells that are stacked behind each other (Tastanova et al., 2018), (Milo et al., 2010). Free floating nanocages were assumed to absorb the light the same way an equally amount of free eumelanin would.

Feel free to use our code from our iGEM GitLab.

Equation

$T=\frac{I_{q}}{I_{0}} 10^{-q \times \varepsilon \times l \times (\frac{m \times x}{V_{cell}})}$

$T$ stands for the transmission of light that passes through the hydrogel bead. $I_0$ stands for the light intensity at the beginning and $I_q$ stands for the intensity of the transmitted light. The number of cells the light passes through is $q$ . $\epsilon$ stands for the molar absorption coefficient, we used the one for melanin at 400 nm (Riesz and Jean, 2007). The length the light travels through, we assumed it is the radius of the HEK cell, is symbolized by the letter $l$ . $m$ is the amount of dopaquinone in gramm. $x$ is the number of nanocages. $V_\text{cell}$ is the volume of a HEK cell.

Discussion

The model showed that one bead could already reduce the transmission to 3.35 % when SkMelC2 combined with the MX nanocage are used and the maximum number of nanocages is assumed.

To improve upon the model, experimental data for the maximal and average number of nanocages in the cell would be helpful, with this the transmission of one bead could be calculated exactly, and one could determine how many beads are necessary to get the desired reduction in transmission. Also, the absorption of the hydrogel and skin layers above the tattoo could be included, so that the model reflects better what would be visible from the tattoo. To increase accuracy of the model the actual amount of eumelanin after 48 hours could be determined experimentally.

Structure and Assembly Modeling

To achieve controlled melanin production within nanocages, we pursued two parallel molecular engineering strategies, both requiring extensive computational modeling to guide rational design. The first approach employs split tyrosinase systems, where the enzyme is divided into inactive fragments that only reconstitute into functional enzymes when brought together inside assembled nanocages. The second approach utilizes naturally occurring lid domains found on certain bacterial and fungal tyrosinases, which block the enzyme’s active site until removed or inhibted (de Almeida Santos et al., 2024). Our modeling efforts encompassed two major domains: optical modeling of tattoo darkness based on melanin aggregation states, and comprehensive structural modeling of protein assemblies to optimize our tyrosinase control mechanisms. The structural modeling component integrated multiple state-of-the-art computational tools including SPELL for split site prediction, AlphaFold2 for structure prediction, Boltz-2 for complex modeling and binding affinity estimation, and Rosetta for physics-based refinement and energy calculations.

The modeling work directly informed our experimental design choices at multiple decision points throughout the project. Computational predictions allowed us to screen dozens of potential tyrosinase candidates and design variants in-silico before committing resources to laboratory testing, dramatically accelerating our development timeline and reducing cost. Structure predictions revealed which split sites would likely yield reconstitutable fragments, which lid domain positions and modifications could accommodate cleavage sites without disrupting enzyme or lid domain function, and which encapsulin-tyrosinase fusion architectures would maintain both cage assembly and catalytic activity. By establishing quantitative metrics for candidate evaluation including predicted binding energies, structural similarity to native enzymes, and active site geometry preservation, we transformed what would have been extensive trial-and-error experimentation into a hypothesis-driven optimization process grounded in structural biology principles.

The Melanin Compartmentalization Challenge

Melanin biosynthesis presents an inherent toxicity challenge that must be overcome to create viable engineered cellular systems. The tyrosinase-catalyzed pathway proceeds through highly reactive intermediates including dopaquinone and various indole derivatives that can spontaneously react with cellular nucleophiles. These intermediates cause oxidative damage to DNA, proteins, and lipids while also generating reactive oxygen species as byproducts. The final melanin polymer, though chemically stable, can accumulate to physically disrupt cellular processes and interfere with normal protein folding (Brenner and Hearing, 2008b). This presents an issue as cytotoxicity greatly reduces cell and thus too longevity.

Our solution leverages encapsulin nanocompartments as synthetic melanosomes that provide a physically isolated compartment for melanin production and storage. These self-assembling protein shells, typically 20-40 nanometers in diameter, create a semi-permeable barrier that allows small molecule substrates like tyrosine to enter while retaining toxic intermediates and melanin products inside (Nichols et al., 2017). By genetically fusing tyrosinase to encapsulin subunits and inhibiting enzymatic activity outside of cages, we aim to localize all melanin biosynthesis to this protected environment. However, simply co-localizing tyrosinase with encapsulins is insufficient. We need temporal control to prevent melanin production until the cages are fully assembled and properly positioned within the cell, as premature tyrosinase activity could generate toxic intermediates before adequate sequestration is achieved.

To achieve this temporal control, we developed two parallel approaches that both rely on protein engineering principles. The split tyrosinase strategy divides the enzyme into two inactive fragments that are fused to separate encapsulin subunits; these fragments only come into proper proximity and orientation to reconstitute a functional active site when the encapsulin cage assembles, inherently linking melanin production to cage formation. The lid domain approach exploits naturally occurring C-terminal extensions found on certain tyrosinases that sterically occlude the active site until proteolytically cleaved. By engineering specific cleavage sites into these lids and placing them under the control of our MESA receptor system, we can trigger melanin production on demand through ligand-induced MESA dimerization. Both strategies require detailed structural understanding to succeed. We had to predict whether split fragments could reassemble, where to introduce splits or cleavage sites without disrupting enzyme or lid domain function, and how fusion to encapsulin subunits affects both cage assembly and tyrosinase activity. These questions motivated our comprehensive computational modeling campaign, as structural prediction provides the foundation for rational engineering decisions that experimental screening alone cannot efficiently provide.

Split Tyrosinases

Design Goals

Split protein engineering has emerged as a powerful strategy for controlling enzyme activity with high spatiotemporal precision. The fundamental principle involves dividing a protein into two or more fragments that, when separated, lack the proper three-dimensional structure required for function. These fragments remain inactive until deliberately brought together through specific molecular triggers such as chemically induced dimerization domains, light-activated protein-protein interactions, or in our case, the self-assembly of encapsulin nanocages. This approach offers several advantages over conventional regulation mechanisms like transcriptional control or allosteric inhibition. It provides rapid response times limited only by diffusion and binding kinetics, enables spatial restriction by controlling where fragments co-localize, and can be made effectively irreversible by engineering high-affinity complementation interface (Bae et al., 2024).

Our central hypothesis for split tyrosinase control posits that by fusing the N-terminal and C-terminal fragments to different encapsulin subunits, the fragments will only achieve the correct orientation and proximity for reconstitution when those subunits incorporate into the same assembled nanocage structure. This creates an inherent coupling between cage formation and enzyme activation. This property ensures melanin biosynthesis only occur within the protective environment of assembled nanocages. The geometric constraints of encapsulin assembly provide natural scaffolding that positions the fragments favorably for interaction, potentially even enhancing reconstitution efficiency compared to freely diffusing fragments due to the increased number of interacting residues. Furthermore, the multivalency of cage structures, which contain 60-180 copies of the encapsulin subunit depending on the specific variant, means that even modest reconstitution efficiency per fragment pair could yield significant cumulative tyrosinase activity per cage.

The key questions driving our computational modeling efforts focused on candidate selection and split site optimization. First we had to answer the question, which naturally occurring tyrosinases from the diverse family of copper-containing oxidases would prove most amenable to splitting. For an overview please refer to the Wetlab Impact section of this page. Different tyrosinases exhibit varying degrees of structural interdependence between their N- and C-terminal regions, with some enzymes having relatively autonomous domains that might reconstitute readily, while others have more integrated architectures where splitting could prevent proper folding of individual fragments. Second it was important to precisly find ideal sites for splitting to maximize reconstitution potential while minimizing background activity. The split site must separate regions that are structurally interdependent (preventing spontaneous activity of fragments) yet positioned such that cage-mediated proximity enables efficient reassembly. Additionally, the split should ideally divide the enzyme’s active site, as this geometry would allow both copper ions and catalytic residues to be distributed across the two fragments, further reducing the probability of premature activity. Answering these questions computationally required integrating split site prediction algorithms, de-novo structure prediction of fragments, and complex modeling of reconstituted assemblies. This workflow would inform which constructs to prioritize for experimental validation.

Methods

Spell

Our computational pipeline for split tyrosinase design integrated multiple complementary tools, beginning with SPELL (Split Protein Engineering by Ligand-dependent Localization) for systematic split site identification. SPELL, developed by the Dokholyan laboratory at Penn State University, employs a physics-based scoring approach to identify optimal protein split sites by analyzing the energetic consequences of fragmentation. The algorithm calculates split energy profiles by determining the difference between the native protein’s folding energy and the combined energies of separated N-terminal and C-terminal fragments using the MEDUSA force field, which incorporates both physics-based and knowledge-based energy terms. SPELL integrates multiple filtering criteria to identify surface-exposed, non-conserved loop regions that can tolerate fragmentation: solvent accessible surface area calculations identify residues with greater than 30 Å² exposure, evolutionary conservation analysis using Kullback-Leibler scores from Pfam multiple sequence alignments excludes highly conserved positions (KL score below 2), secondary structure determination via STRIDE preferentially selects loop regions over helices and sheets, and loop tightness metrics ensure selected sites occur in flexible regions rather than constrained structural elements (Dagliyan et al., 2018). Naturally split tyrosinases like SavMel were excluded from this search.

The split energy metric that SPELL reports represents a critical criterion for evaluating candidate split sites. Higher absolute split energies indicate positions where fragmentation occurs between protein folding cores, creating fragments that are energetically unfavorable to spontaneously reassemble in the absence of external scaffolding or binding partners. This energetic barrier reduces background reconstitution, ensuring that fragment complementation occurs primarily when deliberate triggering mechanisms (in our case, encapsulin cage assembly) bring the fragments into close proximity. We used SPELL with default parameters to scan each candidate tyrosinase sequence, generating comprehensive split energy profiles across all potential division points. Beyond the numerical scores, we exclusively considered split sites that would fully or at least partially separate the copper-binding active site, as dividing key catalytic residues and metal coordination sites between fragments provides an additional layer of control by preventing either fragment from possessing independent catalytic activity. The combination of energetic analysis, evolutionary conservation filtering, and active site geometry considerations allowed us to generate a ranked list of promising split sites for each tyrosinase that balanced the competing requirements of low background activity and high reconstitution potential.

AlphaFold

Following split site selection through SPELL, we employed AlphaFold2 accessed via AlphaFold Server to predict the three-dimensional structures of both separated fragments and reconstituted complexes. Structure prediction served multiple critical purposes in our design workflow. First, we needed to verify that individual N-terminal and C-terminal fragments would fold into stable structures independently, as misfolded or aggregation-prone fragments would be unsuitable for our application regardless of their theoretical reconstitution potential. Second, and more importantly, we aimed to predict whether and how the split fragments would reconstitute when brought together, assessing whether the predicted complex would maintain the characteristic tyrosinase active site geometry observed in native crystal structures. AlphaFold2, developed by DeepMind, employs a transformer-based neural network architecture trained on the Protein Data Bank to predict protein structures with near-atomic accuracy, achieving median backbone accuracy of 0.96 Å RMSD across diverse protein families (Jumper et al., 2021).

For our modeling setup, we prepared split tyrosinase sequences as separate chains in multi-chain AlphaFold predictions, explicitly including two copper ions in the input to allow the algorithm to predict metal coordination geometry. This was crucial because tyrosinases are type-3 copper proteins containing a binuclear copper active site where each copper ion is coordinated by three histidine residues, and proper positioning of these metal centers is essential for catalytic activity. de Almeida Santos et al. (2024) We evaluated the quality of resulting predictions using multiple metrics. First, per-residue confidence scores (pLDDT values) assess local structure reliability with values above 70 indicating high confidence regions and values above 90 representing very high confidence comparable to high-resolution crystal structures, Predicted Aligned Error (PAE) matrices visualize which regions of the structure are confidently predicted relative to each other with low PAE values (below 5 Å) indicating reliable spatial relationships between domains, and most critically, we performed detailed geometric analysis of predicted active sites. For each predicted reconstituted complex, we measured distances between catalytic residues, copper-histidine coordination distances, and copper-copper spacing, comparing these metrics against reference crystal structures of native tyrosinases to assess whether the split and reconstituted enzyme would maintain catalytically competent geometry. Structures showing significant deviation from native active site architecture such as copper ions positioned too far apart or histidine side chains oriented away from the copper centers, indicated split sites that would likely produce inactive enzymes even upon reassembly.

Boltz-2

To complement AlphaFold2’s structure predictions with quantitative binding affinity estimates and alternative conformational predictions, we employed Boltz-2, an open-source biomolecular modeling platform that uniquely integrates structure prediction with binding affinity estimation. Boltz-2 extends beyond pure structure prediction to provide quantitative thermodynamic predictions through its integrated affinity module. This capability proved particularly valuable for our application, as we needed not only to predict whether split fragments could reconstitute structurally but also to estimate the affinity of substrate binding to reconstituted active sites. The model employs a 64-layer PairFormer architecture trained on extensive structural databases including the Protein Data Bank, molecular dynamics trajectories, and millions of AlphaFold distillation predictions, followed by a reverse diffusion process with physics-based steering potentials that enforce proper stereochemistry and resolve steric clashes (Passaro et al., 2025).

Our Boltz-2 modeling workflow focused on three key predictions. First, we modeled the interaction between split tyrosinase fragments, obtaining both structural predictions of reconstituted complexes (which we compared against AlphaFold2 predictions to assess consistency) and confidence metrics specific to fragment-fragment interfaces. Second, we predicted binding affinities between tyrosine substrates and the reconstituted tyrosinase active sites, providing a quantitative metric for whether reconstituted enzymes would maintain catalytic competence. Boltz-2 outputs binding affinities as continuous values that can be interpreted in terms of dissociation constants or IC50 values, allowing us to rank candidate split tyrosinases by their predicted substrate affinity. For each candidate tyrosinase, we performed detailed structural inspection of Boltz-2 outputs, measuring key distances including copper-copper separation (typically 3-4 Å in native enzymes), histidine nitrogen-copper coordination distances (approximately 2.0-2.2 Å for properly coordinated sites), and substrate positioning relative to active sites This geometric analysis allowed us to identify split sites that maintained not just overall structural similarity to native enzymes, but specifically preserved the precise active site architecture required for catalysis.

Rosetta Suite

The final component of our modeling pipeline employed the Rosetta Suite for physics-based structural refinement of AI-generated predictions. Rosetta provides comprehensive molecular modeling capabilities based on Monte Carlo sampling with a sophisticated all-atom energy function. (Alford et al., 2017), (Leman et al., 2020). While AlphaFold2 and Boltz-2 represent revolutionary advances in structure prediction, they rely on training data patterns from the Protein Data Bank and may occasionally predict structures that, while statistically likely given their training, violate subtle physical constraints or represent local rather than global energy minima. Furthermore, these AI models were predominantly trained on native, naturally occurring protein structures rather than engineered split-and-reconstituted systems, potentially creating a bias toward predicting reconstituted structures even when such reconstitution might be thermodynamically unfavorable.

To address these concerns, we applied Rosetta’s relax protocol to all AlphaFold2 and Boltz-2 predictions. The relax protocol performs constrained energy minimization, allowing local adjustments to backbone and sidechain geometry while preventing large-scale structural rearrangements that would deviate substantially from the input model. This process identifies and resolves atomic clashes, optimizes hydrogen bonding networks, adjusts sidechain rotamers to low-energy conformations from the Dunbrack rotamer library, and calculates final energy scores in Rosetta Energy Units (REU) that provide a physics-based quality metric independent of the AI prediction confidence scores (Nivón et al., 2013), (Conway et al., 2014), (Khatib et al., 2011), (Tyka et al., 2011). We particularly focused on interface energies for reconstituted complexes, calculated as the difference between the energy of the bound complex and the sum of energies of separated components. Favorable interface energies indicate that the predicted reconstituted structure represents a thermodynamically stable state, while unfavorable or marginally favorable energies suggest that the AI models may have predicted unrealistic reconstitution. By requiring consistency between AI predictions and physics-based energy calculations, we established a stringent filtering criterion that increased our confidence in computational predictions before committing to experimental validation. This integrated approach of combining learning-based structure prediction with physics-based energy evaluation leverages the complementary strengths of modern AI methods and established biophysical modeling frameworks.

Data Sources

Bacterial Origin

Enzyme	Organism	Accession	Conditions	Source
AmTyrA	Aeromonas media	B2Z3P7	pH 7.0, 30 °C	Wan et al., 2009
BaTyr	Bacillus aryabhattai	A0A6H1TJ97	pH 5.0, 60 °C	Wang et al., 2021
BcMel	Bacillus cereus	Q74NN3	(Not specified)	Shuster and Fishman, 2009
BmTyr	Bacillus megaterium	B2ZB02	pH 7.0, 25 °C	Shuster and Fishman, 2009
BtMel	Bacillus thuringiensis	Q5MC16	pH 9.0, 42 °C	Liu et al., 2004
BtTyr	Burkholderia thailandensis	Q2T7K1	pH 5.0, 37 °C	Son et al., 2018
HcTyr1	Hahella sp. CCB-MM4	A0A261GRE4	pH 7.0, 25 °C	de Almeida Santos et al., 2024
HcTyr2	Hahella sp. CCB-MM4	A0A261GVB1	pH 5.5, 25 °C	de Almeida Santos et al., 2024
LsTyr	Laceyella sacchari	UPI001045D844	pH 6.8, 25 °C	Dolashki et al., 2012
MmPPOB	Marinomonas mediterranea	Q5VM57	pH 5.0, 20 °C	López-Serrano et al., 2004
PaPvdP	Pseudomonas aeruginosa	Q9I188	pH 9.0, 30 °C	Nadal-Jimenez et al., 2014
PmMelM	Pseudomonas maltophila	069134	pH 7.0, 30 °C	Wang et al., 2000
PpTyr	Pseudomonas putida	A0A1L5PR01	(Not specified)	McMahon et al., 2007
PfPvdP	Pseudomonas fluorescens	K9ZQR0	(Not specified)	Sugue, 2022
RsTyr	Ralstonia solanacearum	Q8Y2J8	pH 7.0, 25 °C	Molloy et al., 2013
ReMelA	Rhizobium etli	Q8KIL0	pH 7.0, 30 °C	Cabrera-Valladares et al., 2006
RmMepA	Rhizobium meliloti	P33180	(Not specified)	Mercado-Blanco et al., 1993
SaTyr	Streptomyces albus	A0A0B5EPG7	pH 7.0, 25 °C	Dolashki et al., 2009
SanMelC2	Streptomyces antibioticus	P07524	pH 6.8, 25 °C	Bubacco et al., 2000
SavMelC2	Streptomyces avermitilis	Q79ZK1	pH 8.0, 25 °C	Lee et al., 2015
ScaMelC2	Streptomyces castaneoglobisporus	Q83WS2	pH 6.2, 30 °C	Kohashi et al., 2004
ScyMelC2	Streptomyces cyaneofuscatus	A0A2H4QH72	pH 7.0, 20 °C	Harir et al., 2018
SgrMelC2	Streptomyces griseus	Q9ZN72	(Not specified)	Endo et al., 2001
SgMelC2	Streptomyces glaucescens	P06845	pH 6.0, 30 °C	Lerch and Ettlinger, 1972
SkMelC2	Streptomyces kathirae	A0A077HD11	pH 6.2, 30 °C	Guo et al., 2015
SmMelC2	Streptomyces michiganensis	UPI0016763B33	pH 6.8, 28 °C	Philipp et al., 1991
SrTyr	Streptomyces sp. REN 21	UPI000EA8A7C1	pH 7.0, 30 °C	Ito and Oda, 2000
SzTyr	Streptomyces sp. ZL-24	A0A2S3Y8X7	pH 9.0, 25 °C	Panis et al., 2021
VsTyr	Verrucomicrobium spinosum	A0AAJ6N653	pH 6.5, 24 °C	Aksambayeva et al., 2018

Eukaryotic Origin

Enzyme	Organism	Accession	Conditions	Source
AbPPO1	Agaricus bisporus	Q00024	pH 7.1, 25 °C	Pretzler and Rompel, 2024
AbPPO2	Agaricus bisporus	O42713	pH 7.1, 25 °C	Pretzler and Rompel, 2024
AbPPO3	Agaricus bisporus	C7FF04	pH 7.1, 25 °C	Pretzler and Rompel, 2024
AbPPO4	Agaricus bisporus	C7FF05	pH 7.1, 25 °C	Pretzler and Rompel, 2024
AbPPO5	Agaricus bisporus	UPI0030DFB00F	pH 7.1, 25 °C	Pretzler and Rompel, 2024
AbPPO6	Agaricus bisporus	UPI0030E4E6AE	pH 7.1, 25 °C	Pretzler and Rompel, 2024
AoMeIB	Aspergillus oryzae	Q2UP46	pH 6.0, 25 °C	Pretzler and Rompel, 2024
MmTyr	Mus musculus	P11344	pH 6.8, 37 °C	Olivares et al., 2002
HsTyr	Homo sapiens	P14679	pH 7.5, 37 °C	Kong et al., 2000

Mutagenesis

Enzyme	Organism	Conditions	Source
BmTyr (V218F)	Bacillus megaterium	pH 7.5, 28 °C	Goldfeder et al., 2013
BmTyr (V218G)	Bacillus megaterium	pH 7.5, 28 °C	Goldfeder et al., 2013
BmTyr (G46E F65Y V218Y)	Bacillus megaterium	pH 7.0, 30 °C	Gao et al., 2025
BmTyr (N205T)	Bacillus megaterium	pH 7.0, 37 °C	Zhang et al., 2024
BmTyr (R209D)	Bacillus megaterium	pH 7.0, 37 °C	Zhang et al., 2024
BmTyr (N205T R209D)	Bacillus megaterium	pH 7.0, 37 °C	Zhang et al., 2024
BmTyr (E195S)	Bacillus megaterium	pH 8.0, 37 °C	Kang et al., 2024
BmTyr (A221V)	Bacillus megaterium	pH 8.0, 37 °C	Kang et al., 2024
BmTyr (E195S A221V)	Bacillus megaterium	pH 8.0, 37 °C	Kang et al., 2024
BmTyr (G43R M61H)	Bacillus megaterium	pH 7.0, 60 °C	Liu et al., 2024
BmTyr (G43R M61H A232C)	Bacillus megaterium	pH 7.0, 60 °C	Liu et al., 2024
BmTyr (G43R M61H A232C Q214D)	Bacillus megaterium	pH 7.0, 60 °C	Liu et al., 2024
BmTyr (G43R M61H A232C Q214D V217A)	Bacillus megaterium	pH 7.0, 60 °C	Liu et al., 2024
BmTyr (G43R M61H A232C Q214D V217A F197W)	Bacillus megaterium	pH 7.0, 60 °C	Liu et al., 2024
BmTyr (ΔF65-L80)	Bacillus megaterium	pH 8.0, 30 °C	Cha et al., 2023
SavMeIC2 (141Y)	Streptomyces avermitilis	pH 8.0, 25 °C	Lee et al., 2015
VsTyr (N229H)	Verrucomicrobium spinosum	pH 6.0, 37 °C	Kang et al., 2024
VsTyr (M235A)	Verrucomicrobium spinosum	pH 6.0, 37 °C	Kang et al., 2024
VsTyr (N229H M235A)	Verrucomicrobium spinosum	pH 6.0, 37 °C	Kang et al., 2024
AbPPO4 (4-site mutant, Δ C-term)	Agaricus bisporus	pH 6.8, 25 °C	Pretzler et al., 2017

Our split tyrosinase modeling campaign began with a comprehensive survey of tyrosinase sequences from diverse phylogenetic origins, as different organisms have evolved tyrosinases with varying structural features that might influence split protein engineering outcomes. We compiled sequences from the UniProt database spanning bacterial, fungal, plant, and mammalian sources. The bacterial tyrosinases included enzymes from Streptomyces species, Bacillus megaterium, and Hahella species. We also considered fungal and plant tyrosinases, however these are both significantly larger than bacterial tyrosinases and other sterical hinderances such as N-terminal peptides that could influence split site selection. Mammalian tyrosinases, where excluded due to their more complex structure as well as their need for additional post-translational modifications (Claus and Decker, 2006).

For structural validation and comparison of our computational predictions, we relied on crystal structures deposited in the Protein Data Bank. Key reference structures included the bacterial tyrosinase from Bacillus megaterium (4P6R), which provides atomic-resolution detail of the binuclear copper active site and has been extensively used to understand tyrosinase catalytic mechanisms. These crystal structures served as references for active site geometry. We got a general idea of the active site geometry and distances. However, when checking if we should use a specific structure, we compared against the specific crystal structure for that particular tyrosinase whenever available.

The complete set of tyrosinases we subjected to computational modeling is documented in the Wetlab Impact section of this page. This systematic cataloging ensures reproducibility of our computational work and provides a resource for other researchers interested in tyrosinase engineering. We modeled 19 distinct tyrosinase variants through the complete computational pipeline, generating SPELL split site predictions, AlphaFold2 structural models of fragments and reconstituted complexes, Boltz-2 affinity predictions, and Rosetta-refined structures for each candidate. This comprehensive dataset allowed us to perform comparative analysis across the tyrosinase family, identifying sequence and structural features that correlate with successful split protein reconstitution.

Lid Domain Tyrosinase

Design Goals

The lid domain approach to controlling tyrosinase activity represents an alternative regulatory strategy with distinct advantages and limitations compared to split protein engineering. Lid domains are naturally occurring C-terminal extensions found on certain tyrosinases which fold over the enzyme’s active site and sterically block substrate access. These domains typically consist of 30-80 amino acids that form distinct structural elements, often small α-helical bundles or extended loops, that physically occlude the copper-containing active site entrance. In their natural context, these lid domains are proteolytically removed at specific developmental stages or in response to particular cellular conditions, converting an inactive pro-enzyme into an active mature form. This biological precedent suggested that we could engineer lid domains as controllable activity switches by introducing custom cleavage sites that respond to our MESA receptor system: ligand-induced receptor dimerization would activate a protease that specifically cleaves and removes the lid, thereby triggering melanin biosynthesis on demand (de Almeida Santos et al., 2024).

The primary advantage of the lid domain strategy lies in achieving nearly zero background activity when the lid is intact. Unlike split proteins where some degree of spontaneous fragment association might occur through random diffusion, a covalently attached lid provides absolute occlusion of the active site until proteolytic cleavage occurs. This feature is particularly attractive for our application, as any unwanted melanin production before encapsulin assembly and proper cellular localization could lead to cytotoxicity and system failure (Brenner and Hearing, 2008b). The lid domain approach also maintains the tyrosinase as a single polypeptide chain, simplifying expression and avoiding potential issues with unbalanced fragment expression or fragment degradation that can complicate split protein systems. Additionally, once the lid is cleaved and diffuses away, the resulting enzyme is essentially identical to the native active form, eliminating concerns about whether reconstituted structures maintain full catalytic efficiency (Bae et al., 2024).

However, the lid domain strategy introduces its own challenges that must be addressed through careful engineering and modeling. The first concern relates to physical size. Adding a 30-80 residue lid domain significantly increases the overall protein length, potentially creating steric issues when fusing the enzyme to encapsulin subunits or packaging multiple copies within assembled cages. We needed to model how lid-containing tyrosinases would fit within encapsulin architecture to ensure that cage assembly would not be disrupted. More critically, the lid domain approach decouples enzyme activation from cage assembly. While split tyrosinases are inherently activated by the cage formation process that brings fragments together, lid-containing tyrosinases could potentially be cleaved by protease activity before encapsulin cages have fully assembled and properly localized within cells. This temporal mismatch could result in melanin production outside of the protective cage environment, defeating our compartmentalization strategy. Managing this risk requires precise engineering of protease-lid interactions and potentially developing feedback mechanisms that coordinate protease activation with cage assembly status.

Our computational modeling objectives for lid domain engineering focused on two central questions. First, we needed to figure out if we could engineer custom protease cleavage sites into the lid domain without disrupting either the lid’s ability to block the active site or the underlying tyrosinase structure and function. This required detailed structural analysis to identify positions within the lid domain that are surface-exposed (accessible to proteases), not critical for maintaining lid-active site interactions, and positioned such that cleavage would fully liberate the lid rather than leaving residual fragments that might partially occlude the active site or access to the active site. Second, and more ambitiously, we needed to determine where the native proteolytic cleavage site is located in naturally occurring lid-containing tyrosinases. Despite these enzymes being known to undergo lid removal in their native bacterial hosts, the precise cleavage positions have not been definitively established in the literature. Identifying these natural sites through structural modeling, finding surface-exposed loops, analyzing sequence conservation patterns and predicting regions of conformational flexibility would provide validated starting points for engineering custom cleavage specificity while at the same time disabling the functionality of natural lid removal mechanisms.

Methods

Boltz-2

We employed Boltz-2 structure prediction as the cornerstone of lid domain modeling, addressing the fundamental question of how lid domains influence tyrosinase structure and active site accessibility. Structure prediction was necessary because high-resolution crystal structures of lid-containing tyrosinases are limited, and the existing structures may not capture the conformational flexibility or dynamic behavior of these regulatory domains. We modeled both original lid-containing tyrosinase sequences and modified variants incorporating TEV protease recognition sites at various positions between the lid domain and the core enzyme. By comparing these predictions, we assessed that introducing cleavage sequences with appropriate linkers would not significantly alter lid domain structure or its interaction with the tyrosinase catalytic domain.

For our Boltz-2 modeling setup, we submitted complete sequences including both the catalytic tyrosinase domain and the C-terminal lid domain as single-chain predictions through a local installation of Boltz-2. We altered the lid-core linking sequence with the goal of removing the inherent auto-cleavage of HcTyr1. However, we needed to ensure that the native affinity of the linker for the core’s structure remained in tact to keep the lid domain in place. This approach allowed Boltz-2 to predict the full structural context of how the lid interacts with the enzyme active site. We explicitly included copper ions in the predictions to ensure proper modeling of the active site geometry, as these metal centers are essential for maintaining the correct fold of the catalytic domain even when the lid is blocking substrate access.

The critical comparison for our engineering efforts involved analyzing predictions of modified tyrosinase sequences containing custom cleavage sites versus original sequences. We introduced TEV protease sites (ENLYFQ↓M) at regions which seemed flexible, between the core protease and the lid domain, generating separate Boltz-2 predictions for each variant. By comparing the predicted structures of these engineered variants to the original lid-containing enzyme, we could assess whether the inserted sequences caused significant structural perturbations and adjust linker length accordingly. We also examined how modified sequences affected the lid domain’s position relative to the active site. Successful engineering should maintain the same degree of active site occlusion as the original sequence, ensuring that the inactive pro-enzyme form remains truly inactive until deliberate proteolytic cleavage.

Rosetta Suite

Following Boltz-2 predictions, we applied Rosetta Suite analysis for physics-based structural refinement and energy evaluation. The rationale for this additional modeling step mirrors our approach with split tyrosinases. While Boltz-2 provides remarkably accurate structure predictions for native proteins, engineered constructs with inserted protease recognition sequences might represent sequences outside the training distribution, potentially leading to predictions that appear structurally reasonable but violate subtle energetic constraints. The Rosetta relax protocol allowed us to test whether Boltz-2’s predicted structures represent true local energy minima or whether they might rearrange to alternative conformations when subjected to physics-based energy minimization.

We applied Rosetta’s relax protocol to all lid domain predictions. The protocol performed iterative cycles of sidechain rotamer optimization. We examined local geometry around inserted protease recognition sequences, as physics-based relaxation sometimes revealed subtle clashes or strained conformations at insertion sites that were not apparent in the initial AlphaFold2 predictions, allowing us to identify and exclude problematic designs before laboratory testing or adjust molecular tension and positioning by varying linker lengths between the lid domain and the tyrosinase core. This combined approach, leveraging AlphaFold2’s broad predictive capabilities and Rosetta’s detailed energetic analysis, maximized our confidence in engineered lid domain designs and prioritized variants most likely to exhibit the desired regulatory behavior.

Tyrosinase Selection

The comparative analysis of our computational predictions across multiple tyrosinase candidates revealed clear distinctions between enzymes likely to function effectively as split proteins and those predicted to fail reconstitution or maintain catalytic activity. Our iterative lid domain tyrosinase design process allowed us to create constructs with high confidence. We established a multi-criterion evaluation framework integrating quantitative metrics from each modeling tool to rank candidates systematically. The primary metrics included SPELL split energy scores for split tyrosinases(higher values indicating better prevention of spontaneous reassembly), AlphaFold2 pLDDT confidence scores for reconstituted structures (particularly for active site residues, where values above 80 were required), Predicted Aligned Error values between split fragments (lower values indicating more confident relative positioning), Boltz-2 predicted binding affinities for tyrosine substrate (more negative binding energies indicating stronger substrate interactions), and Rosetta free energy estimations included in Rosetta Relax runs by default.

Structural alignment analysis provided crucial insights beyond numerical scores. We superimposed each predicted reconstituted split tyrosinase structure onto the corresponding native crystal structure when available and calculated root-mean-square deviations (RMSD) for overall backbone structure and specifically for active site residues. Candidates showing RMSD values below 1.0 Å for the active site region were considered to have maintained native-like geometry, while values exceeding 3.0 Å indicated significant structural distortion that might compromise catalytic activity. We paid particular attention to the preservation of copper coordination geometry: proper tyrosinase function requires each copper ion to be coordinated by three histidine residues in a distorted trigonal pyramidal arrangement, with the two copper ions separated by approximately 3.5 Å. Predictions showing copper-copper distances outside the range of 3.2-4.0 Å, histidine side chains rotated away from copper ions, or loss of the three-histidine coordination motif were classified as non-functional regardless of other favorable metrics.

Through this comprehensive evaluation, specific tyrosinase candidates emerged as optimal for split protein engineering. The selected candidates showed favorable split energy profiles indicating low background activity of separated fragments, high-confidence AlphaFold2 predictions with well-preserved active site geometry in reconstituted models, strong predicted substrate binding affinities from Boltz-2 analysis, and thermodynamically favorable Rosetta interface energies. Critically, these top candidates maintained the precise copper coordination geometry required for catalysis, properly oriented histidine coordination, and substrate binding pocket architecture, thus closely matching native crystal structures. The structural basis for successful reconstitution in these candidates typically involved split sites positioned in flexible loop regions connecting relatively autonomous N-terminal and C-terminal domains, allowing each fragment to fold independently while retaining the capacity for complementation when brought together by encapsulin nanocompartment assembly.

The complete set of tyrosinases we subjected to computational modeling is documented in the table below. Our initial screening eliminated eukaryotic tyrosinases due to their larger size, more complex structure, and requirement for post-translational modifications, focusing our efforts on bacterial enzymes. This systematic cataloging ensures reproducibility of our computational work and provides a resource for other researchers interested in tyrosinase engineering.

Split Tyrosinase Selection

Enzyme	Organism	Accession	Reasoning	Source
BmTyr	Bacillus megaterium	B2ZB02	BmTyr has already been used in other encapsulated systems (Sigmund et al., 2018), has a verified crystal structure and optimal activity at a pH of 7.0 and 25°C, close to mammalian cell conditions	Shuster and Fishman, 2009
LsTyr	Laceyella sacchari	UPI001045D844	Its thermophilic origin provides enhanced structural stability, intern improving the longevity of the tyrosinase-encapsulin fusion proteins. Additonally it stems from a phylogenetically distinct validation candidate to test generalizability of our computational modelling pipeline	Dolashki et al. (2012)
SavMelC2	Streptomyces avermitilis	Q79ZK1	This tyrosinase uses a secondary caddie protein to help transport copper to its active site. This is an interesting property as it already provides a tyrosinase that naturally comes as a 2-part complex	Kohashi et al., 2004
SkMelC2	Streptomyces kathirae	A0A077HD11	This tyrosinase uses a secondary caddie protein to help transport copper to its active site. This is an interesting property as it already provides a tryrosinase that naturally comes as a 2-part complex	Guo et al., 2015

Native BmTyr

This structure dispaly shows the crystal structure of a complete Bacillus Megaterium Tyrosinase. We used this to identify valid split sites and produce split structures which were designed to reconstitute. See this construct on the iGEM BBa_25JTFYB9.

Split BmTyr

This structure was created via the methods described above. It is the final result after our multi-step process of predicting ideal split sites for creating a split tyrosinase. Its origin is Bacillus Megaterium Tyrosinase. You can check out the parts on our registry: BBa_2502R9NP and BBa_25LKZLQD.

Crystal structure of BmTyr (left) and Split BmTyr (right) with a RMSD of 0.652 Å for the whole protein and 0.577 Å between the active sites’ nitrogen atoms involved in copper coordination.

Lid Domain Tyrosinase Selection

Enzyme	Organism	Accession	Reasoning	Source
VsTyr	Verrucomicrobium spinosum	A0AAJ6N653	This tyrosinase contains a C-terminal lid domain which inhibits its activity when attached. This is an interesting property which we can use to prevent melanin production until the lid domain is cleaved. Read more about these models int the next section.	Aksambayeva et al., 2018
HcTyr1	Hahella sp. CCB-MM4	A0A261GRE4	This tyrosinase has a special lid domain which inhibits its activity when attached. This is an interesting property which we can use to prevent melanin production until the lid domain is cleaved. Read more about these models int the next section. HcTyr1 normally auto-cleaves its lid domain which is an undesirable property in our use case. For this purpose we altered the linker region likely to contain the native split site and inserted our own TEV protease recognition site.	de Almeida Santos et al., 2024

Native HcTyr1

This is the native structure of HcTyr1 as described by (de Almeida Santos et al., 2024). The regular but still unknown cleavage site and still method are still present.

Modified HcTyr1

This is our modified structure of HcTyr1. We altered its sequence by identifying the likely cleavage region and altering it. We also included a TEV protease cleavage site to enable controlled cleavage. Check out our corresponding part on the BBa_25OLOGCU.

Crystal structure of HcTyr1 (left) and modified HcTyr1 (right) with a RMSD of 0.814 Å for the whole protein and 0.319 Å between the active sites’ nitrogen atoms involved in copper coordination.

Encapsulin-Tyrosinase Assembly Simulations

Design Goals

The ultimate implementation of both split tyrosinase and lid domain control strategies requires functional integration with encapsulin nano compartments, raising complex questions about multi-protein assembly that extend beyond modeling individual enzyme properties. For split tyrosinase systems in particular, the central design question asks whether fragments fused to separate encapsulin subunits will actually reconstitute into functional enzymes when those subunits co-assemble into complete nanocage structures. This question encompasses multiple interrelated challenges: the geometric compatibility of positioning tyrosinase fragments on the interior surface of spherical or icosahedral cage architectures; the effects of cage curvature and subunit packing on fragment orientation and proximity; the stoichiometric balance between encapsulin subunits carrying tyrosinase fragments versus regular, unfused encapsulins, and the influence of linker length and composition connecting tyrosinase fragments to encapsulin domains.

Spatial constraints within encapsulin nanocages impose significant limitations on fusion protein design. Encapsulins typically assemble into structures with T=1 (60 subunits), T=3 (180 subunits), or T=4 (240 subunits) icosahedral symmetry, creating shells 20-40 nanometers in diameter. The protein shell itself is approximately 3-5 nanometers thick, leaving an interior cavity of 15-30 nanometers for cargo loading (Bae et al., 2024). If tyrosinase fragments are fused to the interior-facing surface of encapsulin subunits, they must fit within this cavity without sterically interfering with fragments from neighboring subunits which presents a particular concern given that there could be up to 240 tyrosinase fragments all concentrated within the same confined space. Therefor we needed to model the ideal ratio of encapsulins fused to tyrosinase fragments vs. native encapsulins without any tyrosinase. These geometric considerations cannot be easily intuited from examining tyrosinase or encapsulin structures independently; they require explicit modeling of complete assemblies with multiple copies of fusion proteins.

Our central hypothesis regarding encapsulin-mediated fragment reconstitution posits that cage assembly will bring split tyrosinase fragments into sufficient proximity and favorable orientation to enable efficient complementation. The key assumptions underlying this hypothesis include the expectation that flexible linkers connecting tyrosinase fragments to encapsulin domains will allow fragments to sample multiple orientations and positions, increasing the probability of finding catalytically competent configurations; that multiple fragments distributed across the cage surface create numerous opportunities for N-terminal and C-terminal fragments to encounter each other and reconstitute; and that reconstituted tyrosinase dimers or higher-order assemblies might form at fragment-fragment interfaces even if individual cage subunits each carry only one fragment. Testing these assumptions computationally required modeling complete encapsulin assemblies with fusion proteins, a significantly more challenging task than modeling individual proteins due to the large system size (hundreds of chains, tens of thousands of atoms) and the need to capture conformational flexibility of linker regions that determines whether fragments can achieve proper relative positioning.

Methods

AlphaFold

The computational modeling of encapsulin-tyrosinase assemblies pushed the boundaries of current structure prediction capabilities, requiring us to model massive protein complexes with multiple identical chains and attached fusion partners. We employed AlphaFold2 as implemented on AlphaFold Server with AlphaFold-Multimer functionality, which extends the algorithm’s architecture to explicitly handle protein complexes and oligomeric assemblies. AlphaFold-Multimer modifies the original AlphaFold2 approach by processing multiple sequence alignments for each chain separately while using specialized inter-chain attention mechanisms to predict interfaces and quaternary structure arrangements. However, even with these capabilities, directly modeling complete T=1 encapsulin cages (60 subunits) or larger T=3/T=4 cages exceeds current computational feasibility due to memory constraints and the exponential growth of possible conformations with system size (Evans et al., 2022).

To make the problem tractable while still capturing essential geometric features, we employed a multi-scale modeling strategy. For complete cage assembly questions, we modeled minimal asymmetric units. Smaller sets of encapsulin subunits whose positions and orientations define the entire icosahedral structure through symmetry operations can be used to approximate the entire cage. For T=1 cages, this meant modeling pentameric encapsulin building blocks (five subunits forming pentagonal faces of the icosahedron), then extrapolating the complete cage architecture by applying appropriate rotational symmetry operations. This approach dramatically reduces the modeling complexity from 60 chains to 5 chains while still capturing the key geometric constraints of how adjacent subunits pack together and how fusion proteins on these subunits would be positioned relative to each other. For more detailed analysis of tyrosinase fragment reconstitution, we modeled pairs or small clusters of fusion proteins in isolation, explicitly including flexible linker sequences and testing various linker lengths to determine optimal spacing between encapsulin and tyrosinase domains.

Our AlphaFold2 modeling setup involved constructing input sequences for encapsulin-tyrosinase fusion proteins by connecting encapsulin subunit sequences to split tyrosinase fragments via flexible linker sequences, typically comprising glycine and serine residues in motifs like (GGGGS)n that are known to provide conformational flexibility without forming secondary structure. We systematically varied linker length from short (5-10 residues) to long (20-50 residues) to assess how linker flexibility affects fragment positioning. Critically, we included appropriate numbers of copper ions in the modeling inputs—two copper ions per reconstituted tyrosinase active site—to ensure that AlphaFold2 would predict proper metal coordination geometry when fragments come together. For assemblies with multiple encapsulin-tyrosinase chains, we explored different stoichiometric ratios of tyrosinase-encapsulin fusion constructs to native encapsulins, testing scenarios ranging from 4:1 ratios (every pentamer carries 2 tyrosinases) to more biased ratios like 2:13 (one tyrosinase for 3 pentamers) that might favor particular assembly states and reduce steric issues inside the nanocage.

Quality metrics for these complex predictions focused on several distinct assessment criteria. Overall confidence was evaluated through mean pLDDT scores across all residues, with particular attention to encapsulin subunit interfaces (which should show high confidence if cage assembly is properly predicted) and tyrosinase fragment regions (where lower confidence might indicate conformational heterogeneity or dynamics). Predicted Aligned Error matrices for multi-chain assemblies reveal not just overall structure quality but specifically the confidence in relative positioning between chains: low PAE values between encapsulin chains indicate confident prediction of cage quaternary structure, while PAE values between tyrosinase fragment chains provide critical information about whether fragments are predicted to stably interact or remain separated. We performed the same detailed geometric analysis of reconstituted tyrosinase active sites as in our simpler split tyrosinase models: measuring copper-copper distances, evaluating histidine-copper coordination geometry and assessing whether substrate binding pockets maintain proper shape and electrostatic properties compared to native crystal structures.

Beyond static structure analysis, we examined spatial relationships within predicted assemblies to understand how encapsulin cage geometry influences fragment interactions. We measured distances between fragment termini that would need to come together for reconstitution, mapped out which pairs or groups of subunits position fragments closest to each other, and identified potential steric clashes between multiple fragments confined within the cage interior. These spatial analyses revealed whether our fusion protein designs created viable architectures where fragments could physically access each other or whether geometric constraints prevented reconstitution regardless of fragment binding affinity. We also analyzed how different linker lengths affected these spatial relationships: short linkers might restrict fragment motion too severely, preventing proper orientation for active site assembly, while excessively long linkers might allow fragments to interact non-specifically with wrong partners or tangle in ways that interfere with cage assembly.

Boltz-2

Complementing our AlphaFold2 structure predictions, we employed Boltz-2 for quantitative affinity predictions between reconstituted tyrosinase active sites and substrates. While AlphaFold2 excels at structure prediction, it does not provide direct estimates of binding thermodynamics. This information is crucial for assessing whether reconstituted enzymes will actually bind and process tyrosine substrates with sufficient affinity to support melanin biosynthesis. Boltz-2’s integrated affinity prediction module outputs quantitative binding energy estimates that we could compare between native tyrosinases, simple split-and-reconstituted variants, and full encapsulin-tyrosinase assemblies.

We prepared Boltz-2 inputs using the same inputs as for the AlphaFold 2 predictions. The model outputs predicted binding affinities as continuous energy values that can be interpreted as free energies of binding (ΔG in kcal/mol) or converted to dissociation constants. We compared these predicted affinities across different design variants. Native tyrosinases, simple split-and-reconstituted tyrosinases, and encapsulin-fused constructs should ideally maintain comparable affinities indicating that the fusion architecture doesn’t sterically interfere with substrate access. Variants showing substantially weaker substrate binding in Boltz-2 predictions flagged potential problems with active site accessibility or geometry that would likely translate to poor catalytic activity in experimental tests.

Rosetta Suite

The final component of our encapsulin-tyrosinase modeling pipeline applied Rosetta Suite protocols for physics-based relaxation of AI-generated predictions. This step served the same purpose as in our simpler modeling workflows: validating that AI predictions represent physically reasonable structures when evaluated by independent energy functions. For these large multi-chain assemblies, we focused Rosetta analysis on specific regions rather than attempting to relax entire cage structures, which is not computationally possible for us. We extracted reconstituted tyrosinase domains from complete assembly predictions and performed local relaxation focusing on fragment-fragment interfaces and active site geometry. We also analyzed encapsulin subunit interfaces where fusion proteins attach, ensuring that tyrosinase fragments don’t create unfavorable steric or energetic perturbations that might destabilize cage assembly. Interface energy calculations quantified how strongly fragments interact upon reconstitution and whether these interactions would be stable under physiological conditions, providing thermodynamic metrics to complement Boltz-2’s affinity predictions and AlphaFold2’s structural confidence scores. This multi-method integration of combining learning-based structure and affinity prediction with physics-based energy validation maximized the reliability of our computational predictions for these complex engineered assemblies.

Data Sources

The foundation for encapsulin-tyrosinase assembly modeling came from combining crystallographic structures of native encapsulins with our previously developed split tyrosinase models. We used the crystal structure of encapsulin from Thermotoga maritima (PDB: 3DKT, resolution 2.8 Å) as our primary structural template, which represents a T=1 icosahedral assembly of 60 identical subunits forming a roughly 24-nanometer diameter cage. This particular encapsulin has been extensively characterized both structurally and functionally, making it an ideal starting point for engineering efforts. We also considered alternative encapsulin structures including those from Myxococcus xanthus (PDB: 2E0Z) and Pyrococcus furiosus (PDB: 2C54) to explore whether different cage architectures or surface properties might better accommodate tyrosinase fusions. These structures provided detailed atomic coordinates for encapsulin subunit folds, the protein-protein interfaces that mediate pentamer and hexamer formation, and the interior surface features that might interact with fused cargo proteins.

Our modeling integrated these encapsulin structures with the split tyrosinase structural predictions generated in earlier phases of our computational work. For each promising tyrosinase candidate identified through our split protein screening, we had already generated AlphaFold2 predictions of both separated fragments and reconstituted complexes, along with Boltz-2 affinity estimates and Rosetta-refined structures. These existing models formed the “parts list” that we computationally fused to encapsulin structures through flexible linkers. We created sequence constructs representing various fusion architectures by appending the tyrosinase fragments to the N-Terminus of encapsulins which are located inside the nanocage. By systematically combining our validated encapsulin and tyrosinase structural data through these different architectures, we constructed a comprehensive library of fusion protein designs that spanned the plausible engineering parameter space.

Results

The evaluation of encapsulin-tyrosinase assemblies proceeded through systematic comparison of multiple design variants against quantitative criteria assessing both successful cage formation and tyrosinase reconstitution. For each tested configuration defined by the specific tyrosinase candidate, split site location, linker length, fusion geometry, and subunit stoichiometry, we compiled predictions from all modeling tools and evaluated them to find the most promising fusions. The primary evaluation criteria included encapsulin assembly maintenance (assessed through predicted structure quality of inter-subunit interfaces and comparison to native cage architecture), tyrosinase reconstitution quality (measured by active site RMSD relative to native crystal structures and preservation of copper coordination geometry), predicted substrate affinity from Boltz-2, interface stability between reconstituted fragments from Rosetta calculations, and absence of problematic features like severe steric clashes, predicted aggregation-prone surfaces, or flexibility in regions that should be rigid.

Structural similarity analysis provided direct visual and quantitative comparison between predicted encapsulin-tyrosinase assemblies and reference structures. We aligned predicted encapsulin subunits onto crystal structure coordinates, calculating RMSD values that quantify structural deviation: values below 1 Å indicate that fusion to tyrosinase fragments did not substantially perturb encapsulin fold or assembly interfaces, while values exceeding 3 Å suggest potentially problematic distortions. More critically, we analyzed reconstituted tyrosinase active sites using the same geometric criteria applied to simpler split protein models. Successful designs showed predicted copper-copper distances of 3.4-3.8 Å (compared to 3.5 Å in native enzymes), proper three-histidine coordination of each copper ion with nitrogen-copper distances around 2.0-2.2 Å, and maintenance of substrate binding pocket architecture with conserved positions of key catalytic residues. Variants that deviated significantly from these native geometries by for example, showing copper-copper separations exceeding 5 Å or loss of histidine coordination were predicted to be catalytically compromised regardless of other favorable properties.

Strong predicted binding affinities indicated that substrate molecules could readily access reconstituted active sites despite the constraints imposed by fusion to encapsulin cages and potential steric crowding from multiple enzymes in close proximity. Weaker predicted affinities might indicate partially occluded active sites, distorted substrate binding pockets, or suboptimal orientation of catalytic residues. These are problems that could potentially be addressed through linker optimization, alternative fusion geometries, or selection of different tyrosinase candidates with more favorable structural properties.

BmTyr(complete)-Mx Encapsulin

The structure shows sterical issues with tyrosinase amount, especially when considering further pentamers and their respective tyrosinases. This simulation suggests that less than 100% of encapsulins should carry a tyrosinase.

BmTyr(split)-MX Encapsulin + Mx Encapsulin

Compared to the previous simulation this one seems to be much more sterically sound. It allows plenty of space for the tyrosinases to co-exist while preserving space for other encapsulin-tyrosinase pentamers to bind and construct a full nanocage without hindering stability.

BmTyr(complete)-Qt Encapsulin (Pentamer)

This simulation also indicates sterical issues both with regards to tyrosinase packing as well as encapsulin interaction and nanocage assembly, further supporting the hypothesis that a balance between encapsulin fused tyrosinases and native encapsulins is required.

BmTyr(complete)-Qt Encapsulin (Hexamer)

Additional support for the sterical issues hypothesis is provided by this simulation of Qt-Hexamers. Theoretically these offer far more space than the Pentamers, but even these suffer from sterical issues.

Conclusion

Drylab Impact

Our integrated computational pipeline transformed what would have been exhaustive experimental screening into targeted hypothesis testing, dramatically accelerating our development timeline while conserving resources. By requiring candidates to pass stringent criteria across SPELL split site prediction, AlphaFold2 structure prediction, Boltz-2 affinity estimation, and Rosetta energy calculations, we focused experimental efforts exclusively on designs with high success probability. This filtering eliminated poorly performing configurations in silico before committing laboratory resources, ensuring that every construct synthesized represented a rational design decision grounded in quantitative structural predictions rather than arbitrary choices.

The modeling directly shaped our experimental strategy at every decision point. BmTyr emerged as our primary candidate through the combination of excellent computational metrics across all platforms and prior validation in encapsulated systems, exemplifying how modeling integrated with literature evidence. For split tyrosinases, SPELL identified optimal division points balancing energetic barriers to spontaneous reassembly against reconstitution potential, which AlphaFold2 then validated would restore native active site geometry. The lid domain engineering for HcTyr1 relied entirely on computational identification of the likely auto-cleavage site and subsequent design of modified linkers that would disrupt native proteolysis while introducing controllable TEV protease recognition sequences. Even our initial conservative predictions about encapsulin-tyrosinase stoichiometry, suggesting that fully loaded cages might face steric constraints, represented testable hypotheses that computational modeling made explicit. See our detailed lid domain engineering protocols and encapsulin assembly protocols for experimental implementations of these computationally designed constructs.

Wetlab Impact

The lid domain tyrosinase experiments provided spectacular validation of our structure-based engineering approach. Modified HcTyr1, designed with computationally engineered linkers disrupting auto-cleavage while incorporating TEV sites, performed exactly as predicted through our tyrosinase activity assays. The construct showed zero melanin production without TEV protease, confirming complete active site occlusion, then produced melanin at levels matching our lidless positive control upon protease addition. This three-way comparison constitutes definitive proof that Boltz-2 predictions of lid positioning, AlphaFold2 confidence metrics for the lid-core interface, and Rosetta energy validations all correctly identified a functional design. Critically, constructs that reached functional testing performed as predicted, while failures occurred at earlier stages due to expression challenges, solubility issues, or copper loading problems rather than incorrect computational predictions. When biological systems produced properly folded enzymes, those enzymes behaved as the models forecasted.

The encapsulin-tyrosinase assembly experiments revealed an intriguing observation requiring model refinement. Our Native PAGE analysis showed that one hundred percent BmTyr-encapsulin loading produced high molecular weight species consistent with large assemblies, suggesting encapsulins may tolerate cargo loading more permissively than our conservative predictions indicated. This finding demands careful interpretation, as we lack activity data for these assemblies, used complete rather than split tyrosinases, and cannot definitively distinguish proper icosahedral cages from irregular aggregates without additional characterization. Split tyrosinases might still face stringent geometric constraints for active site reconstitution even if complete enzymes assemble readily. However, if this tolerance extends to functional split systems, it represents exciting optimization potential for dramatically increased melanin production capacity beyond our initial design assumptions. Rather than invalidating our modeling approach, this result exemplifies productive iterative refinement where conservative computational predictions successfully guided initial designs while experiments revealed a larger accessible design space. The models prevented wasteful pursuit of obviously non-functional configurations while experiments pushed boundaries, positioning us to refine predictions through explicit treatment of protein flexibility and experimental calibration of steric constraint calculations.

References

Escherichia Coli Human Embryonic Kidney Cells Modular Extracellular Sensory Architecture Tobacco Etch Virus Technical University of Munich Ludwig-Maximilians-Universität München Förster Resonance Energy Transfer Ligand Binding Domain Transmembrane Domain Intracellular Effector Domain Generalized Extracellular Molecule Sensor Synthetic Intermembrane Proteolysis Receptors Transcription Factor Human Embryonic Kidney Human Embryonic Kidney Amino Acid Triangulation Number C-terminal Domain N-terminal Domain Split Protease-Cleavable Orthogonal Coiled-Coil High-Performance Liquid Chromatography Heterodimeric Coiled-Coiled Peptides enhanced Unagi (eel) Green fluorescent protein tetracycline-controlled transactivator Heterodimeric Coiled-Coil Peptide P3 Heterodimeric Coiled-Coil Peptide P4 circular permutation Bicinchoninic acid Bovine Serum Albumin Erythropoietin

Alford, R.F., Leaver-Fay, A., Jeliazkov, J.R., O’Meara, M.J., DiMaio, F.P., Park, H., Shapovalov, M.V., Renfrew, P.D., Mulligan, V.K., Kappel, K., others, 2017. The Rosetta all-atom energy function for macromolecular modeling and design. Journal of chemical theory and computation 13, 3031–3048.

Bae, J., Kim, J., Choi, J., Lee, H., Koh, M., 2024. Split proteins and reassembly modules for biological applications. ChemBioChem 25, e202400123.

Brenner, M., Hearing, V.J., 2008a. The Protective Role of Melanin Against UV Damage in Human Skin. Photochemistry and Photobiology 84, 539–549. https://doi.org/https://doi.org/10.1111/j.1751-1097.2007.00226.x

Brenner, M., Hearing, V.J., 2008b. The protective role of melanin against UV damage in human skin. Photochemistry and photobiology 84, 539–549.

Claus, H., Decker, H., 2006. Bacterial tyrosinases. Systematic and applied microbiology 29, 3–14.

Conway, P., Tyka, M.D., DiMaio, F., Konerding, D.E., Baker, D., 2014. Relaxation of backbone bond geometry improves protein energy landscape modeling. Protein science 23, 47–55.

Dagliyan, O., Krokhotin, A., Ozkan-Dagliyan, I., Deiters, A., Der, C.J., Hahn, K.M., Dokholyan, N.V., 2018. Computational design of chemogenetic and optogenetic split proteins. Nature Communications 9, 4042. https://doi.org/10.1038/s41467-018-06531-4

de Almeida Santos, G., Englund, A.N., Dalleywater, E.L., Røhr, \AAsmund Kjendseth, 2024. Characterization of two bacterial tyrosinases from the halophilic bacterium Hahella sp. CCB MM 4 relevant for phenolic compounds oxidation in wetlands. FEBS Open Bio 14, 2038–2058.

Dolashki, A., Voelter, W., Gushterova, A., Van Beeumen, J., Devreese, B., Tchorbanov, B., 2012. Isolation and characterization of novel tyrosinase from Laceyella sacchari. Protein and peptide letters 19, 538–543.

Evans, R., O’Neill, M., Pritzel, A., Antropova, N., Senior, A., Green, T., Žı́dek, A., Bates, R., Blackwell, S., Yim, J., others, 2022. Protein complex prediction with AlphaFold-multimer. bioRxiv [Preprint](2022).

Guo, J., Rao, Z., Yang, T., Man, Z., Xu, M., Zhang, X., Yang, S.-T., 2015. Cloning and identification of a novel tyrosinase and its overexpression in Streptomyces kathirae SC-1 for enhancing melanin production. FEMS Microbiology Letters 362, fnv041.

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žı́dek, A., Potapenko, A., others, 2021. Highly accurate protein structure prediction with AlphaFold. nature 596, 583–589.

Khatib, F., Cooper, S., Tyka, M.D., Xu, K., Makedon, I., Popović, Z., Baker, D., Players, F., 2011. Algorithm discovery by protein folding game players. Proceedings of the National Academy of Sciences 108, 18949–18953.

Leman, J.K., Weitzner, B.D., Lewis, S.M., Adolf-Bryfogle, J., Alam, N., Alford, R.F., Aprahamian, M., Baker, D., Barlow, K.A., Barth, P., others, 2020. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nature methods 17, 665–680.

Markovic, D., 2013. What is the volume of a HEK293 cell?

Milo, R., Jorgensen, P., Moran, U., Weber, G., Springer, M., 2010. BioNumbers—the database of key numbers in molecular and cell biology. Nucleic acids research 38, D750–D753.

Nichols, R.J., Cassidy-Amstutz, C., Chaijarasphong, T., Savage, D.F., 2017. Encapsulins: molecular biology of the shell. Critical reviews in biochemistry and molecular biology 52, 583–594.

Nivón, L.G., Moretti, R., Baker, D., 2013. A Pareto-optimal refinement method for protein design scaffolds. PloS one 8, e59004.

Paria, K., Paul, D., Chowdhury, T., Pyne, S., Chakraborty, R., Mandal, S.M., 2020. Synergy of melanin and vitamin-D may play a fundamental role in preventing SARS-CoV-2 infections and halt COVID-19 by inactivating furin protease. Translational medicine communications 5, 21.

Passaro, S., Corso, G., Wohlwend, J., Reveiz, M., Thaler, S., Somnath, V.R., Getz, N., Portnoi, T., Roy, J., Stark, H., others, 2025. Boltz-2: Towards accurate and efficient binding affinity prediction. bioRxiv 2025. Google Scholar.

Riesz, J.J., Jean, J., 2007. The spectroscopic properties of melanin. University of Queensland Queensland, Australia.

Sch"auble, S., Stavrum, A.K., Puntervoll, P., Schuster, S., Heiland, I., 2013. Effect of substrate competition in kinetic models of metabolic networks. FEBS letters 587, 2818–2824.

Shuster, V., Fishman, A., 2009. Isolation, cloning and characterization of a tyrosinase with improved activity in organic solvents from Bacillus megaterium. Journal of molecular microbiology and biotechnology 17, 188–200.

Sigmund, F., Berezin, O., Beliakova, S., Magerl, B., Drawitsch, M., Piovesan, A., Gonçalves, F., Bodea, S.-V., Winkler, S., Bousraou, Z., others, 2023. Genetically encoded barcodes for correlative volume electron microscopy. Nature Biotechnology 41, 1734–1745.

Sigmund, F., Massner, C., Erdmann, P., Stelzl, A., Rolbieski, H., Desai, M., Bricault, S., Wörner, T.P., Snijder, J., Geerlof, A., others, 2018. Bacterial encapsulins as orthogonal compartments for mammalian cell engineering. Nature communications 9, 1990.

Song, Y., Liao, J., Zha, C., Wang, B., Liu, C.C., 2015. A novel approach to determine the tyrosine concentration in human plasma by DART-MS/MS. Analytical Methods 7, 1600–1605.

Storer, A.C., Cornish-Bowden, A., 1974. The kinetics of coupled enzyme reactions. Applications to the assay of glucokinase, with glucose 6-phosphate dehydrogenase as coupling enzyme. Biochemical Journal 141, 205–209.

Tastanova, A., Folcher, M., Müller, M., Camenisch, G., Ponti, A., Horn, T., Tikhomirova, M.S., Fussenegger, M., 2018. Synthetic Biology-Based Cellular Biomedical Tattoo for Detection of Hypercalcemia Associated with Cancer. Sci. Transl. Med. 10, eaap8562. https://doi.org/10.1126/scitranslmed.aap8562

Tyka, M.D., Keedy, D.A., André, I., DiMaio, F., Song, Y., Richardson, D.C., Richardson, J.S., Baker, D., 2011. Alternate states of proteins revealed by detailed energy landscape mapping. Journal of molecular biology 405, 607–618.

Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, .Ilhan, Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P., SciPy 1.0 Contributors, 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, 261–272. https://doi.org/10.1038/s41592-019-0686-2