Modeling-Guided Compartmentalization Engineering
Background and Motivation
To enhance β-carotene production in Saccharomyces cerevisiae, we developed a lipid droplet (LD)-targeted compartmentalization strategy. However, fusing LD-targeting signal peptides (HD2) to key enzymes — CarB (phytoene desaturase) and CarRP (bifunctional phytoene synthase/lycopene cyclase) — risks disrupting their catalytic activity by altering protein folding or blocking substrate-binding sites. To address this, we employed computational protein modeling to:
- Predict enzyme structures and substrate-binding pockets;
- Identify optimal fusion sites for signal peptides;
- Design flexible linkers to minimize steric interference.
Structure Prediction of Key Enzymes CarB and CarRP
Our search of the Protein Data Bank (PDB) revealed critically low sequence identities for the target enzymes: CarB showed maximum sequence identity of only 30.06% to known structures, while CarRP exhibited an even lower 24.68% identity. Since conventional homology modeling requires >30% sequence similarity, we selected AlphaFold for de novo structure prediction. However, subsequent docking attempts with AutoDock Vina yielded inaccurate binding poses. Consequently, we employed Chai Discovery—an AI-driven platform for biomolecular optimization—to achieve precise substrate-enzyme binding models.
Utilizing Chai Discovery, we predicted the binding model of CarB with its substrate phytoene (Figure 2). Computational analysis revealed that the substrate molecule GGPP docks within the binding pocket, where α-helix and loop regions form a hydrophobic cavity. This structural configuration facilitates substrate insertion and catalysis.

Figure 1. The structures of CarB bound to phytoene
The carRP gene product comprises two domains: the R domain (N-terminal), responsible for lycopene cyclase activity, and the P domain (C-terminal), exhibiting phytoene synthase activity. We further modeled the structures of CarRP bound to GGPP and lycopene, respectively. As depicted in Figure 2, the P domain binds GGPP (represented by the green carbon skeleton). Rotating the model reveals that the R domain binds lycopene (indicated by the blue carbon skeleton).

Figure 2. The structures of CarRP bound to GGPP and lycopene
Selection of Lipid Droplet-Targeting Signal Peptides
Our team consulted Professor Chen and learned that during the engineering modification of yeast strains, they fused the ERG7 gene with eGFP for expression and observed overlapping red and green fluorescence in lipid droplet staining. Integreated HP They confirmed the endogenous lipid droplet (LD)-targeting capability of ERG7. Following Professor Chen’s guidance, we performed Kyte-Doolittle hydropathy analysis on ERG7, identifying four distinct hydrophobic domains (residues 324-346, 376-401, 584-606, and 643-667). From these regions, we designed four hypothetical LD-targeting signal peptides: HD1 (324-346), HD2 (376-401),HD3 (584-606), and HD4 (643-667). Based on structural analysis, HD2 was selected as the optimal signal peptide for subsequent fusion constructs.

Figure 3. Structures of the GFP fusion protein with LD-targeting signal peptide and linker
Screening of GFP Fused with LD-Targeting Signal Peptides
Literature indicates that most lipid droplet (LD)-targeting signal peptides localize at the C-terminus of proteins. To validate HD2’s targeting capability, we fused HD2 to the C-terminus of GFP and used AlphaFold 3 to generate structural models with and without flexible linkers (Figure 3B-C). Structural analysis indicated that whether or not a flexible linker was added during the fusion of HD2 had no significant impact on the structure of GFP. Although LD-targeting signals are typically C-terminal, we additionally generated an N-terminal HD2-GFP fusion to test this paradigm (Figure 3D). Surprisingly, the N-terminal fusion showed minimal structural perturbation to GFP. Consequently, we constructed all fusion configurations as PTDH3-yeGFP-HD2-TCYC1, PTDH3-yeGFP-linker-HD2-TCYC1 (BBa_253ZON2D) and PTDH3-HD2-linker-yeGFP-TCYC1 (BBa_25BE29BZ). The wetlab results showed that HD2 fused to the N-terminus (HD2N) failed to localize to lipid droplets, while fusion to the C-terminus enabled its localization to lipid droplets (figure 4).

Figure 4. Fluorescence microscopy of GFP-targeted lipid droplets with nile red staining
Structural Impact of HD2 Fusion on Key Enzymes CarB and CarRP
Fusion experiments based on GFP lipid droplet localization revealed that HD2 should be fused to the C-terminus of the protein. To reduce experimental workload, we employed DLKcat, a deep learning-based tool for high-throughput prediction, to forecast the Kcat values for substrate binding by different fusion proteins. The prediction results are as follows: Regarding the binding of CarB to the substrate phytoene, the predictions indicate that incorporating a flexible linker between CarB and HD2 maintains a Kcat value comparable to that of the wild-type (Figure 5). In contrast, the absence of a flexible linker reduces the Kcat value to 50% of the wild-type level. Therefore, we selected the CarB-linker-HD2 fusion protein (Part5) for construction.

Figure 5. Kcat values predicted by DLKcat
Chai Discovery-generated docking models revealed: While CarB exhibited significant C-terminal deviation from wild-type yet maintained its substrate-binding pocket architecture.

Figure 6. Structures of the CarB fusion protein with LD-targeting signal peptide and linker
Simultaneously, we employed DLKcat to predict the Kcat values for CarRP binding to substrates GGPP and lycopene, respectively. The prediction results are as follows: For CarRP-GGPP binding, the absence of a flexible linker between CarRP and HD2 maintains a Kcat value comparable to that of the wild-type, whereas incorporating a flexible linker significantly reduces the Kcat value (Figure 7). The same trend was observed for CarRP-lycopene binding. Therefore, omitting the linker is more favorable for preserving CarRP enzymatic activity, leading to the construction of the CarRP-HD2 fusion protein (Part5).

Figure 7. Kcat values predicted by DLKcat
Chai Discovery-generated docking models revealed: CarRP showed differential substrate effects—GGPP maintained native binding positioning, whereas lycopene underwent substantial conformational changes indicative of fusion-induced pocket remodeling (Figure 8 and 9).
Critically, wet-lab validation confirmed these engineered fusions enhanced β-carotene production, demonstrating functional efficacy despite structural perturbations.

Figure 8. Structures of the CarRP fusion protein with LD-targeting signal peptide

Figure 9. Structures of the CarRP fusion protein with LD-targeting signal peptide and linker
Conclusion
Our structure-guided approach demonstrated:
- Minimal disruption to enzyme activity by fusing HD2 to C-termini with flexible linkers.
- Efficient targeting to lipid droplets, enhancing β-carotene storage and reducing cytotoxicity.
- Significant production improvement (2.4-fold in shaking flasks), validating the model predictions.

Reference
- Velayos, A., Eslava, A. P., & Iturriaga, E. A. (2000). A bifunctional enzyme with lycopene cyclase and phytoene synthase activities is encoded by the carRP gene of Mucor circinelloides. European Journal of Biochemistry, 267(17), 5509-5519.
- Zehmer, J. K., Bartz, R., Liu, P., & Anderson, R. G. (2008). Identification of a novel N-terminal hydrophobic sequence that targets proteins to lipid droplets. Journal of cell science, 121(11), 1852-1860.
- Li, F., Yuan, L., Lu, H., Li, G., Chen, Y., Engqvist, M. K., … & Nielsen, J. (2022). Deep learning-based k cat prediction enables improved enzyme-constrained model reconstruction. Nature Catalysis, 5(8), 662-672.