HERO Design
DESIGN
Click here to see more
>
Up

Aware of the challenges in bioremediation, we decided to create a bacterium that carries in its genome the gene encoding a recombinase, along with a related plasmid designed for the insertion of degradation genes through Cre-loxP recombination. Facing then the challenges in predicting metabolic pathways we designed and developed a software that can help researchers in finding the best way to degrade a pollutant with our bacterium.

Bacteria Used


Rhodococcus opacus PD630

Our system is based on Rhodococcus opacus PD630, an oleaginous, Gram-positive bacterium belonging to the phylum Actinomycetota. We selected this strain because it is metabolically robust, genetically accessible, and capable of accumulating oily lipids such as triacylglycerols (TAGs) up to 80% of its dry cell weight under appropriate conditions. This combination of high lipid accumulation, broad catabolic versatility, and a steadily expanding genetic toolbox positions R. opacus PD630 as a promising chassis for synthetic biology and biotechnological applications, enabling cycles where pollutants are degraded while valuable bioproducts are generated.

Escherichia coli DH5α

To amplify all our plasmids, we used E. coli DH5α, which is a high-efficiency, chemically competent strain widely used in molecular cloning. Usually used in molecular biology, it has been engineered to enable blue/white screening and enhance plasmid stability and transformation efficiency.

Loxport Construct


Genome insertion of pollutant degrading genes is necessary to address the limitations posed by antibiotic resistance markers in plasmid keeping, and to standardize our engineering strategy.

Loxport construct map
Figure 1. Loxport construct map by Snapgene. It contains: two homologous regions for double recombination within the R. opacus genome (Homo UP and DOWN), located at the borders; Cre recombinase gene (cre); Lox site inside RBS sequence (LoxLE); thiostreptone responsive activator for the inducible promoter regulating Cre expression (T.U. TipAL); Thiostreptone-Induced promoter and terminator (PtipA and ThcA terminator); Thiostreptone Resistance (ThioR).

Homo UP and DOWN

The genomic integration site was selected based on the work of Antony et al. (2019), who identified a non-essential locus suitable for safe and stable insertions (ROCI-1).

Cre Recombinase

Cre recombinase is an enzyme capable of catalyzing a site-specific recombination event between two 34 bp recognition sites (LoxP sites). This recombinase can delete, insert or flip genes according to the number and orientation of lox sites. As illustrated in Fig. 2, the outcome of recombination is dictated by the relative orientation and position of the loxP sites, allowing precise genomic rearrangements such as excision, inversion, or translocation of DNA sequences.

In our system, Cre enables the targeted integration of degradation genes into the genome of R. opacus by recombining a lox site present on the plasmid with its corresponding site in the genome. The Cre recombinase sequence was sourced from Kitagawa et al. (2023) and codon-optimized for expression in R. opacus.

The lox sites are two: one in this construct (lox71) that will be inserted in the genome and the other one in the vector with degradation gene (lox66).

Cre recombinase mechanism
Figure 2: Examples of how Cre recombinase works. Schematics integration (a) and inversion (b). Triangles are the loxP sites.

RBS with lox71

The ribosome binding site (RBS) is a sequence of nucleotides upstream of the start codon of an mRNA transcript that facilitates the recruitment of the ribosome during translation initiation. In this construct, the RBS is located within the pTip promoter and has been modified by the insertion of the lox71 site.

RBS structure comparison
Figure 3. RBS with (b) and without (a) the lox71 site.

As shown in Fig. 3, the RBS containing lox71 differs structurally from the unmodified version. In principle, the Shine–Dalgarno sequence should remain free of stable secondary structures; however, in the native RBS it tends to form a hairpin. Translation initiation likely occurs when this region transiently unfolds, which is sufficient for ribosome recruitment. The lox71 insertion also introduces a hairpin structure, stronger than the native one, but it allows for alternative conformations—both free and linearized—that maintain the possibility of translation initiation.

This design was chosen to have a double function after gene insertion: it can disrupt the following gene's expression, since it inserts a long DNA fragment between its promoter and CDS. At the same time, it is possible to restore the originally active RBS by simply adding a few bases in the fragment to be inserted, which will allow to re-use the promoter to regulate inserted genes expression.

ThioR, TipAL, pTip Promoter and ThcA Terminator

Thiostrepton resistance is widely used as a selection marker in R. opacus. In our construct, the Cre recombinase gene is placed under the control of the inducible pTip promoter. This promoter is activated in the presence of thiostrepton, which forms a complex with the TipAL activator, which leads to pTipA promoter activation. In this way, pTip ensures that Cre is expressed only when required, preventing continuous enzymatic activity that could otherwise compromise cell viability. We also retained the terminator sequence due to its proven high functionality.

Vectors Used


Plasmid pKSAC45

Loxport will be transferred in R. opacus using pKSAC45. The plasmid pKSAC45 is a versatile vector designed by Holátko et al. (2009) for genetic manipulation in Gram-positive bacteria such as Corynebacterium glutamicum and serves as a model for recombination-based engineering in other actinobacteria. The plasmid carries a sacB gene encoding levansucrase under its native promoter, which provides a conditional lethal effect that allows selection of double recombinants after allelic exchange.

pKSAC45 replicates in E. coli but not in R. opacus, making it a suicide vector in the latter. It contains a Multiple Cloning Site with 11 unique restriction sites (HindIII, SphI, PstI, SalI, HincII, XbaI, BamHI, SmaI, KpnI, SacI, and EcoRI), a kanamycin resistance marker (Kmr), and allows the detection of recombinant clones via lacZ α-complementation in E. coli.

pKSAC45 plasmid map
Figure 4. The suicide plasmid pKSax45 map, generated with Snapgene. NeoR/KanR: kanamicine resistance; ori: origin of replication in E. coli DH5α; sacB: negative control; lacZ: lac operator for blue/white screening.

pLoxship

The plasmid pLoxship was designed as a vector for site-specific sequence integration using the Cre/Lox system in the pLoxport construct.

It comprises multiple elements: sequences derived from the pKsac45 plasmid (origin of replication and antibiotic resistance), the pNit/pTip QT1 plasmids (multiple cloning site, MCS), and custom-designed components (Lox66 + RBS).

  • The ColEI origin of replication functions in E. coli but is inactive in R. opacus
  • The NeoR/KanR cassette confers antibiotic resistance in both E. coli and R. opacus
  • The Lox66 + RBS fragment constitutes the key functional element of the plasmid: its sequence is recognized by Cre recombinase and mediates site-specific recombination with the Lox71 sequence present in the pLoxport construct
  • The MCS derived from pTip/pNit-QT1 facilitates gene cloning and provides a versatile set of restriction sites for various enzymatic manipulations

Alternative design variations can be implemented, for example by replacing the MCS with sequences compatible with BioBrick or Golden Gate assembly standards.

pLoxship plasmid map
Figure 5. PLoxship plasmid map, generated with Snapgene. NeoR/KanR: kanamicine resistance; ori: origin of replication in E. coli DH5α; MCS: Multi Cloning Site; loxRE: lox66 site for recombineering.

Reporter plasmid for primer characterization

We used the pTip vector to express newly tested promoters in Rhodococcus opacus. To visualize the activity of the tested promoters, we employed sfGFP as a reporter protein. We selected a domesticated sfGFP since it is compatible with iGEM assembly standards. Both mCherry and sfGFP had been previously tested in R. opacus, but sfGFP was readily available in the iGEM kit and had already been used as a reporter for vector insertion. Specifically, we obtained sfGFP from BBa_J428326. In iGEM, this sequence is typically part of a composite part including a promoter and RBS, but for our purposes, we used only the protein-coding sequence to assess promoter activity.

Since we required the sequence to be free of "illegal" restriction sites for iGEM, we introduced silent mutations via site-directed PCR mutagenesis. Primers were designed with single mismatches at the BsrGI and SapI sites to induce silent changes, thereby removing the restriction sites without altering the protein sequence.

We then substituted pTip promoter with the promoters to be studied, by cutting with BsrGI and NcoI enzymes (this also removed the RBS that was added and standardized into every single fragment used for promoter study).

Promoters


Facing the problem of few promoters for our strain, we decided to try 4 different promoters, that can be divided in two groups: three derive from the same promoter from the strain Rhodococcus jostii RHA1 and one is pLac, often used in molecular biology.

p2, pB2, pB3

The first group of promoters was derived from a promoter studied by Round et al. (2019) in Rhodococcus jostii RHA1, a species closely related to R. opacus. We selected p2, the promoter of a division cluster transcriptional repressor, as the reference sequence, since it was the strongest promoter not optimized in that work. Using PCR with different reverse primers, we synthesized two additional variants, pB2 and pB3, which are shorter than the original promoter. These modifications, which likely remove the native RBS and part of the original CDS, were designed to optimize p2, creating a stronger and shorter promoter. This approach is the same followed in the paper on p10 promoter, which resulted in M6 minimum promoter.

Promoter optimization map
Figure 6. SnapGene map of the three promoters, highlighting the different primers used.

pLac

We also decided to test pLac, as it is one of the most used and accessible promoters in bacterial systems. The pLac promoter is inducible by IPTG and widely characterized in E. coli, making it a convenient reference for evaluating promoter performance. Its inclusion allowed us to compare the activity of our synthetic P2 variants with a well-known, standard promoter. Even though in E. coli is inducible, in R. opacus, pLac is considered constitutive, providing another useful promoter.

References

  • Firrincieli, A., Grigoriev, B., Dostálová, H., & Cappelletti, M. (2022). The complete genome sequence and structure of the oleaginous Rhodococcus opacus strain PD630 through nanopore technology. Frontiers in Bioengineering and Biotechnology, 9, 810571.
  • Voss, I., & Steinbüchel, A. (2001). High cell density cultivation of Rhodococcus opacus for lipid production at a pilot-plant scale. Applied microbiology and biotechnology, 55(5), 547-555.
  • Anthony, W. E., Geng, W., Diao, J., Carr, R. R., Wang, B., Ning, J., ... & Zhang, F. (2024). Increased triacylglycerol production in Rhodococcus opacus by overexpressing transcriptional regulators. Biotechnology for Biofuels and Bioproducts, 17(1), 83.
  • Anthony, W. E., Carr, R. R., DeLorenzo, D. M., Campbell, T. P., Shang, Z., Foston, M., Dantas, G. (2019). Development of Rhodococcus opacus as a chassis for lignin valorization and bioproduction of high-value compounds. Biotechnology for biofuels, 12(1), 192.
  • Nagy, A. (2000). Cre recombinase: the universal reagent for genome tailoring. genesis, 26(2), 99-109.
  • Kitagawa, W., & Hata, M. (2023). Development of efficient genome-reduction tool based on Cre/loxP system in Rhodococcus erythropolis. Microorganisms, 11(2), 268.
  • Nakashima, N., & Tamura, T. (2004). Isolation and characterization of a rolling-circle-type plasmid from Rhodococcus erythropolis and application of the plasmid to multiple-recombinant-protein expression. Applied and environmental microbiology, 70(9), 5557-5568.
  • Holátko, J., Elišáková, V., Prouza, M., Sobotka, M., Nešvera, J., & Pátek, M. (2009). Metabolic engineering of the L-valine biosynthesis pathway in Corynebacterium glutamicum using promoter activity modulation. Journal of biotechnology, 139(3), 203-210.
  • Round, J. W., Roccor, R., & Eltis, L. D. (2019). A biocatalyst for sustainable wax ester production: re-wiring lipid accumulation in Rhodococcus to yield high-value oleochemicals. Green Chemistry, 21(23), 6468-6482.

Metabolic Pathway Prediction Software


When attempting to successfully engineer a bacterium, every scientist in the synthetic biology community is almost certain to face two major challenges: the first one being the lack of curated material (either information on pathways or feasibility of certain reactions inside the bacterial system), and the second one being the struggle of collecting the information needed (which is often stored on different databases, and possibly not standardized).

With CAPE, short for Computational Assistant for Pathway Engineering, we wanted to design a tool that could serve as a unique solution for these problems, bridging the gap between bioinformatics power and ease of use for non-specialized users. As a matter of fact, when trying to figure out how to accomplish this not-so-simple task, the basic principles upon which we had to build our software were very clear: ensure a user-friendly experience, set the groundwork for future scalability, and balance as much as possible between technical complexity and intuitiveness.

Guided by this philosophy, we created a platform that can be used effortlessly by anyone with a basic biological background, while remaining flexible and comprehensive enough to serve the needs of experts seeking a modular and efficient pipeline. Above all, CAPE is tailored to the iGEM spirit: empowering teams to turn ambitious ideas into practical designs, and accelerating the journey from concept to construct.

Pathway Retrieval

The first challenge we addressed was the retrieval and modeling of biochemical pathways into a unified data structure. Standardization was essential to ensure that the diverse information carried by reactions from external databases could be handled consistently. For the source of this information, we primarily chose KEGG, since both the Rhodococcus opacus PD630 genome and its pathway maps are already available there.

We fetched the metabolic maps from KEGG, downloaded them, and parsed them into a standardized graph-like network representation. This process quickly highlighted two major shortcomings of our approach. First, there was a lack of curated data for degradation pathways of uncommon molecules – or, in some cases, the complete absence of a compound from the database altogether (as in the case of Benzophenone-3). This limitation required a different strategy, capable of predicting new biochemical transformations beyond those directly annotated in KEGG.

Pathway Prediction

This stage of the project marked a turning point. After extensive research, we identified RetroPath2.0, an open-source automated workflow for retrosynthesis, as the missing piece we needed. The tool operates by taking a set of source compounds and a set of sink compounds, then predicting all possible biochemical routes that can connect them based on generalized reaction rules. Once we learned to tailor its parameters to our needs, it became the perfect complement to our system—allowing us to reconstruct plausible metabolic pathways that are not directly annotated in existing databases.

To integrate these predictions with our KEGG-based results, we adopted a hybrid approach: when KEGG alone cannot provide a complete route from the pollutant to the target, RetroPath2.0 steps in to propose the missing transformations. However, because compounds predicted by RP2.0 often appeared in different protonation or tautomeric forms, we introduced a cheminformatics preprocessing step to standardize their InChI representations before integration. We also realized that not all predicted paths are biologically meaningful: simply choosing the shortest route would often favor unrealistic shortcuts involving cofactors or ubiquitous intermediates. To prevent this, we implemented a weighted scoring system applied to the edges of our network, where heavier edges represent reactions that are biologically less relevant (e.g., cofactor-based shortcuts) or less reliable (e.g., predicted rather than experimentally verified). If you want to know more about it, go to the Software page!

EC Prediction through Reaction Similarity Search

The RetroPath2.0 workflow comes with an associated database of reaction rules, RetroRules, which links chemical transformations to Enzyme Commission (EC) numbers. However, when we applied RP2.0 to unannotated molecules such as Benzophenone-3, we discovered that some predicted reactions lacked an associated EC. This posed a critical problem: without EC numbers, no link could be made to potential enzymes, and thus no candidate genes could be proposed for integration into the chassis genome.

To overcome this limitation, we incorporated SelenzymeRF, an enzyme suggestion platform. By taking the SMARTS representation of a chemical reaction as input, SelenzymeRF can propose candidate enzymes and their associated ECs capable of carrying out that transformation. To minimize computational overhead, our software queries the SelenzymeRF API service, which returns the predictions without requiring local execution of the full tool.

In both cases – whether ECs are retrieved directly from KEGG/RetroRules or predicted through SelenzymeRF – the user retains flexibility: they may accept the suggested ECs or manually input their own, depending on their project's requirements.

Once we had maximized the chances of assigning an EC number to every step in the pathway, the next challenge was to retrieve the corresponding protein sequences for the required enzymes, and to automate this process as much as possible. Guided by our advisors, we selected NCBI Protein as our reference database, thanks to its extensive coverage and the ability to query it programmatically through the Entrez API.

At first, we attempted to fully automate the process, starting directly from the EC number and outputting a FASTA file. However, we soon realized it was difficult to define robust criteria for selecting certain protein entries over others. To avoid arbitrarily restricting the results, we decided instead to return the complete table of protein sequences associated with each EC, leaving the final choice to the user. We thought that this approach would provide accountability and flexibility: users can apply their own expertise to select the most suitable candidate, while still benefiting from our tool's filtering and sorting features that make the search process much faster.

Our query system is designed to prioritize Swiss-Prot reviewed entries from organisms closely related to Rhodococcus opacus PD630, whenever available. If no matches are found, the search expands to non-reviewed proteins, and, if necessary, climbs the taxonomic hierarchy step by step until it reaches the Bacteria level. Once suitable sequences are identified, users can download them as a .faa file or continue seamlessly to the next stages of the pipeline. Importantly, users may also upload their own .faa file, enabling the use of experimentally characterized enzymes not yet deposited in public databases.

Codon Optimization and Illegal Sites

For codon optimization, we developed a solution based on the Kazusa codon usage table for Rhodococcus opacus. Users can choose between two modes:

  • Max: a deterministic approach that always selects the most frequently used codon for each amino acid.
  • Weighted: a probabilistic approach that samples codons according to their usage frequency in R. opacus. This mode also accepts a seed parameter, ensuring reproducibility when desired.

In addition, our software includes an algorithm for restriction site screening, which identifies and removes illegal sites from the gene sequence if the user chooses to activate it. These sites are often incompatible with downstream cloning methods, such as iGEM Assembly Standards. Users can select from preset restriction site lists or manually define their own. The presets correspond to widely used standards: Type IIS RFC1000 (removal of BsaI and SapI) and BioBrick RFC10 (removal of EcoRI, XbaI, SpeI, PstI, and NotI).

Final Output

The codon-optimized sequences are consolidated in the final output page, where they can be reviewed and downloaded as a .fna file, alongside other Parts of the HERO toolkit. Also, it is possible to get sequences for the pLoxship backbone, which will allow rapid integration in Rhodococcus opacus genome.

Promoter optimization map
Figure 1. The CAPE pipeline.

References

  • Delépine, B., Duigou, T., Carbonell, P., & Faulon, J. L. (2018). RetroPath2. 0: a retrosynthesis workflow for metabolic engineers. Metabolic engineering, 45, 158-170.
  • Stoney, R. A., Hanko, E. K., Carbonell, P., & Breitling, R. (2023). SelenzymeRF: updated enzyme suggestion software for unbalanced biochemical reactions. Computational and Structural Biotechnology Journal, 21, 5868-5876.
  • Duigou, T., Du Lac, M., Carbonell, P., & Faulon, J. L. (2019). RetroRules: a database of reaction rules for engineering biology. Nucleic acids research, 47(D1), D1229-D1235.
  • Nakamura, Y., Gojobori, T., & Ikemura, T. (2000). Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic acids research, 28(1), 292-292.