CAPE

Computational Assistant for Pathway Engineering

A user-friendly bioinformatics tool designed to discover and engineer metabolic pathways for biodegradation in Rhodococcus opacus PD630

What does CAPE do?

CAPE (Computational Assistant for Pathway Engineering) is a bioinformatics tool built to integrate with HERO (High-performance Engineered Rhodococcus Opacus). It operates on metabolic graphs created from the KEGG database and integrates predicted metabolic pathways from RetroPath2.0.

CAPE is designed to find biologically plausible enzymatic pathways between a source compound (e.g., a pollutant) and a desired product, allowing users to obtain host-optimized gene sequences needed to perform the reactions.

Automated Pathway Discovery

CAPE automates the traditionally manual process of searching metabolic databases and integrating reaction data, making pathway engineering accessible to teams without extensive bioinformatics resources.

Integration of Known & Predicted Reactions

Combines curated KEGG data with RetroPath2.0 predictions to identify complete degradation pathways, even when gaps exist in current databases.

Codon Optimization

Automatically optimizes retrieved enzyme sequences for expression in Rhodococcus opacus PD630, ensuring efficient protein production.

Assembly-Ready Output

Provides FASTA files, characterized iGEM parts, and restriction site screening compatible with standard cloning methods.

How to install CAPE

Open a Unix shell
Ensure you have terminal access on Linux, macOS, or WSL on Windows.
Install conda and add to PATH
Verify conda is installed and accessible from your terminal.
Clone the GitLab repository
git clone https://gitlab.igem.org/2025/software-tools/bologna
Create the conda environment
cd bologna
conda env create -f environment.yml
Unzip necessary files
gunzip -k cape_app/algorithms/pathways/data/retrorules_rr02_rp2_flat_forward.csv.gz

tar -xzf cape_app/algorithms/pathways/data/Pathways.tar.gz -C cape_app/algorithms/pathways/data/
Install RetroPath2.0 in the environment
See the RetroPath repository for more details.

conda install -c conda-forge -n cape retropath2_wrapper
Activate the environment
conda activate cape
Run the server
python manage.py runserver

In case this message is displayed: "You have 18 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions. Run 'python manage.py migrate' to apply them." After quitting the server with Ctrl+C, run python manage.py migrate as suggested.

Access CAPE on a browser at: http://127.0.0.1:8000/

How does CAPE work?

Workflow of the CAPE tool

Step 1: KEGG-based Metabolic Graph

CAPE constructs a metabolic graph of all reactions in the metabolism of Rhodococcus opacus PD630 using data parsed from the KEGG database:

Nodes represent compounds
Edges represent reactions catalyzed by enzymes
Edge weights encode biological plausibility—heavier edges correspond to reactions less likely to occur naturally

When a user inputs a source compound and desired product, CAPE searches for the shortest path (lowest total weight) through this graph, automating manual KEGG searches.

Step 2: Integration with RetroPath 2.0

If KEGG alone cannot connect the source to the product, CAPE integrates predictions from RetroPath2.0:

RetroPath uses generalized chemical rules to infer possible reaction steps from source compounds to core metabolites of R. opacus PD630
Predicted reactions are added as edges in the metabolic graph
CAPE searches again for the shortest path

This enables CAPE to propose novel, plausible pathways beyond those annotated in KEGG.

Step 3: Path Selection

CAPE returns candidate pathways, each showing:

Sequential compound reactions (steps)
Multiple enzyme options for each step, color-coded by reliability:

Green: Annotated in R. opacus PD630 Yellow: Annotated in another organism Red: RetroPath predicted, with EC Purple: RetroPath predicted, no EC

Step 4: EC Selection and NOEC Problem Resolution

Users can select an existing EC number for each reaction or define a custom one. When RetroPath-predicted reactions lack an EC annotation, CAPE uses SelenzymeRF to infer the missing EC:

Input: SMARTS representation of the chemical reaction
Output: Predictions of candidate enzymes and their EC numbers

Step 5: Ortholog Retrieval

CAPE queries the NCBI Protein database via the Entrez API to retrieve ortholog sequences:

Prioritizes Swiss-Prot reviewed entries from organisms closely related to R. opacus PD630
Expands searches to non-reviewed proteins if needed
Broadens taxonomic scope step-by-step up to the Bacteria level
Displays filterable tables of protein sequences for each EC
Supports custom .faa file uploads for experimentally characterized enzymes

Step 6: Codon Optimization

Sequences undergo codon optimization based on the Kazusa codon usage table for Rhodococcus opacus:

Max mode: Deterministic—always selects the most frequently used codon
Weighted mode: Probabilistic—samples codons according to usage frequency (accepts seed parameter for reproducibility)

Restriction site screening detects and removes illegal sites incompatible with cloning methods:

Type IIS RFC1000 (removes BsaI and SapI)
BioBrick RFC10 (removes EcoRI, XbaI, SpeI, PstI, and NotI)
Custom restriction site lists

Step 7: Output

CAPE returns:

.fna file of all enzyme sequences in the selected pathway
Characterized HERO Parts such as promoters and RBSs
pLoxship backbone sequence compatible with HERO for insertion into R. opacus PD630

Using the Web Interface

Input Compounds

Enter the source InChI and desired product InChI, optionally including their common names and the number of pathways to compute (maximum).

Select a Pathway

Review the returned pathways and select the most promising one based on the number of steps and reaction reliability.

Choose EC Numbers

For each enzymatic step, select an EC number from the color-coded options indicating reliability.

Select Enzyme Sequences

Choose from retrieved sequences or add custom sequences from your own experimental data.

Codon Optimization

Optimize protein sequences according to the Rhodococcus opacus genome, optionally excluding illegal restriction sites.

Download Results

Download FASTA files and iGEM parts characterized by the HERO wet-lab team, including the pLoxship backbone, promoters, and RBSs.

How were the edge weights chosen?

In the metabolic graph, each directed edge connects two compounds participating as reactant and product within the same reaction. The algorithm aims to identify biologically plausible degradation pathways following the transformation of a main compound through successive reactions.

If all edges had equal weights, the shortest paths would frequently include biologically implausible "shortcuts," such as traversing ubiquitous metabolites like water, yielding nonsensical routes (e.g., source → water → product). To address this, a biologically aware edge-weighting approach was implemented:

Annotation-Based Weighting

Reactions annotated in R. opacus PD630 receive the lowest weights (highest confidence). Reactions from other organisms get slightly higher weights, while RetroPath-predicted reactions receive progressively heavier weights.

Mass-Based Adjustment

Sharp drops in molecular weight are penalized, as such steps lead toward small fragments rather than the main degradation route. Abrupt increases are also discouraged.

Cofactor Blacklist

Ubiquitous cofactors (ATP, NADH, CoA) are assigned high weights to prevent overrepresentation in pathways.

Improved Results

This weighting strategy improved average biologically relevant path length from ~3 steps (dominated by shortcuts) to ~8 steps, corresponding to coherent, enzyme-mediated degradation sequences.

How can CAPE be improved in the future?

🧬

Strain Generalization

CAPE is currently tailored to R. opacus PD630, but future versions could support additional bacterial hosts, expanding its applicability across synthetic biology projects.

📊

Database Generalization

Integration with metabolic databases beyond KEGG could provide more comprehensive pathway coverage and alternative reaction routes.

🤖

Machine Learning Integration

Incorporating ML-driven pathway prediction could improve accuracy of predicted reactions, suggest novel biodegradation routes, and more accurately prioritize pathways based on metabolic feasibility.

📈

Enhanced Visualization

Interactive pathway diagrams with clickable nodes and edges could help users explore enzyme details and reaction steps more intuitively.

⚠️

Toxicity Awareness

Future versions could warn users about potential harmful compounds generated along predicted pathways, helping to design safer biodegradation strategies.

🔄

RetroRules Update

Using updated RetroRules might yield improved results, facilitating the prediction of lesser-known degradation pathways.

Ready to engineer your biodegradation pathway? Visit our GitLab repository to get started with CAPE!

Innovation never stops! This page represents the original bulk of the code, for further developments visit: CAPE GitHub repository

References

▼

Delépine, B., Duigou, T., Carbonell, P., & Faulon, J. L. (2018). RetroPath2. 0: a retrosynthesis workflow for metabolic engineers. Metabolic engineering, 45, 158-170.
Kanehisa, M., & Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research, 28(1), 27-30.
Koch, M., Duigou, T., & Faulon, J. L. (2020). Reinforcement learning for bioretrosynthesis. ACS synthetic biology, 9(1), 157-168.
Nakamura, Y., Gojobori, T., & Ikemura, T. (2000). Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic acids research, 28(1), 292-292.
Roell, G. W., Schenk, C., Anthony, W. E., Carr, R. R., Ponukumati, A., Kim, J., ... & García Martín, H. (2023). A high-quality genome-scale model for Rhodococcus opacus metabolism. ACS synthetic biology, 12(6), 1632-1644.

CAPE Tool

CAPE

What does CAPE do?

Automated Pathway Discovery

Integration of Known & Predicted Reactions

Codon Optimization

Assembly-Ready Output

How to install CAPE

How does CAPE work?

Step 1: KEGG-based Metabolic Graph

Step 2: Integration with RetroPath 2.0

Step 3: Path Selection

Step 4: EC Selection and NOEC Problem Resolution

Step 5: Ortholog Retrieval

Step 6: Codon Optimization

Step 7: Output

Using the Web Interface

Input Compounds

Select a Pathway

Choose EC Numbers

Select Enzyme Sequences

Codon Optimization

Download Results

How were the edge weights chosen?

Annotation-Based Weighting

Mass-Based Adjustment

Cofactor Blacklist

Improved Results

How can CAPE be improved in the future?

Strain Generalization

Database Generalization

Machine Learning Integration

Enhanced Visualization

Toxicity Awareness

RetroRules Update

References