1 Overview
Our software aims to provide a low-barrier, flexible, easy-to-use, and graphically interactive platform for metabolic modeling and analysis for synthetic biology research teams. The platform is based on COBRApy [10] for core FBA(Flux Balance Analysis) [1][5] computation and provides visualization and a dedicated FBA knowledge-base Agent [6][7][8] through a Web UI interface that supports natural language operations.
Its main functions include:
- Model Selection: Supports switching between different metabolic models, including selecting pre-loaded models in the software, and also supports users importing new models.
- Target Reaction and Weight Setting: Allows flexible specification of target reactions and their optimization weights in the selected model, used to define the optimization direction of the simulation. It also supports importing new reactions.
- Reaction Flux Bound Adjustment: Supports modifying the flux boundary conditions of metabolic reactions to facilitate exploring metabolic behaviors under different environments or engineering modifications.
- Gene Knockout Simulation: Analyzes the impact of gene knockout experiments on the metabolic network and product synthesis by setting gene knockouts.
- FBA Computation and Visualization:After submission, the system calls COBRApy to perform FBA (Flux Balance Analysis) and automatically generates clear metabolic flux maps using Escher [3][9] for intuitive result visualization.
- Intelligent Agent for Knowledge Q&A and Operation Assistance: Additionally, the platform integrates an Agent(DeepSeek-R1:7B) that can automatically convert user natural language instructions into the above operations. For example, a user can directly input "Please set Yeast9-GEM [2] as the target model," and the Agent will parse and execute the corresponding steps, significantly lowering the software usage barrier. At the same time, the Agent is loaded with a relevant knowledge base and can answer user questions about FBA concepts and applications.
By integrating metabolic modeling computation, visualization, and natural language interaction, our software is suitable not only for experienced researchers to conduct in-depth analysis but also enables beginners to more easily explore metabolic network design in synthetic biology.
Tips:
If you are unfamiliar with FBA, you can jump to the Model section to learn its detailed principles and functions.
Briefly, FBA is a linear programming computational method based on constraints used to predict the distribution of metabolic fluxes in a metabolic network. It uses genome-scale metabolic models (GEM) combined with physicochemical constraints (such as mass conservation, energy balance, and reaction rate bounds) for mathematical optimization, thereby simulating the metabolic state of an organism under specific conditions. FBA does not rely on detailed kinetic parameters and is widely used in predicting gene knockout effects, discovering biomarkers, optimizing biosynthetic pathways, and guiding metabolic engineering.
2 Highlights
2.1 Generality and Practicality
The core of synthetic biology research lies in the iterative "Design-Build-Test-Learn" (DBTL) cycle. High-quality models can provide accurate, directive suggestions and accelerate the DBTL iteration. Our platform provides metabolic network models for chassis strain design, playing an indispensable role.
Our software can provide functionalities such as: predicting theoretical yield of target products, identifying key gene targets, and metabolic network analysis [4].
- Predicting Theoretical Yield of Target Products: Teams need to know the maximum theoretical yield of their target product (e.g., lycopene, paclitaxel) under ideal conditions—what is the ceiling? FBA can quickly calculate this maximum theoretical yield, providing key decision-making basis for project feasibility.
- Identifying Key Gene Targets: Teams need to determine which genes to knock out or overexpress to maximize metabolic flux toward the target product while suppressing byproducts. FBA's gene essentiality analysis and robustness analysis (ROOM) can systematically screen optimal gene editing target lists.
- Understanding Metabolic Network Behavior: Teams need to understand how metabolic flux redistributes under specific conditions (e.g., anaerobic). Which pathways are activated, and which are suppressed? FBA's Flux Variability Analysis (FVA) can provide a global perspective.
- Strong Extensibility of Platform Chassis Models: Supports multiple chassis cells, such as Yeast9-GEM, IJO1366, and e_coli_core. Also supports customized models and importing simplified models generated by other software (e.g., CarveMe). Supports freely adding new reactions, such as introducing a new heterologous metabolic pathway from literature or hypothesizing a novel enzymatic reaction. Supports creating new metabolites to include hypothetical intermediates not yet in standard databases.
2.2 Compatibility with Common Synthetic Biology Standards
- SBML Support: The platform is built on COBRApy and natively supports SBML format metabolic model files. This allows users to directly import standard models from databases like BiGG Models, ensuring reproducibility and shareability of results [Click here to learn about import operations]
- COBRApy Integration:COBRApy is an open-source Python-based toolkit primarily used for genome-scale metabolic network reconstruction and constraint-based modeling analysis. In short, it helps researchers simulate and predict intracellular metabolic processes using computers. Our software's core computation directly calls the COBRApy API, ensuring compatibility with mainstream tools in the metabolic modeling community.
- Escher Visualization: Escher is an interactive, open-source web application for constructing, visualizing, and analyzing genome-scale metabolic models (GEMs). Its core function is to present complex metabolic networks and data (e.g., FBA results) in an intuitive, aesthetically pleasing, and interactive graphical format. Our software generates metabolic flux maps via Escher's interface, and results are compatible with the existing Escher JSON format, enabling sharing and reuse in the community.
- DeepSeek LLM Agent:DeepSeek-R1-Distill-Qwen-7B is an advanced large language model that, through deep learning on massive text data, masters complex language patterns and can generate fluent and contextually appropriate text. We designed a natural language interaction interface; user instructions are parsed into structured operations and then task functions are called to complete tasks. We also provide a unified API layer for future expansion, allowing integration of more tasks.
2.3 Maintainability and Extensibility
We emphasized code maintainability and extensibility during development:
- Code Comments and Documentation:Core modules include detailed comments and function descriptions, accompanied by installation and user guides (README, Wiki tutorials) for quick onboarding by subsequent teams. [Click here to learn about the installation tutorial]
- Clear Architecture: The software adopts a front-end and back-end separation architecture (Frontend Web UI - Backend Flask - Underlying COBRApy/Escher/LLM API), with clear module responsibilities, facilitating expansion.
2.4 User-Friendly Design
Our software is designed with user-friendliness as a core principle:
- Intuitive Interface: Currently, there are few FBA visualization and graphical interactive computing software. Most rely on programming for computation. Visualization steps are often only for results, while the computation process still requires coding. In traditional research paradigms, metabolic network analysis typically requires researchers to be proficient in MATLAB (with COBRA Toolbox) or Python(with COBRApy). They must not only understand biological logic but also master the syntax, library installation, and debugging skills of another programming language. Our software achieves full-process visualization and graphical interaction of computational results, greatly lowering the usage barrier. After deployment, all modeling operations are completed through the Web interface, eliminating the need for complex command-line operations. Interactive metabolic flux maps are generated using Escher for result visualization, making results immediately clear.
-
Natural Language Interaction: We integrate a large language model Agent, allowing users
to complete complex operations through natural language, significantly lowering the usage barrier.
Specific assistance is reflected in two major aspects:
- Knowledge Q&A and Explanation:
Researchers can directly ask questions in natural language. The Agent can provide immediate and accurate answers based on the built-in metabolic modeling knowledge base and current model context.
- Operation Assistance and Automation:
This is the most revolutionary feature. Users do not need to write any code or search for functions in menus. They simply tell the Agent their analysis intent in natural language, and the Agent automatically converts it into backend operations.
- Knowledge Q&A and Explanation:
2.5 Validation through Wet Experiments
To calculate the feasibility of using red algae as a carbon source, we performed FBA calculations using the software. Wet experiment results ultimately confirmed that red algae can indeed be used as a carbon source substrate and achieved good yields. [Click here to learn more details]
3 Flowchart
The interactive FBA analysis platform offers users a comprehensive computational biology workflow accessible via a web-based interface. The system provides three main functions: FBA Flux Map calculation, Operation Assistance, and Question Query.
For FBA computation, users begin by selecting an appropriate genome-scale metabolic model from the model library as the analysis foundation. After choosing a model, users define the biological objective function for the simulation—common objectives include biomass maximization or optimization of specific metabolite production. To achieve the defined goals, users adjust network constraints based on experimental conditions or physiological states, such as substrate uptake rates and oxygen availability. These constraints delineate the physico-chemical feasible space for metabolic network operation.Gene knockout simulation enables systematic identification of essential genes or discovery of potential metabolic engineering targets. After calculation, all simulated flux distribution results can be intuitively visualized using the integrated Escher Visualization module, facilitating interpretation of network states.
Throughout the analysis process, an LLM Agent and a structured knowledge base support users by providing instant queries about model components, reaction mechanisms, or gene functions. This offers essential contextual knowledge for biological interpretation of simulation results, along with real-time operational guidance to ensure a smooth workflow. Together, these features create an integrated analysis environment spanning model construction, simulation, visualization, and knowledge retrieval.
4 Architecture Diagram
Above is our project architecture diagram. The frontend consists of WebUI and LLM ChatBox, while the backend includes Flask, COBRApy, and Escher.
5 Software Demonstration
5.1 Web Interface and Operation Demonstration
In this tutorial, we will demonstrate how to use our software for FBA analysis. [Click here to learn about installation]
Step 0: Access the FBA Homepage
Visit our deployed FBA platform, and you will see the following page:
Step 1: Select a Model
We provide some commonly used models in advance, such as e_coli_core and Yeast-9. You can also download models from BIGG or use local models and import them into our platform via the "Import" button.
On the page, you can input the model ID to search and execute the query.
After selecting a model, click "Next." The selected model will be displayed at the top.
Step 2: Select Target Reaction
This step involves two tasks: searching for the target reaction and setting its weight. The target reaction may be a linear combination of multiple reactions.
If no reaction is selected, maximizing biomass will be the default target reaction.
If there are additional needs, you can import reaction types not currently considered in the model via "Add New...".
On the weight page, the sum of reaction weights must equal 100%.
Step 3: Constrain the Environment
You can set the upper and lower bounds of reactions on the "Reaction Constraints" page.
Similarly, you can knock out genes on the "Gene Knockout" page.
Then, on the confirmation page, verify the operations performed.
You can always go back to previous pages and click "Clear" to reset settings. All pages involving multiple selections allow using the "Organize" button to reorder selections, enabling users to manage selected items on the first page.
After verification, click "Submit," and the software backend will perform the computation.
When results are satisfactory, the calculated metabolic data will be mapped onto a metabolic map for visualizing metabolic pathways.
Additionally, we provide a dark mode for more comfortable nighttime use.
5.2 Agent System Introduction
Our Agent is based on DeepSeek-R1:7B and integrates a specialized knowledge base for FBA. The knowledge base was constructed by integrating the practical experience of five synthetic biology researchers, covering common questions, frequent operations, and typical application scenarios. Based on their feedback, we systematically compiled user common queries and operational needs to ensure the Agent can more accurately understand and respond to user requests.
After deploying the Agent, you can ask questions or perform operations via natural language in the chat box.
Notably, the conversation history in this session is saved on the current page.
Below are real-time video demonstrations of Agent-user interaction:
Agent Intelligent Q&A:
- What is the principle of FBA?
As shown in the video, after inputting the question, the Agent briefly outlines FBA-related concepts and provides a detailed point-by-point answer, ensuring effective responses within limited space.
- What files or data are needed to start FBA calculations?
As shown in the video demonstration, after completing the first question, we followed up with the second question. The system retains previous history within this session, and new topics do not affect normal responses.
- Where can I find and download reliable, validated metabolic models?
The Agent will list available websites, corresponding file types, and their comparative features.
- What exactly do the "constraints" refer to? How should I understand and set these constraints (e.g.,
reaction rate upper and lower limits)?
As shown in the video, the Agent briefly answers the meaning of constraint subdivisions and describes the significance of setting limits.
Agent Intelligent Operation Assistance:
- Set the model to iJN746
The video demonstrates that, starting from a blank setting, inputting the command causes the Agent to set the model to iJN746. The operation result is visible in the global selection prompt and the final summary, consistent with manual mouse operations.
- Set a specific objective function
Similar to the previous Q&A, Agent-assisted operations retain all interaction content in the current session, allowing users to clearly see which operations have been successfully executed.
- Modify the upper and lower flux bounds of a reaction
Operation is similar to setting the objective function.
- Knock out certain genes
The results are also visible in the final summary.
- Generate a metabolic flux distribution map
After completing the necessary steps above, the Agent can directly call relevant functions to compute results. Users can view them on the Results page.
6 Expert Feedback
In Agent-related practices, we consulted Professor Zhang Tong. [click here to learn more details]
For example, regarding "model selection," he suggested we try using open-source models like Qwen (e.g., 32B version) for local deployment to alleviate response delays caused by calling public APIs (e.g., DeepSeek) and improve system stability.
Regarding "mitigating hallucination issues," he pointed out that hallucinations are currently difficult to completely solve, but can be partially mitigated through methods like building fact bases, counterfactual validation, and feature alignment. He recommended establishing a dedicated ambiguity lexicon and attempting to use attention mechanisms to identify and handle semantic ambiguities.
Regarding "whether it's necessary to introduce knowledge or tools for constraints to avoid incorrect interpretations," Professor Zhang gave a positive answer: "This is necessary. Because there are many specialized or domain-specific knowledge points that large models have not learned. Especially for innovative tasks like yours, it's essential to perform secondary training or use an external knowledge base to supplement knowledge."
Regarding "long-text and memory mechanism design," he recommended "introducing long-text memory and adaptive forgetting mechanisms like LSTM (Long Short-Term Memory) to improve information coherence in multi-turn dialogues and avoid error accumulation and semantic interference. Reinforcement learning ideas can also be combined to optimize memory weight allocation."
Finally, regarding the project's innovation and feasibility, the professor provided high praise, encouraging us to leverage interdisciplinary advantages, deeply integrate AI with domain knowledge, and demonstrate practical value such as shortening R&D cycles and reducing costs.
7 Installation
If Python is not available, install Python first and choose the appropriate installation package based on the current OS.
- Install Python
Visit the official Python website, download page and run the downloaded installation program
The most important step: check the "Add Python 3. x to PATH" checkbox.
- Verify if the installation was successful
Regardless of the method, after installation, it should be verified.
Open the command-line tool (Windows: CMD/PowerShell; Mac/Linux: Terminal) and enter the following command:
# Windows users python --version # Mac/Linux users python3 --versionIf the Python version number is displayed (such as "Python 3.9.7"), it indicates a successful installation.
- Configure and run
- Recommended: Four-Step Setup for basic FBA functionality
Follow these steps to set up the environment and run the software:
-
Create a virtual environment
python -m venv venv -
Activate the virtual environment
source venv/bin/activate # Mac/Linux .\venv\Scripts\activate # Windows -
Install dependencies
pip install -r requirements.txt -
Run the software
python src/app.py
Then visit http://127.0.0.1:8079/
-
Create a virtual environment
-
Optional: Agent functionality
Prerequisites:
Install Ollama: visit the website and download the appropriate version for your operating system.
Verify the installation by running:
ollama --versionRecommended Hardware:
For optimal performance, we recommend using an NVIDIA RTX 3060 12GB or higher GPU to deploy the DeepSeek-R1-7b model.
Steps to Deploy:
- Generate the agent database:
python src/build_fba_knowledge.py - Pull the DeepSeek-R1 Model Run the following command to download the model:
ollama pull deepseek-r1:7b - Run the Model
Start the model with the command:
ollama run deepseek-r1:7b
Once the model is running, you can activate the agent feature in your web application!~~(^v^)~~
- Generate the agent database:
- Recommended: Four-Step Setup for basic FBA functionality