Software | YAU-China

CFS_BHDS_AS Software Basic Info

CFS_BHDS_AS Software WIKI Documentation

Software Name: CFS_BHDS_AS (Curve Fitting Software Based on High-Dimensional Search and Azimuth Statistics)

Version: v1.0

Developer: Shengxuan BIAN

Contact Email: shixiu_yakuchi0324@qq.com

Document Date: 3 October 2025

1. Summary and overview

Software Overview

CFS_BHDS_AS is an intelligent curve fitting software based on high-dimensional space search and orientation statistics, which is specially used to optimize the parameters in biological system dynamics models.

It supports two core application scenarios:

Import simple and easy-to-obtain wet experimental data to predict difficult-to-measure data.
Take ideal wet test data as the target and data to be improved as the starting point; by fitting the behavior of observed parameters, find the key parts that help the wet test achieve the ideal state.

Core functions

Automatic parameter optimization: Fit experimental data with the prediction of the ODE model.
Real-time visualization: Dynamically display the fitting process and parameter change behavior.
Robust search algorithm: Combine reinforcement learning with orthogonal search strategy to effectively escape from local optimal solutions and avoid wasting computing power.

Target User

Computational biology researchers, synthetic biology researchers, and researchers who need to deal with ODE model parameter fitting.

2. Quick Start

2.1 system requirements

Software dependencies: R (>=4.5.0), RStudio (>=2025.05.0)
Hardware dependencies: Recommended configuration (e.g., Intel i7 or higher performance CPU)
Installation and startup (step-by-step guide)
1. Get the software package.
2. Open the CFS_BHDS_AS.Rproj project file in RStudio.
3. Run the load command on the console: load("CFS_BHDS_AS.Rdata").
4. Input: CFS_BHDS_AS() and press CTRL + Enter to start running.
5. View the unpacked file structure (Figure 1):

Figure 1-File after unpacking the software package

2.2 The first analysis: Five minutes to master

Prepare your experimental data file (Exp files/Exp.xlsx).
Prepare or select your ODE model file (ODE files/).
Enter information as prompted by the software.
View and understand the result file (PBPP_RESULT/).

3. Detailed workflow

3.1 Pre-analysis: documentation

3.1.1 Experimental data file (Exp files folder)

Format requirements: .xlsx
Data volume suggestion: Time point > 48, recommended > 240

Figure 2-Exp.xlsx Content example

3.1.2 Model file (ODE file folder)

ODE equation format requirements: Refer to the notes in the model file for modification guidelines.
Each model file contains a modified guide in the form of notes (Figure 3).

Figure 3-Each model file contains a modified guide in the form of notes

3.2 Analysis in progress

Software configuration and operation: Refer to software prompts and README.txt for specific steps.

3.3 Analysis result management

Default output directory: PBPP_RESULT (the directory where the software reads and outputs the latest data).
Directory renaming rule: Rename the PBPP_RESULT directory to any name to allow the software to automatically create a new PBPP_RESULT folder for subsequent analysis.

4. Architecture and design principles

4.1 Methodology

The core methodology of CFS_BHDS_AS is based on high-dimensional space coordinate transformation, and the specific logic is as follows:

For any Ordinary Differential Equation (ODE) model, all parameters α associated with its current prediction curve are defined as predictive coordinates A (each dimension corresponds to a specific parameter value).
Experimental data curves β correspond to unique parameter combinations from the ODE model perspective; similarly, there is at least one target coordinate B with distinct parameter combination values (constituting its dimensional information).
The Euclidean distance AB between A and B represents the degree of divergence between the prediction curve α and the experimental curve β.
Coordinate adjustment strategy:
- Continuously adjust the predicted coordinate A using spatial vectors to approach the target coordinate B.
- If approaching efficiency decreases, find a new spatial vector perpendicular to the current trajectory direction (minimizing Euclidean distance AB) as the next coordinate prediction guidance.
- Distance calculation: Use R language function base::outer(x, y, "-") to calculate the distance information matrix between α and β (after identical weighting for each coordinate), and then compute Mean Squared Error (MSE) to measure their difference.
Dynamic vector adjustment (reinforcement learning + orthogonal constraint):
- Use reinforcement learning to record the historical performance of each parameter direction through a weighted environment, and probabilistically generate new movement vectors based on this data.
- When optimization efficiency declines, orthogonal constraints generate new vectors perpendicular to the current direction to maintain search momentum.
- Use a whitelist/blacklist mechanism to prevent redundant and ineffective searches, and perform local fine-tuning on high-performing directions.

Essentially, this transforms the problem of "how to adjust parameters to make the prediction curve more similar to the experimental curve" into "how to guide the distance between two points to decrease in high-dimensional space".

4.2 Software design

4.2.1 Modular design

Separate ODE files, Exp files, and result output modules for easy management and reuse.

4.2.2 User interaction design

Provide a graphical interface based on the R environment to reduce the threshold of command line operation.

4.2.3 Design trade-offs

Universality and specificity: The software is suitable for a variety of ODE models, but its performance highly depends on the scientificity of the model structure and initial parameters provided by users.
Accuracy and speed: The BestTRD threshold allows users to trade off fitting accuracy and computation time.

5. Key choices in software development

The software adopts two core design choices to ensure stability and effectiveness:

Missing value detection mechanism: A large number of missing value detection codes are set up to prevent system crashes caused by incomplete data.
Unreasonable parameter correction (reinforcement learning): Through the reinforcement learning system, unreasonable parameter adjustments made by users can be effectively transformed into reasonable parameters to ensure effective calculation.

6. Maintenance and support

For software maintenance, technical support, or problem feedback, please contact the developer via email:

Contact Email: shixiu_yakuchi0324@qq.com

Please describe the problem in detail (e.g., error information, operation steps, data format) when sending an email to improve the efficiency of problem solving.

7. Frequently Asked Questions (FAQ)

Q1: How to deal with "long duration display of temporary files"?

A1: This issue occurs when the input ODE file has computational errors (e.g., zero division, calculation of excessively large or small numbers), causing the software to be stuck. It is recommended to:

Check the ODE equation logic for mathematical errors (e.g., denominator variables that may be zero).
Verify the parameter range settings in the ODE file to avoid overflow/underflow.

Q2: Why does the fit fail or report an error?

A2: Please first check the following two core factors:

Data-model matching: Verify the format of wet experiment data and ODE files. You must input data that matches the ODE file specifications. For example, if you input a wet experiment plasmid concentration curve but fail to correctly specify the second column name of Exp.xlsx, the software will be unable to map data to model parameters.
Curve rationality: Although the software has some adaptability to noise curves, using incorrect curve data (e.g., randomly guessed curve shapes) will definitely cause fitting failure. The software can only learn from slightly "casual" curves (Figure 4), not completely irrational data.

Figure 4-The software can learn to have a slightly "casual" curve

Q3: How to replace the model without affecting previous analysis results?

A3: Follow the steps below:

Terminate the running software.
Rename the existing PBPP_RESULT folder to any name (it is recommended to add a data identifier, e.g., PBPP_RESULT_20251003) to save previous results.
Modify the ODE file code (e.g., replace the ODE equation or adjust initial parameters).
Restart the software; it will automatically create a new PBPP_RESULT folder to store the new analysis results.

8. Software limitations

Model type limitation: The software can only accept ODE models at present, and does not support other types of dynamic models (e.g., partial differential equations, stochastic differential equations).
Curve shape length limitation: There is a length limit to the shape of the detection curve; the software can only observe and fit data within the supported length range (the more data provided, the more accurate the fitting result).
Language dependency: The software is not yet independent of the R language; it must run in the R/RStudio environment and cannot be deployed as a standalone executable program.

9. Deployment and integration

9.1 Standard deployment process

Refer to the "Quick Start" chapter for deployment steps, and focus on verifying the following two points to ensure normal operation:

The R version is ≥4.5.0 and the RStudio version is ≥2025.05.0.
The software package is completely unpacked, and the CFS_BHDS_AS.Rproj file can be normally opened in RStudio.

9.2 Visualization and reporting integration

The software's output results (including fitting curves, parameter change charts, etc.) are in standard image formats (e.g., PNG/JPG), which can be directly embedded into academic papers, project reports, or presentation slides without additional format conversion.

Key integration suggestions:

When inserting results into papers, it is recommended to retain the original image resolution (≥300 DPI) to ensure clarity.
When citing results in reports, mark the corresponding PBPP_RESULT folder name to facilitate result tracing.

EasyDock: An Automated Molecular Docking Platform Empowering Synthetic Biology Research

In the fields of synthetic biology and drug design, molecular docking is a crucial tool for understanding the interactions between biological macromolecules and small molecules. However, traditional molecular docking workflows often require complex software configuration, tedious parameter adjustment, and specialized computational biology expertise, creating a technical barrier for many iGEM teams. We previously attempted to perform rapid molecular docking using the existing Free_Cloud_Docking_2D_3D project on Google Colab, but this still required manual clicks, parameter adjustments, and dependency on known ligand positions to set the docking box—making it less suitable for proteins without pre-bound ligands. Additionally, we encountered issues such as network latency, poor performance, and high network dependency on cloud computing platforms. To address these challenges, we migrated the Free_Cloud_Docking_2D_3D project to local computers and developed EasyDock—a user-friendly, automated molecular docking platform—enabling every iGEM team to conduct professional molecular docking analyses on personal computers with ease. The following is a schematic diagram of the software usage and result output:

Project Background and Significance

Molecular docking finds widespread application in iGEM projects: from substrate-binding analysis in enzyme engineering, to ligand recognition studies in biosensors, to exploration of protein-small molecule interactions in synthetic pathways. However, commercial software is often prohibitively expensive, while open-source tools typically require specialized technical expertise. Autodock is cumbersome to use, and EasyDock—built on the smina interface—aims to fill this gap by providing the iGEM community with a solution that is both professional and easy to use. Based on the extensively validated Smina docking engine, EasyDock streamlines the complex molecular docking process into a few simple commands through automated workflows and intelligent parameter settings. Whether a team's research focuses on protein design, metabolic engineering, or biosensing, EasyDock can provide reliable molecular-level insights.

Core Features

Fully Automated Workflow

From inputting a PDB ID and ligand SMILES string, the system automatically completes protein structure downloading, ligand preparation, docking calculation, and result analysis. This integrated design significantly reduces the learning curve and technical burden for users.

Whole-Protein Coverage Docking

Unlike traditional methods that require pre-defining an active site, EasyDock automatically calculates the optimal docking regions across the entire protein surface, ensuring no potential binding sites are overlooked—particularly valuable for studying novel proteins or exploring non-canonical binding sites.

Multi-Conformation Search Capability

The system simultaneously considers multiple 3D conformations of the ligand and performs parallel docking calculations, ultimately providing multiple possible binding modes ranked by binding energy. This comprehensive search strategy greatly increases the probability of identifying the true binding mode.

Rich Visualization Output

EasyDock generates 2D interaction diagrams, 3D interactive views, and professional PyMOL session files, helping team members understand molecular interaction details from multiple perspectives.

Usage Overview

Using EasyDock is straightforward—only the target PDB ID and ligand SMILES string are required to start. For typical iGEM projects involving multiple ligands interacting with the same protein, EasyDock's batch processing capability efficiently handles such tasks. Installation is equally streamlined. We provide an automatic installation script that configures all necessary dependencies with one click. For teams with specific requirements, manual installation and custom configuration are also supported. The entire system is built on open-source tools and is completely free to use, aligning with the open-source spirit of iGEM.

Application Scenarios in iGEM Projects

EasyDock has broad application potential in iGEM projects: Enzyme Engineering: Study interactions between enzymes and substrates, inhibitors, or effectors to guide rational design. Biosensor Development: Understand binding characteristics between receptor proteins and signaling molecules to optimize sensor performance. Metabolic Engineering: Analyze interactions between metabolic enzymes and intermediates to guide pathway optimization. For example, a team developing an environmental pollutant detection sensor could use EasyDock to study binding modes between pollutant molecules and receptor proteins, designing a more sensitive detection system. Another team focused on drug synthesis could optimize key enzyme-substrate interactions to improve synthetic efficiency.

Technical Advantages and Innovations

EasyDock's technical innovations are reflected at multiple levels. The intelligent docking box algorithm automatically determines the optimal search space based on the protein structure (by performing large-scale, multiple searches around the protein's geometric center), avoiding the subjectivity of empirically defined parameters in traditional methods. The multi-conformation parallel processing technique significantly improves computational efficiency, enabling reliable results even with limited computing resources. The system also integrates structure repair and optimization capabilities, automatically addressing common issues such as missing atoms and structural anomalies in PDB files to ensure the quality of input structures. The format conversion module seamlessly connects data transfer between different software tools, providing users with a unified working interface.

Future Development Directions

We plan to continue enhancing EasyDock's functionality, including adding more docking scoring functions, supporting molecular dynamics simulation preprocessing, and integrating machine learning prediction models. We also welcome feedback and suggestions from the iGEM community to jointly advance the development of this tool. Of particular note, we are developing a web-based simplified version that will allow teams without local computing resources to use EasyDock's core functions through a browser. This will further lower the technical barrier for molecular docking and enable more teams to benefit from this technology.

Availability and Support

EasyDock is fully open-source, with code hosted on GitHub and iGEM-GitLab. We provide detailed documentation, tutorial examples, and troubleshooting guides. Technical support is available via email or GitHub Issues. We believe EasyDock will become an invaluable tool for iGEM teams in molecular design, helping researchers better understand and optimize biological systems. We look forward to seeing the innovative applications developed by teams using this platform.

Project Information

Developer：Teng Contact：tenwonyun@gmail.com GitHub Repository：https://github.com/twy2020/EasyDock iGEM-GitLab Repository：EasyDock · main · 2025 Competition / Software Tools / YAU-China · GitLab License: MIT Open Source License