Software
Prototype and Develop Motivation
During the course of the project, a series of challenges were identified in the transition between dry and wet experiments, as well as within the execution of dry experiments. These challenges are outlined below:
During the integration of dry and wet experiments: It is evident that wet lab personnel are not cognizant of the scope, advantages, and limitations of dry lab capabilities.
During the internal collaboration process of the dry experiment: It is evident that dry lab personnel may encounter difficulties in the integration of state-of-the-art analytical techniques, owing to a paucity of comprehensive understanding of analytical methodologies.
Issues with the release and usability of custom models: Releasing models developed by our team directly as source code may reduce the motivation of subsequent researchers or teams to reuse them.
Issues related to understanding the principles of the model: Despite researchers possessing the necessary qualifications to utilise models, a comprehensive understanding of their operation necessitates a considerable investment of time and effort, including the reading of relevant papers and the completion of additional tasks.
The cost implications of comparing computational results across different models: It is acknowledged that invoking a single model may be a viable option; however, enhancing the robustness of the process by comparing the computational results from multiple models necessitates a considerable increase in computational overhead.
Driven by these motivations, we developed RNA-Factory, a user-friendly, one-stop RNA analysis platform primarily focused on deep learning and integrated with advanced large language model technology.
To address the pain points encountered, we have implemented the following design within the RNA-Factory platform:
Design enhance functionality and facilitating ease of use: The software under discussion integrates multiple advanced models for RNA analysis (structure prediction, interaction prediction, design, etc.) sourced from open-source platforms, collaborative teams, and proprietary research. These models are driven by a highly user-friendly graphical interface.
Design to Lower the Barrier to Understanding Model Principles: By incorporating advanced LLM technology and building model-related resources (papers, self-curated materials) in the background to reduce hallucinations, we enable researchers to directly grasp the underlying principles integrated into the platform through a question-and-answer format.
Design to reduce horizontal comparisons among similar models and enhance model output analysis: The integration of advanced agent technology within the AI assistant of the RNA-Factory platform has the potential to surpass conventional modes of question-answering. It has the capacity to directly invoke relevant computational models based on user requirements, analyse the results, and perform comparative evaluations.
In summary, RNA-Factory is a comprehensive RNA analysis and design platform that combines advanced capabilities with a low learning curve.
One-stop tool integration
RNA-Factory is a comprehensive, multifunctional RNA analysis platform integrating analytical tools primarily focused on structure prediction, interaction prediction, and sequence and structure design.
Structure Prediction
BPFold
BPfold is a deep learning approach designed to address the generalizability issue in RNA secondary structure prediction, particularly for unseen RNA families (out-of-distribution data). Its core innovation is the integration of a novel thermodynamic prior: a pre-computed base pair motif (BPM) energy library. This library enumerates the complete conformational space of locally adjacent three-neighbor base pairs and records their thermodynamic energies derived from de novotertiary structure modeling using the BRIQ force field. BPfold's neural network is specifically designed to learn the relationship between an RNA sequence and these BPM energy maps, leading to significantly improved accuracy and generalization compared to state-of-the-art methods.1
More About BPFold
The BPfold architecture is a deep neural network based on a modified transformer block. The key component is a custom-designed base pair attention block. This block integrates two primary inputs: (1) the feature embeddings of the input RNA sequence and (2) the outer and inner base pair motif energy maps queried from the BPM library for that sequence.1
The base pair attention mechanism works by first processing the energy maps through a hybrid convolutional block containing squeeze-and-excitation (SE) layers to adaptively re-calibrate channel-wise feature responses. The resulting thermodynamic feature map is then added to the attention map derived from the sequence features within the transformer block. This forces the model to jointly attend to sequence information and the thermodynamic relationships of base pairs. The output of the network is an orthogonal matrix representing the possibility score for each nucleotide pair. Finally, structure refinement procedures are applied to this output to enforce physical constraints (e.g., only canonical pairs, no sharp loops, removal of isolated base pairs), yielding the final predicted secondary structure.1
UFold
UFold predicts RNA secondary structures using a fully convolutional network (FCN) on an image-like representation of RNA sequences. It converts sequences into 17-channel matrices encoding base-pairing rules and pairing probabilities, processed by a U-Net architecture. UFold handles variable sequence lengths, pseudoknots, and non-canonical pairs, achieving high accuracy on within-family and cross-family datasets. It significantly outperforms traditional energy-based methods and other learning-based models in F1 score, with fast inference times (~160 ms per sequence). UFold's web server provides accessible RNA structure prediction and visualization.2
More About UFold
UFold represents RNA sequences as 17-channel images: 16 channels for base-pairing rules (via Kronecker product of one-hot encodings) and one channel for pairing probabilities from three rules. The U-Net encoder-decoder uses convolutional blocks with residual connections, layer normalization, and CELU activations. The output is a contact score matrix, post-processed with constraints (symmetry, no sharp loops, no overlapping pairs) via linear programming. Training minimizes cross-entropy loss with a weight factor to address class imbalance. Inference involves thresholding the score matrix to derive secondary structures. The model is trained on datasets like RNAStralign and ArchiveII, with data augmentation using Contrafold-generated synthetic data.2
MXFold2
MXfold2 integrates deep learning with thermodynamic principles for RNA secondary structure prediction. It uses a deep neural network to compute folding scores (helix stacking, opening, closing, unpaired regions) and combines them with Turner's free energy parameters. Trained with max-margin framework and thermodynamic regularization, MXfold2 achieves robust performance on within-family and cross-family datasets, outperforming methods like CONTRAfold and SPOT-RNA. It also shows high correlation with experimental free energies, making it suitable for stability assessment.3
More About MXFold2
MXfold2’s DNN processes RNA sequences with 1D convolutions, BiLSTM layers, and 2D convolutions to generate folding scores. The scores are integrated with thermodynamic parameters to compute loop energies, and optimal structures are predicted via Zuker-style dynamic programming. The model is trained with structured hinge loss and thermodynamic regularization to align folding scores with free energies. Input includes sequence embeddings and structural features, and output is a contact probability matrix. Evaluation metrics include F1 score, PPV, sensitivity, and correlation with experimental energies.3
RNAformer
Reformer is a deep learning model designed to predict RNA-protein binding interactions at single-base resolution using a transformer-based architecture. It processes RNA sequences and incorporates attention mechanisms to identify binding motifs and quantify binding affinities. The model leverages eCLIP-seq data to learn RBP-specific binding patterns, enabling accurate prediction of binding sites and the impact of genetic variants on RBP interactions. Reformer demonstrates strong performance in motif discovery, functional annotation of RBPs, and prioritization of pathogenic mutations, validated through electrophoretic mobility shift assays (EMSAs). Its ability to reconstruct known motifs and associate them with biological functions highlights its utility in understanding RNA regulatory mechanisms and disease-related variants.4
More About RNAformer
Reformer employs a bidirectional transformer encoder to process RNA sequences tokenized using 3-mers. Each input sequence is represented as a set of tokens, including special classification (CLS) and separator (SEP) tokens, along with the RBP target name. The model consists of 12 transformer layers with 12 attention heads and 768 hidden units. Token embeddings are combined with trainable position embeddings to capture spatial information. The self-attention mechanism computes attention weights between token pairs, enabling the model to capture complex, multi-level interactions within the sequence. The output is passed through a linear layer with a ReLU activation function to predict binding affinity at each base position. Dropout (rate 0.1) is applied to prevent overfitting. For binary classification (Reformer-BC), the final layer is replaced with a sigmoid activation to distinguish binding from non-binding sites. The model is trained using mean squared error (MSE) loss for regression and binary cross-entropy for classification, optimized with mini-batch stochastic gradient descent.4
Interaction Prediction
RNAmigos2
RNAmigos2 accelerates RNA virtual screening by combining deep graph learning with docking-based data augmentation. It represents RNA binding sites as 2.5D graphs (Leontis-Westhof classification) and ligands as molecular graphs, encoded by pre-trained networks. The model predicts binding affinity using a compatibility score decoder trained on docking data and experimental complexes. RNAmigos2 achieves a 10,000x speedup over docking, with high enrichment factors in large-scale in vitro screens, and synergizes with docking tools to improve efficiency and diversity in lead compound identification.5
More About RNAmigos2
RNAmigos2 uses relational GCNs to encode RNA graphs and molecular GCNs for ligands. The RNA encoder is pre-trained with metric learning on graph similarity, and the ligand encoder uses a variational autoencoder. Decoders predict affinity (Aff) or compatibility (Compat) scores, trained with BCE or L2 loss. The mixed model combines both decoders for enhanced performance. Virtual screening involves presorting compounds with RNAmigos2 and refining with docking. Evaluation metrics include AuROC, enrichment factors, and efficiency under time constraints. The model is validated on the ROBIN dataset, showing robust performance across diverse RNA targets.5
Reformer
Reformer is a deep learning model that predicts protein-RNA binding affinity at single-base resolution using cDNA sequences alone. Trained on 225 eCLIP-seq datasets encompassing 155 RNA-binding proteins (RBPs) across three cell lines, it leverages a transformer architecture to capture high-resolution interactions between binding peaks and their contextual regions. Reformer outperforms existing methods in binary classification of binding sites and accurately quantifies binding affinities, achieving a Spearman correlation of 0.63 with experimental data. Its attention mechanism identifies enriched motifs beyond traditional eCLIP-seq peak regions, including those critical for RNA processing functions. The model also predicts mutation effects on RBP binding, validated experimentally via electrophoretic mobility shift assays (EMSAs), demonstrating its utility in prioritizing pathogenic variants and elucidating RNA regulatory mechanisms.6
More About Reformer
Reformer employs a bidirectional transformer encoder comprising 12 layers with 12 attention heads and 768 hidden units. Input sequences (511 bp cDNA) are tokenized using 3-mer representations and prepended with RBP-specific tokens (e.g., "SRSF1&K562"). Token and positional embeddings are combined and processed through multi-head self-attention, which computes attention weights αᵢⱼ between all base pairs to capture long-range dependencies. The output is passed through a linear layer with ReLU activation to predict log₂(1 + normalized coverage) for each base. For binary classification (Reformer-BC), a sigmoid-activated linear layer replaces the regression head. The model is pre-trained on RNA sequences and fine-tuned with mean squared error (MSE) loss. Attention weights are post-processed via average product correction (APC) to reduce noise, and high-attention regions are analyzed for motif enrichment using tools like STREME and TOMTOM. Mutation effects are quantified by comparing binding affinity changes in 100 bp windows around variants. The architecture’s reliance on sequence alone, without structural inputs, simplifies application across diverse RBP contexts and enables precise base-resolution predictions.6
CoPRA
CoPRA bridges protein and RNA language models (ESM-2 and RiNALMo) with complex structure information to predict protein-RNA binding affinity. It uses a Co-Former to fuse sequence embeddings from PLMs and RLMs with pairwise structural features extracted from interface atoms. CoPRA is pre-trained with contrastive protein-RNA interaction (CPRI) and mask interface distance modeling (MIDM) tasks to enhance interaction understanding. On the curated PRA310 dataset, it achieves state-of-the-art performance in affinity prediction and mutation effect estimation, demonstrating robustness and generalization across diverse RNA families.7
More About CoPRA
CoPRA’s Co-Former is a dual-path transformer with structure-sequence fusion modules. Inputs include protein and RNA sequences encoded by ESM-2 and RiNALMo, with interface nodes selected based on distance thresholds. The model computes invariant pair-wise features (node type, sequential distance, spatial distance, angular information) to form a pair embedding. The Co-Former uses multi-head self-attention guided by structural embeddings and outer product updates for pair features. Pre-training involves CPRI (contrastive loss for interaction pairs) and MIDM (distance prediction with cross-entropy loss). For downstream tasks, the model predicts ΔG or ΔΔG using an MLP on special node embeddings. Training employs Adam optimizer with plateau scheduling, and evaluation metrics include RMSE, MAE, PCC, and SCC.7
DeepRPI
The PekingHSC 2025 team independently developed DeepRPI, a model for predicting RNA-protein interactions. Its key feature is that it does not require high-resolution structural data of RNA and proteins as input. Instead, it uses sequence data alone to predict the likelihood of such interactions. This design overcomes the issue of scarce structural data for RNA and RNA-protein complexes.
More About DeepRPI
The DeepRPI model consists primarily of an embedding module, a cross-attention module and an attention pooling module. To ensure high performance, it employs advanced pre-trained protein and RNA language models as sequence embedding models. The cross-attention mechanism fuses features from protein and RNA sequences to capture potential interaction sites.
De Novo Design
Mol2Aptamer
Mol2Aptamer is a deep learning model developed by the PekingHSC 2025 collaborative team HZAU-China for designing binding aptamer sequences based on given small molecules. This model accepts small-molecule SMILES format as input and outputs the sequence of a potential nucleic acid aptamer. It presents a novel approach to RNA sequence design.
More About Mol2Aptamer
The model architecture utilizes a sequence-to-sequence framework, employing tokenizers for both SMILES and RNA sequences as indicated by the presence of tokenizer files. The core logic involves encoding the input SMILES string and decoding it into an RNA sequence, likely using transformer-based or recurrent neural network modules, as suggested by the inference and tokenizer scripts. This structure enables the model to learn contextual dependencies and generate accurate aptamer predictions from molecular inputs.
This is a details block.
RNAFlow
RNAFlow is a flow matching model for protein-conditioned RNA sequence and structure design. It integrates an RNA inverse folding module (Noise-to-Seq) and a pre-trained RoseTTAFold2NA (RF2NA) network to generate RNA sequences and structures iteratively. By incorporating inverse folding into the denoising process, RNAFlow simplifies training and avoids fine-tuning large structure prediction networks. The model conditions on inferred conformational ensembles to handle dynamic RNA conformations, enhancing its ability to design functional RNAs. Evaluated on protein-conditioned tasks, RNAFlow outperforms existing methods in sequence recovery, RMSD, and lDDT, demonstrating its efficacy in generating realistic RNA aptamers for targets like GRK2.8
More About RNAFlow
RNAFlow uses a conditional flow matching framework where the denoising network comprises Noise-to-Seq (a geometric graph neural network) and RF2NA. The input includes protein backbone structures and sequences, with RNA sequences generated autoregressively. Noise-to-Seq encodes the protein-RNA complex as a graph, with nodes representing amino acids and nucleotides, and edges connecting nearest neighbors. Node features include unit vectors to neighboring atoms and scalar features like residue identity. The model employs GVP-GNN layers with vector gating and ReLU activations. During training, noisy RNA backbones are interpolated with ground truth structures, and the inverse folding model predicts denoised sequences. RF2NA folds these sequences into structures, supervised by MSE loss for coordinates and cross-entropy for sequences. For trajectory-based inference (Traj-to-Seq), multiple conformations are processed by a multi-graph neural network to predict sequences, improving recovery rates. The output rescoring model selects high-quality designs based on predicted recovery rates.8
RNA-Frameflow
RNA-FrameFlow is the first generative model for de novo 3D RNA backbone design, adapting SE(3) flow matching to RNA structures. It represents RNA nucleotides as rigid-body frames (C3′, C4′, O4′) and parameterizes non-frame atoms using torsion angles. The model addresses RNA conformational flexibility and data scarcity through structural clustering and cropping augmentations. RNA-FrameFlow generates locally realistic backbones, with over 40% validity based on self-consistency TM-scores, and captures RNA-specific features like ring puckering. It outperforms diffusion-based baselines in novelty and diversity, providing a foundation for conditional RNA design applications.9
More About RNA-Frameflow
RNA-FrameFlow uses SE(3) flow matching to generate RNA frames, each represented as a translation (C4′ position) and rotation (C3′-C4′-O4′ orientation). The prior distribution is a unit Gaussian, and frames are aligned via Kabsch alignment. The flow model predicts updates for frames using a neural network based on Invariant Point Attention (IPA) layers and transformer encoders. Auxiliary losses include coordinate MSE, distogram loss, and torsional loss to ensure geometric realism. The model is trained with AdamW optimizer, and sampling involves integrating over an ODE with Euler steps. For conditional generation, initial poses are guessed using RF2NA, and the model refines predictions iteratively. Evaluation metrics include TM-score, RMSD, and Earth Mover’s Distance for local structural measurements.9
RNAMPNN
RNAMPNN is an RNA sequence design model that was developed independently by the PekingHSC 2025 team. It predicts possible sequences based on a given RNA structure, in a process analogous to the transfer of ProteinMPNN into the RNA design domain. RNAMPNN can perform "unfolding" tasks for RNA scaffold structures generated by scaffold-based models such as RNA-Frameflow. When provided with input from real RNA structures, RNAMPNN can optimise their stability. The model that has actually been integrated into the platform is RNAMPNN-X, which is an enhancement of the fundamental RNAMPNN framework.
More About RNAMPNN
RNAMPNN uses a graph neural network framework that is similar to that used by ProteinMPNN. First, it performs basic geometric feature extraction on the input RNA scaffold. Then, it employs a self-attention mechanism similar to BERT for feature mixing — the 'pre-fusion' stage. Message passing is conducted using a classical GCN, followed by a further self-attention mechanism similar to BERT for feature mixing in the 'post-fusion' stage. Finally, an output head generates the final results. RNAMPNN-X improves upon the base model by replacing the traditional MLP output layer with XGBoost. During training, RNAMPNN-X first uses an MLP as the output layer to enable pre-training with global differentiability. Subsequently, the output layer switches to XGBoost to predict sequence information on features extracted by the pre-trained RNAMPNN.
RiboDiffusion
RiboDiffusion is a generative diffusion model designed for RNA inverse folding based on tertiary structures, addressing the challenge of identifying functional RNA sequences that satisfy 3D structural constraints. Unlike traditional methods focused on secondary structures, RiboDiffusion learns the conditional distribution of sequences given fixed RNA backbone geometries. It employs a denoising process to iteratively refine random sequences into candidates that match target structures, balancing sequence recovery and diversity through tunable sampling weights. The model demonstrates superior performance in sequence recovery, with an average improvement of 11% for sequence similarity splits and 16% for structure similarity splits, and consistently excels across various RNA lengths and types. Its ability to generate sequences folding into desired 3D conformations makes it a powerful tool for RNA design in synthetic biology and therapeutics.10
More About RiboDiffusion
RiboDiffusion integrates a graph neural network (GNN)-based structure module and a Transformer-based sequence module to parameterize the diffusion process. The structure module extracts SE(3)-invariant geometric features from coarse-grained RNA backbone representations (C4′, C1′, and N1/N9 atoms) using a GVP-GNN architecture. It constructs a geometric graph where nodes represent nucleotides, and edges connect top-k nearest neighbors based on C1′ atom distances. Node features include dihedral angles, orientation vectors, and corrupted one-hot encoded sequences, while edge features incorporate directional vectors, Gaussian radial basis encodings for distances, and sinusoidal positional encodings. The sequence module, built on Transformer layers, processes nucleotide embeddings combined with structural features and diffusion context (e.g., log signal-to-noise ratio). It employs adaptive normalization (adaLN) and multi-head self-attention to capture intra-sequence correlations. The model is trained to predict original sequences from noisy inputs via a weighted MSE loss, with self-conditioning and random structure dropout enhancing robustness. During sampling, ancestral sampling with a reverse-time SDE iteratively denoises random Gaussian noise into sequences under structural constraints, allowing control over diversity and recovery via conditional scaling weights.10
AI Assistant—More Than Just Q&A
The RNA-Factory platform integrates advanced LLM technology, leveraging the powerful automation capabilities of large language models to further lower the barriers and complexity for users in understanding models, selecting models, and analyzing outputs.
RAG System
Despite their vast knowledge reserves, large language models (LLMs) face significant issues with hallucination when applied directly to the specialised field of RNA analysis. Furthermore, LLMs may struggle to understand the context based on the specific circumstances of the platform, resulting in ineffective question answering.
What is RAG?
Retrieval-Augmented Generation (RAG) enhances large language models by integrating retrieval mechanisms with generative processes. It first retrieves relevant information from external knowledge sources, such as databases or document collections, then conditions the generator on this retrieved context to produce informed and accurate responses. This architecture mitigates hallucination and improves factuality, particularly in knowledge-intensive tasks. RAG represents a hybrid approach, combining the strengths of parametric memory with non-parametric retrieval, thereby advancing the robustness and applicability of generative AI systems in real-world scenarios.
The RAG system effectively mitigates issues with model hallucinations. We have constructed and integrated an RNA analysis model knowledge base into the LLM. This enables the model to respond to user queries based on the platform's actual context, preventing it from providing unrealistic information.
Agent for RNA Analysis
Although a chatbot can help users to understand model architecture, they still need to select and invoke the appropriate models independently, and then analyse and compare the outputs. This undoubtedly poses challenges for researchers who are unfamiliar with the wide range of RNA analysis methods available.
What is agent?
Agent technology refers to the development of autonomous software entities capable of perceiving their environment, making decisions, and executing actions to achieve designated goals. These intelligent agents operate without direct human intervention, often leveraging advancements in artificial intelligence, particularly in reasoning, planning, and machine learning. A multi-agent system comprises multiple interacting agents, enabling the solving of complex, distributed problems that are beyond the capability of a single agent. This paradigm provides a powerful framework for modeling and implementing sophisticated, adaptive, and decentralized computational systems.
To make it easier for users to select models and analyse their outputs, we have designed and integrated a highly automated agent workflow into the platform. This allows the LLM to autonomously select and invoke one or more suitable models for analysis based on user requirements and automatically compare and analyse the results provided by these models.
Together as One
RNA-Factory is not the result of PekingHSC's dry lab department working in isolation, but rather a product of extensive collaboration and communication both within and between teams.
From and for Wet Lab
The development of RNA-Factory was inspired by the collaboration between the Dry Lab and Wet Lab teams at Peking HSC 2025. During this process, we observed that traditional biological researchers often lack the ability to independently select and utilise advanced open-source models for data analysis, as well as the capacity to interpret the results generated by these models. This presents a significant challenge for teams that do not include specialised wet lab data analysts.
During the initial development phase, we were directly inspired by the needs of wet lab researchers. After the platform went live, we actively encouraged both internal and external traditional wet lab researchers to independently analyse data using RNA-Factory. This initiative yielded positive feedback, which we used to continuously improve and upgrade the platform. For example, the idea of integrating an AI assistant into the platform came from wet lab researchers and was ultimately implemented.
Powered by Dry Lab
In addition to actively encouraging wet lab researchers to explore RNA-Factory, we also promote dry lab researchers to integrate the analytical tools within RNA-Factory into their existing data analysis workflows. For instance, we have successfully incorporated a deep learning-based RNA secondary structure prediction model into our designed RNA-small molecule molecular dynamics analysis workflow, achieving positive results.
Support from HP and other team
The HP team at PekingHSC 2025 also played an active role in the development of RNA-Factory. To ensure comprehensive functionality, we aimed to incorporate the most extensive models possible into the platform. The HZAU-China 2025 team, similarly dedicated to RNA research, had developed Mol2Aptamer—a nucleic acid sequence design model for small-molecule targets—which filled a critical gap in our platform's capabilities within this domain. During subsequent platform development, we further collaborated on data organization, model integration, and platform interface design. The HZAU team primarily handled the compilation of RNA-related model documentation and the design and illustration of the platform logo.
TODO
Although the current design of the RNA Factory is largely complete, we have the following improvement plans in place and are working on them progressively:
Enhanced RAG System: Create and integrate a database of iGEM RNA-related projects into the RAG model. This will enable the model to provide recommendations for biological components and project organisation workflows based on user requirements.
Simplified Deployment Solution: As the project manages the installation environment for all software in the background, users are required to possess basic Python coding skills for local installation, thereby maintaining a certain user threshold. We plan to lower this threshold further by designing and launching a fully automated local installation and deployment process.
Cloud Computing Services: Due to the computational resource requirements of the software, the current project is hosted on PekingHSC's on-campus computing servers. Consequently, off-campus users experience limitations compared to on-campus users. We plan to deploy an external version on third-party servers to meet the needs of off-campus researchers and competition teams. However, this initiative has not progressed due to a lack of funds for server rental and LLM API purchases.