Design of DNA Origami
1. Choice of basic structure
Originally proposed by Rothemund in 2006, the rectangular structure of DNA origami has been utilized in numerous studies and proven to be stable (1-3). It's highly programmable (4), allowing for modifications such as adding functional elements as extensions of staples, attachments or cargos (Figure 1).
Figure 1. Illustration of basic structure of DNA origami (DO). The original structure is integrated by M13mp18 ssDNA scaffold strand and 192 short ssDNA staples. The arrows indicate 192 different staples.
2. Addition of S-PAM-cap and PAM-rich DNA strands
To knockout the drug-resistant pathogenic gene, sgRNA/Cas9 needs to be loaded. To better recruit sgRNA/Cas9, we introduced capturing strands (S-PAM-cap) for PAM-rich strands and sgRNAL. Cas9 protospacer adjacent motif (PAM) is sequence that recruit Cas9 (5). To enhance the chance of loading sgRNA/Cas9, the PAM-rich strands were introduced containing multiple copies of PAM sequences, made up of simple staple sequence and a direct addition of PAM sequence.
For the design of S-PAM-cap sequences, literature searches and preparations were conducted. To release the sgRNA/Cas9 complex, which was loaded on the DNA origami as above, an intracellular mechanism was required to separate the sgRNA/Cas9 complex from the delivery complex, resulting in an "activated" sgRNA/Cas9 and downstream targeting gene cleavage. Then, RNase H enzyme was found out, functioned as essential enzymes for survival in Staphylococcus aureus, which will cut out the DNA-RNA hybrid region, satisfying our desire for disconnecting the linking site of sgRNAL and DNA origami (6). Thus, active sites and catalytic mechanisms of RNase H were investigated. The grooves of the DNA-RNA hybrid interacted with highly conserved residues of the enzyme (7), so both DNA and RNA will be recognized. Besides, the cleavage effect of the enzyme does not require existence of specific sequence of substrates, unlike the effect of Cas9 protein, which requires a unique PAM sequence. Thus, the composition and arrangement sequence of bases on the cleavage site are not important. Regardless of the sequence, all kind of substrate possess the possibility to be cut, and the only difference is the catalytic speed and efficiency. For the speed, what should be highly considered is the overall shape formation of the substrate DNA, given that the more the area matches the enzyme's depression, the easier it is to be catalyzed. In our designed FoCas, the S-PAM-cap sequences, derived from former articles, were designed to display a suitable structure for RNase H enzyme to recognize.
Meanwhile, except for the design of specific staple sequences, the location of those S-PAM-cap and PAM-rich DNA strands were considered. Based on preliminary estimates, the diameter of the Cas9 protein is about one fourth of the width of the rectangular DNA origami. To reserve space for protecting sgRNA/Cas9 from complex environment of the wound, we excluded choices of staples near the edges. Additionally, to balance the space and the loading efficiency, arranging 6 staples in the center of each DNA origami were finally considered 6 staples to be reasonable (Figure 2).
Figure 2. Illustration of loading structure of DNA origami (DOPAM). The arrows indicate 192 different staples. The red arrows indicate the S-PAM-cap strands, which are located in the middle of rectangular DNA origami plane.
3. Addition of S-G4
Reactive oxygen species (ROS) could harm bacteria membrane (8) and create "pores", while the G4/hemin complex facilitates the controlled release of ROS, perforating the bacterial membrane without harming other cells and tissues. The detailed reason why we chose G4/hemin as the membrane permeabilization component can be found in the description.
To integrate G4 into the DNA origami, we selected specific staple strands and appended the G4 array sequence GGGTAGGGCGGGTTGGG to the 3' end of these staples, and these modified staples are called S-G4 (Figure 3). For G4/hemin, though the density, quantity, location of loading site and order of assembly lack normalized standards corresponding to different forms of DNA-origami, general principles guiding the locations for functional staple strands on DNA origami advise against locations that are near the edge of the origami structure to avoid structural errors (9). To affirm whether the G4 sequence is connected to the 3' end or the 5' end, so that the connector will extend in the desired direction, we referred to statistics provided in DNA-nanotube models (10). Crowding of G4/hemin is proved to result in a lowering of its catalytic efficiency (11), so compromise between the density and quantity of G4/hemin should be highly considered. Finally, to refine the synthesis protocol, we decided to first allow the assembly of G4 and the DNA-origami and then add hemin, instead of assembling G4/hemin beforehand. This is due to relative instability of G4/hemin DNAzyme compared to native enzymes, when exposed to the solvent (12). Also, the stability of the G4 structure was proved to be affected by the addition of hemin, while G4 itself could remain stable until the hemin was added (13). After analyzing the characteristics and interaction between G4/hemin and DNA origami, we comprehensively designed a new DNA-origami with 136 copies of G4 (Figure 3).
Figure 3. Illustration of loading and G4 sites on the DNA origami (DOPAMG). The arrows indicate 192 different staples. The red arrows indicate the S-PAM-cap strands (a total of 6 sites). The blue dots are G4 sequences added at 3' ends of staples (a total of 136 sites).
4. Addition of aptamers
To enhance the membrane disruption efficiency of DNA origami while minimizing damage to human cells, we added aptamers to the sides of DNA origami. The membrane protein, PBP2a, was selected as target for aptamers.
After screening and optimization of the aptamer sequence targeting MRSA (link to aptamer screening), a suitable aptamer was found (sequence: CCATCCACACTCCGCAAGGGTGCCCCGGGGGGCTGTTCAGCGTGGTGGTGGGATGCCGTGTTGGTCCTTAGTCTCCGTCGTCGGCTGCCTCTAC AT). Given that the model used for experimental validation is Escherichia coli MG1655, which was utilized as an alternative organism due to biosafety concerns, we employed an aptamer sequence that has been validated in the literature to specifically target Escherichia coli (sequence: CATATCCGCGTCGCTGCGCTCAGACCCACCACCACGCACC) (14). To enable the aptamer and the DNA origami to be connected through base pairing, we added a linker to the 3' end of the aptamer (C-APT), designed to be complementary to the origami structure (linker sequence: TTTTTCGCTTATTATTATTATTATTA). Subsequently, we incorporated 12 staple strands on the DNA origami that contain complementary sequences to the aptamer linker sequence at its 5' end (Apt-cap), allowing it to pair with the aptamer (sequence: TAATAATAATAATAAGCGTTTTT) (Figure 4). This ensures that the aptamer can bind to the DNA origami through base pairing facilitated by the complementary sequences. Considering that aptamers have a more complex structure than linear DNAs, they are more likely to interfere with each other if placed in the middle of the DNA origami rectangle. Additionally, aptamers would bind to the target more effectively if they were not obstructed by other functional staples. Therefore, we positioned the aptamers along the short sides of the DNA origami. For the placement pattern, we referenced the work of Li et al. and added two additional adjacent aptamers in the middle of the short side to ensure efficient targeting (15).
Figure 4. Illustration of loading and aptamer sites on the DNA origami (DOAPAM). The arrows indicate 204 different staples. The red arrows indicate the S-PAM-cap strands (a total of 6 sites). The purple arrows indicate the S-Apt-cap (a total of 12 sites), which capture C-APT.
5. Addition of disulfide bonds
To enhance the stability of the sgRNA/Cas9 complex during application, we decided to introduce a new staple to roll the rectangular DNA origami structure into a cylindrical shape, referred to as S-Lock. The evidence for the process of folding from a rectangle to a cylinder was given by previous study. Approaches like thrombin-loaded origami cylinder (16) and tissue plasminogen activator-loaded origami cylinder (17) were achieved. Our cargo, the Cas9 protein, is similar with the thrombin and tissue plasminogen activator, as they all contain regions of typical high-density positive charge. Also, Yin et al. mentioned that DNA backbones contain negative charge that repulse negatively charged molecules (19). So, we incorporated a DNA strand that is complementary to the frame at both ends of the origami, aiming to fold the rectangular structure into a cylindrical form (Figure 5). Additionally, we introduced thiol groups to each strand so that adjacent S-Lock structures could form disulfide bonds, further stabilizing the overall structure. To optimize the unutilized staples on the long sides, we placed disulfide bonds in the middle of 8 staples on each side. This arrangement strengthens the bond between the two sides, facilitating the transformation of the origami rectangle into a tubule. Yin et al. used a similar number of staples for side-to-side binding, which demonstrated an appropriate density of functional staples (18). Furthermore, the length of their staples that do not pair with the scaffold is similar to ours. Based on this, we assume that the adjacent staples will not interfere with each other, and the likelihood of incorrect pairing is minimal.
Figure 5. Illustration of loading, G4, aptamer and lock sites on the DNA origami (S-DOAPAMG). The arrows indicate 204 different staples. The red arrows indicate the S-PAM-cap strands (a total of 6 sites). The purple arrows indicate the S-Apt-cap (a total of 12 sites), which cap C-APT. The green arrows with orange dots at the middle are staples with disulfide bonds (a total of 16 sites), the orange dots indicate the disulfide bond formation sites. They are complementary to the scaffold on the other side so that the two sides will meet and bind.
Supporting tables: Lists of DNA sequences for DNA origami.
Note: Supporting tables will be provided in PDF format.
References
1. Rothemund PWK. Folding DNA to create nanoscale shapes and patterns. Nature. 2006 Mar;440(7082):297–302.
2. Li Z, Wang L, Yan H, Liu Y. Effect of DNA Hairpin Loops on the Twist of Planar DNA Origami Tiles. Langmuir. 2012 Jan 31;28(4):1959–65.
3. Yu L, Xu Y, Al-Amin M, Jiang S, Sample M, Prasad A, et al. CytoDirect: A Nucleic Acid Nanodevice for Specific and Efficient Delivery of Functional Payloads to the Cytoplasm. J Am Chem Soc. 2023 Dec 20;145(50):27336–47.
4. Fu J, Li T. Spatial Organization of Enzyme Cascade on a DNA Origami Nanostructure. Methods Mol Biol. 2017;1500:153–64.
5. Karvelis T, Gasiunas G, Siksnys V. Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: A brief overview. Methods. 2017 May 15;121–122:3–8.
6. Hang T, Zhang X, Wu M, Wang C, Ling S, Xu L, et al. Structural insights into a novel functional dimer of Staphylococcus aureus RNase HII. Biochem Biophys Res Commun. 2018 Sept 10;503(3):1207–13.
7. Hyjek M, Figiel M, Nowotny M. RNases H: Structure and mechanism. DNA Repair (Amst). 2019 Dec;84:102672.
8. Xie W, Zhang S, Pan F, Chen S, Zhong L, Wang J, et al. Nanomaterial-based ROS-mediated strategies for combating bacteria and biofilms. Journal of Materials Research. 2021 Feb 1;36(4):822–45.
9. Zhan P, Peil A, Jiang Q, Wang D, Mousavi S, Xiong Q, et al. Recent Advances in DNA Origami-Engineered Nanomaterials and Applications. Chem Rev. 2023 Apr 12;123(7):3976–4050.
10. Berengut JF, Berengut JC, Doye JPK, Prešern D, Kawamoto A, Ruan J, et al. Design and synthesis of pleated DNA origami nanotubes with adjustable diameters. Nucleic Acids Res [Internet]. 2019 Dec 16 [cited 2025 Sept 10];47(22):11963–75. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145641/
11. Yang B, Wang R, Li W, Wang J, Liu H. On-Origami Molecular Crowding Control of G-Quadruplex DNAzymes. Small Methods [Internet]. 2025 [cited 2025 Sept 10];9(6):2401401. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1002/smtd.202401401
12. Anthony R. Monte Carlo III, Jinglin Fu. Inactivation Kinetics of G‐Quadruplex/Hemin Complex and Optimization for More Reliable Catalysis - Monte Carlo - 2022 - ChemPlusChem - Wiley Online Library [Internet]. [cited 2025 Oct 3]. Available from: https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/cplu.202200090
13. Ghahremani Nasab M, Hassani L, Mohammadi Nejad S, Norouzi D. Interaction of hemin with quadruplex DNA. J Biol Phys [Internet]. 2017 Mar [cited 2025 Sept 10];43(1):5–14. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5323342/
14. Mela I, Vallejo‐Ramirez PP, Makarchuk S, Christie G, Bailey D, Henderson RM, et al. DNA Nanostructures for Targeted Antimicrobial Delivery. Angew Chem Int Ed. 2020 July 27;59(31):12698–702.
15. Li S, Jiang Q, Liu S, Zhang Y, Tian Y, Song C, et al. A DNA nanorobot functions as a cancer therapeutic in response to a molecular trigger in vivo. Nat Biotechnol. 2018 Mar;36(3):258–64.
16. Li S, Jiang Q, Liu S, Zhang Y, Tian Y, Song C, et al. A DNA nanorobot functions as a cancer therapeutic in response to a molecular trigger in vivo. Nat Biotechnol. 2018 Mar;36(3):258–64.
17. Yin J, Wang S, Wang J, Zhang Y, Fan C, Chao J, et al. An intelligent DNA nanodevice for precision thrombolysis. Nat Mater. 2024 June;23(6):854–62.
18. Jue Yin, Siyu Wang, Jiahui Wang, Yewei Zhang, Chunhai Fan, Jie Chao, Yu Gao, Lianhui Wang et al. An intelligent DNA nanodevice for precision thrombolysis | Nature Materials [Internet]. [cited 2025 Oct 3]. Available from: https://www.nature.com/articles/s41563-024-01826-y#ref-CR22
Aptamer Screening and Optimization
Our research team focuses on the aptamer selection for Methicillin-Resistant Staphylococcus aureus (MRSA) receptors, aiming to develop a universal, efficient, and transferable technical workflow. This workflow is not only designed to meet the current aptamer screening needs for MRSA receptors but also to flexibly adapt to receptor targets of other bacterial strains, thereby providing a reusable technical framework for aptamer research related to microorganisms.
What is an aptamer?
An aptamer is a short-chain nucleic acid molecule isolated from a nucleic acid library via in vitro SELEX (Systematic Evolution of Ligands by Exponential Enrichment) technology. It can bind to target molecules (such as proteins, small molecules, etc.) with high specificity and high affinity. Featuring no immunogenicity and easy modifiability, aptamers are widely used in the fields of biomedicine, detection, and drug development, and are known as "chemical antibodies".
1. Core Model Selection: Introduction and Application of Aptatrans
1.1 Core Mechanism of Aptatrans
We introduce the Aptatrans model based on the Transformer architecture, whose core advantage lies in achieving efficient screening through the aptamer sequence generation algorithm driven by Monte Carlo Tree Search (MCTS), namely Apta-MCTS:
1.1.1 Introduction to the Aptatrans Architecture
Step 1: Fragmentation of long molecular chains
Both aptamers and proteins exist as long molecular chains, which cannot be directly "processed" by the model. Thus, they first need to be fragmented into short segments (referred to as "tokens", analogous to words). The fragmentation methods for the two differ to align with their respective characteristics:
Aptamers: First, thymine (T) in DNA is replaced with uracil (U) in RNA, and all aptamers are uniformly treated as RNA. The "k-mer algorithm" is adopted, which functions like "small scissors" of fixed length to cut the long chains. For instance, if a 3-nucleotide grouping is selected, a sequence such as "GGC GGA GAA" will be fragmented into short segments like "GGC", "GCG", and "CGG". This approach preserves the key local information of aptamers, as their binding to proteins often relies on local structures.
Proteins: The "FCS (Frequent Subsequence) mining algorithm" is employed. First, the frequency of short segments composed of 1–3 amino acids in all protein sequences is counted; the infrequent "rare segments" are removed, and the high-frequency segments are retained to construct a "frequent subsequence vocabulary". This vocabulary is then used to fragment new protein sequences. For example, the sequence "MSRLDKSKVI" may be fragmented into "MSR", "LDK", "SK", etc., enabling accurate capture of the functional key segments in proteins.
Step 2: Encoding of short segments
The model needs to acquire information about each segment, such as its position in the long chain and its relationships with other segments. This step is accomplished by two "dedicated encoders" (one for processing aptamer segments and the other for protein segments). The core of these encoders is the "Transformer" technology, and they have undergone prior "foundation learning" (pre-training).
Step 3: Calculation of the "binding potential" of segments (interaction computation)
With the encoded segments, the next step is to calculate the "binding probability" between aptamer and protein segments, which is carried out in two steps:
Calculate the similarity scores between the vectors of each aptamer segment and protein segment, and generate an interaction matrix.
Extract features using convolution blocks.
Step 4: Generation of "binding results" (binding determination)
The "core information vector" obtained in Step 3 is input into a "fully connected layer" (a simple computation module) to generate a "binding score". If the score exceeds a preset threshold (e.g., 0.5), the aptamer and protein are determined to be "capable of binding"; if the score is below the threshold, they are determined to be "incapable of binding".
1.1.2 Advantages of the Model
Through a dual-cycle mechanism of "probabilistic search + iterative optimization", this algorithm directionally screens aptamer candidates with high affinity for target proteins (e.g., MRSA receptors) from a vast pool of potential nucleic acid sequences.
- Key Feature: It only requires the input of the amino acid sequence of the target protein to complete the prediction of the aptamer sequence, without relying on complex structural information, which significantly reduces the difficulty of target preprocessing.
- Existing Issue: Meanwhile, we recognize that the saturated docking method has limitations in handling macromolecular proteins, primarily manifested in its high demand for computational resources, making it difficult to efficiently perform the search of complex conformational spaces in macromolecular systems.
- Advantage: In contrast, the Aptatrans model, relying on its deep learning-based architectural design and sequence-driven prediction mechanism, exhibits stronger applicability in the aptamer prediction task for macromolecular proteins. It can reduce computational costs while maintaining favorable prediction performance.
What is a Transformer?
The Transformer is a deep learning model architecture proposed by Vaswani et al. in 2017, which is widely used in natural language processing (NLP) and sequence-to-sequence tasks. Its core innovation lies in the introduction of the Self-Attention Mechanism, which enables it to achieve excellent performance in processing sequence data.
What is MCTS?
Monte Carlo Tree Search (MCTS) is an algorithm applied in decision-making processes, particularly suitable for game-related problems. It estimates the value of each potential action by simulating a large number of random games, thereby selecting the optimal next step. The core idea of the MCTS algorithm is to concentrate computational resources on the most promising branches to improve search efficiency.
1.2 Supplementary Information on Data Sources
Although Aptatrans demonstrates innovation in the field of aptamer prediction, its direct prediction results cannot yet fully meet the accuracy requirements of this project, which is limited by the scale of the model training dataset. Therefore, we integrate the validated sequences from professional aptamer databases (e.g., the Aptamer Database (AptaDB) [2], accessible at https://www.aptamer.org/) to establish a dual-source data support system of "model prediction + database screening", thereby further improving the reliability of candidate aptamers.
Figure 2. The homepage of the database.
The specific operations are as follows:
- Enter the target name (e.g., pbp2a) in the search box at the top right corner, select the "Aptamer" database on the right side, and click "Search".
- Locate the appropriate aptamer results based on the search outcomes and click on the corresponding entry.
- Scroll down the page to obtain the sequence and related literature.
- The aptamer we selected is shown in the figure below:
Procedure for converting an aptamer sequence into a PDB file:
- Access the website RNAComposer [3] via the link: http://rnacomposer.ibch.poznan.pl/.
- Click "Load example" and select "3", then replace the original sequence in the text box with the target aptamer sequence, and click "Compose".
- Wait until the prompt "Task completed" appears (as shown in the figure below); the PDB file can be downloaded via the button at the bottom left corner.
1.3 Validation of Model Versatility
In their published research, the development team of Aptatrans has verified the feasibility of the model through multiple sets of protein targets that are non-MRSA receptors (e.g., Escherichia coli surface proteins, streptococcal adhesins, etc.). This verification confirms that the model possesses aptamer prediction capability across different targets. This characteristic provides crucial technical support for the design of the "universal workflow" in this project, and validates the application potential of the model in aptamer screening for receptors of other bacterial strains.
2. Candidate Aptamer Validation System: Multi-Tool Cross-Validation
To ensure that the screened aptamers (including sequences predicted by Aptatrans and those screened from databases) possess high affinity and specificity, we have established a triple cross-validation system, which evaluates the binding effect between aptamers and target proteins through the collaboration of multiple tools:
Validation Method | Core Principle | Output Result |
---|---|---|
Aptatrans Built-in Evaluation System | Based on the affinity prediction model trained by the model, the binding potential between aptamers and target proteins is quantified using probability values. | Aptamer-target protein binding probability (API probability) |
ZDOCK Tool Validation [4] | ZDOCK is a computational tool for protein-protein docking. It predicts the most probable 3D conformations of two proteins during binding using a method based on Fast Fourier Transform (FFT). | Multiple optimal binding conformations |
HDOCK Tool Validation [5] | It predicts protein-protein and protein-deoxyribonucleic acid (DNA)/ribonucleic acid (RNA) docking based on a hybrid algorithm combining template-based modeling and ab initio free docking. | Binding probability, visual model of binding region |
Operational Principle
Each candidate aptamer must undergo validation using the three aforementioned methods. Only sequences that perform excellently in all three validations are retained for subsequent experimental procedures, thereby minimizing the false positive rate to the greatest extent.
How to Use ZDOCK
- Access the ZDOCK website via the link: ZDOCK, and you will see the interface as shown below:
- Enter the PDB ID or upload the PDB file, input your email address, and click "Submit". After successful submission, simply wait and check your email.
- Once you receive the email, open the included link, download the files, and use MOE (Molecular Operating Environment) to view the results.
How to Use HDOCK
- Access the HDOCK website via the link: HDOCK Server.
- Select the relevant PDB files (Note: Correcting the typo "pdf" to "PDB" as per academic context) and click "Submit".
- View the images of aptamer-protein binding and the binding probability online.
ZDOCK Visualization Display
3. Innovative Screening Mechanism: Dual-Path Optimization Strategy
Based on the integration of the aforementioned technical modules, we propose two aptamer screening and optimization mechanisms, forming a closed-loop process of "Prediction - Validation - Optimization":
3.1 Mechanism 1: Prediction-Validation Collaborative Screening
Workflow
- Input the target protein sequence (e.g., MRSA receptor) into Aptatrans to generate an initial candidate aptamer library.
- Perform preliminary filtering on the candidate library (e.g., API probability ≥ 0.8) to retain high-potential sequences.
- Conduct cross-validation using multiple tools and screen out sequences that meet the standards of all three validations.
- Integrate aptamer sequences from databases to finally determine experimental-grade candidate aptamers.
Advantage
Through "model prediction guidance + multi-tool validation check", the screening range is quickly narrowed, and screening efficiency is improved.
3.2 Mechanism 2: Directed Optimization of Low-Affinity Aptamers
Workflow
- For aptamers with low binding probability (e.g., API probability < 0.6) in validation but with potential structural advantages (e.g., presence of conserved binding motifs), perform manual modification of local sequences (e.g., adjusting stem-loop structures, optimizing base-complementary regions).
- Re-input the modified sequences into the Aptatrans built-in evaluation system, ZDOCK, and HDOCK for secondary validation.
- Conduct iterative optimization until the aptamer meets the preset validation standards, or eliminate the sequence if it is confirmed to have no optimization potential.
Advantage
It avoids discarding aptamers with potential value; instead, it improves their binding ability through directed modification, thereby reducing screening costs.
References
1. Shin I, Kang K, Kim J, Sel S, Choi J, Lee JW, et al. AptaTrans: a deep neural network for predicting aptamer-protein interaction using pretrained encoders. BMC Bioinformatics. 2023 Nov 27;24(1):447.
2. Global Aptamer Database. Global Aptamer Database [Internet]. 2024 [cited 2025-8-22]. Available from: https://www.aptamer.org/
3. Institute of Bioorganic Chemistry, Polish Academy of Sciences. RNAComposer: Automated RNA Structure 3D Modeling Server. Available from: http://rnacomposer.ibch.poznan.pl/ (2025-09-30).
4. Weng Lab. ZDOCK: Protein-Protein Docking Server. Available from: https://zdock.wenglab.org/ (Accessed 2025-09-30).
5. Huazhong University of Science and Technology. HDOCK: a web server for protein-protein and protein-Dna/RNA docking based on a hybrid strategy. Available from: http://hdock.phys.hust.edu.cn/ (2025-09-30).