Overview
In silico methods have become a cornerstone of modern drug discovery, offering a time- and cost-efficient alternative to traditional laboratory-based approaches. Two particularly valuable techniques are molecular dynamics (MD) simulations and molecular docking. MD simulations provide a dynamic, time-dependent view of molecular systems, allowing researchers to observe how proteins and ligands move and interact [1]. Conversely, molecular docking is a computational method used to predict the optimal binding orientation of a small molecule (ligand) to a larger target molecule (receptor) [2].
The binding of vitamin B12 (cobalamin) to intrinsic factor (IF) from human, rat, porcine (pig), bovine, and platypus was compared using MD and molecular docking by our team. Each non-human IF was predicted by AlphaFold and was equilibrated by molecular dynamics (MD) simulation. Representative structures were extracted from stable equilibrated structure, prepared, and docked with B12 using Glide SP.
The analysis demonstrated that bovine IF and human IF exhibited the highest geometric complementarity and stabilizing energies, indicating the most tightly packed, native-like complex formation with the ligand among all species studied. In contrast, the platypus structure, despite having the best Glide Score, showed significantly weaker packing, which suggests a more relaxed binding pose. The porcine (pig) structure was the weakest performer, with the lowest scores across all geometric and energetic stabilization metrics.
Materials & Equipment
1. Software and Databases
- RCSB Protein Data Bank (PDB ID: 2PMV) - Human IF crystal structure
- AlphaFold Protein Structure Database (entry AF-P17267-F1-v4, UniProt P17267) - Rat IF
- NCBI database - Protein sequences for bovine, platypus, and porcine IF
- AlphaFold server - Structure prediction for non-human IF proteins
- GROMACS 2025.3 - Molecular dynamics simulations
- Maestro 2025.3 - Molecular docking studies
2. Force Fields and Models
- AMBER14SB force field
- TIP3P water model
- Glide SP docking protocol
3. Computational Resources
- High-performance computing cluster for MD simulations
- Workstation with Maestro software for docking studies
Protocol
1. Structure Collection
- Obtain human IF from RCSB Protein Data Bank (PDB ID: 2PMV)
- Retrieve rat IF from AlphaFold Protein Structure Database (entry AF-P17267-F1-v4)
- Collect protein sequences for bovine, platypus, and porcine IF from NCBI
- Submit sequences to AlphaFold server for structure prediction
- Document sequences and predicted results in Table 1
2. Molecular Dynamics Simulations
- Prepare protein structures using AMBER14SB force field and TIP3P water model
- Perform steepest descent minimization for energy minimization
- Apply thermalization and equilibration under Nose-Hoover NVT ensemble
- Apply equilibration under Parrinello-Rahman NPT ensemble
- Run initial 20 ns NPT simulation for all four species
- Analyze RMSD to determine equilibrium status
- Extend simulations for bovine and platypus to 35 ns (bovine stabilized)
- Extend platypus simulation to 50 ns due to continued fluctuation
- Calculate RMSF to identify unstable regions
- Exclude N-terminal tail (residues 1-30) from RMSD calculations
- Select representative structures: 15 ns (rat), 40 ns (platypus), 12 ns (porcine), 30 ns (bovine)
3. Ligand Docking Simulation
- Load proteins and human protein complex into Maestro 2025.3
- Define binding pocket based on human IF complex with B12 (6 Å radius)
- Apply Protein Preparation Wizard to correct structural errors
- Optimize hydrogen bonds and assign appropriate force fields
- Structurally align all proteins with human protein complex
- Generate docking grid at active site for each protein
- Dock ligand into binding pocket using Glide SP protocol
- Compare docking results with human protein baseline
Results
Analysis of Ligand Binding Affinity
The docking study aimed to predict the binding affinity of a ligand to the IF protein across various species, with the human IF complex serving as the reference baseline. The results, summarized in Table 2, indicate significant species-specific differences in ligand binding. The docking score, a primary metric where a more negative value signifies a stronger binding affinity, reveals that the ligand is predicted to bind most strongly to the platypus protein (-5.256), followed closely by the bovine protein (-4.983). In contrast, the ligand exhibits a significantly weaker predicted affinity for the rat (-3.939) and porcine (pig) (-3.059) proteins.
However, a deeper analysis of additional energy metrics provides a more nuanced understanding of the binding. The Glide Emodel and Glide Eenergy values, which account for the total stability and packing of the complex, suggest that the human and bovine proteins actually form the most structurally sound complexes with the ligand. This indicates a superior geometric complementarity, a perfect physical fit between the ligand and the protein's binding pocket, and stronger stabilizing energies derived from favorable interactions.
The contrast is particularly evident with the platypus protein. Despite its top docking score, its lower Emodel suggests a less compact or more "relaxed" binding pose. This implies that while the initial interaction is highly favorable, the final complex may not be as stable or tightly packed as those formed with the human and bovine proteins.
Molecular Interactions and Structural Basis for Affinity
The observed differences in binding affinity are directly supported by the analysis of the 2D interaction diagrams and 3D docking poses, which reveal the specific molecular interactions. For example, the visual data for the bovine IF (Figure 6A) provides a clear basis for its strong affinity, showing the formation of four predicted hydrogen bonds with distances ranging from 2.0 to 2.1 Å. The corresponding 2D diagram (Figure 7A) confirms these stabilizing interactions with key residues such as Tyr 417, Ser 267, Ser 225, and Asp 222.
The platypus protein's binding pose (Figure 6B) is characterized by a more relaxed conformation, consistent with its lower Emodel score. While it does form two hydrogen bonds (2.0 Å each), its 2D diagram shows interactions with a different set of residues (Thr 168, Val 166, Lys 162, and Gln 204 (Figure 7B)), highlighting the structural differences in its binding pocket compared to the human and bovine proteins.
The low docking scores for the rat and porcine (pig) proteins are similarly explained by their visual data. The rat IF's pose (Figure 6C) and interaction diagram show only two hydrogen bonds (1.7 Å and 2.2 Å) and interactions with residues like Asp 51 and Ser 50 (Figure 7C). The porcine (pig) IF (Figure 6D) demonstrates the weakest affinity, with its binding pose stabilized by only two predicted hydrogen bonds (2.2 Å and 2.5 Å) and interactions with residues like Met 1 and Ser 173 (Figure 7B). The general lack of extensive, stabilizing interactions and distinct residue profiles in these two species directly accounts for their poor predicted binding.
Table 1. Predicted proteins from AlphaFold
Species | Sequence | NCBI web | Results from AlphaFold server |
---|---|---|---|
Bovine | MARAALQLLTLLWAATRTSTQTRSSCSVPSAEQPWVDGIQVIMENSVINSTSPNPSVLIAMNLAGAYNVE AQKLLTFDLMASDTADLTAGQLALTIMALTSSCRDPGDKVSTLRTQMENWTPSSLGSYASTFYGPSLAIL ALCQNNPETTLPVAARFAKTLLASSSPFDVDAGAVATLALTCMYNRIPVGSEEGYRALFAQLLKKIVEDI STRIRDNGLIGDVYSTGLAMQALSVTPERPNKQWDCQQTMDTVLDEIKEGKFQNSMSIAQILPSLKGKTY LDVPQVSCSPDQEVQPTQPNFPSPVPTSASNITVVYTISNQLRGVELLFNVTISVSVKKGSVLLVVLEEA QRRNPKFKFETRMTSWGLEVSSINNIAGNVNDKTYWQFLSGKTPLIEGVADYIPFNHEHITANFTQY | Link | Link |
Platypus | MSSSWVNPRFSQPVSEPSGNPKLLNNKAQAVPAAKQALVTGLQDIMENSVTPSSFVNPSILIAMNLADTE NTEAQKLLTEQILAKDTTEHTIGQLALDVLALASSCRHSGNRLSSLQKKMENWAPTSQTESKITFYQPSL AILALCLKSPEDILPLAARFAKTLLVNTSPFSVDTGAVATLALTCMYSKIPIGVEDGFGELFGQVLKEMV ENLSSRIQDNGLIGNIYSTGLAMQALSVSPVQPNRKWDCKKTMDTVLLEIEQEKFNNPMAIAQILPSLKG KTYLDVVHVTCCSDHEGKTISSSSSCPLPTSPSNITVHYTINNQLRGVELLFKATLAVSVRKGSVLLTVL EEAQRINSMFSFKYSVTSWGIFISSINNLTENANQRTYWQFLSGQTPLDQGVSFYVPFNNEHIMANFTQY | Link | Link |
Pig | MDSTGESGMARAALQLLTLLWAVAGTSTQTRSSCSVPSAEQPLVNGIQVLMEQSVTSSAFPNPSILIAMN LAGAYNTEAQELLTYKLMASNTSDLTTGQLALTIMALTSSCRDPGNRIAILQGQMENWAPPSLDTHASTF YEPSLGILTLCQNNPEKTLPLAARFAKTLLANSSPFNMDTGAMATLALTCMYNKIPVGSEEGYRALFSQV LRNTVENISMRIQDNGIIGNIYSTGLAMQALSVTPERPNKEWDCQKTMDTVLTEIKEGKFHNPMAIAQIL PSLKGKTYLDVPHVSCSPGHEVPPTLPNHPSPVPTPAPNITVIYTINNQLRGVELLFNETISVSVKRGSV LLIVLEEAQRKNPKFKFETTMTSWGPVVSSINNIAENVNHRTYWQFLSGQTPLNEGVADYIPFNHEHITA NFTQY | Link | Link |
Table 2. Glide Docking Scores and Energy Components of Ligand with Different Species' Proteins
Species | Docking score | Glide emodel | Glide internal | Glide energy |
---|---|---|---|---|
Platypus | -5.256 | -38.711 | 4.459 | -32.532 |
Human | -5.112 | -47.695 | 21.619 | -45.332 |
Bovine | -4.983 | -51.187 | 19.236 | -45.123 |
Rat | -3.939 | -38.418 | 14.009 | -37.513 |
Pig | -3.059 | -12.688 | 19.371 | -21.31 |

Figure 1. RMSD over time for each protein during the molecular dynamics (MD) simulations. RMSD describes how much the protein's shape changed from its starting position. A low, stable RMSD means the protein has settled into a good, final shape. The rat and porcine (pig) IF stabilized relatively quickly, within the first 20 ns of the simulation; the bovine IF took a bit longer, stabilizing around the 35 ns mark; the platypus IF stabilized after 20 ns.

Figure 2. Platypus protein's RMSD over the full 50 ns simulation
The constant fluctuation of the line shows that the protein's overall shape never fully settled into a single, stable conformation.

Figure 3. RMSF of platypus
Root Mean Square Fluctuation (RMSF) describes how much each individual part of the protein is wiggling. A high peak on the graph means that a specific part of the protein is very flexible. The analysis showed that the large peak at the beginning of the graph (residues 1-30) was the primary cause of the platypus protein's overall instability.

Figure 4. Binding pocket
The specific amino acid residues that are part of the pocket where the ligand (B12) will bind are shown in this figure. The different colors indicate the type of amino acid, for example, yellow for hydrophobic residues that repel water, and pink for polar residues that are attracted to water.

Figure 5. Workflow of drylab

Figure 6. 3D-predicted binding poses of the ligand. A. Bovine; B. Platypus; C. Rat; D. Porcine (pig)

Figure 7. 2D Interaction Diagrams of Ligand with Different Species' Proteins. This figure provides a two-dimensional visualization of the molecular interactions between the ligand and the intrinsic factor (IF) binding pocket for (A) Bovine, (B) Platypus, (C) Rat, and (D) Porcine (pig). The diagrams show key interactions such as hydrogen bonds and hydrophobic contacts, along with the specific amino acid residues involved in the binding for each species.
References
- Hollingsworth SA, Dror RO. Molecular Dynamics Simulation for All. Neuron. 2018;99(6):1129-1143.
- Meng EC, Shoichet BK, Kuntz ID. Automated docking with grid-based energy evaluation. Journal of Computational Chemistry. 1992;13(4):505-524.
- Data P. RCSB PDB - 2PMV: Crystal Structure of Human Intrinsic Factor- Cobalamin Complex at 2.6 A Resolution. Rcsb.org. Published 2020. https://www.rcsb.org/structure/2PMV
- UniProt. UniProt. Published 2025. Accessed October 1, 2025. https://www.uniprot.org/uniprotkb/P17267/entry
- AlphaFold Protein Structure Database. AlphaFold Protein Structure Database. Ebi.ac.uk. Published 2025. Accessed October 1, 2025. https://alphafold.ebi.ac.uk/entry/P17267
- M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E. Lindahl. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers SoftwareX (2015).
- S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl Tackling Exascale Software Challenges in Molecular Dynamics Simulations withGROMACS. In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale (2015).
- S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, E. Lindahl GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics (2013)
- B. Hess, C. Kutzner, D. van der Spoel, E. Lindahl GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput. (2008)
- D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark, H. J. C. Berendsen GROMACS: Fast, Flexible and Free J. Comp. Chem. (2005)
- E. Lindahl, B. Hess, D. van der Spoel GROMACS 3.0: A package for molecular simulation and trajectory analysis J. Mol. Mod. (2001)
- H. J. C. Berendsen, D. van der Spoel and R. van Drunen GROMACS: A message-passing parallel molecular dynamics implementation Comp. Phys. Comm. (1995)