Molecular dynamics (abbreviated to MD) is a computational method that simulates the physical movements of atoms and molecules over time to understand their behavior and properties. In contrast to experimental procedures, molecular dynamics offer a more flexible, safe, and cost effective way of analyzing structures of molecules and can be applied to the areas of biomedical engineering. In our project we strive to study the structure of fusion protein and assess its stability and propensity to bind to guide RNA. With the help of computational tools such as AMBER, a suite of biomolecular simulation programs, we are able to simulate a nine step relaxation of the cas13a complex with a diphtheria toxin and glycoprotein 120 and observe its structural integrity.
An additional, and rather crucial part of our project here was the analysis of protein structure even prior to our relaxation steps. This includes a comparison of generated alphafold structures to similar, existing files from the Protein Data Bank (PDB). Please refer to our team’s engineering section for changes we made based on the structure of our synthetic fusion protein.
Our project inherently holds concerns regarding physical and structural design/integrity:
When creating an input structure for a simulation an input structure is a necessity. Using the plasmid sequences (from wet lab), AlphaFold Server was able to generate model structures of our fusion protein. AlphaFold Server is a web-service that can generate highly accurate biomolecular structure predictions containing proteins, DNA, RNA, ligands, ions, and also model chemical modifications for proteins and nucleic acids in one platform. As shown below is the fusion protein as generated by alphafold consisting of a three cargo system. The Cas13a, the DTA toxin, and the GP 120.
Figure 1: Fusion protein with labeled markers.
As mentioned above, having a force field to describe interactions between molecules in a system is an absolute must for an MD simulation. The force field chosen was ff14SB. Furthermore, the choice of an explicit water solvent model was chosen. Explicit models treat individual water molecules as distinct entities and are represented with their own atoms and interactions. For these experiments TIP3P water model was used.
In order to ascertain the stability of the fusion protein a relaxation was run. A relaxation can be thought of as a gradual adaptation to its environment and that is what is done here.
Topology files were generated using AMBERS tleap command and due to the choice of explicit water tleap was run twice: once to add water and determine the volume of the simulation system, and a second time after we use the volume to determine how many ions need to be added to give a specific salt concentration.
It was important to analyze the energy minimization results to ensure both the maximum force and potential energy are within reasonable thresholds.
Figure 2. Potential energy minimization.
Simulations are carried out in two stages: NVT (constant temperature) and NPT (constant pressure) equilibration, in order to attain thermal and mechanical equilibrium of the system. These are steps 2 and 3 of the relaxation process. The system temperature is progressively raised to the desired 300K to achieve stable conditions, and the system's stability is confirmed by monitoring the temperature, pressure, volume, and density curves over time.
Figure 3a,b. Figure of the potential energy from the second heating step as well as the temperature.
Figure 4a,b. Volume of the system over time and the density of the system over time for step 3.
Under the conditions of equilibrium, a long-term production simulation is conducted. In order to further analyze the intermolecular interactions and dynamic conformational changes.
Using VMD the key interactions of the protein system can be visualized through selection and rotation commands, identifying important residues and binding pockets that play significant roles during the simulation process.
Figure 5: Fusion protein surrounded by water molecules.
As stated in literature Cas13a relies on two regions for cleavage, they are named HEPN-I and HEPN-II.
Figure 6: Higher eukaryotes and prokaryotes nucleotide-binding site for Cas13a.
Within these domains lie two pairs of amino acids each that have been known to be highly conserved as shown below. Thus the distance over time between the pairs of histidine and arginine were measured. The RMSD of both HEPN regions was also measured compared to the original structure as a reference. Root mean square deviation is a fundamental metric for measuring structural deviation during the simulation process. By calculating the deviation of structures at different time points, it is possible to assess the stability of the system.
Figure 7: The two pairs of amino acids on each HEPN domain. HEPN-I contains the 1255th and 1260th which are Histidine and Arginine respectively. The histidine and arginine pairs for HEPN-II are 1936 and 1941 (Wang et al).
Figure 8a,b. Distance over time for both pairs of amino acids.
Figure 9a,b. Root mean square deviations of each back bone.
It is observed in figure 7 that both amino acid pair distances remain stable throughout the process of the simulation. Furthermore, as shown by the RMSD graphs the values range from 0-2. Generally if the value is small (under 1 or 2 angstroms), the match is pretty good and we call these "very similar" structures. If the value is higher, such as 3 or 4, it's hard to know if that means all of the selected atoms are about 3-4 angstroms from where they are in the reference, or if it means that many of the atoms match very closely (1-2), and that other atoms deviate more strongly (4-5, or maybe more). The goal of this specific RMSD is to compare how the simulated structure changes over time with respect to the experimental structure.
Overall we can conclude that the general stability of the Cas13a is maintained once the fusion protein is created.