Protein Modelling | TJI-Seoul

Protein Modeling

Our project began with an in silico design strategy, meaning we first used computers to model and predict how our proteins would behave before moving into the lab. This computational step was essential for us, because it helped us understand the structure of MHC Class I molecules, identify where peptides bind, and decide which amino acids might be changed to improve binding. By doing so, we were able to make more informed decisions about which protein variants to test in the lab.

Understanding MHC Class I Structure

MHC Class I molecules play a central role in the immune system. They are found on the surface of nearly all human cells, where they act like tiny "display cases" for short pieces of protein, called peptides. These peptides are presented to immune cells, which then check whether the peptide looks normal or dangerous, such as when it comes from a virus or cancer cell.

The structure of MHC Class I can be broken down into several important parts. The α chain is the main component of the molecule and is divided into three regions called α1, α2, and α3. The most important feature of this chain is the peptide-binding groove, which is formed by the α1 and α2 regions. This groove is like a slot where peptides fit snugly, allowing them to be shown to the immune system. To keep the entire structure stable, the α chain pairs with another small protein called β2-microglobulin (B2M). Without B2M, the MHC molecule cannot fold correctly or stay on the cell surface.

For our project, we focused on one of the most common and clinically important versions of MHC Class I: HLA-A*02:01. This allele is found in many people around the world and is known to present not only viral peptides but also tumor-related peptides. Because of this, it has been studied extensively, and many of its structures have been solved and deposited in the Protein Data Bank. The availability of high-quality structural data made HLA-A*02:01 an ideal starting point for our modeling work.

Figure 1. Schematic illustration of MHC Class I protein. a1 and a2 domains form the peptide-binding cleft where tumor antigen is recognized.

Choosing Our Targets

To begin our computational work, we first gathered sequence and structural information. We relied on resources such as UniProt, which provides protein sequences and functional information, and the RCSB Protein Data Bank (PDB), which archives experimentally determined 3D structures. For HLA-A*02:01, we found hundreds of available crystal structures, including some that already had both β2-microglobulin and bound peptides.

We chose to focus on a well-characterized viral peptide from cytomegalovirus, pp65 (sequence NLVPMVATV), which is known to bind strongly to HLA-A*02:01. We selected the crystal structure PDB ID: 6Q3K, which contains HLA-A*02:01 paired with B2M and this peptide. This gave us a high-resolution view of how the peptide fits into the binding groove and allowed us to analyze the key interactions holding it in place.

Figure 2. Information regarding the protein of interest has been extracted from UniProt.

Binding Site Analysis

Using the 6Q3K structure, we carefully examined the peptide-binding groove. We found that several amino acids in the groove made direct contact with the peptide. These included residues like tyrosine, tryptophan, lysine, and glutamic acid, which formed hydrogen bonds, electrostatic interactions, and hydrophobic pockets that helped hold the peptide in position. In total, we identified 23 key residues that seemed to play a role in stabilizing the peptide.

Understanding which residues interact with the peptide was an important step because it gave us a list of potential sites to modify. Some residues are essential anchors that should not be changed, while others might be adjusted to improve binding to tumor-derived peptides.

Figure 3. 3-Dimensional reconstruction of MHC Class I (HLA Type A, 0201 allele)

Mutagenesis Strategy

After identifying the residues in the binding groove, we used a technique called alanine scanning in silico. This involves virtually replacing each residue with alanine, one at a time, to see how much it affects peptide binding. If replacing a residue with alanine caused binding to weaken significantly, that residue was considered essential. If the change did not affect binding much, that residue became a candidate for further engineering.

We then used computational tools to calculate how mutations might change the binding affinity between the peptide and HLA-A*02:01. By scoring and ranking these potential mutations, we were able to highlight specific positions in the groove where modifications might enhance peptide presentation. These predictions then served as the basis for designing new mutant constructs that could later be tested in the lab.

Docking with DiffDock

To simulate how peptides and MHC molecules interact, we used an AI-based molecular docking program called DiffDock. This tool allowed us to model the peptide (pp65, in this case) inside the binding groove of HLA-A*02:01 and predict the strength of the interaction. Unlike traditional docking programs, DiffDock uses machine learning to generate realistic binding poses, making it both faster and often more accurate.

In our simulations, we compared the wild-type HLA-A*02:01 with our in silico mutants to see whether the peptide was more or less likely to fit into the groove. This helped us predict which mutations would increase binding affinity and which might destabilize the complex. By combining docking results with our alanine scanning, we were able to narrow down a shortlist of promising mutants.

Design--Build--Test--Learn in Action

Our computational modeling followed the design--build--test--learn cycle that guides all iGEM engineering work. First, we designed our project by identifying the HLA-A*02:01 allele and the pp65 peptide as our model system. Next, we built computational models of mutant proteins, altering specific residues in the binding groove. We then tested these models in silico, using tools like alanine scanning and DiffDock to measure predicted changes in peptide binding. Finally, we learned from the results, prioritizing the most promising variants for actual expression and purification in the wet lab.

By starting with an in silico phase, we were able to reduce guesswork and save time in the laboratory. Instead of testing random mutations, we focused on rational designs supported by computational evidence. This approach reflects the growing importance of AI-assisted protein engineering in synthetic biology and demonstrates how computational and experimental work can complement each other in advancing new ideas.