Glucagon-like peptide-1 (GLP-1) is a gut-derived incretin hormone released after meals that coordinates how the intestine, pancreas, brain, and other organs manage incoming nutrients (Ali Jazayeri et al., 2011). When functioning properly, GLP-1 levels rise promptly with food intake and then fall quickly, creating a brief, well-timed signal (Baggio and Drucker, 2007). The hormone's activity is short-lived due to its short half-life of 1.5-3 minutes (Williams, 2014). Additionally, it is rapidly broken down by the enzyme DPP-4 or degraded by the kidneys, making GLP-1's effects tightly related to eating (Baggie and Drucker, 2007). Functionally, GLP-1 helps keep blood sugar in range by enhancing insulin release when glucose is high and by reducing glucagon, which otherwise pushes glucose upward (Müller et al., 2019). It also slows stomach emptying and reduces appetite, which blunts post-meal glucose spikes and supports weight control (Müller et al., 2019). Overall, these properties make GLP-1 important to study, because understanding its biology can better explain gut-brain-pancreas communication and drive the development of better therapies for diabetes, obesity, and other related health issues, which are serious public health concerns worldwide (Ali Jazayeri et al., 2011).
GLP-1 Structure
At iGem at Notre Dame, we are interested in controlling GLP-1 production, concentration, and binding affinity in order to enhance its therapeutic efficacy, particularly when in combination with a yogurt base. In order to better understand its production efficiency and binding affinity, we need to investigate codon usage and protein structural change, two biological properties that are intertwined. When there is a change in a codon, there is a resultant change in the amino acid that is translated. As a result, there is a change to the protein's structure that is being coded for, which in turn affects the protein's function. While there are twenty standard amino acids in humans, we will focus on Leucine, Serine, Glycine and Alanine as they are the most abundant (Scherer, 2011). We will determine where to make modifications in the amino acid chains by using multiple sequence alignment and homolog search. By focusing on only four amino acids and using multiple sequence alignment and homolog search, we will be able to significantly reduce the time spent on determining the particular locations where edits are most likely to improve expression or receptor binding, and then prioritize those sites for testing. This is why computational biology is essential: it narrows the search space, saves lab time, and helps us tune GLP-1 production, concentration, and binding affinity.
DNA, RNA, and Protein Formation
To collect the data essential to answer our research question, we collected the amino acid sequence of GLP-1 and then used the reference sequence as a query for remote homolog, and performed homolog search. To achieve this goal, we ran BLAST (Basic Local Alignment Search Tool) and HMMER against UniProt IDs to collect homologs. After this step, we removed duplicate sequences and performed some basic statistical analyses including the number of sequences, mean and median sequence length, and the sequences' associated species. From this data, we were prepared to perform multiple sequence alignment (MSA).
From our data collection, at this point we were able to obtain a list of all amino acid sequences that are related to GLP-1. The next goal, then, was to align the sequences to find residues that are evolutionarily stable or evolutionarily unstable in each position of the sequence. To do this, we performed MSA by using SALIGN, MUSCLE, and clustal omega. This provided alignment results, which were consequently downloaded for further analysis. At this point, statistical analysis was performed in Python. In particular, we have performed Shannon entropy, and data plotting (including map sequence conversion onto structures).
MSA Step by Step
Methodology of MSA
To answer our research question, we designed an empirical strategy involving multiple sequence alignment (MSA). In this process, amino acid sequences are compared to find the most evolutionarily conserved amino acids in the sequence. The theory behind this process is that the most evolutionary conserved bases in the sequence will be the most integral to the protein structure. Thus, MSA allows us to identify the areas of the sequence that are highly evolutionarily conserved and thus are most likely to be necessary to maintain protein structure and function. Our goal was to modify the amino acids in the sequence for GLP-1 to include greater specificity with its receptor. However, this is computationally infeasible due to the large number of nucleotides. By using MSA, we were able to determine the amino acids that are likely able to be changed without disrupting the protein structure significantly. Thus, using MSA made the computational process more efficient. To measure how well conserved each amino acid in the GLP-1 structure is, Shannon Entropy was used. This tool indicates how well conserved or how well spread the variance in the target variable is (Tsai, 2017). In this case, Shannon Entropy was used to determine which amino acids were conserved (evolutionarily important) and which were varied (better candidates to be changed to test fit). In this way, we were able to test GLP-1 mutants for bond affinity with the GLP-1 receptor in a computationally efficient manner, which is essential in designing an effective GLP-1 agonist (Wang et al., n.d.).
"Entropy measures conservation at each position. High entropy indicates variability and tolerance to change."
\[ H(X) = - \sum_{i=1}^{n} p(x_i) \log p(x_i) \]
By identifying amino acids in the GLP-1 molecule that are prone to mutation based on MSA, we hypothesize that these amino acids can be selected as candidates for GLP-1 structural modification. Positions with low evolutionary conservation, shown as high variability, are under weaker purifying selection. These residues are evolutionarily more tolerant to change, which is why they become our primary candidate sites for engineering because substitutions there are less likely to disrupt folding, processing, or receptor engagement. In contrast, highly conserved positions are likely function-critical and will be avoided or altered only with strong justification.
One major implication of our results is finding potential mutant sequences that can guide future directions for wet lab research. Indeed, we were successfully able to identify (via MSA) key residues in the GLP-1 amino acid sequence that are candidates for mutation without potentially affecting protein structure, and hence, protein function. Given the protein structures determined by MSA, we would like to predict protein structure using computational structure prediction methods (such as AlphaFold3). We will then predict the binding affinity of these predicted structures. Additionally, to bolster the selection of mutant structures, we will apply motif analysis. This will progress the study by analyzing protein structure candidates and determining the most effective mutant to use in the final product. In this way, this analysis will allow the product to act more effectively and perform its biological function at the utmost efficacy. Indeed, these findings will provide the wet-lab researchers with possible GLP-1 mutant replacements with improved receptor binding affinity that would allow the product to work more effectively.
AlphaFold3 Prediction