Dry Lab

Plasmid constructs

As part of the Dry Lab activities, schematic representations of designed plasmid constructs were created. The pET-24a(+) vector served as the backbone, into which the gene of interest was inserted in silico using SnapGene 8.1.1 for construct visualization and validation.

Figure 1. pET-24a(+) plasmid backbone with a-S1-casein (Homo sapiens) with N-terminal 6xHisTag and TEV-site insert. Part CasAlpha_S1_Hum amplified with universal primers and restricted with NdeI and XhoI fast digest restrictases, ligated into restricted backbone with T4 ligase.

Figure 2. pET-24a(+) plasmid backbone with a-S1-casein (Bos taurus) with N-terminal 6xHisTag and TEV-site insert. Part CasAlpha_S1_Bov amplified with universal primers and restricted with NdeI and XhoI fast digest restrictases, ligated into restricted backbone with T4 ligase.

Figure 3. pET-24a(+) plasmid backbone with β-casein (Homo sapiens) with N-terminal 6xHisTag, TEV-site and 2xCys insert. Part CasBetaHum-2Cys-N amplified with universal primers and restricted with NdeI and XhoI fast digest restrictases, ligated into restricted backbone with T4 ligase.

Figure 4. pET-24a(+) plasmid backbone with β-casein (Homo sapiens) with N-terminal 6xHisTag, TEV-site and C-terminal 2xCys insert. Part CasBetaHum-2Cys-C amplified with universal primers and restricted with NdeI and XhoI fast digest restrictases, ligated into restricted backbone with T4 ligase.

Figure 5. pET-24a(+) plasmid backbone with β-casein (Bos taurus) with N-terminal 6xHisTag, TEV-site and 2xCys insert. Part CasBetaBov-2Cys-N amplified with universal primers and restricted with NdeI and XhoI fast digest restrictases, ligated into restricted backbone with T4 ligase.

Figure 6. pET-24a(+) plasmid backbone with β-casein (Bos taurus) with N-terminal 6xHisTag, TEV-site and C-terminal 2xCys insert. Part CasBetaBov-2Cys-C amplified with universal primers and restricted with NdeI and XhoI fast digest restrictases, ligated into restricted backbone with T4 ligase.

The workflow of AlphaFold analysis is based on deep learning from available information about protein amino acid sequences and structures in order to predict possible structural composition of proteins from available amino acid chains. This is achieved by comparing user-given amino acid sequences to real, already known structures in some of the major protein databases, such as UniProt and Protein Data Bank (PDB) [1]. However, AlphaFold is optimised to predict folded, stationary 3D structures of proteins, while intrinsically disordered proteins are usually described to have non-stationary structures with no single equilibrium state. As a result, when AlphaFold attempts to assign a fixed structure to regions that are structurally heterogeneous and flexible, it often will predict low-confidence structures in the flexible regions, rather than providing an accurate and well-predictable structure. Another noteworthy fact is that PBD, which is the main source of protein structures for AlphaFold, mostly contains ordered proteins. [2], so the algorithm severely lacks disordered structures to compare the user-given structures to.

Caseins are intrinsically disordered proteins, meaning that they are natively unfolded and perform their biological function despite not having well-defined secondary or tertiary structures [3]. Disordered structures of caseins are crucial to their biological functions, allowing them to form thermodynamically stable complexes with calcium phosphate, which is necessary for its transportation that is the primary function of casein.

α-S1-caseins

The structure of α-S1-casein is amphipathic, meaning that it contains both hydrophobic and hydrophilic domains, which is critical for its function of calcium binding and micelle formation with other casein types. Both human and bovine α-S1-caseins are highly phosphorylated, which influences their stability and function in milk. The disordered structure of α-S1-casein can be explained by a high content of proline, which is known to disrupt α-helical and β-structures in proteins, and around 70% of α-S1-casein is in an unordered form [4].

We did the structural modelling of our modified α-s1-casein proteins of human and bovine origin with added N-terminal 6xHisTag and TEV protease binding sites to assess the structural accessibility of the tag and recognition site as well as the potential impact of modifications on the structures of the proteins. Modelling was done by using open-source AlphaFold 3 provided by DeepMind [5].

The pTM score for the human α-S1-casein with N-terminal 6xHisTag and TEV-site was 0.23, which is a low global-confidence score; however, that is to be expected because we re analysing an intrinsically disordered protein which does not have a single stable global fold. The overall structure contains regions with very high (pLDDT > 90) and confident (90 > pLDDT > 70) predicted Local Distance Difference Test (pLDDT) confidence test scores where protein forms α-helix secondary structures and low (70 > pLDDT > 50) or very low (pLDDT < 50) pLDDT scores for flexible unstructured regions. In total, the structure contains 3 α-helices with confident prediction scores and multiple flexible regions for which AlphaFold 3 is unable to generate reliable predictions. Our N-terminal 6xHisTag and TEV-site structures are a part of an N-terminal α-helix with confident and very high pLDDT scores.

Modelling results are similar in the case of bovine α-S1-casein with N-terminal 6xHisTag and TEV-site. Overall pTM score is 0.2. Unlike for human α-S1-casein, for bovine-origin protein AlphaFold 3 predicts 5 shorter α-helices, although with lower pLDDT scores. The structural organisation of the N-terminus with the inserted 6xHisTag and TEV site remains unchanged, and AlphaFold 3 predicts its organisation into an α-helix secondary structure with pLDDT > 70.

Overall protein structures of human and bovine α-S1-caseins are similar with slight differences in distribution of structured regions which may affect their flexibility and possible interactions.

β-caseins

β-casein is the second most abundant milk protein after α-S1-casein with an important role in ensuring casein micelle stability. Its C-terminal end is highly disordered and hydrophobic due to many non-polar amino acid residues, while the N-terminal end is more structured and forms an α-helix [4, 6]. The remaining part of the protein exists mainly in the form of random coils and turns with no intramolecular cross-linking [7]. β-casein is the most amphiphilic of these four casein proteins, and it has become one of the most studied random coil proteins.

Our project included two different types of β-caseins human and bovine origin with different types of modifications. Additionally to the N-terminal 6xHisTag and TEV protease cleavage site that were added to simplify protein purifications, the sequences on modified β-caseins also contained 2 additional cysteine residues in either the N- or C-terminal end.

Information about cysteines

β-casein contains no native cysteine residues, which in proteins usually ensure the formation of covalent disulfide bonds which are often critical for protein stabilisation and mechanical properties[8]. Using cysteine residues is one of the common strategies employed in protein crosslinking [9]. However, improper site selection for cysteine residue insertion can lead to disruption of native protein structure by forming unwanted interactions, which is why it is advisable to introduce the amino acid residues into the terminal ends or flexible regions of the proteins [10;11]. Considering the structural properties of native β-casein, we decided to introduce the 2 cysteine residues in either N-terminal or C-terminal end of the protein, as the N-terminal end is more structured while the C-terminus is more flexible and exposed for interactions.

Firstly, structures of human-origin β-casein with 2xCys in either the N- or C-terminus were modelled.

Human β-casein with 2xCys in N-terminal

Human β-casein with 2xCys in C-terminal

The pTM of the AlphaFold 3 prediction for the β-casein with 6xHisTag, TEV-site and 2xCys in the N-terminal end was 0.23. The low score once again clearly indicates that predicting the structure of intrinsically disordered proteins is challenging and their structure is variable, rendering the model unreliable. The modelled protein structure with a high confidence level contains a relatively long N-terminal α-helix. Insertion of N-terminal 6xHisTag, TEV-site and 2xCys seemingly does not lower the pLDDT score of α-helix structural prediction. The rest of the protein is unordered except for a short coil near the C-terminus, which is predicted with low (70 > plDDT > 50) confidence level.

The pTM of AlphaFold 3 was 0.24 for the same protein with 6xHisTag and TEV-site but 2xCys inserted in the C-terminal end. The model predicts an N-terminal α-helix with a very high confidence level and a slightly more coiled structure for the rest of the protein than in the case of β-casein with 2xCys in the N-terminal; however, the pLDDT of these regions is low, so the coiled secondary arrangement of the amino acid chain is not definitive.

Secondly, we modelled the structures of bovine-origin β-casein with an N-terminal 6xHisTag, TEV-site and 2xCys in either the N-terminal or C-terminal end.

Bovine β-casein with 2xCys in N-terminal

Bovine β-casein with 2xCys in C-terminal

Unlike human-origin β-casein, two α-helices are predicted to be in the N-terminal end of the bovine-origin β-casein protein, albeit each of them shorter than the one predicted for human-origin β-casein. For both variants of modifications the insertion of 2xCys in either the C- or N-terminal end it does not seem to have an impact on the confidence of the model prediction, as the pTM score for both predictions is 0.2. As mentioned, the predicted structure of bovine-origin β-casein with a high confidence score (pLDDT > 70) contains 2 α-helices in the N-terminal end linked by a short unstructured loop as well as a few coiled regions with a low confidence score (70 > pLDDT > 50).

References

[1] V. Agarwal and A. C. McShan, “The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins,” Nature Chemical Biology, vol. 20, no. 8, pp. 950–959, 2024, doi: https://doi.org/10.1038/s4158902401638w.

[2] T. L. Gall, P. R. Romero, M. S. Cortese, V. N. Uversky, and D. A. Keith, “Intrinsic Disorder in the Protein Data Bank,” Journal of Biomolecular Structure and Dynamics, vol. 24, no. 4, pp. 325–341, 2007, doi: https://doi.org/10.1080/07391102.2007.10507123.

[3] P. Tompa, “Intrinsically unstructured proteins,” Trends in Biochemical Sciences, vol. 27, no. 10, pp. 527–533, 2002, doi: https://doi.org/10.1016/S09680004(02)021692.

[4] M. Y. Bhat, T. A. Dar, and L. R. Singh, “Casein Proteins: Structural and Functional Aspects,” in Milk Proteins: From Structure to Biological Properties and Health Aspects, I. Gigli, Ed., IntechOpen Books, 2016. doi: https://doi.org/10.5772/64187.

[5] J. Abramson et al., “Accurate structure prediction of biomolecular interactions with AlphaFold 3,” Nature, vol. 630, no. 8016, pp. 493–500, 2024, doi: https://doi.org/10.1038/s4158602407487w.

[6] M. Noelken and M. Reibstein, “Conformation of β-casein B,” Archives of Biochemistry and Biophysics, vol. 123, no. 2, pp. 397–402, 1968, doi: https://doi.org/10.1016/00039861(68)901501.

[7] Y. Li, X. Liu, H. Liu, and L. Zhu, “Interfacial adsorption behavior and interaction mechanism in saponin–protein composite systems: A review,” Food Hydrocolloids, vol. 136, p. 108295, 2023, doi: https://doi.org/10.1016/j.foodhyd.2022.108295.

[8] K. Weiss et al., “Compartmentalized disulfide bond formation pathways,” in Redox Chemistry and Biology of Thiols, B. Alvarez, M. A. Comini, G. Salinas, and M. Trujillo, Eds., Academic Press, 2022, pp. 321–340. doi: https://doi.org/10.1016/B9780323902199.000200.

[9] B. Jayachandran, T. N. Parvin, A. M. M., K. Chanda, and B. M. M., “Insights on Chemical Crosslinking Strategies for Proteins,” Molecules, vol. 27, no. 23, 2022, doi: https://doi.org/10.3390/molecules27238124.

[10] K. M. Holtz et al., “Modifications of cysteine residues in the transmembrane and cytoplasmic domains of a recombinant hemagglutinin protein prevent crosslinked multimer formation and potency loss,” BMC Biotechnology, vol. 14, no. 1, p. 111, 2014, doi: https://doi.org/10.1186/s128960140111y.

[11] L. Wang, M.-Y. Ding, J. Wang, J.-G. Gao, R.-M. Liu, and H. T. Li, “Effects of Site-Directed Mutagenesis of Cysteine on the Structure of Sip Proteins,” Frontiers in Microbiology, vol. 13, Apr. 2022, doi: https://doi.org/10.3389/fmicb.2022.805325.