Software

Dry Lab

One of the challenges this project faces is determining the optimal period during which the enzyme should be introduced to the blood samples. RBC units are stored at 1–6 °C for up to 42 days; once removed, they must be returned within 30 minutes or transfused within 4 hours [1]. For a standard 450 mL unit, our current estimate is to incubate the enzyme in whole blood for at least ~1 hour inside the filtration bag. Therefore, using an enzyme that operates well at 4-10°C would allow for the enzyme to cleave antigen groups prior to a transfusion without interrupting the workflow or prolonging the overall blood prep.

Figure 1
Figure 1: Bacterial growth temperatures for the various microorganism temperature groups. The enzymes of organisms existing at different temperatures are able to operate at the temperature of the organism, derived from [5].

One solution for these temperature and time restrictions is using cold-adapted enzymes, also known as psychrophilic enzymes. These psychrophilic enzymes can catalyze biochemical reactions at low temperatures, typically below 20°C [6]. Unlike conventional (mesophilic) enzymes, which lose efficiency in the cold, these enzymes have unique structural and sequencing adaptations that lead to a more loose and flexible structure, allowing them to remain active where most biochemical reactions would otherwise slow down significantly.

Figure 16
Table 1. Table of structural and sequencing differences between cold-adapted and conventional enzymes [4], images derived from Jamile Queiroz Pereira et al. [2].

So, The goal of the sub-team was to identify psychrophilic counterparts to the five enzymes identified and used by the Wet Lab team.

Project 1: Computational Identification of Psychrophilic Enzymes using CAZy and NCBI

Goal: Parsing through large databases to find desired psychrophilic enzymes for cloning is time-intensive and requires referencing multiple databases and cross-referencing literature to ensure that the enzyme is suitable for colder temperature conditions. To automate this process, we developed a program that can identify specific enzymes from a given protein family and predict whether the enzymes belong to a psychrophilic organism.

Iteration 1: Using CAZy And The NCBI GenBank Database To Determine A Particular Enzyme.


Why CAZy and NCBI?

The CAZy database aims to display and analyze genomic and biochemical information of structurally-related catalytic carbohydrate-active enzymes (CAZymes; enzymes involved in the making, breaking, and modifying of oligo- and polysaccharides). The NCBI GenBank consists of information about the enzyme, such as protein/mRNA/DNA sequences, accession IDs, source organism, and references of original contribution. Both databases are consistently reviewed and updated frequently (every two months). They are co-kept with other well-known databases like GenBank, RefSeq, TPA, SwissProt, PIR, PRF, and PDB. Additionally, both are often accessed and used by the scientific community due to the above-mentioned reasons of being uniform, up-to-date, comprehensive, and permitting unrestricted use and distribution of data. Hence, CAZy and NCBI were chosen.

Work done in this project is summarised as a flowchart, given below. The shortlisted genera were obtained from Morita & Moyer, 2001.

Colorimetric substrates

The CAZy Database was used to identify all enzymes found in each of the GH families that were recognised in Akkermansia muciniphila. The families that were used are given below.

Colorimetric substrates

Since GenBank IDs were a part of the information in CAZy, we were able to cross-compare with the NCBI Database. This step was crucial as each family has different types of enzymes, with variations in the cleaving region of the substrate or differing in carbohydrate specificity. Hence, it was important to identify enzymes with a similar function to the mesophilic ones identified in Akkermansia muciniphila. Doing so also helped in reducing the number of entries significantly and expediting the filtering process. Doing this first, rather than filtering based on a temperature, is a major advantage since it reduces the number of entries significantly.

Initially, this process was done manually for the GH36 and GH110 families, which involved manually looking up the GenBank ID and determining whether it was the required enzyme. Hence, we first started by shortlisting the number of entries by comparing with an established list of psychrophilic genera put together by [3]. For the GH110 family, the number of entries after this process was significantly lower. However, for the other families, we had around 15000 to 25000 entries from the CAZy, and around 1000 to 8000 after the shortlisting process. This would have taken too much manpower; hence, we automated the process. As such, we decided to access the NCBI database remotely through Python; preliminary code (of the same) is given below :

Colorimetric substrates
Figure 2: Preliminary code used to filter required enzymes from the CAZy database. The code takes in the protein ID, which it uses to access the NCBI Database, and compares the ‘Title’ with the given enzyme’s name (keyword). If it is the same, it adds the protein ID to a list and prints the same.

In order to test whether the code was successful, the number of filtered enzymes (in the output) was compared to the dataset that was manually done.

Colorimetric substrates
Table 2: Counts of enzymes output from the first iteration of filtering code per GH family.

Iteration 2: Improving Output Readability And Refactoring For Better Usability


At this point, the above code could handle small databases, which might not always be the case. Therefore, we wanted to see if there were any problems with the filtration process when it comes to larger datasets. Likewise, it was tested by running on the entire GH20, GH35, and GH95 families (Results are provided in Table 2). In addition to this, to improve output readability, instead of a list, a dictionary was made, which included information about the enzyme name (that was compared to the given name), accession ID, and the organism name.

Colorimetric substrates
Table 3: Counts of enzymes output from the second iteration of filtering code per GH family (glycosidase families used within Wet Lab cloning).

As seen, the code was successful in reducing the number of entries. On manual inspection of the enzyme name that was outputted, most of the enzymes were filtered and categorized correctly (errors with the process are discussed later).

The next step was to determine whether the enzymes were psychrophilic or not. This was done by comparing the genus of the organism with a shortlist of psychrophilic genera. This list of genera was composed of those that had some psychrophilic organisms; that is, it did not provide a guarantee that an organism was found in cold temperatures. Hence, after the complete filtration, we had to manually look up whether the enzyme is psychrophilic.

Colorimetric substrates

Since the enzymes for cleaving the Extended A and B antigens from Wet Lab’s Cloning Cycle were not successful, it was decided that the software team would focus on characterization of only A and B antigens and compare functioning between the psychrophilic and mesophilic enzymes, and compare structural similarities between the same (Project 2).

Additionally, in order to make the code functional to others for either usage or to make it better, a Python script was written with each part as a function. The main goal was to make it user-friendly by providing methods to manipulate the data and provide easy and readable outputs. Code and information about it can be accessed here: ASU GitLab .

Here is an easy-to-follow user manual made by our team!

Problems Encountered and Future Directions:

  1. Protein ID not recognized: Some of the protein IDs could not be found through the code, but could be accessed manually. The exact reason for this error is unknown; hence, we decided to include a list of IDs that were rejected in the output of the function.
  2. Difference in NCBI Title: For this project, the team utilized the Entrez.esummary function to compare the enzyme names. This function presented a nested list of information about the enzyme, one of them being the Title, which included the enzyme name and organism (in square brackets) as one string. For comparison, the string was broken based on the placement of the square brackets, giving two values: the enzyme and organism name. However, for the GH20 family, this order was switched. It is unknown whether this is the case for other entries, so for future interactions with the code, it must be taken into consideration.
  3. Runtime: Currently, for ~18,000 entries, the code takes ~5-6 hours to finish (with the skipping of “unidentified” enzymes). This can be reduced by using cazy-webscraper.

Proof of Concept: Testing Colwellia Psychrophilic Enzyme in Porcine Type O Blood


As a proof of concept stemming from Project 1, where we filtered for psychrophilic enzymes predicted to target the B antigen chain (α-galactosidases), we selected two top candidates– one from Polaribacter sejongensis (AUC20683.1) and one from Colwellia sp. (ARD46000.1) -- for cloning and bench testing. We successfully cloned the Colwellia sp. candidate and advanced it to functional assays; importantly, the hemagglutination crossmatch assay demonstrated low temperature activity consistent with our design goal of cold-active antigen cleavage. The results below summarize this initial validation in porcine Type O blood.

psychrophilic hemagglutination image
Figure 3. Microscope images of O-type porcine blood crossmatched with human serum after treatment for 2 hours with 0.5 µM Colwellia cold-adapted (CA) enzyme and 0.4 µM B2 (ExtB cleaving) enzyme. (A) CA + B2 treatment at 22 C, (B) CA treatment at 4 C, and (C) CA + B2 treatment at 4 C.

To test the effectiveness of the Colwellia cold-adapted (CA) enzyme on blood, we treated O-type porcine blood for 2 hours at two different temperatures. While this ɑ-galactosidase enzyme was intended to cleave the human B antigen, it is also capable of cleaving the ɑ-galactose antigen on porcine blood, which is detectable by human antibodies. Figure 5A highlights that at 22 C, the CA and B2 enzymes are not capable of cleaving the ɑ-galactose antigen on porcine blood, resulting in detection by human antibodies and severe clumping. However, Figure 5B and 5C indicate that treatment with CA alone and CA + B2 is capable of ɑ-galactose cleavage, as shown by a reduced amount of agglutinated cells and an increased prevalence of single red blood cells. These results suggest that the cold-adapted enzyme is functional at colder temperatures, as intended.

Read more about the enzyme results here !

Project 2: Temperature-Based Sequence Alignment and Structure Analysis

Goal: Cold-adapted enzymes have structural and sequencing differences from conventional counterparts that allow for functionality in lower temperatures. If we can find areas of similarity among GH families (specifically our chosen GH families) that may allow for low-temperature functionality, we would have the ability to alter our cloned enzymes to be cold-adapted and compatible with our blood conversion kits.

Iteration 1: Using CLUSTAL to find sequence alignment in identified enzymes.


The overall objective from this iteration was to determine regions of similarity between the three temperature groups of enzymes (thermophilic: 41°C-122°C, mesophilic: 20°C-45°C, psychrophilic: < 20°C) [6]. cloned enzymes to be cold-adapted and compatible with our blood conversion kits.

These conserved sequences between temperature groups can be used to determine potential mutation sites to alter protein sequences and convert our original 5 enzymes into cold-adapted enzymes. This could give us insight into where active sites are located and critical regions for cold-adapted enzymes.

We compared enzyme sequences using CLUSTAL, a computer program used to compare sequence alignment of DNA and protein sequences. This program can be used to determine areas of similarity/dissimilarity and create phylogeny trees. For our study, we used CLUSTAL to determine areas necessary to cleave sugar residues and for psychrophilic function.

The plan of action to complete the CLUSTAL sequence alignment was as follows:

  1. Identify a list of 5 enzymes per group (thermophilic, mesophilic, and psychrophilic enzymes) based on a literature search and review.
  2. Obtain protein sequences from the NCBI GenBank Database.
  3. Run the CLUSTALO program to obtain regions of similarity for all groups.
  4. Compare regions of similarity in one group to regions of similarity in other groups.

Within each temperature range of enzymes, we identified 5 enzymes from different, well-researched bacteria species to identify regions of similarity, focusing on bacteria and enzymes used within the dairy industry to create lactose-free milk and yogurt, as well as our pre-determined 5 enzymes as mesophilic enzymes and outputted enzymes from Project 1 as cold-adapted enzymes.

Due to major differences in organisms (lack of matching between genus, family, and order), there were very few regions of similarity between temperature groups. For example, for the sequence alignment of the thermophilic enzyme group (Figure 3), large gaps are found throughout the sequence, and very few regions of similarity exist between the enzymes. From this analysis, we were unable to determine the active site region for each temperature group.

Colorimetric substrates
Figure 4: Example of CLUSTALO sequence alignment for thermophilic enzyme group (Accession ID, enzyme name, and organism name: EHE95041.1 - Beta-glucosidase [Streptococcus thermophilus CNCM I-1630], AFN03236.1 - beta-glucosidase [Pyrococcus furiosus COM1], WP_015922700.1 - GH1 family beta-glucosidase [Thermomicrobium roseum]). Colors correspond to the amino acid category (i.e., blue: hydrophobic, red: positively charged, magenta: negatively charged, green: polar).

After assembling and running CLUSTAL sequence alignment, no significant regions of similarity were found for thermophilic or mesophilic enzymes. Instead, we pivoted to research and run protein sequences of enzymes from the same families, order, and/or genus to obtain regions of similarity. From Project 1, we obtained 2 psychrophilic enzymes, and only one of which we were able to order for cloning, Colwellia sp. PAMC 21821. For this reason, we decided to investigate the enzymes from this organism’s order, Alteromonadales, to obtain organisms, focusing on the GH110 enzyme family.

Iteration 2: Researching enzymes within the Colwellia genus and the Alteromonadales order to find regions of similarity between selected bacteria.


Using the same general design, we optimized our method to focus on the Alteromonadales bacterium order, the order of bacteria for the Colwellia genus obtained from a previous alpha-galactosidase search (Project 1 program). The order itself is not limited to just psychrophiles (like the Colwellia genus), but does not include thermophilic enzymes that operate above a maximum temperature of 45°C. Overall, we were able to find more mesophiles to compare sequences to other enzymes found within the Colwellia genus and other psychrophilic organisms.

From the Alteromonadales order, we compiled enzymes from a total of 5 families: Colwelliaceae (other than Colwellia sp. PAMC 21821), Alteromonadalaceae, Pseudoalteromonadaceae, Psychromonadaceae, and Shewanellaceae. In total, we identified 25 enzymes from the GH110 family, as shown in the following table.

Colorimetric substrates
Table 4: List of enzymes compiled from the Altermonadales order, with GH110 enzymes, and were used in CLUSTALO sequence comparisons. Key: pink → psychrophilic, orange → mesophilic, grey → cannot be determined.

Sequence alignments were performed in groups of 5-8 sequences to ensure that information could be easily obtained and organized from CLUSTAL. These include a comparison of the GH110 enzyme from Colwellia sp. PAMC 21821 to the enzyme from Akkermansia muciniphila, of all GH110 enzymes from the Colwelliaceae family, Colwellia sp. PAMC 21821 to enzymes from other psychrophilic organisms, and finally, all the enzymes from mesophilic organisms. There are distinct regions of similarity between Colwellia sp. PAMC 21821 and Akkermansia muciniphila, as seen below.

Colorimetric substrates
Figure 5. Sequence comparison of the GH110 enzyme from Colwellia sp. PAMC 21821 (Accession ID: ARD46000.1) and Akkermansia muciniphila (Accession ID: WP_435331545.1). Colors correspond to the amino acid category (i.e., blue: hydrophobic, red: positively charged, magenta: negatively charged, green: polar).

Among the Colwelliaceae family, there are long regions of similarity, indicating a recent common ancestor.

For an extended list of our results and figures obtained from the CLUSTAL sequence analysis, please view this document:

Overall, this analysis of GH110 sequences provides a fundamental understanding of the structural differences and similarities between mesophilic and psychrophilic enzymes. Due to time constraints, we were unable to perform 3D structural comparisons via AlphaFold, but below we have detailed our future directions for structural and sequencing analysis

Future Direction

 1. Complete another round of CLUSTAL sequence comparisons with related thermophilic organisms.

 2. Conduct a thorough literature search on the alpha-galactosidase enzyme from Colwellia sp. PAMC 21821, Akkermansia muciniphila, or a different well-characterized organism, to determine the active site.

 3. Using sequence comparisons, find active site regions on all other enzymes characterized and conduct structural comparison with AlphaFold.

 4. Characterize regions specific to each temperature group based on AlphaFold results. Attempt to modify protein structures of our selected enzymes.

 5. Clone and compare the efficiency of both psychrophilic and mesophilic enzymes at various temperatures.

REFERENCES

[1] American Red Cross. (2019). Blood Components. Redcrossblood.org. https://www.redcrossblood.org/donate-blood/how-to-donate/types-of-blood-donations/blood-components.html

[2] Jamile Queiroz Pereira, Ambrosini, A., Pereira, M., & Adriano Brandelli. (2017). A new cold-adapted serine peptidase from Antarctic Lysobacter sp. A03: Insights about enzyme activity at low temperatures. International Journal of Biological Macromolecules, 103, 854–862. https://doi.org/10.1016/j.ijbiomac.2017.05.142

[3] Morita, R. Y., & Moyer, C. L. (2001). Psychrophiles, Origin of. Elsevier EBooks, 917–924. https://doi.org/10.1016/b0-12-226865-2/00362-x

[4] Nowak, J. S., & Otzen, D. E. (2023). Helping proteins come in from the cold: 5 burning questions about cold-active enzymes. BBA Advances, 5, 100104–100104. https://doi.org/10.1016/j.bbadva.2023.100104

[5] OpenStax. (2022, July 18). Temperature Effects on Bacterial Growth. Biology LibreTexts. https://bio.libretexts.org/Courses/Prince_Georges_Community_College
/PGCC_Microbiology/08%3A_Microbial_Growth/
8.02%3A_Factors_that_Affect_Bacterial_Growth/8.2.
01%3A_Temperature_Effects_on_Bacterial_Growth

[6] Tankeshwar, A. (2019, November 25). Temperature requirements of Microorganisms. Learn Microbiology Online. https://microbeonline.com/psychrophiles-mesophiles-thermophiles/