Model | Concordia-Montreal

Overview

Our software was created to model the fragments resulting from an enzyme’s cleavage of gluten and to determine which (if any) IgE epitopes the fragments matched. This allowed the team to model the expected functionality of different enzymes in silico; helping the wet lab determine how enzymes of interest were likely to perform alone or in combination with other enzymes.

To do this the dry lab team created three programs: the FASTA Ripper, the Enzyme Epitope Predictor and the File Splicer. These scripts worked together to deliver the desired data in an easy and accessible way.

R software work flow — **Figure 1:** A demonstration of the R workflow of the software. Mega FASTA files are broken down with the FASTA ripper. The data is then processed with the Enzyme Epitope Predictor and sorted with the File Splicer. Sorted files can then be run back through the Enzyme Epitope Predictor to test new enzymes or used for other procedures.

Shiny app software work flow — **Figure 1:** A demonstration of the Shiny App workflow. Mega FASTA files are broken down with the FASTA ripper. The data is then processed with the Enzyme Epitope Predictor and the results will appear in the console.

Program Summaries and Quick Instructions

FASTA Ripper

Function: Splits a multi-FASTA file into individual FASTA files.
Input: multi-FASTA file in .fasta or .txt format.
Output: Folder of individual FASTA sequences in .fasta format.

Why: The Gluten database had a single file with all possible sequences and epitopes within. This made running a program on it time consuming and made selecting sequences of interest more difficult than if it was just single files. This script was created to split the larger FASTA file into individual sequences.

Enzyme Epitope Predictor

Function: To determine the results of enzyme cleavage and any epitopes that would brings to the resulting fragments.
Input:
- Epitopes: Epitope CSV (file must be in .csv format not .xlsx)
- Folder of Sequences: A folder with fasta sequences in .fasta format.
- Output Folder: A folder for the output to be loaded into (we sugest that it be empty)
Output:
- Shiny App: Summary CSV for all enzymes and the positive control, sequences displayed in the window of the shiny app.
- R-script: Summary CSV for all enzymes and the positive control, containing the fasta sequences in .fasta format marked fail, negative or success - all contained in the selected output folder.

Why: This program was the cornerstone of our modeling. Cleaving the protein sequences based on selected enzymes and matching the resulting segments to a list of provided epitopes. The output of this program informed our selection of enzymes.

File Splicer

Function: Divides out successful sequences from failed ones.
Input: Folder of Enzyme Epitope Predictor Results
Output: Folder of failed sequences (data missing epitopes) and a folder of successful sequences (those that have epitopes)

Why: This program made getting the sequences that matched with epitopes easy to obtain and reassess. It made sorting the results very easy.

Why Multiple Scripts

The dry lab attempted having all scripts combined in one software. However, due to the large number of files the program was designed to process it caused the program to be time consuming to run. As such (and as the additional functions could easily be run once, not every time) it was determined that the programs were more efficient separately.

Use as Our Model

We used our software to test the function of the following five enzymes against the GlutPro 6.1 Database [1]:

KUMA030
AN-PEP
SC-PEP
EP_B2
FVp-P
CARICAIN (Enzyme used in comercial product Gluteguard)

The enzymes were selected for their potential to degrade gluten, as their catalytic motifs target amino acid sequences commonly found within gluten proteins. They are all derived from either fungal or plant genomes.

To evaluate enzyme performance, the Enzyme Epitope Predictor was used to generate a rough model of the expected gluten fragments, including their size, number, and sequence. These fragments were then compared to a previously established database of gluten IgE epitope sequences. Based on this comparison, it was possible to infer whether a single enzyme would be sufficient to break gluten into non-immunogenic fragments or whether a combination of enzymes would be necessary.

The results of the model indicated that our prefered enzyme candidate AN-PEP did leave a fragment that could be bound by an epitope, indicating that a combination of enzymes would be required to completely prevent flareups.

These results guided our decision to begin exploring enzymes that could be used in combination with AN-PEP.

Limitations

Though this software was valuable in helping the team determine what enzymes they wished to pursue there are some limitations. The system does not account for enzyme concentration, reaction time or any real world conditions such as temperature or pH. In essence it models an idealized system.

The Enzyme Epitope Predictor can determine wheather an enzyme could theoretically cleave gluten at every matching motif, but not whether similar results could realistically be achieved in vivo.

The program also does not do multi-enzyme digests, each sequence is cut with each enzyme once, preventing it from exploring the effect of enzymes in combination - though given more time to work on the program, this issue would be addressed.

Download Instructions

Download the software from our Git Lab!

Ensure you have R installed on your computer.
- All of our programs are built in R version 4.5.1 and run on R-studio, best results will be obtained using the same version.
  Instructions will be provided under the assumption that R-studio is being used.
Visit our Git Lab and download the required program(s)
Open the file(s) in R and hit the run button.
Approve the installation of any packages the software requires.
Use as directed in the instructions.

FASTA Ripper

Open the FASTA Ripper in R-studio
Hit Source (the run button) in the top right hand corner of the window.
A file navigator will open, within it select the .fasta or .txt file you would like to rip.
A new folder will open with the ripped FASTA sequences inside.

Enzyme Epitope Predictor Instructions

CSV File Prep:

The Enzyme Epitope predictor requires the epitopes to be provided as a CSV.
The program must have a .csv file extension or it will not work.
The file should have three columns as demonstrated in Table 1.

**Table 1:** Contents of epitope CSV file required for correct software function.
Col 1	Col 2	Col 3
Protein Name	Epitope	ID Code
String (Does not need to be unique)	Amino Acid Sequence Where X is an unknown/variable amino acid	Number (Must be unique to each epitope)

Instructions

The program will run the same whether using the shiny app or R studio. It will call for the same inputs. Only the output will change, we suggest the shiny app for shorter jobs where you just need the pass/fail info and don’t intend to work further with the FASTA files and the R script for larger datasets where recovering successful/failed files would be useful (say to go back through and see if other enzymes get rid of any epitope matches).

Gather your documents:
- Epitope CSV (.csv format)
- Folder of .fasta files to be tested (can be made wit the FASTA Ripper)
- Output folder
Open the Enzyme Epitope Predictor, it will open a shiny app.
Upload the epitope CSV
Select the FASTA folder
Select the output folder
Select the enzymes you wish to test (or add the information for your own*)
If using the Shiny app, select output type:
- Overall Summary: tabs show success and failed sequences accross all enzymes
- Summary Per Enzyme: tabs show the information seperated by enzyme
Hit run analysis

*if adding your own enzymes in the R-script you will need to do that by altering the code. The function is built into the app.

Warning: Depending on the specifications of your device, the program may take longer times to run (roughly 5-10 minutes). More complex cleavage patterns take more time in return.

Interpreting Your Results

Data is only useful if it can be easily interpreted. In this section we walk you through how to do that. Focusing on the output of the shiny app.

Shiny App Software interface showing the success and failed tabs — **Figure 3:** The Shiny App Software Overall Summary interface showing the success and failed tabs.

If the user decides to choose the Overall Summary option, they will be presented with this screen. In the top middle, there are 2 visible tabs: Success and Failed. The success tab shows sequences that have matched with a given Ige Epitope from the CSV file imputed by the user. From left to right, the information given is as follows:

**Table 2:** Summary of Enzyme Epitope Predictor output by column.
Col 1	Col 2	Col 3	Col 4	Col 5
The enzyme used	The sequence title	The number associated with the fragment that is a positive match in this case	The fragment sequence that was cleaved	The epitope code(s) that matched in the sequence, if multiple exist, they will be delimited by a semi-colon(;)

There is also the option to show more entries at once, either 10, 25, 50 or 100.On the bottom of the results, the user can keep track of how many trials they have done, BUT is unable to retrieve older results if a new run has been started. The user is also able to download the results to their devices in the form of a CSV file. The user can either download the results of a specific enzyme or all at once in a ZIP folder. If they download the ZIP file, the trial number would be written in the title of that folder.

Shiny App Software interface showing the data divided by enzyme — **Figure 4:** The Shiny App Software Per-Enzyme Summary interface showing the results divided by enzyme.

Similar to the Overall Summary, the Per-Enzyme Summary mode displays the data in a similar manner, but the tabs are replaced with the Enzymes instead. Each tab would display the Positive Matches first, and then the failed matches under it. If no positive or failed matches exist, the table would display a “No data Available in Table” message.

After downloading the CSV file, the user will see 4 columns with data:

Col 1	Col 2	Col 3	Col 4
The Sequence Name	The Fragment Number	The Sequence in the fragment	The Epitope code if an epitope match occured

The sequence name will appear truncated at first, but if the user decides to expand the cell with the name they want, they can view the full title of the sequence.

File Splicer Instructions

This program is only useful when processing the results of the Enzyme Epitope Predictor R-script. The shiny app produces only the summary CSV files not the anotated FASTA sequences.

Hit Run code
Select the folder containing the results of the Enzyme Epitope Enzyme Predictor
Select output folder
Results will appear in the output folder, seperated based on success and fail tags.

References

Bromilow, S., Daly, M., Gethings, L. A., Mills, E. N. C., Nitride, C., & Shewry, P. R. (2020). The GluPro suite of curated cereal seed storage prolamin protein sequence datasets [Data set]. figshare. https://doi.org/12613154
OpenAI. (2025). ChatGPT (GPT-5) [Large language model]. Retrieved July 15, 2025, from https://chat.openai.com/
Posit. (2025). Shiny [Web framework]. Retrieved September 19, 2025, from https://shiny.posit.co/
GeeksforGeeks. (2024, July 25). Biostrings in R. GeeksforGeeks. https://www.geeksforgeeks.org/r-language/biostrings-in-r/
Chua, E. H. (n.d.). Regular Expressions (Regex). Nanyang Technological University. https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html
RDocumentation. (n.d.). RDocumentation. July 15, 2025, from https://www.rdocumentation.org/