gRNA DESIGN PROCESS


OVERVIEW

Sensitiveity Analysis
Guide RNAs (gRNA) are short RNA sequences that help direct CRISPR-Cas proteins, such as Cas13a, to specific RNA targets for cleavage. gRNAs are designed to be complementary to the target sequence, ensuring that Cas13a accurately recognizes the correct viral RNA without mistakenly targeting unrelated sequences.

In the context of HIV, sequence diversity presents a major challenge due to the virus’s high mutation rate, leading to significant variability between different HIV strains. To address this, our design process focused on identifying conserved regions across multiple HIV strains. Conserved regions are known to remain consistent even across mutated strains and by targeting these regions, we are able to ensure that our gRNA can remain effective against different HIV-1 variants.

In order to account for the diversity of HIV strands, we source these strands from different geographic regions, groups, and subtypes. Our focus is primarily on HIV-1 due to its higher global prevalence and broader impact on public health. Most of our strands are sourced within Group M, representing the majority of HIV-1 Strains. Within Group M, subtypes C, B, and A are the most common, making up around 70% of the global distribution of HIV-1 (Akahome).

Accession Number Region Group Subgroup
AY535660.1EstoniaGroup MA, CRF03_AB
EU541617.1USAGroup MB
AF224507.1South KoreaGroup MB
GU177863.1ChinaGroup MB
MW754307.1USAGroup MB
KT284371.1USAGroup MB, H
EU031915.1MalaysiaGroup MB, CRF01_AE
OK662987.1ChinaGroup MB, CRF01_AE
KC156129.1South AfricaGroup MC
MT195527.1ZambiaGroup MC
MT194478.1ZambiaGroup MC
MZ766722.1BotswanaGroup MC
MT194176.1ZambiaGroup MC
MZ766696.1BotswanaGroup MC
AY586549.2CubaGroup MG
AY970948.1NetherlandsGroup M, NH
FJ185260.1VietnamGroup MCRF01_AE
AF407419.1FranceGroup ONA
AY623602.1CameroonGroup ONA
AY618998.1CameroonGroup ONA
Table 1: Table of HIV-1 sequences all retrieved from NBCI Database, listing accession numbers, geographic origins, group classifications, and subtypes. Pink shading indicates one conserved region found and shared by 11 sequences. Red shading indicates three conserved regions shared by a subset of 4 sequences.

We evaluated candidate gRNAs across these criterias:
  1. Conservation - selection from highly conserved regions across multiple HIV-1 strains.
  2. Specificity - low probability of binding to human or unrelated sequences, validated through BLAST analysis.
  3. Stability - predicted favorable folding and accessibility using RNA secondary structure modeling (RNAfold).
Our goal is to ensure strong on-target efficiency and minimize potential off-target interactions. Through this process, we generated three gRNAs that are both stable and broadly effective across multiple HIV-1 variants.

PIPELINE

Step 1

STEP 1: NCBI Database

The National Center for Biotechnology Information (NCBI) is a public database that provides access to biological information and data, such literature, genetic data, DNA/RNA sequences, etc.
From this database, we retrieved 20 HIV-1 genome sequences to serve as the foundation of our gRNA designs.

Step 2

STEP 2: Snapgene

Snapgene is a DNA editing and visualization tool that allows us to safely view and annotate sequences.
Our HIV sequences contain harmful replication machinery, to ensure that Wet Lab would be working with safe and non-infectious sequences, snapgene was used to trim out replication related genes such as gag, pol, and env. This will remove functional parts of the virus, but still allows us to still work with parts of the virus.

Step 3

STEP 3: Clustal Omega, NCBI MSA Viewer

Clustal Omega and NCBI Multiple Sequence Alignment (MSA) Viewer are both multiple sequence alignment programs that can generate alignments between three or more sequences.
Our trimmed HIV-1 sequences were aligned using these tools, allowing us to identify conserved regions. However, we found that the genetic diversity among the 20 sequences posed a bigger challenge than we expected. The initial alignment did not reveal any clear conserved regions, so we narrowed down the dataset to focus on more similar sequences.
Our results showed one conserved region shared by 11 sequences, and three conserved regions shared by a subset of 4 sequences.

Step 4

STEP 4: BLAST

After we identify the conversed regions we want to work with, we use BLAST to perform off-target analysis. This ensures that our gRNA won’t unintentionally target another RNA sequence present in the system, as this could potentially harm the cell if those sequences encode for important functions.
BLAST (Basic Local Alignment Search Tool) is a bioinformatics tool that compares nucleotide or protein sequences to a sequence database. BLAST can identify similar regions between these sequences, allowing us to determine the degree of homology between sequences.
By blasting our sequences against the human genome, we confirmed that there were no significant off-target matches, ensuring that the gRNA does not bind to any essential human transcripts or functional regions.

Step 5

STEP 5: RNAstructure

For further analysis, we assessed the structural stability of our candidate gRNA sequences using RNAstructure to predict their secondary structure and calculate their minimum free energy (MFE). This analysis helped in determining how likely the RNA strand is to form a more stable structure, impacting its effectiveness in binding to the target sequence.
Based on these results, the top three candidate gRNAs were chosen, optimizing for both structural stability and targeting efficiency.

RESULTS

The purpose of evaluating the secondary structure and folding free energy of our top RNA strands is to make sure they can reliably bind to the Cas13a protein. If the RNA folds differently each time, it might interfere with the formation of the Cas13a–RNA complex (called the RNP complex).

Like in other CRISPR/Cas systems (Cas9, Cas12, etc.), the guide RNA (crRNA) needs to form a specific structure: a single stem loop. This loop is the docking site for the Cas protein. For our designs, the predicted RNA structures must consistently form this stem loop. Because stem loops are stable and form naturally with a low free energy (ΔG), they are energetically favorable and more reliable.

Below are our three most preferred sequences, with secondary structure predictions and corresponding graphs of free energy (ΔG) throughout each structure. You will notice that each free energy graph contains a large negative spike at some position on the RNA strand. This is indicative of our RNA stem loops, as their formation is very energetically favorable.

Guide RNA Sequence MFE (ΔG)
gRNA 1 UUUCUCUUACAGCAGGCCAUCCAACUAU -41.8 kcal/mol
gRNA 2 GGAGACUCCAUGACCCAAAUGCCA -34.24 kcal/mol
gRNA 3 CUCUCCUUCUAGCCUCCGCUAGUCAAA -40.19 kcal/mol
Table 2: Summary of top gRNA candidates with folding stability metrics.

1
cRNA1 cRNA1 Target
2
cRNA2 cRNA2 Target
3
cRNA2 cRNA2 Target

Figure 1. Left column represents 2D analysis of crRNA folding using RNAfold. Key components consist of a single stem loop on each structure. Right column represents free energy of folding (ΔG) for each crRNA. Large spikes indicate major energetically-favorable folding conformations, such as stem loops.

These three gRNAs will next be synthesized and tested in our Cas13a cleavage assays using HIV mimic plasmids.

REFERENCES

  1. Akahome, Pascal. “HIV-1 Subtypes.” Aidsmap.com, 9 Apr. 2009, www.aidsmap.com/about-hiv/hiv-1-subtypes.
  2. “HIV-1 Clone PCMO2.3 from Cameroon, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/AY618998.1/.
  3. “HIV-1 Clone PCMO2.5 from Cameroon, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/AY623602.1/.
  4. “HIV-1 Clone PIIIB from USA, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/EU541617.1/.
  5. “HIV-1 Isolate 07MYKLD49 from Malaysia, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/EU031915.1/.
  6. “HIV-1 Isolate 97VNHCM309 from Viet Nam, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/FJ185260.1/.
  7. “HIV-1 Isolate 027A from China, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/OK662987.1/.
  8. “HIV-1 Isolate 074-M-201-2-5_w0_224 from Botswana, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/MZ766696.1/.
  9. “HIV-1 Isolate 074-P-201-1-4_w0_249 from Botswana, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/MZ766722.1/.
  10. “HIV-1 Isolate 2303_17_N_26 from USA, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/MW754307.1/.
  11. “HIV-1 Isolate CH185_TF from South Africa, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/KC156129.1/.
  12. “HIV-1 Isolate Cu87 from Cuba, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/AY586549.2/.
  13. “HIV-1 Isolate EE0369 from Estonia, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/AY535660.1/.
  14. “HIV-1 Isolate Hypermutated VAU Group O from France, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/AF407419.1/.
  15. “HIV-1 Isolate Plwj from China, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/GU177863.1/.
  16. “HIV-1 Isolate RV-1 from USA, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/KT284371.1/.
  17. “HIV-1 Isolate ZM1044M_25Mar2006_SC_8_N from Zambia, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/MT194176.1/.
  18. “HIV-1 Isolate ZM1123M_10Jan2012_1A_2_N.HY from Zambia, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/MT194478.1/.
  19. “HIV-1 Isolate ZMN133M_16Nov2011_2A_21_N from Zambia, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/MT195527.1/.
  20. “HIV-1 Strain HIV-1wk from South Korea, Complete Genome - Nucleotide - NCBI.” Nih.gov, 2025, www.ncbi.nlm.nih.gov/nuccore/AF224507.1/.