How does it work?

How to predict the binding affinity of a probe towards a binding site

The binding strength of RNA‑RNA or RNA‑protein interactions can be quantified with the Boltzmann factor \(p_i \;\propto\; e^{-E_i/(k_{\mathrm B}T)},\) a principle from statistical physics that links the probability of a state (e.g., bound vs. unbound) to its free‑energy \(E_i\). We apply this relationship to estimate the affinity of a probe to a RNA sequence. In RNA‑RNA duplexes, each A–U base pair contributes two, while each G–C pair contributes three hydrogen bonds, corresponding to roughly \(1\,k_{\mathrm B}T\) of energy. Using a perfectly matched sequence as the reference (\(E=0\)), a sequence containing mismatches incurs an energetic penalty that can be approximated as \(\Delta E \approx \kappa \times 2.5\,k_{\mathrm B}T \) where \(\kappa\) is the number of mismatched bases. Consequently, the relative binding probability for a probe with \(\kappa\) mismatches compared to a perfect match is \[ \frac{p_{\kappa}}{p_0}=e^{-\Delta E/(k_{\mathrm B}T)}=e^{-2.5\,\kappa} \] For example, a probe binds roughly \(e^{-2\times2.5}\approx\frac{1}{12}\) as frequently to a site with two mismatches as it does to a perfectly matched site. More on interpreting Boltzmann‑factor values can be found here.

Scale up to predict the affinity towards all transcripts in the cell

We evaluae the affinity of a potential binding site of the target RNA (=query) towards a transcript (=reference) by comparing the sequence of the query to each subsequence of the reference, counting the number of mismatches (see plot) to calculate the boltzmann-factor at each position. Summing up these boltzmann factors over all positions gives us an estimate for the overall binding affinity towards the whole reference. Additionally the number of binding site with with the highest sequence matches is stored as well to help us avoid transcripts with two or more semi-strong binding site, which would act as linkers for our LLPS system leading to condensation via non-target RNA.

Running the query-reference affinity estimation described above for all references (transcripts) in a reference list (transcriptom) gives us the affinity metrics towards each reference (see plots). Assuming linear binding dynamics we can sum over the boltzmann factors again to get the binding affinity towards the transcriptome, however to avoid specific off-target effects the highest off target probability is stored as well. When the provided transcriptome datasets contains expression level related metrics, they can be used to weight the contribution of these affinities towards the transcripts on the overall transcriptome affinity of the query.

Running this affinity evaluation for all subsequences of the target RNA (in parallel) allows us to find those binding site sequences with the lowest probability of off-target effects.

Limitations

Although the interaction estimation of a binding site against the whole transcriptome is based on the physical concept of the boltzman factor (binding affinity of a binding sequence to a single transcript), it does not model any nonlinear effects like cooperative binding, nor phase seperation behaviour. Instead it heuristicly uses the maximum boltzman factor among the transcripts, to keep off target effects unspecific reducing the risk nonlinear behaviours not captured in the model. This model cannot estimate true absolute probabilities normalized to 1, it can only estimate the relative probabilities i.e. the ratios of probabilities. The underlying reason is that the Boltzmann factor is missing the partition function to be a true probability.

How to install?

Summary

Visit our github page to download / clone our project.
Setup dependecies using uv (pyproject.toml), conda (conda_env.yaml) or pip (requirements.txt)
Have fun! Take a look at our examples or detailed explanations.

Download our project

visit >gihub< -> click on CODE (green top right) -> download as zip
extract zip and copy to a directory of your choice

Install dependecies using uv

install uv
open a terminal in the location of the project

in the file explorer / finder right-click 'open terminal here'
or open terminal -> cd <path/to/project>

install dependencies: uv sync
ready to go... have a look at the example notebooks using:

launch jupyter: uv run jupyter lab
open folder in vscodium
open folder in pycharm community edition

How to use: examples

Have a look at our Gitlab repository to find the full source code and reference implementations in the form of jupyter notebooks.

SEA-STAR Software

SEA-STAR

side effect aware target site ranking