To speed up and parallelize the screening of peptides that bind to plastic surfaces our team developed
Sito. Sito translates to sieve in Ukrainian and is the main way we “sieved” through peptides to find
promising binding candidates. It employs python and bash scripting to automatically prepare ligand and
receptor files for docking and submit AutoDock Vina slurm jobs of all combinations between the prepared
ligands and receptors.
Please note that this script is made for HPC clusters supporting SLURM only. In its current form this
script will not run on a regular machine. Universities may provide access to such clusters for their
researchers. Please reach out to your university’s computing center to find out more information about
HPC access.
Since each docking job is independent the speedup compared to normal vina is proportional to the number
of jobs available for submission on the HPC server. In our case this number was 100, equivalent to a
pure 10000% speedup of molecular docking screening with AutoDock. It may also be easily increased by
asking the HPC administrators.
The script automatically prepares all the ligands (sdf files) and receptors (pdb files) in their
respective directories using Meeko, analogous to the preparation outlined in the official AutoDock Vina
tutorials. Next it prepares and submits an array of jobs to the HPC and after the last job is done
creates a summary of the docking results.
In our case the ligands were plastic representations generated from SMILES into sdf using Open Babel. We
have provided the python scripts we used to generate them.
The receptors we used in our project were the foldings of our peptides of interest generated using
AlphaFold 3 and converted to pdb with Open Babel, but this script’s potential is not limited to that.
With a little bit of tweaking it may be employed for any sort of docking screening operation generating
a huge speedup thanks to job parallelization.
To run this script first clone the sito repository and set up the conda environment with the necessary
dependencies.
git clone https://gitlab.igem.org/2025/software-tools/fsu
cd fsu
conda env create -f sito.yml
conda activate sito
Next make sito.sh
executable and add it into path
cd sito
chmod +x sito.sh
# Ensure ~/.local/bin exists
mkdir -p "$HOME/.local/bin"
# Symlink to PATH
ln -s "$(pwd)/sito.sh" "$HOME/.local/bin/sito"
# Add $HOME/.local/bin to PATH if it isn’t there already
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
Verify the script works by running the help command.
sito -h
Usage: sito [OPTIONS]
A master script to launch a high-throughput AutoDock Vina docking screen on a SLURM cluster.
It generates a task list of all receptor-ligand pairs and submits them as a SLURM job array.
Options:
-r, --receptor_dir DIR Path to the directory containing receptor PDB files. (Default: ./receptors)
-l, --ligand_dir DIR Path to the directory containing ligand SDF files. (Default: ./ligands)
-o, --output_dir DIR Path to the main output directory for all results. (Default: ./docking_results)
-p, --padding PADDING Padding in Angstroms for the docking box. (Default: 20)
-e, --exhaustiveness EXH Vina exhaustiveness value. (Default: 32)
-q, --queue QUEUE SLURM queue/partition to submit to. (Default: genacc_q)
-c, --cpus_per_task N Number of CPUs to request for each docking job. (Default: 1)
-a, --allow_bad_res Pass the '--allow_bad_res' flag to Meeko for receptor preparation. (Default: off)
-h, --help Display this help message and exit.
Example:
sito --receptor_dir ./peptides --ligand_dir ./fragments --padding 25 --cpus_per_task 8
We recommend running each batch of docking screening in its own directory to stay organized. Inside of
this directory create two directories one with your receptors in pdb format (the peptides) and one with
your ligands in sdf format.
example_directory/
│
├── receptors/
│ ├── peptide1.pdb
│ ├── peptide2.pdb
│ └── peptide3.pdb
│
├── ligands/
│ ├── polystyrene.sdf
│ ├── polystyrene.sdf
│ └── polypropylene.sdf
│
├── README.md
└── Makefile
You may also generate small plastic chain ligands we used in our docking by getting the SMILES from the
scripts in smiles_ligand_generators and then converting them to 3D sdf ligands with
obabel -:"<SMILES STRING>" -O plastic.sdf --gen3d
An example run of sito from example_directory would looks something like this:
sito -r receptors -l ligands -e 32 -q <NAME OF YOUR HPC QUEUE>
The script will then create a new directory called docking_results prepare all receptor ligand pairs with
meeko and submit all the combinations of receptors and ligands as docking jobs to the slurm batcher.
After all docking jobs successfully finish the gatherer will generate a summary with the best ∆G from
each docking in docking_results/docking_summary.csv
If any jobs failed
(docking_results/failed_jobs.log
) this file will not be made but a summary may still be
generated manually with cat docking_results/**/top_score.result > docking_summary.txt
You may also explore and export separate dockings (.pbdqt) from the folders in
docking_results/<receptor_name>_docked_<ligand_name>. This is simply the folder where
AutoDock Vina ran.
To make Sito generic to any docking screening please appropriately tweak the source code by removing the
automatic vina box generation and creating and distributing appropriate vina config files containing the
correct docking region boxes for your receptors.