To speed up and parallelize the screening of peptides that bind to plastic surfaces our team developed Sito. Sito translates to sieve in Ukrainian and is the main way we “sieved” through peptides to find promising binding candidates. It employs python and bash scripting to automatically prepare ligand and receptor files for docking and submit AutoDock Vina slurm jobs of all combinations between the prepared ligands and receptors.

Please note that this script is made for HPC clusters supporting SLURM only. In its current form this script will not run on a regular machine. Universities may provide access to such clusters for their researchers. Please reach out to your university’s computing center to find out more information about HPC access.

Since each docking job is independent the speedup compared to normal vina is proportional to the number of jobs available for submission on the HPC server. In our case this number was 100, equivalent to a pure 10000% speedup of molecular docking screening with AutoDock. It may also be easily increased by asking the HPC administrators.

The script automatically prepares all the ligands (sdf files) and receptors (pdb files) in their respective directories using Meeko, analogous to the preparation outlined in the official AutoDock Vina tutorials. Next it prepares and submits an array of jobs to the HPC and after the last job is done creates a summary of the docking results.

In our case the ligands were plastic representations generated from SMILES into sdf using Open Babel. We have provided the python scripts we used to generate them.

The receptors we used in our project were the foldings of our peptides of interest generated using AlphaFold 3 and converted to pdb with Open Babel, but this script’s potential is not limited to that. With a little bit of tweaking it may be employed for any sort of docking screening operation generating a huge speedup thanks to job parallelization.

To run this script first clone the sito repository and set up the conda environment with the necessary dependencies.

git clone https://gitlab.igem.org/2025/software-tools/fsu
cd fsu 
conda env create -f sito.yml
conda activate sito

Next make sito.sh executable and add it into path

cd sito
chmod +x sito.sh
# Ensure ~/.local/bin exists
mkdir -p "$HOME/.local/bin"
# Symlink to PATH
ln -s "$(pwd)/sito.sh" "$HOME/.local/bin/sito"
# Add $HOME/.local/bin to PATH if it isn’t there already
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc

Verify the script works by running the help command.

sito -h                                                                                                
Usage: sito [OPTIONS]                                                                                                                                  
                                                                                                                                                              
A master script to launch a high-throughput AutoDock Vina docking screen on a SLURM cluster.
It generates a task list of all receptor-ligand pairs and submits them as a SLURM job array.
Options:
  -r, --receptor_dir DIR    Path to the directory containing receptor PDB files. (Default: ./receptors)
  -l, --ligand_dir DIR      Path to the directory containing ligand SDF files. (Default: ./ligands)
  -o, --output_dir DIR      Path to the main output directory for all results. (Default: ./docking_results)
  -p, --padding PADDING     Padding in Angstroms for the docking box. (Default: 20)
  -e, --exhaustiveness EXH  Vina exhaustiveness value. (Default: 32)
  -q, --queue QUEUE         SLURM queue/partition to submit to. (Default: genacc_q)
  -c, --cpus_per_task N     Number of CPUs to request for each docking job. (Default: 1)
  -a, --allow_bad_res       Pass the '--allow_bad_res' flag to Meeko for receptor preparation. (Default: off)
  -h, --help                Display this help message and exit.
Example:
  sito --receptor_dir ./peptides --ligand_dir ./fragments --padding 25 --cpus_per_task 8

We recommend running each batch of docking screening in its own directory to stay organized. Inside of this directory create two directories one with your receptors in pdb format (the peptides) and one with your ligands in sdf format.

 example_directory/
│
├── receptors/
│ ├── peptide1.pdb
│ ├── peptide2.pdb
│ └── peptide3.pdb
│
├── ligands/
│ ├── polystyrene.sdf
│ ├── polystyrene.sdf
│ └── polypropylene.sdf
│
├── README.md
└── Makefile

You may also generate small plastic chain ligands we used in our docking by getting the SMILES from the scripts in smiles_ligand_generators and then converting them to 3D sdf ligands with obabel -:"<SMILES STRING>" -O plastic.sdf --gen3d

An example run of sito from example_directory would looks something like this:

 sito -r receptors -l ligands -e 32 -q <NAME OF YOUR HPC QUEUE>

The script will then create a new directory called docking_results prepare all receptor ligand pairs with meeko and submit all the combinations of receptors and ligands as docking jobs to the slurm batcher.

After all docking jobs successfully finish the gatherer will generate a summary with the best ∆G from each docking in docking_results/docking_summary.csv If any jobs failed (docking_results/failed_jobs.log) this file will not be made but a summary may still be generated manually with cat docking_results/**/top_score.result > docking_summary.txt

You may also explore and export separate dockings (.pbdqt) from the folders in docking_results/<receptor_name>_docked_<ligand_name>. This is simply the folder where AutoDock Vina ran.

To make Sito generic to any docking screening please appropriately tweak the source code by removing the automatic vina box generation and creating and distributing appropriate vina config files containing the correct docking region boxes for your receptors.