In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
from dotenv import load_dotenv

load_dotenv()
Out[2]:
True

Docking Workflow¶

This notebook demonstrates how to perform molecular docking using Deep Origin's drug discovery platform. You'll learn how to:

  1. Load and prepare proteins - Load a protein structure and prepare it for docking
  2. Find binding pockets - Identify potential binding sites on the protein
  3. Dock ligands - Perform docking calculations for single or multiple ligands
  4. Monitor jobs - Track the progress of docking calculations
  5. Analyze results - Visualize and filter docking poses

Let's get started!

Setup¶

First, we'll import the necessary Deep Origin drug discovery modules.

In [3]:
from deeporigin.drug_discovery import (
    Complex,
    DATA_DIR,
    Protein,
    LigandSet,
)

Load Protein Structure¶

Here we load a protein structure from a PDB file. The Complex object represents a protein-ligand complex and will be used throughout the docking workflow.

In [4]:
protein = Protein.from_file(DATA_DIR / "brd" / "brd.pdb")
sim = Complex(protein=protein)
sim
/home/runner/work/do-dd-client/do-dd-client/.venv/lib/python3.11/site-packages/jwt/api_jwt.py:153: InsecureKeyLengthWarning: The HMAC key is 6 bytes long, which is below the minimum recommended length of 32 bytes for SHA256. See RFC 7518 Section 3.2.
  return self._jws.encode(
Out[4]:
Complex(protein=brd with 0 ligands)

Load Ligands¶

Load a set of ligands from a CSV file containing SMILES strings. The LigandSet object allows you to work with multiple ligands at once. You can visualize them in a grid to see what molecules you're working with.

In [5]:
ligands = LigandSet.from_csv(DATA_DIR / "ligands" / "smiles_to_dock.csv")
ligands
Out[5]:

LigandSet with 16 ligands

16 unique SMILES NOT PROTONATED 2D

Properties: initial_smiles

Use .to_dataframe() to convert to a dataframe, .show_df() to view dataframewith structures, or .show() for 3D visualization, .prepare() to prepare ligands for docking

In [6]:
ligands.show_grid()
Out[6]:
No description has been provided for this image

Protonate Ligands¶

It is reccomended that ligands are protonated before running docking

In [7]:
ligands.protonate()
Protonating ligands:   0%|          | 0/16 [00:00<?, ?ligand/s]
Protonating ligands: 100%|██████████| 16/16 [00:00<00:00, 548.26ligand/s]

Out[7]:

LigandSet with 16 ligands

16 unique SMILES PROTONATED (pH=7.4) 2D

Properties: initial_smiles

Use .to_dataframe() to convert to a dataframe, .show_df() to view dataframewith structures, or .show() for 3D visualization, .prepare() to prepare ligands for docking

Assign Ligands to Complex¶

Associate the ligands with the protein complex. This prepares the system for docking calculations.

In [8]:
sim.ligands = ligands
sim
Out[8]:
Complex(protein=brd with 16 ligands)

Visualize the Protein¶

Display the protein structure in 3D. This helps you understand the protein's structure before proceeding with docking.

In [9]:
sim.protein.show()

Prepare the Protein¶

Before docking, we need to prepare the protein structure. Water molecules are typically removed from crystal structures as they can interfere with docking calculations.

In [10]:
sim.protein.remove_water()

sim.protein.show()

Find Pockets¶

The find_pockets() method of Protein uses computational methods to detect cavities and potential binding sites on the protein surface.

In [11]:
pockets = sim.protein.find_pockets(pocket_count=1)
sim.protein.show(pockets=pockets)

Inspect Binding Pockets¶

View the detected binding pockets. Each pocket represents a potential binding site. You'll typically want to dock ligands into the most promising pocket (often the largest or most druggable one).

In [12]:
pockets
Out[12]:
[Pocket:
 ╭─────────────────────────┬──────────────╮
 │ Name                    │ pocket_1     │
 ├─────────────────────────┼──────────────┤
 │ Color                   │ red          │
 ├─────────────────────────┼──────────────┤
 │ Volume                  │ 382.0 ų     │
 ├─────────────────────────┼──────────────┤
 │ Total SASA              │ 1383.8657 Ų │
 ├─────────────────────────┼──────────────┤
 │ Polar SASA              │ 372.27518 Ų │
 ├─────────────────────────┼──────────────┤
 │ Polar/Apolar SASA ratio │ 0.36800975   │
 ├─────────────────────────┼──────────────┤
 │ Hydrophobicity          │ 30.518518    │
 ├─────────────────────────┼──────────────┤
 │ Polarity                │ 11.0         │
 ├─────────────────────────┼──────────────┤
 │ Drugability score       │ 0.94493204   │
 ╰─────────────────────────┴──────────────╯]

Bulk Docking Workflow¶

For drug discovery, you'll often want to dock many ligands at once. The bulk docking workflow allows you to:

  1. Submit multiple docking jobs - Dock all ligands in your ligand set
  2. Monitor progress - Track job status in real-time
  3. Retrieve results - Download all poses once calculations complete
  4. Analyze at scale - Compare binding across all ligands

The run() method with quote=True first provides a cost estimate before submitting jobs. You can specify:

  • pocket: Which binding pocket to use
  • batch_size: How many ligands to process per batch
In [13]:
jobs = sim.docking.run(
    pocket=pockets[0],
    quote=True,
    batch_size=8,
)
jobs
Starting docking jobs:   0%|          | 0/2 [00:00<?, ?it/s]
Starting docking jobs: 100%|██████████| 2/2 [00:00<00:00, 78.70it/s]

Out[13]:
Job Details

Docking brd.pdb to 16 ligands. (2 jobs)

Jobs Quoted

All 2 jobs have been quoted. For details look at the Billing tab. To approve and start the runs, call the confirm() method.

executionID resourceID Status Started At Running Time
81838b8b-40a1-426a-a0ab-04669b3194ee fhe2ul01oew5w3zbn79h Quoted None None
bb43d0bd-23ff-46f5-be8e-88406de87436 l1f4ii65odul149ichwz Quoted None None
[]
Quoted
⚠️ This widget will not auto-update. Last updated: 2026-02-12 14:40:32

Review Job Details¶

Before confirming, review the job details including:

  • Number of ligands to dock
  • Estimated cost
  • Expected completion time

Use confirm() to submit the jobs for execution.

In [14]:
jobs.confirm()
jobs
Out[14]:
Job Details

Docking brd.pdb to 16 ligands. (2 jobs)

Average speed: 0.00 dockings/minute

Completed: 0 Failed: 0 Remaining: 16
executionID resourceID Status Started At Running Time
81838b8b-40a1-426a-a0ab-04669b3194ee fhe2ul01oew5w3zbn79h Running now None
bb43d0bd-23ff-46f5-be8e-88406de87436 l1f4ii65odul149ichwz Running now None
[]
Running
⚠️ This widget will not auto-update. Last updated: 2026-02-12 14:40:32

Monitor Job Progress¶

The watch() method monitors your docking jobs and updates you on their progress. It will:

  • Check job status at regular intervals
  • Display progress updates
  • Notify you when jobs complete

You can cancel jobs if needed using jobs.cancel().

In [15]:
jobs.watch()
Job Details

Docking brd.pdb to 16 ligands. (2 jobs)

Average speed: 0.00 dockings/minute

Completed: 16 Failed: 0 Remaining: 0
executionID resourceID Status Started At Running Time
81838b8b-40a1-426a-a0ab-04669b3194ee fhe2ul01oew5w3zbn79h Succeeded 20 seconds ago 0 minutes
bb43d0bd-23ff-46f5-be8e-88406de87436 l1f4ii65odul149ichwz Succeeded 20 seconds ago 0 minutes
[ "ligand docked\nligand docked\nligand docked\nligand docked\nligand docked\nligand docked\nligand docked\nligand docked", "ligand docked\nligand docked\nligand docked\nligand docked\nligand docked\nligand docked\nligand docked\nligand docked" ]
Succeeded
⚠️ This widget will not auto-update. Last updated: 2026-02-12 14:40:52

Retrieve Docking Results¶

Once jobs complete, retrieve all poses using get_poses(). This downloads all calculated poses for all ligands in your set.

In [16]:
poses = sim.docking.get_poses()
poses
Downloading files:   0%|          | 0/16 [00:00<?, ?file/s]
Downloading files:  38%|███▊      | 6/16 [00:00<00:00, 56.57file/s]
Downloading files: 100%|██████████| 16/16 [00:00<00:00, 116.42file/s]

Out[16]:

LigandSet with 256 ligands

16 unique SMILES NOT PROTONATED 3D

Properties: Binding Energy, POSE SCORE, SCORE, SMILES, initial_smiles

Use .to_dataframe() to convert to a dataframe, .show_df() to view dataframewith structures, or .show() for 3D visualization, .prepare() to prepare ligands for docking

Convert to DataFrame for Analysis¶

Convert poses to a DataFrame for detailed analysis. This enables:

  • Statistical analysis of binding energies
  • Comparison across ligands
  • Filtering and sorting
  • Export to CSV or other formats
In [17]:
df = poses.to_dataframe()
df
Out[17]:
SCORE POSE SCORE Binding Energy SMILES
0 -12.340761 0.704003 -8.416734 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C...
1 -10.475386 0.622063 -8.215181 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C...
2 -12.177119 0.619683 -8.784129 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C...
3 -12.612789 0.614134 -9.716379 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C...
4 -11.153189 0.598104 -8.464557 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C...
... ... ... ... ...
251 -10.172739 0.558711 -7.040850 C=CC(=O)N1CCN(c2nnc(-c3ccccc3OC)c3cc(-c4c(O)cc...
252 -7.821764 0.542668 -6.298942 C=CC(=O)N1CCN(c2nnc(-c3ccccc3OC)c3cc(-c4c(O)cc...
253 -6.126631 0.542479 -7.659334 C=CC(=O)N1CCN(c2nnc(-c3ccccc3OC)c3cc(-c4c(O)cc...
254 -11.947805 0.518609 -7.863280 C=CC(=O)N1CCN(c2nnc(-c3ccccc3OC)c3cc(-c4c(O)cc...
255 -5.179104 0.508119 -5.374318 C=CC(=O)N1CCN(c2nnc(-c3ccccc3OC)c3cc(-c4c(O)cc...

256 rows × 4 columns

Visualize Statistics of All Poses¶

Create a scatter plot showing all poses from all docked ligands. The plot displays binding energy vs Pose Score. Hover over each point to see details about the ligand and pose.

In [18]:
poses.plot()

Visualize Statistics of Best Poses¶

Display the top pose for each ligand in the protein structure. This gives you a visual overview of how different ligands bind to the protein, helping you identify promising candidates for further study.

In [19]:
top_poses = poses.filter_top_poses()
top_poses.plot()

Show best poses¶

Find the best pose for each ligand and visualize their conformations in the protein structure. This helps identify the most promising binding modes across your ligand set.

In [20]:
sim.protein.show(poses=top_poses)