In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
from dotenv import load_dotenv

load_dotenv()
Out[2]:
True

Docking Workflow¶

This notebook demonstrates how to perform molecular docking using Deep Origin's drug discovery platform. You'll learn how to:

  1. Load and prepare proteins - Load a protein structure and prepare it for docking
  2. Find binding pockets - Identify potential binding sites on the protein
  3. Dock ligands - Perform docking calculations for single or multiple ligands
  4. Monitor jobs - Track the progress of docking calculations
  5. Analyze results - Visualize and filter docking poses

Let's get started!

Setup¶

First, we'll import the necessary Deep Origin drug discovery modules.

In [3]:
from deeporigin.drug_discovery import (
    Complex,
    DATA_DIR,
    Protein,
    LigandSet,
    Ligand,
)
import deeporigin

deeporigin.__version__
Out[3]:
'0.0.0.dev0'
In [4]:
ligand = Ligand.from_smiles(
    "Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C3)N4)nc(OCC34CCCN3CCC4)nc12"
)

props = ligand.admet_properties(use_cache=False)

Load Protein Structure¶

Here we load a protein structure from a PDB file. The Complex object represents a protein-ligand complex and will be used throughout the docking workflow.

In [5]:
protein = Protein.from_file(DATA_DIR / "brd" / "brd.pdb")
sim = Complex(protein=protein)
sim
Out[5]:
Complex(protein=brd with 0 ligands)

Load Ligands¶

Load a set of ligands from a CSV file containing SMILES strings. The LigandSet object allows you to work with multiple ligands at once. You can visualize them in a grid to see what molecules you're working with.

In [6]:
ligands = LigandSet.from_csv(DATA_DIR / "ligands" / "smiles_to_dock.csv")
ligands
Out[6]:

LigandSet with 16 ligands

16 unique SMILES

Properties: initial_smiles

Use .to_dataframe() to convert to a dataframe, .show_df() to view dataframewith structures, or .show() for 3D visualization

In [7]:
ligands.show_grid()
Out[7]:
No description has been provided for this image

Assign Ligands to Complex¶

Associate the ligands with the protein complex. This prepares the system for docking calculations.

In [8]:
sim.ligands = ligands
sim
Out[8]:
Complex(protein=brd with 16 ligands)

Visualize the Protein¶

Display the protein structure in 3D. This helps you understand the protein's structure before proceeding with docking.

In [9]:
sim.protein.show()

Prepare the Protein¶

Before docking, we need to prepare the protein structure. Water molecules are typically removed from crystal structures as they can interfere with docking calculations.

In [10]:
sim.protein.remove_water()

sim.protein.show()

Find Pockets¶

The find_pockets() method of Protein uses computational methods to detect cavities and potential binding sites on the protein surface.

In [11]:
pockets = sim.protein.find_pockets(pocket_count=1)
sim.protein.show(pockets=pockets)

Inspect Binding Pockets¶

View the detected binding pockets. Each pocket represents a potential binding site. You'll typically want to dock ligands into the most promising pocket (often the largest or most druggable one).

In [12]:
pockets
Out[12]:
[Pocket:
 ╭─────────────────────────┬──────────────╮
 │ Name                    │ pocket_1     │
 ├─────────────────────────┼──────────────┤
 │ Color                   │ red          │
 ├─────────────────────────┼──────────────┤
 │ Volume                  │ 382.0 ų     │
 ├─────────────────────────┼──────────────┤
 │ Total SASA              │ 1383.8657 Ų │
 ├─────────────────────────┼──────────────┤
 │ Polar SASA              │ 372.27518 Ų │
 ├─────────────────────────┼──────────────┤
 │ Polar/Apolar SASA ratio │ 0.36800975   │
 ├─────────────────────────┼──────────────┤
 │ Hydrophobicity          │ 30.518518    │
 ├─────────────────────────┼──────────────┤
 │ Polarity                │ 11.0         │
 ├─────────────────────────┼──────────────┤
 │ Drugability score       │ 0.94493204   │
 ╰─────────────────────────┴──────────────╯]

Single Ligand Docking Example¶

Let's start with a simple example: docking a single ligand into a pocket. This demonstrates the basic docking workflow:

  1. Dock the ligand - Calculate possible binding poses
  2. View the poses - Visualize the docked conformations
  3. Analyze results - Examine binding energies and scores
  4. Filter top poses - Select the best binding pose

The dock() function returns a LigandSet object containing all calculated binding poses.

In [13]:
poses = sim.protein.dock(
    pocket=pockets[0],
    ligand=sim.ligands[0],
)

View Docking Poses¶

Visualize all the calculated poses for the ligand. Each pose represents a different binding conformation with its own binding energy and score.

In [14]:
sim.protein.show(poses=poses)

Analyze Docking Results¶

Convert the poses to a pandas DataFrame for detailed analysis. This allows you to:

  • Compare binding energies across poses
  • Examine pose scores
  • Filter and sort poses based on various criteria
In [15]:
poses.to_dataframe()
Out[15]:
SMILES SCORE Binding Energy POSE SCORE
0 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -12.340761 -8.416734 0.7040031
1 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -10.475386 -8.215181 0.6220628
2 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -12.177119 -8.784129 0.61968327
3 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -12.612789 -9.716379 0.6141343
4 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -11.153189 -8.464557 0.59810394
5 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -10.680066 -7.884676 0.59509414
6 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -9.727909 -8.051216 0.58927166
7 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -11.234045 -7.577743 0.57984835
8 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -11.777233 -8.14088 0.5768367
9 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -9.5387745 -6.8403416 0.5655857
10 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -10.329558 -7.387606 0.56298816
11 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -8.893742 -6.2199383 0.5438366
12 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -8.091473 -5.6996293 0.51211494
13 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -10.145433 -7.51261 0.49029335
14 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -8.0155945 -6.0591693 0.48096564
15 Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... -11.608778 -8.121737 0.42745712
In [16]:
poses
Out[16]:

LigandSet with 16 poses

SMILES: Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C3)N4)nc(OCC34CCCN3CCC4)nc12

Properties: Binding Energy, POSE SCORE, SCORE, SMILES, _Name, _SMILES, initial_smiles

Use .to_dataframe() to convert to a dataframe, .show_df() to view dataframewith structures, or .show() for 3D visualization

Filter Best Poses¶

Select the top pose (best binding conformation) for the ligand. The filter_top_poses() method selects poses based on binding energy and score criteria.

In [17]:
top_pose = poses.filter_top_poses()
sim.protein.show(poses=top_pose)
In [ ]: