%load_ext autoreload
%autoreload 2
from dotenv import load_dotenv
load_dotenv()
True
Docking Workflow¶
This notebook demonstrates how to perform molecular docking using Deep Origin's drug discovery platform. You'll learn how to:
- Load and prepare proteins - Load a protein structure and prepare it for docking
- Find binding pockets - Identify potential binding sites on the protein
- Dock ligands - Perform docking calculations for single or multiple ligands
- Monitor jobs - Track the progress of docking calculations
- Analyze results - Visualize and filter docking poses
Let's get started!
Setup¶
First, we'll import the necessary Deep Origin drug discovery modules.
from deeporigin.drug_discovery import (
Complex,
DATA_DIR,
Protein,
LigandSet,
Ligand,
)
import deeporigin
deeporigin.__version__
'0.0.0.dev0'
ligand = Ligand.from_smiles(
"Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C3)N4)nc(OCC34CCCN3CCC4)nc12"
)
props = ligand.admet_properties(use_cache=False)
Load Protein Structure¶
Here we load a protein structure from a PDB file. The Complex object represents a protein-ligand complex and will be used throughout the docking workflow.
protein = Protein.from_file(DATA_DIR / "brd" / "brd.pdb")
sim = Complex(protein=protein)
sim
Complex(protein=brd with 0 ligands)
Load Ligands¶
Load a set of ligands from a CSV file containing SMILES strings. The LigandSet object allows you to work with multiple ligands at once. You can visualize them in a grid to see what molecules you're working with.
ligands = LigandSet.from_csv(DATA_DIR / "ligands" / "smiles_to_dock.csv")
ligands
LigandSet with 16 ligands
16 unique SMILES
Properties: initial_smiles
Use .to_dataframe() to convert to a dataframe, .show_df() to view dataframewith structures, or .show() for 3D visualization
ligands.show_grid()
Assign Ligands to Complex¶
Associate the ligands with the protein complex. This prepares the system for docking calculations.
sim.ligands = ligands
sim
Complex(protein=brd with 16 ligands)
Visualize the Protein¶
Display the protein structure in 3D. This helps you understand the protein's structure before proceeding with docking.
sim.protein.show()
Prepare the Protein¶
Before docking, we need to prepare the protein structure. Water molecules are typically removed from crystal structures as they can interfere with docking calculations.
sim.protein.remove_water()
sim.protein.show()
Find Pockets¶
The find_pockets() method of Protein uses computational methods to detect cavities and potential binding sites on the protein surface.
pockets = sim.protein.find_pockets(pocket_count=1)
sim.protein.show(pockets=pockets)
Inspect Binding Pockets¶
View the detected binding pockets. Each pocket represents a potential binding site. You'll typically want to dock ligands into the most promising pocket (often the largest or most druggable one).
pockets
[Pocket: ╭─────────────────────────┬──────────────╮ │ Name │ pocket_1 │ ├─────────────────────────┼──────────────┤ │ Color │ red │ ├─────────────────────────┼──────────────┤ │ Volume │ 382.0 ų │ ├─────────────────────────┼──────────────┤ │ Total SASA │ 1383.8657 Ų │ ├─────────────────────────┼──────────────┤ │ Polar SASA │ 372.27518 Ų │ ├─────────────────────────┼──────────────┤ │ Polar/Apolar SASA ratio │ 0.36800975 │ ├─────────────────────────┼──────────────┤ │ Hydrophobicity │ 30.518518 │ ├─────────────────────────┼──────────────┤ │ Polarity │ 11.0 │ ├─────────────────────────┼──────────────┤ │ Drugability score │ 0.94493204 │ ╰─────────────────────────┴──────────────╯]
Single Ligand Docking Example¶
Let's start with a simple example: docking a single ligand into a pocket. This demonstrates the basic docking workflow:
- Dock the ligand - Calculate possible binding poses
- View the poses - Visualize the docked conformations
- Analyze results - Examine binding energies and scores
- Filter top poses - Select the best binding pose
The dock() function returns a LigandSet object containing all calculated binding poses.
poses = sim.protein.dock(
pocket=pockets[0],
ligand=sim.ligands[0],
)
View Docking Poses¶
Visualize all the calculated poses for the ligand. Each pose represents a different binding conformation with its own binding energy and score.
sim.protein.show(poses=poses)
Analyze Docking Results¶
Convert the poses to a pandas DataFrame for detailed analysis. This allows you to:
- Compare binding energies across poses
- Examine pose scores
- Filter and sort poses based on various criteria
poses.to_dataframe()
| SMILES | SCORE | Binding Energy | POSE SCORE | |
|---|---|---|---|---|
| 0 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -12.340761 | -8.416734 | 0.7040031 |
| 1 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -10.475386 | -8.215181 | 0.6220628 |
| 2 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -12.177119 | -8.784129 | 0.61968327 |
| 3 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -12.612789 | -9.716379 | 0.6141343 |
| 4 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -11.153189 | -8.464557 | 0.59810394 |
| 5 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -10.680066 | -7.884676 | 0.59509414 |
| 6 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -9.727909 | -8.051216 | 0.58927166 |
| 7 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -11.234045 | -7.577743 | 0.57984835 |
| 8 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -11.777233 | -8.14088 | 0.5768367 |
| 9 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -9.5387745 | -6.8403416 | 0.5655857 |
| 10 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -10.329558 | -7.387606 | 0.56298816 |
| 11 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -8.893742 | -6.2199383 | 0.5438366 |
| 12 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -8.091473 | -5.6996293 | 0.51211494 |
| 13 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -10.145433 | -7.51261 | 0.49029335 |
| 14 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -8.0155945 | -6.0591693 | 0.48096564 |
| 15 | Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C... | -11.608778 | -8.121737 | 0.42745712 |
poses
LigandSet with 16 poses
SMILES: Fc1c(-c2cccc3ccccc23)ncc2c(N3C[C@H]4CC[C@@H](C3)N4)nc(OCC34CCCN3CCC4)nc12
Properties: Binding Energy, POSE SCORE, SCORE, SMILES, _Name, _SMILES, initial_smiles
Use .to_dataframe() to convert to a dataframe, .show_df() to view dataframewith structures, or .show() for 3D visualization
Filter Best Poses¶
Select the top pose (best binding conformation) for the ligand. The filter_top_poses() method selects poses based on binding energy and score criteria.
top_pose = poses.filter_top_poses()
sim.protein.show(poses=top_pose)