Skip to content

deeporigin.drug_discovery.structures

Classes

Ligand dataclass

A class representing a ligand molecule in drug discovery workflows. The Ligand class provides functionality to create, manipulate, and analyze small molecules (ligands) in computational drug discovery. It supports various input formats and provides methods for property prediction, visualization, and file operations.

Attributes:

Name Type Description
identifier Optional[str]

Ligand identifier (e.g., PubChem ID)

file_path Optional[str]

Path to the ligand file

smiles Optional[str]

SMILES string representing the ligand

block_type Optional[str]

Format of the block content ('mol', 'mol2', 'sdf', 'pdb')

block_content Optional[str]

String containing the molecule data

name Optional[str]

Optional name of the ligand

seed Optional[int]

Random seed for coordinate generation

xref_protein Optional[str]

Cross-reference to protein

xref_ins_code Optional[str]

Cross-reference insertion code

xref_residue_id Optional[str]

Cross-reference residue ID

xref_protein_chain_id Optional[str]

Cross-reference protein chain ID

save_to_file bool

Whether to save the ligand to file

properties dict

Dictionary of ligand properties

mol Optional[Molecule]

Direct Molecule object initialization

Examples:

>>> # Create from SMILES
>>> ligand = Ligand.from_smiles("CCO", name="Ethanol")
>>> # Create from SDF file
>>> ligand = Ligand.from_sdf("ligand.sdf")
>>> # Get properties
>>> center = ligand.get_center()
>>> props = ligand.admet_properties()
>>> # Visualize
>>> ligand.visualize()
>>> # Save to file
>>> ligand.write_to_file("output.pdb")

Attributes

atom_types property
atom_types
available_for_docking class-attribute instance-attribute
available_for_docking: bool = field(
    init=False, default=True
)
block_content class-attribute instance-attribute
block_content: str | None = None
block_type class-attribute instance-attribute
block_type: str | None = None
coordinates property
coordinates
file_path class-attribute instance-attribute
file_path: str | None = None
hac class-attribute instance-attribute
hac: int = field(init=False, default=0)
identifier class-attribute instance-attribute
identifier: str | None = None
mol class-attribute instance-attribute
mol: Molecule | None = None
name class-attribute instance-attribute
name: str | None = None
properties class-attribute instance-attribute
properties: dict = field(default_factory=dict)
protonated_smiles class-attribute instance-attribute
protonated_smiles: str | None = field(
    init=False, default=None
)
save_to_file class-attribute instance-attribute
save_to_file: bool = False
seed class-attribute instance-attribute
seed: int | None = None
smiles class-attribute instance-attribute
smiles: str | None = None
xref_ins_code class-attribute instance-attribute
xref_ins_code: str | None = None
xref_protein class-attribute instance-attribute
xref_protein: str | None = None
xref_protein_chain_id class-attribute instance-attribute
xref_protein_chain_id: str | None = None
xref_residue_id class-attribute instance-attribute
xref_residue_id: str | None = None

Functions

admet_properties
admet_properties() -> str

Predict ADMET properties for the ligand.

Returns:

Name Type Description
str str

A string containing the predicted ADMET properties.

convert_to_sdf classmethod
convert_to_sdf(block_content: str, block_type: str) -> str

Convert a ligand block content to SDF format.

Parameters:

Name Type Description Default
block_content str

The block content of the ligand.

required
block_type str

The type of the block content.

required

Returns:

Name Type Description
str str

The ligand block content in SDF format.

draw
draw()

Draw the ligand molecule.

Example:

ligand.draw()

from_block_content classmethod
from_block_content(
    block_content: str,
    block_type: str,
    name: str = "",
    save_to_file: bool = False,
    **kwargs: Any
) -> Ligand

Create a Ligand instance from block content.

Parameters:

Name Type Description Default
block_content str

String containing the molecule data

required
block_type str

Format of the block content ('mol', 'mol2', 'sdf', 'pdb')

required
name str

Name of the ligand. Defaults to "".

''
save_to_file bool

Whether to save the ligand to file. Defaults to False.

False
**kwargs Any

Additional arguments to pass to the constructor

{}

Returns:

Name Type Description
Ligand Ligand

A new Ligand instance

from_csv classmethod
from_csv(
    file_path: str, smiles_column: str = "smiles"
) -> list[Ligand]

Create Ligand instances from a CSV file containing SMILES strings and additional properties.

Parameters:

Name Type Description Default
file_path str

The path to the CSV file.

required
smiles_column str

The name of the column containing SMILES strings. Defaults to "smiles".

'smiles'

Returns:

Type Description
list[Ligand]

list[Ligand]: A list of Ligand instances created from the CSV file.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

ValueError

If the CSV does not contain the specified smiles column or if SMILES strings are invalid.

from_identifier classmethod
from_identifier(
    identifier: str,
    name: Optional[str] = None,
    save_to_file: bool = False,
    **kwargs: Any
) -> Ligand

Create a Ligand instance from a chemical identifier.

Parameters:

Name Type Description Default
identifier str

Chemical identifier (e.g., common name, PubChem name, drug name)

required
name str

Name of the ligand. If not provided, uses the identifier. Defaults to "".

None
save_to_file bool

Whether to save the ligand to file. Defaults to False.

False
**kwargs Any

Additional arguments to pass to the constructor

{}

Returns:

Name Type Description
Ligand Ligand

A new Ligand instance initialized from the chemical identifier

Example
Create ATP molecule

atp = Ligand.from_identifier("ATP", name="ATP")

Create serotonin molecule

serotonin = Ligand.from_identifier( ... identifier="serotonin", ... name="Serotonin" ... )

Raises:

Type Description
DeepOriginException

If the identifier cannot be resolved to a valid molecule

from_rdkit_mol classmethod
from_rdkit_mol(
    mol: Mol,
    name: str = "",
    save_to_file: bool = False,
    **kwargs: Any
) -> Ligand

Create a Ligand instance from an RDKit Mol object.

Parameters:

Name Type Description Default
mol Mol

RDKit molecule object to convert to a Ligand

required
name str

Name of the ligand. Defaults to "".

''
save_to_file bool

Whether to save the ligand to file. Defaults to False.

False
**kwargs Any

Additional arguments to pass to the constructor

{}

Returns:

Name Type Description
Ligand Ligand

A new Ligand instance initialized from the RDKit molecule

Example

from rdkit import Chem mol = Chem.MolFromSmiles("CCO") ligand = Ligand.from_rdkit_mol(mol, name="Ethanol")

from_sdf classmethod
from_sdf(
    file_path: str,
    *,
    sanitize: bool = True,
    removeHs: bool = False
) -> Union[List[Ligand], Ligand]

Create Ligand instances from an SDF file.

Parameters:

Name Type Description Default
file_path str

The path to the SDF file.

required

Returns:

Type Description
Union[List[Ligand], Ligand]

list[Ligand]: A list of Ligand instances created from the SDF file.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

ValueError

If the file cannot be parsed correctly.

from_smiles classmethod
from_smiles(
    smiles: str,
    name: str = "",
    save_to_file: bool = False,
    **kwargs: Any
) -> Ligand

Create a Ligand instance from a SMILES string.

Parameters:

Name Type Description Default
smiles str

SMILES string representing the ligand

required
name str

Name of the ligand. Defaults to "".

''
save_to_file bool

Whether to save the ligand to file. Defaults to False.

False
**kwargs Any

Additional arguments to pass to the constructor

{}

Returns:

Name Type Description
Ligand Ligand

A new Ligand instance

Example

ligand = Ligand.from_smiles( ... smiles="CCO", # Ethanol ... name="Ethanol", ... save_to_file=False ... ) print(ligand.smiles) CCO

get_center
get_center() -> Optional[list[float]]

Get the center of the ligand based on its coordinates.

Returns: - list: The center coordinates of the ligand. - None: If coordinates are not available.

Example:

center = ligand.get_center()
print(center)  # Output: [1.23, 4.56, 7.89]

get_property
get_property(prop_name: str)

Get the value of a property for the ligand molecule.

Parameters: - prop_name (str): Name of the property to retrieve.

Returns: - The value of the property if it exists, otherwise None.

Example:

binding_affinity = ligand.get_property("BindingAffinity")

protonate
protonate(pH: float = 7.4, filter_percentage: float = 1)

Predicts the right protonation of a molecule at given pH value.

Parameters: - entry: A single or multiple ligands represented as SMILES or Ligand instances. - pH: pH value of the solvent for concentration calculation. Default is 7.4. - filter_percentage: Percentage threshold for filtering low concentration states. Default is 1.

Returns: - ProtonationReport: A ProtonationReport instance.

protonate_molecules classmethod
protonate_molecules(ligands)

Predicts the right protonation of a molecule at given pH value.

Parameters: - entry: A single or multiple ligands represented as SMILES or Ligand instances. - pH: pH value of the solvent for concentration calculation. Default is 7.4. - filter_percentage: Percentage threshold for filtering low concentration states. Default is 1.

Returns: - ProtonationReport: A ProtonationReport instance.

set_property
set_property(prop_name: str, prop_value)

Set a property for the ligand molecule.

Parameters: - prop_name (str): Name of the property. - prop_value: Value of the property.

Example:

ligand.set_property("BindingAffinity", 5.6)

show
show() -> str

Visualize the current state of the ligand molecule.

Returns: - str: HTML representation of the visualization.

Raises: - Exception: If visualization fails.

Example:

ligand.show()

update_coordinates
update_coordinates(coords: ndarray)

update coordinates of the ligand structure

visualize_ligands classmethod
visualize_ligands(ligands: list[Ligand])

Visualize ligands.

Parameters:

Name Type Description Default
ligands list[Ligand]

list["Ligand"]: The list of ligands objects to visualize.

required

Raises:

Type Description
FileNotFoundError

If the file does not exist.

ValueError

If the file cannot be parsed correctly.

visualize_ligands_from_sdf classmethod
visualize_ligands_from_sdf(file_path: str)

Visualize ligands from an SDF file.

Parameters:

Name Type Description Default
file_path str

The path to the SDF file.

required

Raises:

Type Description
FileNotFoundError

If the file does not exist.

ValueError

If the file cannot be parsed correctly.

write_to_file
write_to_file(
    output_path: str = "", output_format: str = ""
)

Writes the ligand molecule to a file, including all properties.

Parameters: - output_path (str): Path where the ligand will be written.

Raises: - ValueError: If the file extension is unsupported. - Exception: If writing to the file fails.

Example:

ligand.write_to_file('/path/to/output.pdb')

Pocket dataclass

A class representing a binding pocket in a protein structure.

Attributes

block_content class-attribute instance-attribute
block_content: str = ''
block_type class-attribute instance-attribute
block_type: str = ''
color class-attribute instance-attribute
color: str = 'red'
coordinates class-attribute instance-attribute
coordinates: Optional[ndarray] = None
file_path class-attribute instance-attribute
file_path: Optional[Path] = None
index class-attribute instance-attribute
index: Optional[int] = 0
name class-attribute instance-attribute
name: Optional[str] = None
pdb_id class-attribute instance-attribute
pdb_id: Optional[str] = None
props class-attribute instance-attribute
props: Optional[Dict[str, Any]] = field(
    default_factory=dict
)
structure class-attribute instance-attribute
structure: Optional[ndarray] = None

Functions

from_block classmethod
from_block(
    block_content: str,
    block_type: str = "pdb",
    **kwargs: Any
) -> Pocket

Create a Pocket instance from block content.

Parameters:

Name Type Description Default
block_content str

The content of the pocket structure.

required
block_type str

The format of the block content (default: "pdb").

'pdb'
**kwargs Any

Additional arguments to pass to the Pocket constructor.

{}

Returns:

Name Type Description
Pocket Pocket

A new Pocket instance.

Example
pocket = Pocket.from_block(pdb_content, block_type="pdb")
from_file classmethod
from_file(file_path: str, **kwargs: Any) -> Pocket

Create a Pocket instance from a file.

Parameters:

Name Type Description Default
file_path str

Path to the pocket structure file.

required
**kwargs Any

Additional arguments to pass to the Pocket constructor.

{}

Returns:

Name Type Description
Pocket Pocket

A new Pocket instance.

Example
pocket = Pocket.from_file("pocket.pdb")
from_name classmethod
from_name(name: str, **kwargs: Any) -> Pocket

Create a Pocket instance by searching for a file with the given name in the pockets directory.

Parameters:

Name Type Description Default
name str

Name of the pocket to search for.

required
**kwargs Any

Additional arguments to pass to the Pocket constructor.

{}

Returns:

Name Type Description
Pocket Pocket

A new Pocket instance.

Example
pocket = Pocket.from_name("binding_site_1")
from_pocket_finder_results classmethod
from_pocket_finder_results(
    pocket_finder_results_dir: str | Path,
) -> List[Pocket]

Create a list of Pocket objects from pocket finder results directory.

Parameters:

Name Type Description Default
pocket_finder_results_dir str | Path

Directory containing pocket finder results with PDB files for each pocket and a CSV properties file.

required

Returns:

Type Description
List[Pocket]

List of Pocket objects with properties from the CSV file.

from_structure classmethod
from_structure(
    structure: ndarray,
    name: Optional[str] = None,
    **kwargs: Any
) -> Pocket

Create a Pocket instance directly from a structure array.

Parameters:

Name Type Description Default
structure ndarray

The structure array.

required
name Optional[str]

Name for the pocket.

None
**kwargs Any

Additional arguments to pass to the Pocket constructor.

{}

Returns:

Name Type Description
Pocket Pocket

A new Pocket instance.

Example
pocket = Pocket.from_structure(atom_array, name="binding_site")
get_center
get_center() -> ndarray

Get the center of the pocket based on its coordinates.

Returns:

Type Description
ndarray

np.ndarray: A numpy array containing the center of the pocket.

get_directory staticmethod
get_directory() -> str

Generates and ensures the existence of a directory for pockets.

Returns:

Name Type Description
str str

The path to the pockets directory.

load_structure
load_structure(structure_file_path: str | Path) -> None

Load a PDB structure from a file path into the structure attribute.

Parameters:

Name Type Description Default
structure_file_path str | Path

Path to the PDB file.

required
load_structure_from_block
load_structure_from_block(
    block_content: str, block_type: str
)

Load a pocket structure from block content.

Parameters: - block_content (str): String containing the pocket data. - block_type (str): Format of the block content ('pdb').

Returns: - AtomArray: Loaded structure.

Raises: - ValueError: If the block type is unsupported.

pocket_props
pocket_props()

get the properties of the pocket

show
show()

show the pocket in a jupyter notebook

update_coordinates
update_coordinates(coords: ndarray)

update coordinates of the pocket structure

write_to_file
write_to_file(output_path: str, output_format: str = 'pdb')

Write the current state of the structure to a PDB file.

Parameters: - file_path (str): Path where the pocket structure will be written.

Example:

pocket.write_to_file('/path/to/output.pdb')

Protein dataclass

A class representing a protein structure with various manipulation and analysis capabilities.

Attributes

atom_types class-attribute instance-attribute
atom_types: Optional[ndarray] = None
block_content class-attribute instance-attribute
block_content: Optional[str] = None
block_type class-attribute instance-attribute
block_type: str = 'pdb'
coordinates property
coordinates
file_path class-attribute instance-attribute
file_path: Optional[Path] = None
info class-attribute instance-attribute
info: Optional[dict] = None
name instance-attribute
name: str
pdb_id class-attribute instance-attribute
pdb_id: Optional[str] = None
structure class-attribute instance-attribute
structure: ndarray = field(repr=False)

Functions

dock
dock(*, ligand: Ligand, pocket: Pocket) -> str

Dock a ligand into a specific pocket of the protein.

This method performs molecular docking of a ligand into a specified pocket of the protein structure. It uses the Deep Origin docking to generate a 3D structure of the docked ligand.

Parameters:

Name Type Description Default
ligand Ligand

The ligand to dock into the protein pocket.

required
pocket Pocket

The specific pocket in the protein where the ligand should be docked.

required

Returns:

Name Type Description
str str

Path to the SDF file containing the docked ligand structure.

download_protein_by_pdb_id staticmethod
download_protein_by_pdb_id(
    pdb_id: str, save_dir: str = ""
) -> str

Download a PDB structure by its PDB ID from RCSB.

extract_metals_and_cofactors
extract_metals_and_cofactors() -> (
    Tuple[list[str], list[str]]
)

Extract metal ions and cofactors from the protein structure.

Returns:

Type Description
Tuple[list[str], list[str]]

Tuple[list[str], list[str]]

find_pockets
find_pockets(
    pocket_count: int = 5, pocket_min_size: int = 30
) -> list[Pocket]

Find potential binding pockets in the protein structure.

This method analyzes the protein structure to identify cavities or pockets that could potentially serve as binding sites for ligands. It uses the Deep Origin pocket finding algorithm to detect and characterize these pockets.

Parameters:

Name Type Description Default
pocket_count int

Maximum number of pockets to identify. Defaults to 5.

5
pocket_min_size int

Minimum size of pockets to consider, measured in cubic Angstroms. Defaults to 30.

30

Returns:

Type Description
list[Pocket]

list[Pocket]: A list of Pocket objects, each representing a potential binding site in the protein. Each Pocket object contains: - The 3D structure of the pocket - Properties such as volume, surface area, hydrophobicity, etc. - Visualization parameters (color, etc.)

Examples:

>>> protein = Protein(file="protein.pdb")
>>> pockets = protein.find_pockets(pocket_count=3, pocket_min_size=50)
>>> for pocket in pockets:
...     print(f"Pocket: {pocket.name}, Volume: {pocket.properties.get('volume')} ų")
from_file classmethod
from_file(file_path: str, struct_ind: int = 0) -> Protein

Create a Protein instance from a file.

Parameters:

Name Type Description Default
file_path str

Path to the protein PDB file.

required
struct_ind int

Index of the structure to select if multiple are present.

0

Returns:

Name Type Description
Protein Protein

A new Protein instance.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

ValueError

If the structure cannot be loaded.

RuntimeError

If the file cannot be read or processed.

from_pdb_id classmethod
from_pdb_id(pdb_id: str, struct_ind: int = 0) -> Protein

Create a Protein instance from a PDB ID.

Parameters:

Name Type Description Default
pdb_id str

PDB ID of the protein to download.

required
struct_ind int

Index of the structure to select if multiple are present.

0

Returns:

Name Type Description
Protein Protein

A new Protein instance.

Raises:

Type Description
ValueError

If the PDB ID is invalid or the structure cannot be loaded.

RuntimeError

If the download fails.

get_center_by_residues
get_center_by_residues(residues: list[str]) -> ndarray

Get the center of the protein structure based on specific residues.

Parameters:

Name Type Description Default
residues list[str]

List of residue names to include in the calculation.

required
get_directory staticmethod
get_directory() -> str

Get the directory for storing protein files.

list_chain_names
list_chain_names() -> list[str]

List all unique chain IDs in the protein structure.

Returns:

Type Description
list[str]

list[str]: A list of unique chain IDs.

list_hetero_names
list_hetero_names(exclude_water=True) -> list[str]

List all unique hetero residue names in the protein structure.

Parameters:

Name Type Description Default
exclude_water bool

Whether to exclude water molecules from the list.

True

Returns:

Type Description
list[str]

list[str]: A list of unique ligand residue names (excluding water).

load_structure_from_block staticmethod
load_structure_from_block(
    block_content: str, block_type: str
) -> ndarray

Load a protein structure from block content.

remove_hetatm
remove_hetatm(
    keep_resnames: Optional[list[str]] = None,
    remove_metals: Optional[list[str]] = None,
) -> None

Remove HETATM records from the protein structure, with options to retain specified residues or exclude certain metals.

Parameters

keep_resnames : Optional[list[str]] A list of residue names (strings) to keep in the structure even if they are HETATM records. exclude_metals : Optional[list[str]] A list of metal names (strings) to exclude from removal. These metals will be retained in the structure.

Notes
  • By default, a predefined list of metals is considered for removal unless specified in exclude_metals.
  • If keep_resnames is provided, those residues (along with any metals not excluded) will be retained even if they are HETATM records.
  • The method updates the current protein object in place.

Example:

    protein = Protein(structure)
    protein.remove_hetatm(keep_resnames=['HOH'], exclude_metals=['ZN'])

remove_resnames
remove_resnames(
    exclude_resnames: Optional[list[str]] = None,
) -> None

Remove specific residue names from the protein structure in place.

Parameters:

Name Type Description Default
exclude_resnames Optional[list[str]]

List of residue names to exclude.

None
remove_water
remove_water() -> None

Remove water molecules from the protein structure in place.

Example:

protein.remove_water()

select_chain
select_chain(chain_id: str) -> Optional[Protein]

Select a specific chain by its ID and return a new Protein object.

Parameters: - chain_id (str): Chain ID to select.

Returns: - Protein: A new Protein object containing the selected chain.

Raises: - ValueError: If the chain ID is not found.

Example:

chain_a = protein.select_chain('A')

select_chains
select_chains(chain_ids: list[str]) -> Protein

Select specific chains from the protein structure.

Parameters:

Name Type Description Default
chain_ids list[str]

List of chain IDs to select.

required
select_structure staticmethod
select_structure(structure: ndarray, index: int) -> ndarray

Select a specific structure by index.

show
show(
    pockets: Optional[list[Pocket]] = None,
    sdf_file: Optional[str] = None,
)

Visualize the protein structure in a Jupyter notebook using MolStar viewer.

This method provides interactive 3D visualization of the protein structure with optional highlighting of binding pockets and docked ligands. The visualization is rendered directly in Jupyter notebooks using the MolStar viewer.

Parameters:

Name Type Description Default
pockets Optional[list[Pocket]]

List of Pocket objects to highlight in the visualization. Each pocket will be displayed with its defined color and transparency. Defaults to None.

None
sdf_file Optional[str]

Path to an SDF file containing docked ligand structures. When provided, the ligands will be displayed alongside the protein structure. Defaults to None.

None

Examples:

Visualize protein structure only
>>> protein = Protein(file="protein.pdb")
>>> protein.show()
Visualize protein with highlighted pockets
>>> pockets = protein.find_pockets(pocket_count=3)
>>> protein.show(pockets=pockets)
Visualize protein with docked ligands
>>> protein.show(sdf_file="docked_ligands.sdf")
Visualize protein with both pockets and docked ligands
>>> protein.show(pockets=pockets, sdf_file="docked_ligands.sdf")
Notes
  • When pockets are provided, they are displayed with semi-transparent surfaces (alpha=0.7) while the protein is shown with a more transparent surface (alpha=0.1)
  • The protein is displayed in cartoon representation when pockets are shown
  • When an SDF file is provided, the visualization includes both the protein and the docked ligands in their respective binding poses
to_pdb
to_pdb(file_path: str)

Write the protein structure to a PDB file.

Parameters:

Name Type Description Default
file_path str

Path where the PDB file will be written.

required

Example:

protein.to_pdb('/path/to/output.pdb')

update_coordinates
update_coordinates(coords: ndarray)

update coordinates of the protein structure