deeporigin.drug_discovery.structures
¶
Classes¶
Ligand
dataclass
¶
A class representing a ligand molecule in drug discovery workflows. The Ligand class provides functionality to create, manipulate, and analyze small molecules (ligands) in computational drug discovery. It supports various input formats and provides methods for property prediction, visualization, and file operations.
Attributes:
Name | Type | Description |
---|---|---|
identifier |
Optional[str]
|
Ligand identifier (e.g., PubChem ID) |
file_path |
Optional[str]
|
Path to the ligand file |
smiles |
Optional[str]
|
SMILES string representing the ligand |
block_type |
Optional[str]
|
Format of the block content ('mol', 'mol2', 'sdf', 'pdb') |
block_content |
Optional[str]
|
String containing the molecule data |
name |
Optional[str]
|
Optional name of the ligand |
seed |
Optional[int]
|
Random seed for coordinate generation |
xref_protein |
Optional[str]
|
Cross-reference to protein |
xref_ins_code |
Optional[str]
|
Cross-reference insertion code |
xref_residue_id |
Optional[str]
|
Cross-reference residue ID |
xref_protein_chain_id |
Optional[str]
|
Cross-reference protein chain ID |
save_to_file |
bool
|
Whether to save the ligand to file |
properties |
dict
|
Dictionary of ligand properties |
mol |
Optional[Molecule]
|
Direct Molecule object initialization |
Examples:
>>> # Create from SMILES
>>> ligand = Ligand.from_smiles("CCO", name="Ethanol")
>>> # Create from SDF file
>>> ligand = Ligand.from_sdf("ligand.sdf")
>>> # Get properties
>>> center = ligand.get_center()
>>> props = ligand.admet_properties()
>>> # Visualize
>>> ligand.visualize()
>>> # Save to file
>>> ligand.write_to_file("output.pdb")
Attributes¶
available_for_docking
class-attribute
instance-attribute
¶
available_for_docking: bool = field(
init=False, default=True
)
protonated_smiles
class-attribute
instance-attribute
¶
protonated_smiles: str | None = field(
init=False, default=None
)
Functions¶
admet_properties
¶
admet_properties() -> str
Predict ADMET properties for the ligand.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
A string containing the predicted ADMET properties. |
convert_to_sdf
classmethod
¶
convert_to_sdf(block_content: str, block_type: str) -> str
Convert a ligand block content to SDF format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block_content
|
str
|
The block content of the ligand. |
required |
block_type
|
str
|
The type of the block content. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The ligand block content in SDF format. |
from_block_content
classmethod
¶
from_block_content(
block_content: str,
block_type: str,
name: str = "",
save_to_file: bool = False,
**kwargs: Any
) -> Ligand
Create a Ligand instance from block content.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block_content
|
str
|
String containing the molecule data |
required |
block_type
|
str
|
Format of the block content ('mol', 'mol2', 'sdf', 'pdb') |
required |
name
|
str
|
Name of the ligand. Defaults to "". |
''
|
save_to_file
|
bool
|
Whether to save the ligand to file. Defaults to False. |
False
|
**kwargs
|
Any
|
Additional arguments to pass to the constructor |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Ligand |
Ligand
|
A new Ligand instance |
from_csv
classmethod
¶
from_csv(
file_path: str, smiles_column: str = "smiles"
) -> list[Ligand]
Create Ligand instances from a CSV file containing SMILES strings and additional properties.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
The path to the CSV file. |
required |
smiles_column
|
str
|
The name of the column containing SMILES strings. Defaults to "smiles". |
'smiles'
|
Returns:
Type | Description |
---|---|
list[Ligand]
|
list[Ligand]: A list of Ligand instances created from the CSV file. |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If the file does not exist. |
ValueError
|
If the CSV does not contain the specified smiles column or if SMILES strings are invalid. |
from_identifier
classmethod
¶
from_identifier(
identifier: str,
name: Optional[str] = None,
save_to_file: bool = False,
**kwargs: Any
) -> Ligand
Create a Ligand instance from a chemical identifier.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
identifier
|
str
|
Chemical identifier (e.g., common name, PubChem name, drug name) |
required |
name
|
str
|
Name of the ligand. If not provided, uses the identifier. Defaults to "". |
None
|
save_to_file
|
bool
|
Whether to save the ligand to file. Defaults to False. |
False
|
**kwargs
|
Any
|
Additional arguments to pass to the constructor |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Ligand |
Ligand
|
A new Ligand instance initialized from the chemical identifier |
Example
Create ATP molecule¶
atp = Ligand.from_identifier("ATP", name="ATP")
Create serotonin molecule¶
serotonin = Ligand.from_identifier( ... identifier="serotonin", ... name="Serotonin" ... )
Raises:
Type | Description |
---|---|
DeepOriginException
|
If the identifier cannot be resolved to a valid molecule |
from_rdkit_mol
classmethod
¶
from_rdkit_mol(
mol: Mol,
name: str = "",
save_to_file: bool = False,
**kwargs: Any
) -> Ligand
Create a Ligand instance from an RDKit Mol object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mol
|
Mol
|
RDKit molecule object to convert to a Ligand |
required |
name
|
str
|
Name of the ligand. Defaults to "". |
''
|
save_to_file
|
bool
|
Whether to save the ligand to file. Defaults to False. |
False
|
**kwargs
|
Any
|
Additional arguments to pass to the constructor |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Ligand |
Ligand
|
A new Ligand instance initialized from the RDKit molecule |
Example
from rdkit import Chem mol = Chem.MolFromSmiles("CCO") ligand = Ligand.from_rdkit_mol(mol, name="Ethanol")
from_sdf
classmethod
¶
from_sdf(
file_path: str,
*,
sanitize: bool = True,
removeHs: bool = False
) -> Union[List[Ligand], Ligand]
Create Ligand instances from an SDF file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
The path to the SDF file. |
required |
Returns:
Type | Description |
---|---|
Union[List[Ligand], Ligand]
|
list[Ligand]: A list of Ligand instances created from the SDF file. |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If the file does not exist. |
ValueError
|
If the file cannot be parsed correctly. |
from_smiles
classmethod
¶
from_smiles(
smiles: str,
name: str = "",
save_to_file: bool = False,
**kwargs: Any
) -> Ligand
Create a Ligand instance from a SMILES string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
smiles
|
str
|
SMILES string representing the ligand |
required |
name
|
str
|
Name of the ligand. Defaults to "". |
''
|
save_to_file
|
bool
|
Whether to save the ligand to file. Defaults to False. |
False
|
**kwargs
|
Any
|
Additional arguments to pass to the constructor |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Ligand |
Ligand
|
A new Ligand instance |
Example
ligand = Ligand.from_smiles( ... smiles="CCO", # Ethanol ... name="Ethanol", ... save_to_file=False ... ) print(ligand.smiles) CCO
get_center
¶
get_center() -> Optional[list[float]]
Get the center of the ligand based on its coordinates.
Returns: - list: The center coordinates of the ligand. - None: If coordinates are not available.
Example:
center = ligand.get_center()
print(center) # Output: [1.23, 4.56, 7.89]
get_property
¶
get_property(prop_name: str)
Get the value of a property for the ligand molecule.
Parameters: - prop_name (str): Name of the property to retrieve.
Returns: - The value of the property if it exists, otherwise None.
Example:
binding_affinity = ligand.get_property("BindingAffinity")
protonate
¶
protonate(pH: float = 7.4, filter_percentage: float = 1)
Predicts the right protonation of a molecule at given pH value.
Parameters: - entry: A single or multiple ligands represented as SMILES or Ligand instances. - pH: pH value of the solvent for concentration calculation. Default is 7.4. - filter_percentage: Percentage threshold for filtering low concentration states. Default is 1.
Returns: - ProtonationReport: A ProtonationReport instance.
protonate_molecules
classmethod
¶
protonate_molecules(ligands)
Predicts the right protonation of a molecule at given pH value.
Parameters: - entry: A single or multiple ligands represented as SMILES or Ligand instances. - pH: pH value of the solvent for concentration calculation. Default is 7.4. - filter_percentage: Percentage threshold for filtering low concentration states. Default is 1.
Returns: - ProtonationReport: A ProtonationReport instance.
set_property
¶
set_property(prop_name: str, prop_value)
Set a property for the ligand molecule.
Parameters: - prop_name (str): Name of the property. - prop_value: Value of the property.
Example:
ligand.set_property("BindingAffinity", 5.6)
show
¶
show() -> str
Visualize the current state of the ligand molecule.
Returns: - str: HTML representation of the visualization.
Raises: - Exception: If visualization fails.
Example:
ligand.show()
visualize_ligands
classmethod
¶
visualize_ligands(ligands: list[Ligand])
Visualize ligands.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ligands
|
list[Ligand]
|
list["Ligand"]: The list of ligands objects to visualize. |
required |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If the file does not exist. |
ValueError
|
If the file cannot be parsed correctly. |
visualize_ligands_from_sdf
classmethod
¶
visualize_ligands_from_sdf(file_path: str)
Visualize ligands from an SDF file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
The path to the SDF file. |
required |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If the file does not exist. |
ValueError
|
If the file cannot be parsed correctly. |
write_to_file
¶
write_to_file(
output_path: str = "", output_format: str = ""
)
Writes the ligand molecule to a file, including all properties.
Parameters: - output_path (str): Path where the ligand will be written.
Raises: - ValueError: If the file extension is unsupported. - Exception: If writing to the file fails.
Example:
ligand.write_to_file('/path/to/output.pdb')
Pocket
dataclass
¶
A class representing a binding pocket in a protein structure.
Attributes¶
props
class-attribute
instance-attribute
¶
props: Optional[Dict[str, Any]] = field(
default_factory=dict
)
Functions¶
from_block
classmethod
¶
from_block(
block_content: str,
block_type: str = "pdb",
**kwargs: Any
) -> Pocket
Create a Pocket instance from block content.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block_content
|
str
|
The content of the pocket structure. |
required |
block_type
|
str
|
The format of the block content (default: "pdb"). |
'pdb'
|
**kwargs
|
Any
|
Additional arguments to pass to the Pocket constructor. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Pocket |
Pocket
|
A new Pocket instance. |
Example
pocket = Pocket.from_block(pdb_content, block_type="pdb")
from_file
classmethod
¶
from_file(file_path: str, **kwargs: Any) -> Pocket
Create a Pocket instance from a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
Path to the pocket structure file. |
required |
**kwargs
|
Any
|
Additional arguments to pass to the Pocket constructor. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Pocket |
Pocket
|
A new Pocket instance. |
Example
pocket = Pocket.from_file("pocket.pdb")
from_name
classmethod
¶
from_name(name: str, **kwargs: Any) -> Pocket
Create a Pocket instance by searching for a file with the given name in the pockets directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
Name of the pocket to search for. |
required |
**kwargs
|
Any
|
Additional arguments to pass to the Pocket constructor. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Pocket |
Pocket
|
A new Pocket instance. |
Example
pocket = Pocket.from_name("binding_site_1")
from_pocket_finder_results
classmethod
¶
from_pocket_finder_results(
pocket_finder_results_dir: str | Path,
) -> List[Pocket]
Create a list of Pocket objects from pocket finder results directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pocket_finder_results_dir
|
str | Path
|
Directory containing pocket finder results with PDB files for each pocket and a CSV properties file. |
required |
Returns:
Type | Description |
---|---|
List[Pocket]
|
List of Pocket objects with properties from the CSV file. |
from_structure
classmethod
¶
from_structure(
structure: ndarray,
name: Optional[str] = None,
**kwargs: Any
) -> Pocket
Create a Pocket instance directly from a structure array.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
structure
|
ndarray
|
The structure array. |
required |
name
|
Optional[str]
|
Name for the pocket. |
None
|
**kwargs
|
Any
|
Additional arguments to pass to the Pocket constructor. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Pocket |
Pocket
|
A new Pocket instance. |
Example
pocket = Pocket.from_structure(atom_array, name="binding_site")
get_center
¶
get_center() -> ndarray
Get the center of the pocket based on its coordinates.
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: A numpy array containing the center of the pocket. |
get_directory
staticmethod
¶
get_directory() -> str
Generates and ensures the existence of a directory for pockets.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The path to the pockets directory. |
load_structure
¶
load_structure(structure_file_path: str | Path) -> None
Load a PDB structure from a file path into the structure
attribute.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
structure_file_path
|
str | Path
|
Path to the PDB file. |
required |
load_structure_from_block
¶
load_structure_from_block(
block_content: str, block_type: str
)
Load a pocket structure from block content.
Parameters: - block_content (str): String containing the pocket data. - block_type (str): Format of the block content ('pdb').
Returns: - AtomArray: Loaded structure.
Raises: - ValueError: If the block type is unsupported.
write_to_file
¶
write_to_file(output_path: str, output_format: str = 'pdb')
Write the current state of the structure to a PDB file.
Parameters: - file_path (str): Path where the pocket structure will be written.
Example:
pocket.write_to_file('/path/to/output.pdb')
Protein
dataclass
¶
A class representing a protein structure with various manipulation and analysis capabilities.
Attributes¶
Functions¶
dock
¶
Dock a ligand into a specific pocket of the protein.
This method performs molecular docking of a ligand into a specified pocket of the protein structure. It uses the Deep Origin docking to generate a 3D structure of the docked ligand.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ligand
|
Ligand
|
The ligand to dock into the protein pocket. |
required |
pocket
|
Pocket
|
The specific pocket in the protein where the ligand should be docked. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Path to the SDF file containing the docked ligand structure. |
download_protein_by_pdb_id
staticmethod
¶
download_protein_by_pdb_id(
pdb_id: str, save_dir: str = ""
) -> str
Download a PDB structure by its PDB ID from RCSB.
extract_metals_and_cofactors
¶
extract_metals_and_cofactors() -> (
Tuple[list[str], list[str]]
)
Extract metal ions and cofactors from the protein structure.
Returns:
Type | Description |
---|---|
Tuple[list[str], list[str]]
|
Tuple[list[str], list[str]] |
find_pockets
¶
find_pockets(
pocket_count: int = 5, pocket_min_size: int = 30
) -> list[Pocket]
Find potential binding pockets in the protein structure.
This method analyzes the protein structure to identify cavities or pockets that could potentially serve as binding sites for ligands. It uses the Deep Origin pocket finding algorithm to detect and characterize these pockets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pocket_count
|
int
|
Maximum number of pockets to identify. Defaults to 5. |
5
|
pocket_min_size
|
int
|
Minimum size of pockets to consider, measured in cubic Angstroms. Defaults to 30. |
30
|
Returns:
Type | Description |
---|---|
list[Pocket]
|
list[Pocket]: A list of Pocket objects, each representing a potential binding site in the protein. Each Pocket object contains: - The 3D structure of the pocket - Properties such as volume, surface area, hydrophobicity, etc. - Visualization parameters (color, etc.) |
Examples:
>>> protein = Protein(file="protein.pdb")
>>> pockets = protein.find_pockets(pocket_count=3, pocket_min_size=50)
>>> for pocket in pockets:
... print(f"Pocket: {pocket.name}, Volume: {pocket.properties.get('volume')} ų")
from_file
classmethod
¶
from_file(file_path: str, struct_ind: int = 0) -> Protein
Create a Protein instance from a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
Path to the protein PDB file. |
required |
struct_ind
|
int
|
Index of the structure to select if multiple are present. |
0
|
Returns:
Name | Type | Description |
---|---|---|
Protein |
Protein
|
A new Protein instance. |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If the file does not exist. |
ValueError
|
If the structure cannot be loaded. |
RuntimeError
|
If the file cannot be read or processed. |
from_pdb_id
classmethod
¶
from_pdb_id(pdb_id: str, struct_ind: int = 0) -> Protein
Create a Protein instance from a PDB ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pdb_id
|
str
|
PDB ID of the protein to download. |
required |
struct_ind
|
int
|
Index of the structure to select if multiple are present. |
0
|
Returns:
Name | Type | Description |
---|---|---|
Protein |
Protein
|
A new Protein instance. |
Raises:
Type | Description |
---|---|
ValueError
|
If the PDB ID is invalid or the structure cannot be loaded. |
RuntimeError
|
If the download fails. |
get_center_by_residues
¶
get_center_by_residues(residues: list[str]) -> ndarray
Get the center of the protein structure based on specific residues.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
residues
|
list[str]
|
List of residue names to include in the calculation. |
required |
list_chain_names
¶
list_chain_names() -> list[str]
List all unique chain IDs in the protein structure.
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of unique chain IDs. |
list_hetero_names
¶
list_hetero_names(exclude_water=True) -> list[str]
List all unique hetero residue names in the protein structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exclude_water
|
bool
|
Whether to exclude water molecules from the list. |
True
|
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of unique ligand residue names (excluding water). |
load_structure_from_block
staticmethod
¶
load_structure_from_block(
block_content: str, block_type: str
) -> ndarray
Load a protein structure from block content.
remove_hetatm
¶
remove_hetatm(
keep_resnames: Optional[list[str]] = None,
remove_metals: Optional[list[str]] = None,
) -> None
Remove HETATM records from the protein structure, with options to retain specified residues or exclude certain metals.
Parameters¶
keep_resnames : Optional[list[str]] A list of residue names (strings) to keep in the structure even if they are HETATM records. exclude_metals : Optional[list[str]] A list of metal names (strings) to exclude from removal. These metals will be retained in the structure.
Notes¶
- By default, a predefined list of metals is considered for removal unless specified in
exclude_metals
. - If
keep_resnames
is provided, those residues (along with any metals not excluded) will be retained even if they are HETATM records. - The method updates the current protein object in place.
Example:
protein = Protein(structure)
protein.remove_hetatm(keep_resnames=['HOH'], exclude_metals=['ZN'])
remove_resnames
¶
remove_resnames(
exclude_resnames: Optional[list[str]] = None,
) -> None
Remove specific residue names from the protein structure in place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exclude_resnames
|
Optional[list[str]]
|
List of residue names to exclude. |
None
|
remove_water
¶
remove_water() -> None
Remove water molecules from the protein structure in place.
Example:
protein.remove_water()
select_chain
¶
select_chain(chain_id: str) -> Optional[Protein]
Select a specific chain by its ID and return a new Protein object.
Parameters: - chain_id (str): Chain ID to select.
Returns: - Protein: A new Protein object containing the selected chain.
Raises: - ValueError: If the chain ID is not found.
Example:
chain_a = protein.select_chain('A')
select_chains
¶
select_chains(chain_ids: list[str]) -> Protein
Select specific chains from the protein structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
chain_ids
|
list[str]
|
List of chain IDs to select. |
required |
select_structure
staticmethod
¶
select_structure(structure: ndarray, index: int) -> ndarray
Select a specific structure by index.
show
¶
show(
pockets: Optional[list[Pocket]] = None,
sdf_file: Optional[str] = None,
)
Visualize the protein structure in a Jupyter notebook using MolStar viewer.
This method provides interactive 3D visualization of the protein structure with optional highlighting of binding pockets and docked ligands. The visualization is rendered directly in Jupyter notebooks using the MolStar viewer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pockets
|
Optional[list[Pocket]]
|
List of Pocket objects to highlight in the visualization. Each pocket will be displayed with its defined color and transparency. Defaults to None. |
None
|
sdf_file
|
Optional[str]
|
Path to an SDF file containing docked ligand structures. When provided, the ligands will be displayed alongside the protein structure. Defaults to None. |
None
|
Examples:
Visualize protein structure only¶
>>> protein = Protein(file="protein.pdb")
>>> protein.show()
Visualize protein with highlighted pockets¶
>>> pockets = protein.find_pockets(pocket_count=3)
>>> protein.show(pockets=pockets)
Visualize protein with docked ligands¶
>>> protein.show(sdf_file="docked_ligands.sdf")
Visualize protein with both pockets and docked ligands¶
>>> protein.show(pockets=pockets, sdf_file="docked_ligands.sdf")
Notes
- When pockets are provided, they are displayed with semi-transparent surfaces (alpha=0.7) while the protein is shown with a more transparent surface (alpha=0.1)
- The protein is displayed in cartoon representation when pockets are shown
- When an SDF file is provided, the visualization includes both the protein and the docked ligands in their respective binding poses
to_pdb
¶
to_pdb(file_path: str)
Write the protein structure to a PDB file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
Path where the PDB file will be written. |
required |
Example:
protein.to_pdb('/path/to/output.pdb')
update_coordinates
¶
update_coordinates(coords: ndarray)
update coordinates of the protein structure