deeporigin.drug_discovery.Protein
¶
Bases: Entity
A class representing a protein structure with various manipulation and analysis capabilities.
Attributes¶
sequence
property
¶
sequence: list[str]
Retrieve the amino acid sequences of all polypeptide chains in the protein structure.
This property parses the protein structure file using Bio.PDB and extracts the sequences of all peptide chains present. Each sequence is returned as a Bio.Seq object, which can be converted to a string if needed. The method is useful for analyzing the primary structure of the protein or for downstream sequence-based analyses.
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of amino acid sequences (as Bio.Seq objects) for each polypeptide chain found in the protein structure. If the structure contains multiple chains, each chain's sequence is included as a separate entry in the list. |
Example
protein = Protein.from_file("example.pdb") sequences = protein.sequence for seq in sequences: ... print(seq)
Functions¶
dock
¶
dock(*, ligand: Ligand, pocket: Pocket) -> str
Dock a ligand into a specific pocket of the protein.
This method performs molecular docking of a ligand into a specified pocket of the protein structure. It uses the Deep Origin docking to generate a 3D structure of the docked ligand.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ligand
|
Ligand
|
The ligand to dock into the protein pocket. |
required |
pocket
|
Pocket
|
The specific pocket in the protein where the ligand should be docked. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Path to the SDF file containing the docked ligand structure. |
extract_metals_and_cofactors
¶
extract_metals_and_cofactors() -> (
Tuple[list[str], list[str]]
)
Extract metal ions and cofactors from the protein structure.
Returns:
Type | Description |
---|---|
Tuple[list[str], list[str]]
|
Tuple[list[str], list[str]] |
find_missing_residues
¶
find_missing_residues() -> dict[str, list[tuple[int, int]]]
find missing residues in the protein structure
find_pockets
¶
find_pockets(
pocket_count: int = 1, pocket_min_size: int = 30
) -> list[Pocket]
Find potential binding pockets in the protein structure.
This method analyzes the protein structure to identify cavities or pockets that could potentially serve as binding sites for ligands. It uses the Deep Origin pocket finding algorithm to detect and characterize these pockets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pocket_count
|
int
|
Maximum number of pockets to identify. Defaults to 5. |
1
|
pocket_min_size
|
int
|
Minimum size of pockets to consider, measured in cubic Angstroms. Defaults to 30. |
30
|
Returns:
Type | Description |
---|---|
list[Pocket]
|
list[Pocket]: A list of Pocket objects, each representing a potential binding site in the protein. Each Pocket object contains: - The 3D structure of the pocket - Properties such as volume, surface area, hydrophobicity, etc. - Visualization parameters (color, etc.) |
Examples:
>>> protein = Protein(file="protein.pdb")
>>> pockets = protein.find_pockets(pocket_count=3, pocket_min_size=50)
>>> for pocket in pockets:
... print(f"Pocket: {pocket.name}, Volume: {pocket.properties.get('volume')} ų")
from_file
classmethod
¶
from_file(file_path: str, struct_ind: int = 0) -> Protein
Create a Protein instance from a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
Path to the protein PDB file. |
required |
struct_ind
|
int
|
Index of the structure to select if multiple are present. |
0
|
Returns:
Name | Type | Description |
---|---|---|
Protein |
Protein
|
A new Protein instance. |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If the file does not exist. |
ValueError
|
If the structure cannot be loaded. |
RuntimeError
|
If the file cannot be read or processed. |
from_pdb_id
classmethod
¶
from_pdb_id(pdb_id: str, struct_ind: int = 0) -> Protein
Create a Protein instance from a PDB ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pdb_id
|
str
|
PDB ID of the protein to download. |
required |
struct_ind
|
int
|
Index of the structure to select if multiple are present. |
0
|
Returns:
Name | Type | Description |
---|---|---|
Protein |
Protein
|
A new Protein instance. |
Raises:
Type | Description |
---|---|
ValueError
|
If the PDB ID is invalid or the structure cannot be loaded. |
RuntimeError
|
If the download fails. |
get_center_by_residues
¶
get_center_by_residues(residues: list[str]) -> ndarray
Get the center of the protein structure based on specific residues.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
residues
|
list[str]
|
List of residue names to include in the calculation. |
required |
list_chain_names
¶
list_chain_names() -> list[str]
List all unique chain IDs in the protein structure.
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of unique chain IDs. |
list_hetero_names
¶
list_hetero_names(exclude_water=True) -> list[str]
List all unique hetero residue names in the protein structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exclude_water
|
bool
|
Whether to exclude water molecules from the list. |
True
|
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of unique ligand residue names (excluding water). |
load_structure_from_block
staticmethod
¶
load_structure_from_block(
block_content: str, block_type: str
) -> ndarray
Load a protein structure from block content.
remove_hetatm
¶
remove_hetatm(
keep_resnames: Optional[list[str]] = None,
remove_metals: Optional[list[str]] = None,
) -> None
Remove HETATM records from the protein structure, with options to retain specified residues or exclude certain metals.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
keep_resnames
|
Optional[list[str]]
|
A list of residue names (strings) to keep in the structure even if they are HETATM records. |
None
|
remove_metals
|
Optional[list[str]]
|
A list of metal names (strings) to exclude from removal. These metals will be retained in the structure. |
None
|
Notes:
- By default, a predefined list of metals is considered for removal unless specified in
exclude_metals
. - If
keep_resnames
is provided, those residues (along with any metals not excluded) will be retained even if they are HETATM records. - The method updates the current protein object in place.
remove_resnames
¶
remove_resnames(
exclude_resnames: Optional[list[str]] = None,
) -> None
Remove specific residue names from the protein structure in place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exclude_resnames
|
Optional[list[str]]
|
List of residue names to exclude. |
None
|
select_chain
¶
select_chain(chain_id: str) -> Optional[Protein]
Select a specific chain by its ID and return a new Protein object.
Parameters: - chain_id (str): Chain ID to select.
Returns: - Protein: A new Protein object containing the selected chain.
Raises: - ValueError: If the chain ID is not found.
select_chains
¶
select_chains(chain_ids: list[str]) -> Protein
Select specific chains from the protein structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
chain_ids
|
list[str]
|
List of chain IDs to select. |
required |
select_structure
staticmethod
¶
select_structure(structure: ndarray, index: int) -> ndarray
Select a specific structure by index.
show
¶
show(
pockets: Optional[list[Pocket]] = None,
sdf_file: Optional[str] = None,
)
Visualize the protein structure in a Jupyter notebook using MolStar viewer.
This method provides interactive 3D visualization of the protein structure with optional highlighting of binding pockets and docked ligands. The visualization is rendered directly in Jupyter notebooks using the MolStar viewer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pockets
|
Optional[list[Pocket]]
|
List of Pocket objects to highlight in the visualization. Each pocket will be displayed with its defined color and transparency. Defaults to None. |
None
|
sdf_file
|
Optional[str]
|
Path to an SDF file containing docked ligand structures. When provided, the ligands will be displayed alongside the protein structure. Defaults to None. |
None
|
Notes
- When pockets are provided, they are displayed with semi-transparent surfaces (alpha=0.7) while the protein is shown with a more transparent surface (alpha=0.1)
- The protein is displayed in cartoon representation when pockets are shown
- When an SDF file is provided, the visualization includes both the protein and the docked ligands in their respective binding poses
to_pdb
¶
to_pdb(file_path: str)
Write the protein structure to a PDB file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str
|
Path where the PDB file will be written. |
required |
update_coordinates
¶
update_coordinates(coords: ndarray)
update coordinates of the protein structure