Skip to content

deeporigin.drug_discovery.Protein

Bases: Entity

A class representing a protein structure with various manipulation and analysis capabilities.

Attributes

atom_types class-attribute instance-attribute

atom_types: Optional[ndarray] = None

block_content class-attribute instance-attribute

block_content: Optional[str] = None

block_type class-attribute instance-attribute

block_type: str = 'pdb'

coordinates property

coordinates

file_path class-attribute instance-attribute

file_path: Optional[Path] = None

info class-attribute instance-attribute

info: Optional[dict] = None

name instance-attribute

name: str

pdb_id class-attribute instance-attribute

pdb_id: Optional[str] = None

sequence property

sequence: list[str]

Retrieve the amino acid sequences of all polypeptide chains in the protein structure.

This property parses the protein structure file using Bio.PDB and extracts the sequences of all peptide chains present. Each sequence is returned as a Bio.Seq object, which can be converted to a string if needed. The method is useful for analyzing the primary structure of the protein or for downstream sequence-based analyses.

Returns:

Type Description
list[str]

list[str]: A list of amino acid sequences (as Bio.Seq objects) for each polypeptide chain found in the protein structure. If the structure contains multiple chains, each chain's sequence is included as a separate entry in the list.

Example

protein = Protein.from_file("example.pdb") sequences = protein.sequence for seq in sequences: ... print(seq)

structure class-attribute instance-attribute

structure: ndarray = field(repr=False)

Functions

dock

dock(*, ligand: Ligand, pocket: Pocket) -> str

Dock a ligand into a specific pocket of the protein.

This method performs molecular docking of a ligand into a specified pocket of the protein structure. It uses the Deep Origin docking to generate a 3D structure of the docked ligand.

Parameters:

Name Type Description Default
ligand Ligand

The ligand to dock into the protein pocket.

required
pocket Pocket

The specific pocket in the protein where the ligand should be docked.

required

Returns:

Name Type Description
str str

Path to the SDF file containing the docked ligand structure.

extract_metals_and_cofactors

extract_metals_and_cofactors() -> (
    Tuple[list[str], list[str]]
)

Extract metal ions and cofactors from the protein structure.

Returns:

Type Description
Tuple[list[str], list[str]]

Tuple[list[str], list[str]]

find_missing_residues

find_missing_residues() -> dict[str, list[tuple[int, int]]]

find missing residues in the protein structure

find_pockets

find_pockets(
    pocket_count: int = 1, pocket_min_size: int = 30
) -> list[Pocket]

Find potential binding pockets in the protein structure.

This method analyzes the protein structure to identify cavities or pockets that could potentially serve as binding sites for ligands. It uses the Deep Origin pocket finding algorithm to detect and characterize these pockets.

Parameters:

Name Type Description Default
pocket_count int

Maximum number of pockets to identify. Defaults to 5.

1
pocket_min_size int

Minimum size of pockets to consider, measured in cubic Angstroms. Defaults to 30.

30

Returns:

Type Description
list[Pocket]

list[Pocket]: A list of Pocket objects, each representing a potential binding site in the protein. Each Pocket object contains: - The 3D structure of the pocket - Properties such as volume, surface area, hydrophobicity, etc. - Visualization parameters (color, etc.)

Examples:

>>> protein = Protein(file="protein.pdb")
>>> pockets = protein.find_pockets(pocket_count=3, pocket_min_size=50)
>>> for pocket in pockets:
...     print(f"Pocket: {pocket.name}, Volume: {pocket.properties.get('volume')} ų")

from_file classmethod

from_file(file_path: str, struct_ind: int = 0) -> Protein

Create a Protein instance from a file.

Parameters:

Name Type Description Default
file_path str

Path to the protein PDB file.

required
struct_ind int

Index of the structure to select if multiple are present.

0

Returns:

Name Type Description
Protein Protein

A new Protein instance.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

ValueError

If the structure cannot be loaded.

RuntimeError

If the file cannot be read or processed.

from_pdb_id classmethod

from_pdb_id(pdb_id: str, struct_ind: int = 0) -> Protein

Create a Protein instance from a PDB ID.

Parameters:

Name Type Description Default
pdb_id str

PDB ID of the protein to download.

required
struct_ind int

Index of the structure to select if multiple are present.

0

Returns:

Name Type Description
Protein Protein

A new Protein instance.

Raises:

Type Description
ValueError

If the PDB ID is invalid or the structure cannot be loaded.

RuntimeError

If the download fails.

get_center_by_residues

get_center_by_residues(residues: list[str]) -> ndarray

Get the center of the protein structure based on specific residues.

Parameters:

Name Type Description Default
residues list[str]

List of residue names to include in the calculation.

required

list_chain_names

list_chain_names() -> list[str]

List all unique chain IDs in the protein structure.

Returns:

Type Description
list[str]

list[str]: A list of unique chain IDs.

list_hetero_names

list_hetero_names(exclude_water=True) -> list[str]

List all unique hetero residue names in the protein structure.

Parameters:

Name Type Description Default
exclude_water bool

Whether to exclude water molecules from the list.

True

Returns:

Type Description
list[str]

list[str]: A list of unique ligand residue names (excluding water).

load_structure_from_block staticmethod

load_structure_from_block(
    block_content: str, block_type: str
) -> ndarray

Load a protein structure from block content.

model_loops

model_loops() -> None

model loops in protein structure

remove_hetatm

remove_hetatm(
    keep_resnames: Optional[list[str]] = None,
    remove_metals: Optional[list[str]] = None,
) -> None

Remove HETATM records from the protein structure, with options to retain specified residues or exclude certain metals.

Parameters:

Name Type Description Default
keep_resnames Optional[list[str]]

A list of residue names (strings) to keep in the structure even if they are HETATM records.

None
remove_metals Optional[list[str]]

A list of metal names (strings) to exclude from removal. These metals will be retained in the structure.

None

Notes:

  • By default, a predefined list of metals is considered for removal unless specified in exclude_metals.
  • If keep_resnames is provided, those residues (along with any metals not excluded) will be retained even if they are HETATM records.
  • The method updates the current protein object in place.

remove_resnames

remove_resnames(
    exclude_resnames: Optional[list[str]] = None,
) -> None

Remove specific residue names from the protein structure in place.

Parameters:

Name Type Description Default
exclude_resnames Optional[list[str]]

List of residue names to exclude.

None

remove_water

remove_water() -> None

Remove water molecules from the protein structure in place.

select_chain

select_chain(chain_id: str) -> Optional[Protein]

Select a specific chain by its ID and return a new Protein object.

Parameters: - chain_id (str): Chain ID to select.

Returns: - Protein: A new Protein object containing the selected chain.

Raises: - ValueError: If the chain ID is not found.

select_chains

select_chains(chain_ids: list[str]) -> Protein

Select specific chains from the protein structure.

Parameters:

Name Type Description Default
chain_ids list[str]

List of chain IDs to select.

required

select_structure staticmethod

select_structure(structure: ndarray, index: int) -> ndarray

Select a specific structure by index.

show

show(
    pockets: Optional[list[Pocket]] = None,
    sdf_file: Optional[str] = None,
)

Visualize the protein structure in a Jupyter notebook using MolStar viewer.

This method provides interactive 3D visualization of the protein structure with optional highlighting of binding pockets and docked ligands. The visualization is rendered directly in Jupyter notebooks using the MolStar viewer.

Parameters:

Name Type Description Default
pockets Optional[list[Pocket]]

List of Pocket objects to highlight in the visualization. Each pocket will be displayed with its defined color and transparency. Defaults to None.

None
sdf_file Optional[str]

Path to an SDF file containing docked ligand structures. When provided, the ligands will be displayed alongside the protein structure. Defaults to None.

None
Notes
  • When pockets are provided, they are displayed with semi-transparent surfaces (alpha=0.7) while the protein is shown with a more transparent surface (alpha=0.1)
  • The protein is displayed in cartoon representation when pockets are shown
  • When an SDF file is provided, the visualization includes both the protein and the docked ligands in their respective binding poses

to_pdb

to_pdb(file_path: str)

Write the protein structure to a PDB file.

Parameters:

Name Type Description Default
file_path str

Path where the PDB file will be written.

required

update_coordinates

update_coordinates(coords: ndarray)

update coordinates of the protein structure

upload

upload()

Upload the entity to the remote server.