Skip to content

deeporigin.drug_discovery.Protein

Pocket finding (find_pockets removed)

Protein.find_pockets() was deprecated and has been removed. Use the PocketFinder class instead (see Find pockets).

Bases: Entity

A class representing a protein structure with various manipulation and analysis capabilities.

Attributes

atom_types class-attribute instance-attribute

atom_types: Optional[ndarray] = None

block_content class-attribute instance-attribute

block_content: Optional[str] = None

block_type class-attribute instance-attribute

block_type: str = 'pdb'

coordinates property

coordinates

id class-attribute instance-attribute

id: str | None = field(default=None, kw_only=True)

info class-attribute instance-attribute

info: Optional[dict] = None

length property

length: int

get the length of the protein structure

local_path class-attribute instance-attribute

local_path: str | None = field(default=None, kw_only=True)

name instance-attribute

name: str

num_atoms property

num_atoms: int

Count the number of atoms in PDB file for this protein

Returns:

Name Type Description
int int

The number of atoms in the PDB file.

pdb_id class-attribute instance-attribute

pdb_id: Optional[str] = None

project_id class-attribute instance-attribute

project_id: str | None = field(default=None, kw_only=True)

remote_path class-attribute instance-attribute

remote_path: str | None = field(default=None, kw_only=True)

sequence property

sequence: list[Seq]

Retrieve the amino acid sequences of all polypeptide chains in the protein structure.

This property parses the protein structure file using Bio.PDB and extracts the sequences of all peptide chains present. Each sequence is returned as a Bio.Seq object, which can be converted to a string if needed. The method is useful for analyzing the primary structure of the protein or for downstream sequence-based analyses.

Returns:

Type Description
list[Seq]

list[str]: A list of amino acid sequences (as Bio.Seq objects) for each polypeptide chain found in the protein structure. If the structure contains multiple chains, each chain's sequence is included as a separate entry in the list.

Example

protein = Protein.from_file("example.pdb") sequences = protein.sequence for seq in sequences: ... print(seq)

structure class-attribute instance-attribute

structure: Any | None = field(default=None, repr=False)

Functions

download

download(
    *,
    lazy: bool = True,
    client: DeepOriginClient | None = None
) -> str

Download the remote structure file and load :attr:structure from it.

If :attr:structure is already loaded (e.g. from :meth:from_pdb_id or :meth:from_file), returns :attr:local_path when set without hitting the files API.

Otherwise delegates to :meth:Entity.download, then parses the returned path with :meth:from_file when :attr:structure is still None.

Parameters:

Name Type Description Default
client DeepOriginClient | None

DeepOriginClient instance. If None, uses the default client.

None

Returns:

Type Description
str

Local file path returned by the files client, or an existing on-disk

str

path when the structure was already loaded from a local file.

ensure_remote_path

ensure_remote_path(
    *, client: DeepOriginClient, label: str
) -> None

Ensure :attr:remote_path is set after a lazy :meth:sync may have no-oped.

If the entity already has a platform id but remote_path was never populated (e.g. rehydrated metadata only), sync(lazy=True) returns early. This performs a full sync when needed, then raises if the path is still missing.

Parameters:

Name Type Description Default
client DeepOriginClient

Authenticated client for sync/upload.

required
label str

Human-readable name for error messages (e.g. "Protein").

required

Raises:

Type Description
ValueError

If remote_path cannot be determined after sync.

extract_ligand

extract_ligand(
    exclude_resnames: Optional[set[str]] = None,
) -> Ligand

Extracts ligand(s) from a Protein object and removes them from the protein structure. This method mutates the protein object by removing ligand records.

Parameters:

Name Type Description Default
exclude_resnames set

Residue names to exclude (e.g., water).

None

Returns:

Name Type Description
Ligand Ligand

The extracted ligand molecule.

extract_metals_and_cofactors

extract_metals_and_cofactors() -> (
    tuple[list[str], list[str]]
)

Extract metal ions and cofactors from the protein structure.

Returns:

Type Description
tuple[list[str], list[str]]

Tuple[list[str], list[str]]

find_missing_residues

find_missing_residues() -> dict[str, list[tuple[int, int]]]

find missing residues in the protein structure

from_base64 classmethod

from_base64(
    base64_string: str, name: str = "", **kwargs: Any
) -> Self

Create a Protein instance from a base64 encoded PDB string.

Parameters:

Name Type Description Default
base64_string str

Base64 encoded PDB content

required
name str

Name of the protein. Defaults to "".

''
**kwargs Any

Additional arguments to pass to the constructor

{}

Returns:

Name Type Description
Protein Self

A new Protein instance

Raises:

Type Description
DeepOriginException

If the base64 string cannot be decoded or parsed

from_file classmethod

from_file(
    file_path: str | Path,
    struct_ind: int = 0,
    *,
    validate: bool = True
) -> Self

Create a Protein instance from a file.

Parameters:

Name Type Description Default
file_path str

Path to the protein PDB or CIF file.

required
struct_ind int

Index of the structure to select if multiple are present.

0

Returns:

Name Type Description
Protein Self

A new Protein instance.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

ValueError

If the file type is unsupported or the structure cannot be loaded.

RuntimeError

If the file cannot be read or processed.

from_id classmethod

from_id(
    id: str,
    *,
    client: Optional[DeepOriginClient] = None,
    download: bool = True,
    remote_path_override: Optional[str] = None
) -> Self

Create a Protein instance from a Deep Origin Data Platform ID.

Parameters:

Name Type Description Default
id str

The Deep Origin Data Platform ID of the protein.

required
client Optional[DeepOriginClient]

Optional DeepOriginClient instance. If not provided, uses the default client.

None
download bool

If True (default), download the structure file and load coordinates. If False, fetch metadata only and set :attr:remote_path to the platform file path (remote_path_override or the record's file_path) without downloading; :attr:structure stays None until :meth:download or :meth:load_structure_from_local.

True
remote_path_override Optional[str]

When download is False, use this as remote_path instead of the API record's file_path (e.g. the path stored on the execution userInputs).

None

Returns:

Name Type Description
Protein Self

A new Protein instance.

Raises:

Type Description
ValueError

If the protein data does not contain a file_path.

RuntimeError

If the file cannot be downloaded or loaded.

from_name classmethod

from_name(name: str) -> Self

Create a Protein instance from a name.

from_pdb_id classmethod

from_pdb_id(pdb_id: str, struct_ind: int = 0) -> Self

Create a Protein instance from a PDB ID.

Parameters:

Name Type Description Default
pdb_id str

PDB ID of the protein to download.

required
struct_ind int

Index of the structure to select if multiple are present.

0

Returns:

Name Type Description
Protein Self

A new Protein instance.

Raises:

Type Description
ValueError

If the PDB ID is invalid or the structure cannot be loaded.

RuntimeError

If the download fails.

from_remote_file classmethod

from_remote_file(
    remote_path: str,
    *,
    client: DeepOriginClient | None = None,
    lazy: bool = True,
    struct_ind: int = 0,
    validate: bool = True
) -> Self

Create a Protein from a structure file stored on the platform.

Downloads the file via :meth:deeporigin.platform.files.FilesClient.download, then loads it with :meth:from_file. Supported formats are PDB, PDBQT, and mmCIF.

Parameters:

Name Type Description Default
remote_path str

Platform file path (e.g. org storage path) to the structure file.

required
client DeepOriginClient | None

DeepOrigin client used for download. If None, uses DeepOriginClient().

None
lazy bool

Passed to files.download; if True, skip download when the file already exists locally at the default cache location.

True
struct_ind int

Index of the structure to select if multiple are present (see :meth:from_file).

0
validate bool

Whether to validate PDB files when reading (see :meth:from_file).

True

Returns:

Name Type Description
Protein Self

A protein with :attr:~deeporigin.drug_discovery.structures.entity.Entity.remote_path

Self

set to remote_path and :attr:~deeporigin.drug_discovery.structures.entity.Entity.local_path

Self

set to the downloaded file path.

list_chain_names

list_chain_names() -> list[str]

List all unique chain IDs in the protein structure.

Returns:

Type Description
list[str]

list[str]: A list of unique chain IDs.

list_hetero_names

list_hetero_names(exclude_water=True) -> list[str]

List all unique hetero residue names in the protein structure.

Parameters:

Name Type Description Default
exclude_water bool

Whether to exclude water molecules from the list.

True

Returns:

Type Description
list[str]

list[str]: A list of unique ligand residue names (excluding water).

load_structure_from_block staticmethod

load_structure_from_block(
    block_content: str, block_type: str
)

Load a protein structure from block content.

Parameters:

Name Type Description Default
block_content str

The content of the structure file.

required
block_type str

The type of the structure file (pdb, pdbqt, or cif).

required

Raises:

Type Description
ValueError

If the block type is unsupported.

load_structure_from_local

load_structure_from_local(
    path: str | Path | None = None,
) -> None

Load :attr:structure from disk without using the remote API.

Parameters:

Name Type Description Default
path str | Path | None

Path to a PDB/mmCIF file. If None, uses :attr:local_path (see :class:Entity).

None

model_loops

model_loops(use_cache: bool = True) -> None

model loops in protein structure

register

register(
    *,
    client: Optional[DeepOriginClient] = None,
    remote_path: Optional[str] = None
) -> None

Register the protein as a new record in the data platform.

Uploads the protein file to remote storage and creates a new protein record, regardless of whether one already exists for this file path.

Parameters:

Name Type Description Default
client Optional[DeepOriginClient]

DeepOriginClient instance. If None, uses DeepOriginClient().

None
remote_path Optional[str]

Custom remote path to upload to. Overrides the default hash-based path.

None

Returns:

Type Description
None

None. As a side effect, uploads the protein and sets self.id

None

to the newly created record's ID.

remove_hetatm

remove_hetatm(
    keep_resnames: Optional[list[str]] = None,
    remove_metals: Optional[list[str]] = None,
) -> None

Remove HETATM records from the protein structure, with options to retain specified residues or exclude certain metals.

Parameters:

Name Type Description Default
keep_resnames Optional[list[str]]

A list of residue names (strings) to keep in the structure even if they are HETATM records.

None
remove_metals Optional[list[str]]

A list of metal names (strings) to exclude from removal. These metals will be retained in the structure.

None

Notes:

  • By default, a predefined list of metals is considered for removal unless specified in exclude_metals.
  • If keep_resnames is provided, those residues (along with any metals not excluded) will be retained even if they are HETATM records.
  • The method updates the current protein object in place.

remove_resnames

remove_resnames(
    exclude_resnames: Optional[list[str]] = None,
) -> None

Remove specific residue names from the protein structure in place.

Parameters:

Name Type Description Default
exclude_resnames Optional[list[str]]

List of residue names to exclude.

None

remove_water

remove_water() -> None

Remove water molecules from the protein structure in place.

resolved_project_id

resolved_project_id(
    client: DeepOriginClient | None = None,
) -> str | None

Data platform project id for API calls.

Returns :attr:project_id when set; otherwise client.project_id when client is given; otherwise None. Does not read the filesystem.

Parameters:

Name Type Description Default
client DeepOriginClient | None

Optional platform client (e.g. the one passed to :meth:sync).

None

Returns:

Type Description
str | None

Project id string, or None if neither the entity nor the client

str | None

provides one.

select_chain

select_chain(chain_id: str) -> Optional[Self]

Select a specific chain by its ID and return a new Protein object.

Parameters: - chain_id (str): Chain ID to select.

Returns: - Protein: A new Protein object containing the selected chain.

Raises: - ValueError: If the chain ID is not found.

select_chains

select_chains(chain_ids: list[str]) -> Self

Select specific chains from the protein structure.

Parameters:

Name Type Description Default
chain_ids list[str]

List of chain IDs to select.

required

select_structure staticmethod

select_structure(structure, index: int)

Select a specific structure by index.

show

show(
    *,
    pockets: Optional[list[Pocket]] = None,
    sdf_file: Optional[str] = None,
    ligand: Optional[Ligand] = None,
    ligands: Optional[LigandSet | list[Ligand]] = None,
    poses: Optional[LigandSet | list[Ligand]] = None
)

Visualize the protein structure in a Jupyter notebook using MolStar viewer.

This method provides interactive 3D visualization of the protein structure with optional highlighting of binding pockets and docked ligands. The visualization is rendered directly in Jupyter notebooks using the MolStar viewer.

Parameters:

Name Type Description Default
pockets Optional[list[Pocket]]

List of Pocket objects to highlight in the visualization. Each pocket will be displayed with its defined color and transparency. Defaults to None.

None
sdf_file Optional[str]

Path to an SDF file containing docked ligand structures. When provided, the ligands will be displayed alongside the protein structure. Defaults to None.

None
Notes
  • When pockets are provided, they are displayed with semi-transparent surfaces (alpha=0.7) while the protein is shown with a more transparent surface (alpha=0.1)
  • The protein is displayed in cartoon representation when pockets are shown
  • When an SDF file is provided, the visualization includes both the protein and the docked ligands in their respective binding poses

sync

sync(
    *,
    lazy: bool = False,
    client: Optional[DeepOriginClient] = None,
    remote_path: Optional[str] = None
) -> None

Sync the protein to the data platform.

Uploads the protein file and links to an existing record if one with the same file path already exists, otherwise creates a new record via :meth:register.

Parameters:

Name Type Description Default
lazy bool

If True, skip syncing when the protein already has an ID. Defaults to False.

False
client Optional[DeepOriginClient]

DeepOriginClient instance. If None, uses DeepOriginClient().

None
remote_path Optional[str]

Custom remote path to upload to. Overrides the default hash-based path.

None

Returns:

Type Description
None

None. As a side effect, uploads the protein (if necessary) and updates

None

self.id with the ID of the existing or newly created protein record,

None

and sets :attr:project_id when a project scope applies or the platform

None

row includes project_id.

to_base64

to_base64() -> str

Convert the protein to base64 encoded PDB format.

Returns:

Name Type Description
str str

Base64 encoded string of the PDB file content

to_file

to_file(file_path: Optional[str | Path] = None) -> str

Dump state to a file.

Parameters:

Name Type Description Default
file_path Optional[str | Path]

Path where the file will be written. If None, uses default path.

None

Returns:

Name Type Description
str str

Path to the written file.

to_hash

to_hash() -> str

Convert the protein to SHA256 hash of the PDB file content.

Returns:

Name Type Description
str str

SHA256 hash string of the PDB file content

to_pdb

to_pdb(file_path: Optional[str | Path] = None) -> str

Write the protein structure to a PDB file.

This is a local operation: it serializes the current :attr:structure. If the protein has :attr:remote_path but no local file yet, raise; rehydrate with :meth:download first.

Parameters:

Name Type Description Default
file_path str

Path where the PDB file will be written.

None

Raises:

Type Description
DeepOriginException

If remote_path is set but no local file exists yet, or if :attr:structure is not loaded.

update_coordinates

update_coordinates(coords: ndarray)

update coordinates of the protein structure

upload

upload(
    *,
    client: DeepOriginClient | None = None,
    remote_path: str | None = None
) -> None

Upload the entity to the remote server.

Serializes via :meth:to_file with :attr:remote_path temporarily cleared so subclasses that guard exports when only remote metadata is present (e.g. :meth:Ligand.to_sdf) still write from in-memory state on repeat uploads.

Parameters:

Name Type Description Default
client DeepOriginClient | None

DeepOriginClient instance. If None, uses DeepOriginClient().

None
remote_path str | None

Custom remote path to upload to. When provided, sets :attr:remote_path before uploading. If :attr:remote_path is still unset, it is set to the default hash-based path.

None