deeporigin.drug_discovery.Protein¶
Pocket finding (find_pockets removed)
Protein.find_pockets() was deprecated and has been removed. Use the PocketFinder class instead (see Find pockets).
Bases: Entity
A class representing a protein structure with various manipulation and analysis capabilities.
Attributes¶
local_path
class-attribute
instance-attribute
¶
local_path: str | None = field(default=None, kw_only=True)
num_atoms
property
¶
num_atoms: int
Count the number of atoms in PDB file for this protein
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The number of atoms in the PDB file. |
project_id
class-attribute
instance-attribute
¶
project_id: str | None = field(default=None, kw_only=True)
remote_path
class-attribute
instance-attribute
¶
remote_path: str | None = field(default=None, kw_only=True)
sequence
property
¶
sequence: list[Seq]
Retrieve the amino acid sequences of all polypeptide chains in the protein structure.
This property parses the protein structure file using Bio.PDB and extracts the sequences of all peptide chains present. Each sequence is returned as a Bio.Seq object, which can be converted to a string if needed. The method is useful for analyzing the primary structure of the protein or for downstream sequence-based analyses.
Returns:
| Type | Description |
|---|---|
list[Seq]
|
list[str]: A list of amino acid sequences (as Bio.Seq objects) for each polypeptide chain found in the protein structure. If the structure contains multiple chains, each chain's sequence is included as a separate entry in the list. |
Example
protein = Protein.from_file("example.pdb") sequences = protein.sequence for seq in sequences: ... print(seq)
structure
class-attribute
instance-attribute
¶
structure: Any | None = field(default=None, repr=False)
Functions¶
download
¶
download(
*,
lazy: bool = True,
client: DeepOriginClient | None = None
) -> str
Download the remote structure file and load :attr:structure from it.
If :attr:structure is already loaded (e.g. from :meth:from_pdb_id or
:meth:from_file), returns :attr:local_path when set without hitting
the files API.
Otherwise delegates to :meth:Entity.download, then parses the returned
path with :meth:from_file when :attr:structure is still None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
DeepOriginClient | None
|
DeepOriginClient instance. If None, uses the default client. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Local file path returned by the files client, or an existing on-disk |
str
|
path when the structure was already loaded from a local file. |
ensure_remote_path
¶
ensure_remote_path(
*, client: DeepOriginClient, label: str
) -> None
Ensure :attr:remote_path is set after a lazy :meth:sync may have no-oped.
If the entity already has a platform id but remote_path was never
populated (e.g. rehydrated metadata only), sync(lazy=True) returns
early. This performs a full sync when needed, then raises if the path is
still missing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
DeepOriginClient
|
Authenticated client for sync/upload. |
required |
label
|
str
|
Human-readable name for error messages (e.g. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
extract_ligand
¶
extract_ligand(
exclude_resnames: Optional[set[str]] = None,
) -> Ligand
Extracts ligand(s) from a Protein object and removes them from the protein structure. This method mutates the protein object by removing ligand records.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
exclude_resnames
|
set
|
Residue names to exclude (e.g., water). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Ligand |
Ligand
|
The extracted ligand molecule. |
extract_metals_and_cofactors
¶
extract_metals_and_cofactors() -> (
tuple[list[str], list[str]]
)
Extract metal ions and cofactors from the protein structure.
Returns:
| Type | Description |
|---|---|
tuple[list[str], list[str]]
|
Tuple[list[str], list[str]] |
find_missing_residues
¶
find_missing_residues() -> dict[str, list[tuple[int, int]]]
find missing residues in the protein structure
from_base64
classmethod
¶
from_base64(
base64_string: str, name: str = "", **kwargs: Any
) -> Self
Create a Protein instance from a base64 encoded PDB string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base64_string
|
str
|
Base64 encoded PDB content |
required |
name
|
str
|
Name of the protein. Defaults to "". |
''
|
**kwargs
|
Any
|
Additional arguments to pass to the constructor |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
Protein |
Self
|
A new Protein instance |
Raises:
| Type | Description |
|---|---|
DeepOriginException
|
If the base64 string cannot be decoded or parsed |
from_file
classmethod
¶
from_file(
file_path: str | Path,
struct_ind: int = 0,
*,
validate: bool = True
) -> Self
Create a Protein instance from a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
Path to the protein PDB or CIF file. |
required |
struct_ind
|
int
|
Index of the structure to select if multiple are present. |
0
|
Returns:
| Name | Type | Description |
|---|---|---|
Protein |
Self
|
A new Protein instance. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
ValueError
|
If the file type is unsupported or the structure cannot be loaded. |
RuntimeError
|
If the file cannot be read or processed. |
from_id
classmethod
¶
from_id(
id: str,
*,
client: Optional[DeepOriginClient] = None,
download: bool = True,
remote_path_override: Optional[str] = None
) -> Self
Create a Protein instance from a Deep Origin Data Platform ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
The Deep Origin Data Platform ID of the protein. |
required |
client
|
Optional[DeepOriginClient]
|
Optional DeepOriginClient instance. If not provided, uses the default client. |
None
|
download
|
bool
|
If True (default), download the structure file and load coordinates.
If False, fetch metadata only and set :attr: |
True
|
remote_path_override
|
Optional[str]
|
When |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Protein |
Self
|
A new Protein instance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the protein data does not contain a file_path. |
RuntimeError
|
If the file cannot be downloaded or loaded. |
from_pdb_id
classmethod
¶
from_pdb_id(pdb_id: str, struct_ind: int = 0) -> Self
Create a Protein instance from a PDB ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pdb_id
|
str
|
PDB ID of the protein to download. |
required |
struct_ind
|
int
|
Index of the structure to select if multiple are present. |
0
|
Returns:
| Name | Type | Description |
|---|---|---|
Protein |
Self
|
A new Protein instance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the PDB ID is invalid or the structure cannot be loaded. |
RuntimeError
|
If the download fails. |
from_remote_file
classmethod
¶
from_remote_file(
remote_path: str,
*,
client: DeepOriginClient | None = None,
lazy: bool = True,
struct_ind: int = 0,
validate: bool = True
) -> Self
Create a Protein from a structure file stored on the platform.
Downloads the file via :meth:deeporigin.platform.files.FilesClient.download,
then loads it with :meth:from_file. Supported formats are PDB, PDBQT, and mmCIF.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path
|
str
|
Platform file path (e.g. org storage path) to the structure file. |
required |
client
|
DeepOriginClient | None
|
DeepOrigin client used for download. If |
None
|
lazy
|
bool
|
Passed to |
True
|
struct_ind
|
int
|
Index of the structure to select if multiple are present (see
:meth: |
0
|
validate
|
bool
|
Whether to validate PDB files when reading (see :meth: |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Protein |
Self
|
A protein with :attr: |
Self
|
set to |
|
Self
|
set to the downloaded file path. |
list_chain_names
¶
list_chain_names() -> list[str]
List all unique chain IDs in the protein structure.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of unique chain IDs. |
list_hetero_names
¶
list_hetero_names(exclude_water=True) -> list[str]
List all unique hetero residue names in the protein structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
exclude_water
|
bool
|
Whether to exclude water molecules from the list. |
True
|
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of unique ligand residue names (excluding water). |
load_structure_from_block
staticmethod
¶
load_structure_from_block(
block_content: str, block_type: str
)
Load a protein structure from block content.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
block_content
|
str
|
The content of the structure file. |
required |
block_type
|
str
|
The type of the structure file (pdb, pdbqt, or cif). |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the block type is unsupported. |
load_structure_from_local
¶
load_structure_from_local(
path: str | Path | None = None,
) -> None
Load :attr:structure from disk without using the remote API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path | None
|
Path to a PDB/mmCIF file. If None, uses :attr: |
None
|
register
¶
register(
*,
client: Optional[DeepOriginClient] = None,
remote_path: Optional[str] = None
) -> None
Register the protein as a new record in the data platform.
Uploads the protein file to remote storage and creates a new protein record, regardless of whether one already exists for this file path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
Optional[DeepOriginClient]
|
DeepOriginClient instance. If None, uses DeepOriginClient(). |
None
|
remote_path
|
Optional[str]
|
Custom remote path to upload to. Overrides the default hash-based path. |
None
|
Returns:
| Type | Description |
|---|---|
None
|
None. As a side effect, uploads the protein and sets |
None
|
to the newly created record's ID. |
remove_hetatm
¶
remove_hetatm(
keep_resnames: Optional[list[str]] = None,
remove_metals: Optional[list[str]] = None,
) -> None
Remove HETATM records from the protein structure, with options to retain specified residues or exclude certain metals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keep_resnames
|
Optional[list[str]]
|
A list of residue names (strings) to keep in the structure even if they are HETATM records. |
None
|
remove_metals
|
Optional[list[str]]
|
A list of metal names (strings) to exclude from removal. These metals will be retained in the structure. |
None
|
Notes:
- By default, a predefined list of metals is considered for removal unless specified in
exclude_metals. - If
keep_resnamesis provided, those residues (along with any metals not excluded) will be retained even if they are HETATM records. - The method updates the current protein object in place.
remove_resnames
¶
remove_resnames(
exclude_resnames: Optional[list[str]] = None,
) -> None
Remove specific residue names from the protein structure in place.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
exclude_resnames
|
Optional[list[str]]
|
List of residue names to exclude. |
None
|
resolved_project_id
¶
resolved_project_id(
client: DeepOriginClient | None = None,
) -> str | None
Data platform project id for API calls.
Returns :attr:project_id when set; otherwise client.project_id when
client is given; otherwise None. Does not read the filesystem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
DeepOriginClient | None
|
Optional platform client (e.g. the one passed to
:meth: |
None
|
Returns:
| Type | Description |
|---|---|
str | None
|
Project id string, or None if neither the entity nor the client |
str | None
|
provides one. |
select_chain
¶
select_chain(chain_id: str) -> Optional[Self]
Select a specific chain by its ID and return a new Protein object.
Parameters: - chain_id (str): Chain ID to select.
Returns: - Protein: A new Protein object containing the selected chain.
Raises: - ValueError: If the chain ID is not found.
select_chains
¶
select_chains(chain_ids: list[str]) -> Self
Select specific chains from the protein structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain_ids
|
list[str]
|
List of chain IDs to select. |
required |
select_structure
staticmethod
¶
select_structure(structure, index: int)
Select a specific structure by index.
show
¶
show(
*,
pockets: Optional[list[Pocket]] = None,
sdf_file: Optional[str] = None,
ligand: Optional[Ligand] = None,
ligands: Optional[LigandSet | list[Ligand]] = None,
poses: Optional[LigandSet | list[Ligand]] = None
)
Visualize the protein structure in a Jupyter notebook using MolStar viewer.
This method provides interactive 3D visualization of the protein structure with optional highlighting of binding pockets and docked ligands. The visualization is rendered directly in Jupyter notebooks using the MolStar viewer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pockets
|
Optional[list[Pocket]]
|
List of Pocket objects to highlight in the visualization. Each pocket will be displayed with its defined color and transparency. Defaults to None. |
None
|
sdf_file
|
Optional[str]
|
Path to an SDF file containing docked ligand structures. When provided, the ligands will be displayed alongside the protein structure. Defaults to None. |
None
|
Notes
- When pockets are provided, they are displayed with semi-transparent surfaces (alpha=0.7) while the protein is shown with a more transparent surface (alpha=0.1)
- The protein is displayed in cartoon representation when pockets are shown
- When an SDF file is provided, the visualization includes both the protein and the docked ligands in their respective binding poses
sync
¶
sync(
*,
lazy: bool = False,
client: Optional[DeepOriginClient] = None,
remote_path: Optional[str] = None
) -> None
Sync the protein to the data platform.
Uploads the protein file and links to an existing record if one with
the same file path already exists, otherwise creates a new record via
:meth:register.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lazy
|
bool
|
If True, skip syncing when the protein already has an ID. Defaults to False. |
False
|
client
|
Optional[DeepOriginClient]
|
DeepOriginClient instance. If None, uses DeepOriginClient(). |
None
|
remote_path
|
Optional[str]
|
Custom remote path to upload to. Overrides the default hash-based path. |
None
|
Returns:
| Type | Description |
|---|---|
None
|
None. As a side effect, uploads the protein (if necessary) and updates |
None
|
|
None
|
and sets :attr: |
None
|
row includes |
to_base64
¶
to_base64() -> str
Convert the protein to base64 encoded PDB format.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Base64 encoded string of the PDB file content |
to_file
¶
to_file(file_path: Optional[str | Path] = None) -> str
Dump state to a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Optional[str | Path]
|
Path where the file will be written. If None, uses default path. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Path to the written file. |
to_hash
¶
to_hash() -> str
Convert the protein to SHA256 hash of the PDB file content.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
SHA256 hash string of the PDB file content |
to_pdb
¶
to_pdb(file_path: Optional[str | Path] = None) -> str
Write the protein structure to a PDB file.
This is a local operation: it serializes the current :attr:structure. If the
protein has :attr:remote_path but no local file yet, raise; rehydrate with
:meth:download first.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
Path where the PDB file will be written. |
None
|
Raises:
| Type | Description |
|---|---|
DeepOriginException
|
If |
update_coordinates
¶
update_coordinates(coords: ndarray)
update coordinates of the protein structure
upload
¶
upload(
*,
client: DeepOriginClient | None = None,
remote_path: str | None = None
) -> None
Upload the entity to the remote server.
Serializes via :meth:to_file with :attr:remote_path temporarily
cleared so subclasses that guard exports when only remote metadata is
present (e.g. :meth:Ligand.to_sdf) still write from in-memory state
on repeat uploads.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
DeepOriginClient | None
|
DeepOriginClient instance. If None, uses DeepOriginClient(). |
None
|
remote_path
|
str | None
|
Custom remote path to upload to. When provided, sets
:attr: |
None
|