deeporigin.drug_discovery.LigandSet¶
A class representing a set of Ligand objects.
Attributes:
| Name | Type | Description |
|---|---|---|
ligands |
list[Ligand]
|
A list of Ligand instances contained in the set. |
network |
dict
|
A dictionary containing the network of ligands estimated using Konnektor. |
Attributes¶
Functions¶
batches
¶
batches(batch_size: int | None) -> list[list[Ligand]]
Split this set into consecutive chunks of ligands (same order as :attr:ligands).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch_size
|
int | None
|
Maximum ligands per chunk. |
required |
Returns:
| Type | Description |
|---|---|
list[list[Ligand]]
|
Non-empty list of batches when |
list[list[Ligand]]
|
of one or more consecutive slices of :attr: |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
BeartypeCallHintParamViolation
|
If |
compute_constraints
¶
compute_constraints(
*, reference: Ligand, mcs_mol=None
) -> list[list[dict]]
Align a set of ligands to a reference ligand
download
¶
download(
*,
client: Optional[DeepOriginClient] = None,
lazy: bool = True,
max_workers: int = 20,
skip_errors: bool = False,
sanitize: bool = True,
remove_hydrogens: bool = False
) -> None
Download platform files for ligands that have remote_path but no local file.
Selects ligands where :attr:~Ligand.remote_path is set and
:attr:~Ligand.local_path is None, fetches distinct remotes in parallel with
:meth:deeporigin.platform.files.FilesClient.download_many, assigns each
ligand's :attr:~Ligand.local_path from the returned remote→local mapping,
then reloads :attr:~Ligand.mol from SDF when applicable.
Ligands without remote_path, with empty remote_path, or that already
have local_path set are left unchanged.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
Optional[DeepOriginClient]
|
DeepOrigin client. If |
None
|
lazy
|
bool
|
Passed to |
True
|
max_workers
|
int
|
Maximum concurrent downloads for |
20
|
skip_errors
|
bool
|
When |
False
|
sanitize
|
bool
|
Passed to :meth: |
True
|
remove_hydrogens
|
bool
|
Passed to :meth: |
False
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If |
embed
¶
embed()
Minimize all ligands in the set using their 3D optimization routines. This calls the embed() method on each Ligand in the set.
filter_top_poses
¶
filter_top_poses(*, by_pose_score: bool = True) -> Self
Filter ligands to keep only the best pose for each unique molecule.
Groups ligands by SMILES string and retains only the one with: - Minimum binding energy (default), or - Maximum pose score (when by_pose_score=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by_pose_score
|
bool
|
If True, select by maximum pose score. If False, select by minimum binding energy. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
A new LigandSet containing only the best pose for each unique molecule. |
Raises:
| Type | Description |
|---|---|
DeepOriginException
|
If required properties are missing from ligands. |
filter_unsupported
¶
filter_unsupported() -> Self
Return a new set excluding ligands whose molecules contain atom types
outside :data:~deeporigin.drug_discovery.constants.SUPPORTED_ATOM_SYMBOLS
(see :meth:Ligand.has_unsupported_atoms).
from_csv
classmethod
¶
from_csv(
file_path: str | Path, smiles_column: str = "smiles"
) -> Self
Create a LigandSet instance from a CSV file containing SMILES strings and additional properties.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
The path to the CSV file. |
required |
smiles_column
|
str
|
The name of the column containing SMILES strings. Defaults to "smiles". |
'smiles'
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
A LigandSet instance containing Ligand objects created from the CSV file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
DeepOriginException
|
If the CSV does not contain the specified smiles column or if SMILES strings are invalid. |
from_dir
classmethod
¶
from_dir(directory: str | Path) -> Self
Create a LigandSet instance from a directory containing SDF files.
from_file
classmethod
¶
from_file(
file_path: str | Path,
*,
sanitize: bool = True,
remove_hydrogens: bool = False,
smiles_column: str = "smiles"
) -> Self
Create a LigandSet from an SDF or CSV file.
.sdf paths are validated as SDF (extension and content) and loaded with
:meth:from_sdf. .csv paths are validated as CSV (extension) and loaded with
:meth:from_csv.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str | Path
|
Path to an |
required |
sanitize
|
bool
|
Whether to sanitize molecules (SDF only). Defaults to True. |
True
|
remove_hydrogens
|
bool
|
Whether to remove hydrogens (SDF only). Defaults to False. |
False
|
smiles_column
|
str
|
Name of the SMILES column (CSV only). Defaults to |
'smiles'
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
Ligands created from the file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
DeepOriginException
|
If the path is not a supported file type or loading fails. |
from_ids
classmethod
¶
from_ids(
ids: list[str],
*,
client: DeepOriginClient | None = None,
download: bool = True,
ligand_inputs: list[dict[str, Any]] | None = None
) -> Self
Create a LigandSet by fetching ligands from the platform by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
list[str]
|
List of Deep Origin Data Platform ligand IDs. |
required |
client
|
DeepOriginClient | None
|
Optional API client. Uses the default if not provided. |
None
|
download
|
bool
|
If True (default), download mol files when present. If False,
hydrate from SMILES and set |
True
|
ligand_inputs
|
list[dict[str, Any]] | None
|
Optional list of dicts (e.g. execution |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new LigandSet containing the rehydrated ligands. |
Notes
This delegates entity retrieval to client.entities.get_ligands()
and preserves the order of the requested IDs.
from_json
classmethod
¶
from_json(
data: list[dict[str, Any]],
*,
client: Optional[DeepOriginClient] = None,
sanitize: bool = True,
remove_hydrogens: bool = False
) -> Self
Build a LigandSet from pose metadata (no SDF download).
Each entry follows the docking poses[] shape: a file_path that is
either a platform remote key or an existing local SDF, optional
local_path / remote_path, plus optional smiles /
canonical_smiles. When only a remote path is available, a
non-empty SMILES field is required so an RDKit molecule can be built;
the pose SDF is then loaded lazily via :meth:Ligand.download.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
list[dict[str, Any]]
|
List of pose dicts (for example merged |
required |
client
|
Optional[DeepOriginClient]
|
Optional client; |
None
|
sanitize
|
bool
|
Passed to :meth: |
True
|
remove_hydrogens
|
bool
|
Passed to :meth: |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
One |
Self
|
class: |
Raises:
| Type | Description |
|---|---|
ValueError
|
If a remote-only entry lacks SMILES, or paths are invalid. |
from_rdkit_mols
classmethod
¶
from_rdkit_mols(mols: list[Mol])
Create a LigandSet from a list of RDKit molecules.
from_result
classmethod
¶
from_result(
*,
protein_id: str | None = None,
execution_id: str | None = None,
best_pose: bool | None = None,
client: Optional[DeepOriginClient] = None
) -> Self
Load docking poses from the data platform without downloading SDF files.
Fetches result-explorer rows via client.results.get_poses, resolves
SMILES from each row's compute_job_id execution (userInputs) when
absent on the pose payload, then delegates to :meth:from_json.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
protein_id
|
str | None
|
Optional protein id filter. |
None
|
execution_id
|
str | None
|
Optional compute job / execution id filter. |
None
|
best_pose
|
bool | None
|
If True, restrict to best pose per ligand. If False, include all poses. If None (default), no filter is applied. |
None
|
client
|
Optional[DeepOriginClient]
|
Optional |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no pose records match, or a pose cannot be built. |
from_sdf
classmethod
¶
from_sdf(
file_path: str | Path,
*,
sanitize: bool = True,
remove_hydrogens: bool = False
) -> Self
Create a LigandSet instance from an SDF file containing one or more molecules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
The path to the SDF file. |
required |
sanitize
|
bool
|
Whether to sanitize molecules. Defaults to True. |
True
|
remove_hydrogens
|
bool
|
Whether to remove hydrogens. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
A LigandSet instance containing Ligand objects created from the SDF file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
DeepOriginException
|
If the file cannot be parsed correctly. |
from_sdf_files
classmethod
¶
from_sdf_files(
file_paths: list[str],
*,
sanitize: bool = True,
remove_hydrogens: bool = False
) -> Self
Create a LigandSet instance from multiple SDF files by concatenating them together.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_paths
|
list[str]
|
A list of paths to SDF files. |
required |
sanitize
|
bool
|
Whether to sanitize molecules. Defaults to True. |
True
|
remove_hydrogens
|
bool
|
Whether to remove hydrogens. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
A LigandSet instance containing Ligand objects from all SDF files. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If any of the files do not exist. |
DeepOriginException
|
If any of the files cannot be parsed correctly. |
from_smiles
classmethod
¶
from_smiles(smiles: list[str] | set[str]) -> Self
Create a LigandSet from a list of SMILES strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
smiles
|
list[str] | set[str]
|
SMILES strings to convert into ligands. |
required |
Returns:
| Type | Description |
|---|---|
Self
|
A new LigandSet containing one Ligand per SMILES string. |
map_network
¶
map_network(
*,
use_cache: bool = True,
operation: Literal[
"mapping", "network", "full"
] = "network",
network_type: Literal["star", "mst", "cyclic"] = "mst"
)
Map a network of ligands from an SDF file using the DeepOrigin API.
mcs
¶
mcs() -> Mol
Generates the Most Common Substructure (MCS) for ligands in a LigandSet
Returns:
| Type | Description |
|---|---|
Mol
|
smartsString (str) : SMARTS string representing the MCS |
plot
¶
plot(
*,
x_label: str = "Pose Score",
y_label: str = "Binding Energy (kcal/mol)",
x: str = "POSE SCORE",
y: str = "Binding Energy",
output_file: Optional[str] = None,
y_lim_max: Optional[float] = 0,
width: int = 800,
height: int = 800
)
Create a scatter plot of ligands using specified attributes for the axes.
The plot displays molecule images on hover and can be displayed inline or saved to an HTML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x_label
|
str
|
Label for the x-axis. Defaults to "Pose Score". |
'Pose Score'
|
y_label
|
str
|
Label for the y-axis. Defaults to "Binding Energy (kcal/mol)". |
'Binding Energy (kcal/mol)'
|
x
|
str
|
The name of the ligand property to use for the x-axis. Defaults to "POSE SCORE". |
'POSE SCORE'
|
y
|
str
|
The name of the ligand property to use for the y-axis. Defaults to "Binding Energy". |
'Binding Energy'
|
output_file
|
Optional[str]
|
Optional file path to save the HTML figure. If provided, the plot is saved to this file instead of being displayed. Defaults to None. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the specified x or y properties are not found in the ligand data. |
prepare
¶
prepare(*, remove_hydrogens: bool = False) -> Self
Prepare all ligands in the set for downstream workflows.
This calls the prepare() method on each Ligand in the set, which performs: - Salt removal - Kekulization - Fragment validation (rejects multiple non-identical fragments) - Validation of atom types against supported symbols
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remove_hydrogens
|
bool
|
Whether to remove hydrogens from the SMILES representation. Defaults to False (preserve hydrogens). |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
The prepared LigandSet (self), for chaining. |
Raises:
| Type | Description |
|---|---|
DeepOriginException
|
If preparation fails for any ligand, unsupported atom types are present, or multiple non-identical fragments are detected. |
random_sample
¶
random_sample(n: int) -> Self
Return a new LigandSet containing n randomly selected ligands.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Number of ligands to randomly sample |
required |
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
A new LigandSet with n randomly selected ligands |
Raises:
| Type | Description |
|---|---|
ValueError
|
If n is greater than the total number of ligands |
show_grid
¶
show_grid(
mols_per_row: int = 3,
sub_img_size: tuple[int, int] = (300, 300),
)
show all ligands in the LigandSet in a grid
sync
¶
sync(
*,
lazy: bool = False,
client: Optional[DeepOriginClient] = None
) -> None
Sync the ligand set to the data platform.
For every ligand in the set this method:
- Searches the data platform for existing ligands whose
canonical_smilesmatch (batched into a single request viasearch_ligands(smiles_list=…)). - For ligands that already exist remotely, updates the local
idand setsremote_pathfrom the record'smol_filewhen present. - For ligands that are new, uploads files to remote storage (if a local_path is present) and batch-creates them in a single API call. Ligands sharing a canonical SMILES (e.g. multiple poses of the same molecule in an SDF) are deduplicated before the create call; all duplicates end up pointing at the single platform record.
- Updates the local
idvalues from the created records.
.. note::
The batch-create step is all-or-nothing: if it fails (e.g.
network error, invalid data), none of the new ligands will
receive an id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lazy
|
bool
|
If True, skip syncing ligands that already have an id. |
False
|
client
|
Optional[DeepOriginClient]
|
DeepOriginClient instance. If None, uses DeepOriginClient(). |
None
|
Raises:
| Type | Description |
|---|---|
DeepOriginException
|
If any ligand to be synced contains atom types
outside :data: |
ValueError
|
If any ligand to be synced has no |
to_dict
¶
to_dict() -> list[dict[str, str]]
Convert this set to a list of dicts, one per ligand.
Each dict has id (platform id when set, else "0", "1", … by
position in this set) and smiles. For batched API calls with globally
unique ids, ensure each :class:Ligand id is set before building the set.
Returns:
| Type | Description |
|---|---|
list[dict[str, str]]
|
One |
Raises:
| Type | Description |
|---|---|
ValueError
|
If a ligand has no non-empty |
to_sdf
¶
to_sdf(output_path: Optional[str] = None) -> str
Write all ligands to one SDF file, preserving properties from each mol.
This is a local operation. Each ligand must already be rehydrated if it has
remote_path but no local file (call :meth:Ligand.download or
:meth:LigandSet.download first, or use from_id(..., download=True)).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_path
|
Optional[str]
|
Path to the output SDF file. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Path to the written SDF file. |
upload
¶
upload(
*,
client: Optional[DeepOriginClient] = None,
max_workers: int = 20
) -> None
Upload structure files for ligands that have a local file.
For each ligand with non-None :attr:~Ligand.local_path, serializes
and assigns :attr:~Ligand.remote_path (same contract as
:meth:Ligand.upload), then uploads all files in parallel via
:meth:deeporigin.platform.files.FilesClient.upload_many. Ligands
without local_path are skipped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
Optional[DeepOriginClient]
|
DeepOrigin client. If None, uses |
None
|
max_workers
|
int
|
Maximum concurrent uploads (passed to |
20
|