deeporigin.drug_discovery.LigandSet¶
A class representing a set of Ligand objects.
Attributes:
| Name | Type | Description |
|---|---|---|
ligands |
list[Ligand]
|
A list of Ligand instances contained in the set. |
network |
dict
|
A dictionary containing the network of ligands estimated using Konnektor. |
Attributes¶
Functions¶
batches
¶
batches(batch_size: int | None) -> list[list[Ligand]]
Split this set into consecutive chunks of ligands (same order as :attr:ligands).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch_size
|
int | None
|
Maximum ligands per chunk. |
required |
Returns:
| Type | Description |
|---|---|
list[list[Ligand]]
|
Non-empty list of batches when |
list[list[Ligand]]
|
of one or more consecutive slices of :attr: |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
BeartypeCallHintParamViolation
|
If |
compute_constraints
¶
compute_constraints(
*, reference: Ligand, mcs_mol=None
) -> list[list[dict]]
Align a set of ligands to a reference ligand
embed
¶
embed()
Minimize all ligands in the set using their 3D optimization routines. This calls the embed() method on each Ligand in the set.
filter_top_poses
¶
filter_top_poses(*, by_pose_score: bool = True) -> Self
Filter ligands to keep only the best pose for each unique molecule.
Groups ligands by SMILES string and retains only the one with: - Minimum binding energy (default), or - Maximum pose score (when by_pose_score=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
by_pose_score
|
bool
|
If True, select by maximum pose score. If False, select by minimum binding energy. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
A new LigandSet containing only the best pose for each unique molecule. |
Raises:
| Type | Description |
|---|---|
DeepOriginException
|
If required properties are missing from ligands. |
filter_unsupported
¶
filter_unsupported() -> Self
Return a new set excluding ligands whose molecules contain atom types
outside :data:~deeporigin.drug_discovery.constants.SUPPORTED_ATOM_SYMBOLS
(see :meth:Ligand.has_unsupported_atoms).
from_csv
classmethod
¶
from_csv(
file_path: str | Path, smiles_column: str = "smiles"
) -> Self
Create a LigandSet instance from a CSV file containing SMILES strings and additional properties.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
The path to the CSV file. |
required |
smiles_column
|
str
|
The name of the column containing SMILES strings. Defaults to "smiles". |
'smiles'
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
A LigandSet instance containing Ligand objects created from the CSV file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
DeepOriginException
|
If the CSV does not contain the specified smiles column or if SMILES strings are invalid. |
from_dir
classmethod
¶
from_dir(directory: str | Path) -> Self
Create a LigandSet instance from a directory containing SDF files.
from_docking_result
classmethod
¶
from_docking_result(
*,
protein_id: str | None = None,
execution_id: str | None = None,
client: Optional[DeepOriginClient] = None
) -> Self
Create a LigandSet from docking results in the data platform.
Fetches docking pose results for the given protein, downloads the SDF files, and loads them into a LigandSet.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
protein_id
|
str | None
|
Protein ID to fetch docking results for. |
None
|
execution_id
|
str | None
|
Execution ID to fetch docking results for. |
None
|
client
|
Optional[DeepOriginClient]
|
Optional DeepOriginClient instance. If not provided, uses the default client. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A LigandSet of docked poses. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no docking results are found for the protein. |
from_docking_results
classmethod
¶
from_docking_results(
*, result: FunctionResult, client: DeepOriginClient
) -> Self
Build a LigandSet from function-API docking responses (embedded pose paths).
Reads functionOutputs from each wrapped response, downloads pose SDF
files via client.files, and merges ligands with :meth:from_sdf_files.
For hydrated poses from the data platform by execution id, use
:meth:from_docking_result instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
FunctionResult
|
|
required |
client
|
DeepOriginClient
|
Client used to download remote pose files. |
required |
Returns:
| Type | Description |
|---|---|
Self
|
A |
from_file
classmethod
¶
from_file(
file_path: str | Path,
*,
sanitize: bool = True,
remove_hydrogens: bool = False,
smiles_column: str = "smiles"
) -> Self
Create a LigandSet from an SDF or CSV file.
.sdf paths are validated as SDF (extension and content) and loaded with
:meth:from_sdf. .csv paths are validated as CSV (extension) and loaded with
:meth:from_csv.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str | Path
|
Path to an |
required |
sanitize
|
bool
|
Whether to sanitize molecules (SDF only). Defaults to True. |
True
|
remove_hydrogens
|
bool
|
Whether to remove hydrogens (SDF only). Defaults to False. |
False
|
smiles_column
|
str
|
Name of the SMILES column (CSV only). Defaults to |
'smiles'
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
Ligands created from the file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
DeepOriginException
|
If the path is not a supported file type or loading fails. |
from_ids
classmethod
¶
from_ids(
ids: list[str],
*,
client: DeepOriginClient | None = None,
download: bool = True,
ligand_inputs: list[dict[str, Any]] | None = None
) -> Self
Create a LigandSet by fetching ligands from the platform by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
list[str]
|
List of Deep Origin Data Platform ligand IDs. |
required |
client
|
DeepOriginClient | None
|
Optional API client. Uses the default if not provided. |
None
|
download
|
bool
|
If True (default), download mol files when present. If False,
hydrate from SMILES and set |
True
|
ligand_inputs
|
list[dict[str, Any]] | None
|
Optional list of dicts (e.g. execution |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new LigandSet containing the rehydrated ligands. |
Notes
This delegates entity retrieval to client.entities.get_ligands()
and preserves the order of the requested IDs.
from_rdkit_mols
classmethod
¶
from_rdkit_mols(mols: list[Mol])
Create a LigandSet from a list of RDKit molecules.
from_sdf
classmethod
¶
from_sdf(
file_path: str | Path,
*,
sanitize: bool = True,
remove_hydrogens: bool = False
) -> Self
Create a LigandSet instance from an SDF file containing one or more molecules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
The path to the SDF file. |
required |
sanitize
|
bool
|
Whether to sanitize molecules. Defaults to True. |
True
|
remove_hydrogens
|
bool
|
Whether to remove hydrogens. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
A LigandSet instance containing Ligand objects created from the SDF file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
DeepOriginException
|
If the file cannot be parsed correctly. |
from_sdf_files
classmethod
¶
from_sdf_files(
file_paths: list[str],
*,
sanitize: bool = True,
remove_hydrogens: bool = False
) -> Self
Create a LigandSet instance from multiple SDF files by concatenating them together.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_paths
|
list[str]
|
A list of paths to SDF files. |
required |
sanitize
|
bool
|
Whether to sanitize molecules. Defaults to True. |
True
|
remove_hydrogens
|
bool
|
Whether to remove hydrogens. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
A LigandSet instance containing Ligand objects from all SDF files. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If any of the files do not exist. |
DeepOriginException
|
If any of the files cannot be parsed correctly. |
from_smiles
classmethod
¶
from_smiles(smiles: list[str] | set[str]) -> Self
Create a LigandSet from a list of SMILES strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
smiles
|
list[str] | set[str]
|
SMILES strings to convert into ligands. |
required |
Returns:
| Type | Description |
|---|---|
Self
|
A new LigandSet containing one Ligand per SMILES string. |
map_network
¶
map_network(
*,
use_cache: bool = True,
operation: Literal[
"mapping", "network", "full"
] = "network",
network_type: Literal["star", "mst", "cyclic"] = "mst"
)
Map a network of ligands from an SDF file using the DeepOrigin API.
mcs
¶
mcs() -> Mol
Generates the Most Common Substructure (MCS) for ligands in a LigandSet
Returns:
| Type | Description |
|---|---|
Mol
|
smartsString (str) : SMARTS string representing the MCS |
plot
¶
plot(
*,
x_label: str = "Pose Score",
y_label: str = "Binding Energy (kcal/mol)",
x: str = "POSE SCORE",
y: str = "Binding Energy",
output_file: Optional[str] = None,
y_lim_max: Optional[float] = 0,
width: int = 800,
height: int = 800
)
Create a scatter plot of ligands using specified attributes for the axes.
The plot displays molecule images on hover and can be displayed inline or saved to an HTML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x_label
|
str
|
Label for the x-axis. Defaults to "Pose Score". |
'Pose Score'
|
y_label
|
str
|
Label for the y-axis. Defaults to "Binding Energy (kcal/mol)". |
'Binding Energy (kcal/mol)'
|
x
|
str
|
The name of the ligand property to use for the x-axis. Defaults to "POSE SCORE". |
'POSE SCORE'
|
y
|
str
|
The name of the ligand property to use for the y-axis. Defaults to "Binding Energy". |
'Binding Energy'
|
output_file
|
Optional[str]
|
Optional file path to save the HTML figure. If provided, the plot is saved to this file instead of being displayed. Defaults to None. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the specified x or y properties are not found in the ligand data. |
prepare
¶
prepare(*, remove_hydrogens: bool = False) -> Self
Prepare all ligands in the set for downstream workflows.
This calls the prepare() method on each Ligand in the set, which performs: - Salt removal - Kekulization - Fragment validation (rejects multiple non-identical fragments) - Validation of atom types against supported symbols
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remove_hydrogens
|
bool
|
Whether to remove hydrogens from the SMILES representation. Defaults to False (preserve hydrogens). |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
The prepared LigandSet (self), for chaining. |
Raises:
| Type | Description |
|---|---|
DeepOriginException
|
If preparation fails for any ligand, unsupported atom types are present, or multiple non-identical fragments are detected. |
protonate
¶
protonate(
*,
ph: number = 7.4,
filter_percentage: number = 1.0,
use_cache: bool = True,
client: Optional[DeepOriginClient] = None,
quote: bool = False
) -> FunctionResult
Protonate all ligands in the set.
Returns a FunctionResult whose .ligands attribute contains
the protonated ligands. When quote=True, .ligands is empty
and .estimate gives the cost in dollars. Only the most abundant
species is retained for each ligand.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ph
|
number
|
pH value at which to protonate. Defaults to 7.4. |
7.4
|
filter_percentage
|
number
|
Percentage threshold for filtering protonation states. Defaults to 1.0. |
1.0
|
use_cache
|
bool
|
Whether to use cached protonation results. |
True
|
client
|
Optional[DeepOriginClient]
|
DeepOrigin client instance. If None, uses
|
None
|
quote
|
bool
|
If True, request a cost estimate without executing. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
FunctionResult |
FunctionResult
|
A FunctionResult with a |
random_sample
¶
random_sample(n: int) -> Self
Return a new LigandSet containing n randomly selected ligands.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Number of ligands to randomly sample |
required |
Returns:
| Name | Type | Description |
|---|---|---|
LigandSet |
Self
|
A new LigandSet with n randomly selected ligands |
Raises:
| Type | Description |
|---|---|
ValueError
|
If n is greater than the total number of ligands |
show_grid
¶
show_grid(
mols_per_row: int = 3,
sub_img_size: tuple[int, int] = (300, 300),
)
show all ligands in the LigandSet in a grid
sync
¶
sync(
*,
lazy: bool = False,
client: Optional[DeepOriginClient] = None
) -> None
Sync the ligand set to the data platform.
For every ligand in the set this method:
- Searches the data platform for existing ligands whose
canonical_smilesmatch (batched into a single request viasearch_ligands(smiles_list=…)). - For ligands that already exist remotely, updates the local
idand setsremote_pathfrom the record'smol_filewhen present. - For ligands that are new, uploads files to remote storage (if a local_path is present) and batch-creates them in a single API call. Ligands sharing a canonical SMILES (e.g. multiple poses of the same molecule in an SDF) are deduplicated before the create call; all duplicates end up pointing at the single platform record.
- Updates the local
idvalues from the created records.
.. note::
The batch-create step is all-or-nothing: if it fails (e.g.
network error, invalid data), none of the new ligands will
receive an id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lazy
|
bool
|
If True, skip syncing ligands that already have an id. |
False
|
client
|
Optional[DeepOriginClient]
|
DeepOriginClient instance. If None, uses DeepOriginClient(). |
None
|
Raises:
| Type | Description |
|---|---|
DeepOriginException
|
If any ligand to be synced contains atom types
outside :data: |
ValueError
|
If any ligand to be synced has no |
to_dict
¶
to_dict() -> list[dict[str, str]]
Convert this set to a list of dicts, one per ligand.
Each dict has id (platform id when set, else "0", "1", … by
position in this set) and smiles. For batched API calls with globally
unique ids, ensure each :class:Ligand id is set before building the set.
Returns:
| Type | Description |
|---|---|
list[dict[str, str]]
|
One |
Raises:
| Type | Description |
|---|---|
ValueError
|
If a ligand has no non-empty |
to_sdf
¶
to_sdf(output_path: Optional[str] = None) -> str
Write all ligands to one SDF file, preserving properties from each mol.
This is a local operation. Each ligand must already be rehydrated if it has
remote_path but no local file (call :meth:Ligand.download first, or use
from_id(..., download=True)).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_path
|
Optional[str]
|
Path to the output SDF file. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Path to the written SDF file. |
upload
¶
upload(
*,
client: Optional[DeepOriginClient] = None,
max_workers: int = 20
) -> None
Upload structure files for ligands that have a local file.
For each ligand with non-None :attr:~Ligand.local_path, serializes
and assigns :attr:~Ligand.remote_path (same contract as
:meth:Ligand.upload), then uploads all files in parallel via
:meth:deeporigin.platform.files.FilesClient.upload_many. Ligands
without local_path are skipped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
Optional[DeepOriginClient]
|
DeepOrigin client. If None, uses |
None
|
max_workers
|
int
|
Maximum concurrent uploads (passed to |
20
|