Skip to content

deeporigin.drug_discovery.LigandSet

A class representing a set of Ligand objects.

Attributes:

Name Type Description
ligands list[Ligand]

A list of Ligand instances contained in the set.

network dict

A dictionary containing the network of ligands estimated using Konnektor.

Attributes

ligands class-attribute instance-attribute

ligands: list[Ligand] = field(default_factory=list)

network class-attribute instance-attribute

network: dict = field(default_factory=dict)

Functions

add_hydrogens

add_hydrogens() -> None

Add hydrogens to all ligands in the set.

batches

batches(batch_size: int | None) -> list[list[Ligand]]

Split this set into consecutive chunks of ligands (same order as :attr:ligands).

Parameters:

Name Type Description Default
batch_size int | None

Maximum ligands per chunk. None returns a single chunk containing all ligands (including when the set is empty). Must be an int when not None; other types are rejected by runtime type checking. When the number of ligands is not a multiple of batch_size, the last batch is shorter (it holds the remainder only).

required

Returns:

Type Description
list[list[Ligand]]

Non-empty list of batches when batch_size is None; otherwise a list

list[list[Ligand]]

of one or more consecutive slices of :attr:ligands.

Raises:

Type Description
ValueError

If batch_size is set and not positive.

BeartypeCallHintParamViolation

If batch_size is neither None nor an int (e.g. a float or str).

compute_constraints

compute_constraints(
    *, reference: Ligand, mcs_mol=None
) -> list[list[dict]]

Align a set of ligands to a reference ligand

compute_rmsd

compute_rmsd()

compute pairwise rmsd between all ligands in the set

download

download(
    *,
    client: Optional[DeepOriginClient] = None,
    lazy: bool = True,
    max_workers: int = 20,
    skip_errors: bool = False,
    sanitize: bool = True,
    remove_hydrogens: bool = False
) -> None

Download platform files for ligands that have remote_path but no local file.

Selects ligands where :attr:~Ligand.remote_path is set and :attr:~Ligand.local_path is None, fetches distinct remotes in parallel with :meth:deeporigin.platform.files.FilesClient.download_many, assigns each ligand's :attr:~Ligand.local_path from the returned remote→local mapping, then reloads :attr:~Ligand.mol from SDF when applicable.

Ligands without remote_path, with empty remote_path, or that already have local_path set are left unchanged.

Parameters:

Name Type Description Default
client Optional[DeepOriginClient]

DeepOrigin client. If None, uses DeepOriginClient().

None
lazy bool

Passed to download_many; when True, existing cache files are reused.

True
max_workers int

Maximum concurrent downloads for download_many.

20
skip_errors bool

When False (default), any failed download in the batch raises. When True, per-ligand failures during path assignment or SDF reload are skipped and that ligand keeps prior state.

False
sanitize bool

Passed to :meth:Ligand.from_sdf when rehydrating SDF downloads.

True
remove_hydrogens bool

Passed to :meth:Ligand.from_sdf when rehydrating.

False

Raises:

Type Description
RuntimeError

If skip_errors is False and download_many reports failures.

embed

embed()

Minimize all ligands in the set using their 3D optimization routines. This calls the embed() method on each Ligand in the set.

filter_top_poses

filter_top_poses(*, by_pose_score: bool = True) -> Self

Filter ligands to keep only the best pose for each unique molecule.

Groups ligands by SMILES string and retains only the one with: - Minimum binding energy (default), or - Maximum pose score (when by_pose_score=True)

Parameters:

Name Type Description Default
by_pose_score bool

If True, select by maximum pose score. If False, select by minimum binding energy.

True

Returns:

Name Type Description
LigandSet Self

A new LigandSet containing only the best pose for each unique molecule.

Raises:

Type Description
DeepOriginException

If required properties are missing from ligands.

Example

Filter by binding energy (default)

filtered_ligands = ligand_set.filter_top_poses()

Filter by pose score

filtered_ligands = ligand_set.filter_top_poses(by_pose_score=True)

filter_unsupported

filter_unsupported() -> Self

Return a new set excluding ligands whose molecules contain atom types outside :data:~deeporigin.drug_discovery.constants.SUPPORTED_ATOM_SYMBOLS (see :meth:Ligand.has_unsupported_atoms).

from_csv classmethod

from_csv(
    file_path: str | Path, smiles_column: str = "smiles"
) -> Self

Create a LigandSet instance from a CSV file containing SMILES strings and additional properties.

Parameters:

Name Type Description Default
file_path str

The path to the CSV file.

required
smiles_column str

The name of the column containing SMILES strings. Defaults to "smiles".

'smiles'

Returns:

Name Type Description
LigandSet Self

A LigandSet instance containing Ligand objects created from the CSV file.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

DeepOriginException

If the CSV does not contain the specified smiles column or if SMILES strings are invalid.

from_dir classmethod

from_dir(directory: str | Path) -> Self

Create a LigandSet instance from a directory containing SDF files.

from_file classmethod

from_file(
    file_path: str | Path,
    *,
    sanitize: bool = True,
    remove_hydrogens: bool = False,
    smiles_column: str = "smiles"
) -> Self

Create a LigandSet from an SDF or CSV file.

.sdf paths are validated as SDF (extension and content) and loaded with :meth:from_sdf. .csv paths are validated as CSV (extension) and loaded with :meth:from_csv.

Parameters:

Name Type Description Default
file_path str | Path

Path to an .sdf or .csv file.

required
sanitize bool

Whether to sanitize molecules (SDF only). Defaults to True.

True
remove_hydrogens bool

Whether to remove hydrogens (SDF only). Defaults to False.

False
smiles_column str

Name of the SMILES column (CSV only). Defaults to "smiles".

'smiles'

Returns:

Name Type Description
LigandSet Self

Ligands created from the file.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

DeepOriginException

If the path is not a supported file type or loading fails.

from_ids classmethod

from_ids(
    ids: list[str],
    *,
    client: DeepOriginClient | None = None,
    download: bool = True,
    ligand_inputs: list[dict[str, Any]] | None = None
) -> Self

Create a LigandSet by fetching ligands from the platform by ID.

Parameters:

Name Type Description Default
ids list[str]

List of Deep Origin Data Platform ligand IDs.

required
client DeepOriginClient | None

Optional API client. Uses the default if not provided.

None
download bool

If True (default), download mol files when present. If False, hydrate from SMILES and set remote_path from the record (or mol_file on the matching ligand_inputs row) without downloading.

True
ligand_inputs list[dict[str, Any]] | None

Optional list of dicts (e.g. execution userInputs.ligands) keyed by id; mol_file on a row overrides the API path when download is False.

None

Returns:

Type Description
Self

A new LigandSet containing the rehydrated ligands.

Notes

This delegates entity retrieval to client.entities.get_ligands() and preserves the order of the requested IDs.

from_json classmethod

from_json(
    data: list[dict[str, Any]],
    *,
    client: Optional[DeepOriginClient] = None,
    sanitize: bool = True,
    remove_hydrogens: bool = False
) -> Self

Build a LigandSet from pose metadata (no SDF download).

Each entry follows the docking poses[] shape: a file_path that is either a platform remote key or an existing local SDF, optional local_path / remote_path, plus optional smiles / canonical_smiles. When only a remote path is available, a non-empty SMILES field is required so an RDKit molecule can be built; the pose SDF is then loaded lazily via :meth:Ligand.download.

Parameters:

Name Type Description Default
data list[dict[str, Any]]

List of pose dicts (for example merged jobOutputs.poses).

required
client Optional[DeepOriginClient]

Optional client; project_id falls back to the client's project_id when missing on an entry.

None
sanitize bool

Passed to :meth:from_sdf when a local SDF path is used.

True
remove_hydrogens bool

Passed to :meth:from_sdf when a local SDF path is used.

False

Returns:

Name Type Description
One Self

class:Ligand per pose entry, ordered as in data.

Raises:

Type Description
ValueError

If a remote-only entry lacks SMILES, or paths are invalid.

from_rdkit_mols classmethod

from_rdkit_mols(mols: list[Mol])

Create a LigandSet from a list of RDKit molecules.

from_result classmethod

from_result(
    *,
    protein_id: str | None = None,
    execution_id: str | None = None,
    best_pose: bool | None = None,
    client: Optional[DeepOriginClient] = None
) -> Self

Load docking poses from the data platform without downloading SDF files.

Fetches result-explorer rows via client.results.get_poses, resolves SMILES from each row's compute_job_id execution (userInputs) when absent on the pose payload, then delegates to :meth:from_json.

Parameters:

Name Type Description Default
protein_id str | None

Optional protein id filter.

None
execution_id str | None

Optional compute job / execution id filter.

None
best_pose bool | None

If True, restrict to best pose per ligand. If False, include all poses. If None (default), no filter is applied.

None
client Optional[DeepOriginClient]

Optional DeepOriginClient; defaults to a new client.

None

Returns:

Type Description
Self

A LigandSet with one ligand per pose record.

Raises:

Type Description
ValueError

If no pose records match, or a pose cannot be built.

from_sdf classmethod

from_sdf(
    file_path: str | Path,
    *,
    sanitize: bool = True,
    remove_hydrogens: bool = False
) -> Self

Create a LigandSet instance from an SDF file containing one or more molecules.

Parameters:

Name Type Description Default
file_path str

The path to the SDF file.

required
sanitize bool

Whether to sanitize molecules. Defaults to True.

True
remove_hydrogens bool

Whether to remove hydrogens. Defaults to False.

False

Returns:

Name Type Description
LigandSet Self

A LigandSet instance containing Ligand objects created from the SDF file.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

DeepOriginException

If the file cannot be parsed correctly.

from_sdf_files classmethod

from_sdf_files(
    file_paths: list[str],
    *,
    sanitize: bool = True,
    remove_hydrogens: bool = False
) -> Self

Create a LigandSet instance from multiple SDF files by concatenating them together.

Parameters:

Name Type Description Default
file_paths list[str]

A list of paths to SDF files.

required
sanitize bool

Whether to sanitize molecules. Defaults to True.

True
remove_hydrogens bool

Whether to remove hydrogens. Defaults to False.

False

Returns:

Name Type Description
LigandSet Self

A LigandSet instance containing Ligand objects from all SDF files.

Raises:

Type Description
FileNotFoundError

If any of the files do not exist.

DeepOriginException

If any of the files cannot be parsed correctly.

from_smiles classmethod

from_smiles(smiles: list[str] | set[str]) -> Self

Create a LigandSet from a list of SMILES strings.

Parameters:

Name Type Description Default
smiles list[str] | set[str]

SMILES strings to convert into ligands.

required

Returns:

Type Description
Self

A new LigandSet containing one Ligand per SMILES string.

map_network

map_network(
    *,
    use_cache: bool = True,
    operation: Literal[
        "mapping", "network", "full"
    ] = "network",
    network_type: Literal["star", "mst", "cyclic"] = "mst"
)

Map a network of ligands from an SDF file using the DeepOrigin API.

mcs

mcs() -> Mol

Generates the Most Common Substructure (MCS) for ligands in a LigandSet

Returns:

Type Description
Mol

smartsString (str) : SMARTS string representing the MCS

plot

plot(
    *,
    x_label: str = "Pose Score",
    y_label: str = "Binding Energy (kcal/mol)",
    x: str = "POSE SCORE",
    y: str = "Binding Energy",
    output_file: Optional[str] = None,
    y_lim_max: Optional[float] = 0,
    width: int = 800,
    height: int = 800
)

Create a scatter plot of ligands using specified attributes for the axes.

The plot displays molecule images on hover and can be displayed inline or saved to an HTML file.

Parameters:

Name Type Description Default
x_label str

Label for the x-axis. Defaults to "Pose Score".

'Pose Score'
y_label str

Label for the y-axis. Defaults to "Binding Energy (kcal/mol)".

'Binding Energy (kcal/mol)'
x str

The name of the ligand property to use for the x-axis. Defaults to "POSE SCORE".

'POSE SCORE'
y str

The name of the ligand property to use for the y-axis. Defaults to "Binding Energy".

'Binding Energy'
output_file Optional[str]

Optional file path to save the HTML figure. If provided, the plot is saved to this file instead of being displayed. Defaults to None.

None

Raises:

Type Description
ValueError

If the specified x or y properties are not found in the ligand data.

prepare

prepare(*, remove_hydrogens: bool = False) -> Self

Prepare all ligands in the set for downstream workflows.

This calls the prepare() method on each Ligand in the set, which performs: - Salt removal - Kekulization - Fragment validation (rejects multiple non-identical fragments) - Validation of atom types against supported symbols

Parameters:

Name Type Description Default
remove_hydrogens bool

Whether to remove hydrogens from the SMILES representation. Defaults to False (preserve hydrogens).

False

Returns:

Name Type Description
LigandSet Self

The prepared LigandSet (self), for chaining.

Raises:

Type Description
DeepOriginException

If preparation fails for any ligand, unsupported atom types are present, or multiple non-identical fragments are detected.

random_sample

random_sample(n: int) -> Self

Return a new LigandSet containing n randomly selected ligands.

Parameters:

Name Type Description Default
n int

Number of ligands to randomly sample

required

Returns:

Name Type Description
LigandSet Self

A new LigandSet with n randomly selected ligands

Raises:

Type Description
ValueError

If n is greater than the total number of ligands

show

show() -> str | None

Visualize all ligands in this LigandSet in 3D

show_df

show_df()

Show ligands in the set in a dataframe with 2D visualizations.

show_grid

show_grid(
    mols_per_row: int = 3,
    sub_img_size: tuple[int, int] = (300, 300),
)

show all ligands in the LigandSet in a grid

show_network

show_network()

Show the network of ligands in the set.

sync

sync(
    *,
    lazy: bool = False,
    client: Optional[DeepOriginClient] = None
) -> None

Sync the ligand set to the data platform.

For every ligand in the set this method:

  1. Searches the data platform for existing ligands whose canonical_smiles match (batched into a single request via search_ligands(smiles_list=…)).
  2. For ligands that already exist remotely, updates the local id and sets remote_path from the record's mol_file when present.
  3. For ligands that are new, uploads files to remote storage (if a local_path is present) and batch-creates them in a single API call. Ligands sharing a canonical SMILES (e.g. multiple poses of the same molecule in an SDF) are deduplicated before the create call; all duplicates end up pointing at the single platform record.
  4. Updates the local id values from the created records.

.. note:: The batch-create step is all-or-nothing: if it fails (e.g. network error, invalid data), none of the new ligands will receive an id.

Parameters:

Name Type Description Default
lazy bool

If True, skip syncing ligands that already have an id.

False
client Optional[DeepOriginClient]

DeepOriginClient instance. If None, uses DeepOriginClient().

None

Raises:

Type Description
DeepOriginException

If any ligand to be synced contains atom types outside :data:~deeporigin.drug_discovery.constants.SUPPORTED_ATOM_SYMBOLS.

ValueError

If any ligand to be synced has no canonical_smiles.

to_dataframe

to_dataframe() -> DataFrame

Convert the LigandSet to a pandas DataFrame.

to_dict

to_dict() -> list[dict[str, str]]

Convert this set to a list of dicts, one per ligand.

Each dict has id (platform id when set, else "0", "1", … by position in this set) and smiles. For batched API calls with globally unique ids, ensure each :class:Ligand id is set before building the set.

Returns:

Type Description
list[dict[str, str]]

One {"id": ..., "smiles": ...} dict per ligand, in order.

Raises:

Type Description
ValueError

If a ligand has no non-empty smiles.

to_rdkit_mols

to_rdkit_mols() -> list[Mol]

Convert all ligands in the set to RDKit molecules.

to_sdf

to_sdf(output_path: Optional[str] = None) -> str

Write all ligands to one SDF file, preserving properties from each mol.

This is a local operation. Each ligand must already be rehydrated if it has remote_path but no local file (call :meth:Ligand.download or :meth:LigandSet.download first, or use from_id(..., download=True)).

Parameters:

Name Type Description Default
output_path Optional[str]

Path to the output SDF file.

None

Returns:

Type Description
str

Path to the written SDF file.

to_smiles

to_smiles() -> list[str]

Convert all ligands in the set to SMILES strings.

upload

upload(
    *,
    client: Optional[DeepOriginClient] = None,
    max_workers: int = 20
) -> None

Upload structure files for ligands that have a local file.

For each ligand with non-None :attr:~Ligand.local_path, serializes and assigns :attr:~Ligand.remote_path (same contract as :meth:Ligand.upload), then uploads all files in parallel via :meth:deeporigin.platform.files.FilesClient.upload_many. Ligands without local_path are skipped.

Parameters:

Name Type Description Default
client Optional[DeepOriginClient]

DeepOrigin client. If None, uses DeepOriginClient().

None
max_workers int

Maximum concurrent uploads (passed to upload_many).

20