Skip to content

deeporigin.drug_discovery.LigandSet

A class representing a set of Ligand objects.

Attributes:

Name Type Description
ligands list[Ligand]

A list of Ligand instances contained in the set.

network dict

A dictionary containing the network of ligands estimated using Konnektor.

Attributes

ligands class-attribute instance-attribute

ligands: list[Ligand] = field(default_factory=list)

network class-attribute instance-attribute

network: dict = field(default_factory=dict)

Functions

add_hydrogens

add_hydrogens() -> None

Add hydrogens to all ligands in the set.

batches

batches(batch_size: int | None) -> list[list[Ligand]]

Split this set into consecutive chunks of ligands (same order as :attr:ligands).

Parameters:

Name Type Description Default
batch_size int | None

Maximum ligands per chunk. None returns a single chunk containing all ligands (including when the set is empty). Must be an int when not None; other types are rejected by runtime type checking. When the number of ligands is not a multiple of batch_size, the last batch is shorter (it holds the remainder only).

required

Returns:

Type Description
list[list[Ligand]]

Non-empty list of batches when batch_size is None; otherwise a list

list[list[Ligand]]

of one or more consecutive slices of :attr:ligands.

Raises:

Type Description
ValueError

If batch_size is set and not positive.

BeartypeCallHintParamViolation

If batch_size is neither None nor an int (e.g. a float or str).

compute_constraints

compute_constraints(
    *, reference: Ligand, mcs_mol=None
) -> list[list[dict]]

Align a set of ligands to a reference ligand

compute_rmsd

compute_rmsd()

compute pairwise rmsd between all ligands in the set

embed

embed()

Minimize all ligands in the set using their 3D optimization routines. This calls the embed() method on each Ligand in the set.

filter_top_poses

filter_top_poses(*, by_pose_score: bool = True) -> Self

Filter ligands to keep only the best pose for each unique molecule.

Groups ligands by SMILES string and retains only the one with: - Minimum binding energy (default), or - Maximum pose score (when by_pose_score=True)

Parameters:

Name Type Description Default
by_pose_score bool

If True, select by maximum pose score. If False, select by minimum binding energy.

True

Returns:

Name Type Description
LigandSet Self

A new LigandSet containing only the best pose for each unique molecule.

Raises:

Type Description
DeepOriginException

If required properties are missing from ligands.

Example

Filter by binding energy (default)

filtered_ligands = ligand_set.filter_top_poses()

Filter by pose score

filtered_ligands = ligand_set.filter_top_poses(by_pose_score=True)

filter_unsupported

filter_unsupported() -> Self

Return a new set excluding ligands whose molecules contain atom types outside :data:~deeporigin.drug_discovery.constants.SUPPORTED_ATOM_SYMBOLS (see :meth:Ligand.has_unsupported_atoms).

from_csv classmethod

from_csv(
    file_path: str | Path, smiles_column: str = "smiles"
) -> Self

Create a LigandSet instance from a CSV file containing SMILES strings and additional properties.

Parameters:

Name Type Description Default
file_path str

The path to the CSV file.

required
smiles_column str

The name of the column containing SMILES strings. Defaults to "smiles".

'smiles'

Returns:

Name Type Description
LigandSet Self

A LigandSet instance containing Ligand objects created from the CSV file.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

DeepOriginException

If the CSV does not contain the specified smiles column or if SMILES strings are invalid.

from_dir classmethod

from_dir(directory: str | Path) -> Self

Create a LigandSet instance from a directory containing SDF files.

from_docking_result classmethod

from_docking_result(
    *,
    protein_id: str | None = None,
    execution_id: str | None = None,
    client: Optional[DeepOriginClient] = None
) -> Self

Create a LigandSet from docking results in the data platform.

Fetches docking pose results for the given protein, downloads the SDF files, and loads them into a LigandSet.

Parameters:

Name Type Description Default
protein_id str | None

Protein ID to fetch docking results for.

None
execution_id str | None

Execution ID to fetch docking results for.

None
client Optional[DeepOriginClient]

Optional DeepOriginClient instance. If not provided, uses the default client.

None

Returns:

Type Description
Self

A LigandSet of docked poses.

Raises:

Type Description
ValueError

If no docking results are found for the protein.

from_docking_results classmethod

from_docking_results(
    *, result: FunctionResult, client: DeepOriginClient
) -> Self

Build a LigandSet from function-API docking responses (embedded pose paths).

Reads functionOutputs from each wrapped response, downloads pose SDF files via client.files, and merges ligands with :meth:from_sdf_files. For hydrated poses from the data platform by execution id, use :meth:from_docking_result instead.

Parameters:

Name Type Description Default
result FunctionResult

FunctionResult wrapping one or more docking API responses.

required
client DeepOriginClient

Client used to download remote pose files.

required

Returns:

Type Description
Self

A LigandSet built from the downloaded SDF files.

from_file classmethod

from_file(
    file_path: str | Path,
    *,
    sanitize: bool = True,
    remove_hydrogens: bool = False,
    smiles_column: str = "smiles"
) -> Self

Create a LigandSet from an SDF or CSV file.

.sdf paths are validated as SDF (extension and content) and loaded with :meth:from_sdf. .csv paths are validated as CSV (extension) and loaded with :meth:from_csv.

Parameters:

Name Type Description Default
file_path str | Path

Path to an .sdf or .csv file.

required
sanitize bool

Whether to sanitize molecules (SDF only). Defaults to True.

True
remove_hydrogens bool

Whether to remove hydrogens (SDF only). Defaults to False.

False
smiles_column str

Name of the SMILES column (CSV only). Defaults to "smiles".

'smiles'

Returns:

Name Type Description
LigandSet Self

Ligands created from the file.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

DeepOriginException

If the path is not a supported file type or loading fails.

from_ids classmethod

from_ids(
    ids: list[str],
    *,
    client: DeepOriginClient | None = None,
    download: bool = True,
    ligand_inputs: list[dict[str, Any]] | None = None
) -> Self

Create a LigandSet by fetching ligands from the platform by ID.

Parameters:

Name Type Description Default
ids list[str]

List of Deep Origin Data Platform ligand IDs.

required
client DeepOriginClient | None

Optional API client. Uses the default if not provided.

None
download bool

If True (default), download mol files when present. If False, hydrate from SMILES and set remote_path from the record (or mol_file on the matching ligand_inputs row) without downloading.

True
ligand_inputs list[dict[str, Any]] | None

Optional list of dicts (e.g. execution userInputs.ligands) keyed by id; mol_file on a row overrides the API path when download is False.

None

Returns:

Type Description
Self

A new LigandSet containing the rehydrated ligands.

Notes

This delegates entity retrieval to client.entities.get_ligands() and preserves the order of the requested IDs.

from_rdkit_mols classmethod

from_rdkit_mols(mols: list[Mol])

Create a LigandSet from a list of RDKit molecules.

from_sdf classmethod

from_sdf(
    file_path: str | Path,
    *,
    sanitize: bool = True,
    remove_hydrogens: bool = False
) -> Self

Create a LigandSet instance from an SDF file containing one or more molecules.

Parameters:

Name Type Description Default
file_path str

The path to the SDF file.

required
sanitize bool

Whether to sanitize molecules. Defaults to True.

True
remove_hydrogens bool

Whether to remove hydrogens. Defaults to False.

False

Returns:

Name Type Description
LigandSet Self

A LigandSet instance containing Ligand objects created from the SDF file.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

DeepOriginException

If the file cannot be parsed correctly.

from_sdf_files classmethod

from_sdf_files(
    file_paths: list[str],
    *,
    sanitize: bool = True,
    remove_hydrogens: bool = False
) -> Self

Create a LigandSet instance from multiple SDF files by concatenating them together.

Parameters:

Name Type Description Default
file_paths list[str]

A list of paths to SDF files.

required
sanitize bool

Whether to sanitize molecules. Defaults to True.

True
remove_hydrogens bool

Whether to remove hydrogens. Defaults to False.

False

Returns:

Name Type Description
LigandSet Self

A LigandSet instance containing Ligand objects from all SDF files.

Raises:

Type Description
FileNotFoundError

If any of the files do not exist.

DeepOriginException

If any of the files cannot be parsed correctly.

from_smiles classmethod

from_smiles(smiles: list[str] | set[str]) -> Self

Create a LigandSet from a list of SMILES strings.

Parameters:

Name Type Description Default
smiles list[str] | set[str]

SMILES strings to convert into ligands.

required

Returns:

Type Description
Self

A new LigandSet containing one Ligand per SMILES string.

map_network

map_network(
    *,
    use_cache: bool = True,
    operation: Literal[
        "mapping", "network", "full"
    ] = "network",
    network_type: Literal["star", "mst", "cyclic"] = "mst"
)

Map a network of ligands from an SDF file using the DeepOrigin API.

mcs

mcs() -> Mol

Generates the Most Common Substructure (MCS) for ligands in a LigandSet

Returns:

Type Description
Mol

smartsString (str) : SMARTS string representing the MCS

plot

plot(
    *,
    x_label: str = "Pose Score",
    y_label: str = "Binding Energy (kcal/mol)",
    x: str = "POSE SCORE",
    y: str = "Binding Energy",
    output_file: Optional[str] = None,
    y_lim_max: Optional[float] = 0,
    width: int = 800,
    height: int = 800
)

Create a scatter plot of ligands using specified attributes for the axes.

The plot displays molecule images on hover and can be displayed inline or saved to an HTML file.

Parameters:

Name Type Description Default
x_label str

Label for the x-axis. Defaults to "Pose Score".

'Pose Score'
y_label str

Label for the y-axis. Defaults to "Binding Energy (kcal/mol)".

'Binding Energy (kcal/mol)'
x str

The name of the ligand property to use for the x-axis. Defaults to "POSE SCORE".

'POSE SCORE'
y str

The name of the ligand property to use for the y-axis. Defaults to "Binding Energy".

'Binding Energy'
output_file Optional[str]

Optional file path to save the HTML figure. If provided, the plot is saved to this file instead of being displayed. Defaults to None.

None

Raises:

Type Description
ValueError

If the specified x or y properties are not found in the ligand data.

prepare

prepare(*, remove_hydrogens: bool = False) -> Self

Prepare all ligands in the set for downstream workflows.

This calls the prepare() method on each Ligand in the set, which performs: - Salt removal - Kekulization - Fragment validation (rejects multiple non-identical fragments) - Validation of atom types against supported symbols

Parameters:

Name Type Description Default
remove_hydrogens bool

Whether to remove hydrogens from the SMILES representation. Defaults to False (preserve hydrogens).

False

Returns:

Name Type Description
LigandSet Self

The prepared LigandSet (self), for chaining.

Raises:

Type Description
DeepOriginException

If preparation fails for any ligand, unsupported atom types are present, or multiple non-identical fragments are detected.

protonate

protonate(
    *,
    ph: number = 7.4,
    filter_percentage: number = 1.0,
    use_cache: bool = True,
    client: Optional[DeepOriginClient] = None,
    quote: bool = False
) -> FunctionResult

Protonate all ligands in the set.

Returns a FunctionResult whose .ligands attribute contains the protonated ligands. When quote=True, .ligands is empty and .estimate gives the cost in dollars. Only the most abundant species is retained for each ligand.

Parameters:

Name Type Description Default
ph number

pH value at which to protonate. Defaults to 7.4.

7.4
filter_percentage number

Percentage threshold for filtering protonation states. Defaults to 1.0.

1.0
use_cache bool

Whether to use cached protonation results.

True
client Optional[DeepOriginClient]

DeepOrigin client instance. If None, uses DeepOriginClient().

None
quote bool

If True, request a cost estimate without executing.

False

Returns:

Name Type Description
FunctionResult FunctionResult

A FunctionResult with a .ligands attribute (list of Ligand).

random_sample

random_sample(n: int) -> Self

Return a new LigandSet containing n randomly selected ligands.

Parameters:

Name Type Description Default
n int

Number of ligands to randomly sample

required

Returns:

Name Type Description
LigandSet Self

A new LigandSet with n randomly selected ligands

Raises:

Type Description
ValueError

If n is greater than the total number of ligands

show

show() -> str | None

Visualize all ligands in this LigandSet in 3D

show_df

show_df()

Show ligands in the set in a dataframe with 2D visualizations.

show_grid

show_grid(
    mols_per_row: int = 3,
    sub_img_size: tuple[int, int] = (300, 300),
)

show all ligands in the LigandSet in a grid

show_network

show_network()

Show the network of ligands in the set.

sync

sync(
    *,
    lazy: bool = False,
    client: Optional[DeepOriginClient] = None
) -> None

Sync the ligand set to the data platform.

For every ligand in the set this method:

  1. Searches the data platform for existing ligands whose canonical_smiles match (batched into a single request via search_ligands(smiles_list=…)).
  2. For ligands that already exist remotely, updates the local id and sets remote_path from the record's mol_file when present.
  3. For ligands that are new, uploads files to remote storage (if a local_path is present) and batch-creates them in a single API call. Ligands sharing a canonical SMILES (e.g. multiple poses of the same molecule in an SDF) are deduplicated before the create call; all duplicates end up pointing at the single platform record.
  4. Updates the local id values from the created records.

.. note:: The batch-create step is all-or-nothing: if it fails (e.g. network error, invalid data), none of the new ligands will receive an id.

Parameters:

Name Type Description Default
lazy bool

If True, skip syncing ligands that already have an id.

False
client Optional[DeepOriginClient]

DeepOriginClient instance. If None, uses DeepOriginClient().

None

Raises:

Type Description
DeepOriginException

If any ligand to be synced contains atom types outside :data:~deeporigin.drug_discovery.constants.SUPPORTED_ATOM_SYMBOLS.

ValueError

If any ligand to be synced has no canonical_smiles.

to_dataframe

to_dataframe() -> DataFrame

Convert the LigandSet to a pandas DataFrame.

to_dict

to_dict() -> list[dict[str, str]]

Convert this set to a list of dicts, one per ligand.

Each dict has id (platform id when set, else "0", "1", … by position in this set) and smiles. For batched API calls with globally unique ids, ensure each :class:Ligand id is set before building the set.

Returns:

Type Description
list[dict[str, str]]

One {"id": ..., "smiles": ...} dict per ligand, in order.

Raises:

Type Description
ValueError

If a ligand has no non-empty smiles.

to_rdkit_mols

to_rdkit_mols() -> list[Mol]

Convert all ligands in the set to RDKit molecules.

to_sdf

to_sdf(output_path: Optional[str] = None) -> str

Write all ligands to one SDF file, preserving properties from each mol.

This is a local operation. Each ligand must already be rehydrated if it has remote_path but no local file (call :meth:Ligand.download first, or use from_id(..., download=True)).

Parameters:

Name Type Description Default
output_path Optional[str]

Path to the output SDF file.

None

Returns:

Type Description
str

Path to the written SDF file.

to_smiles

to_smiles() -> list[str]

Convert all ligands in the set to SMILES strings.

upload

upload(
    *,
    client: Optional[DeepOriginClient] = None,
    max_workers: int = 20
) -> None

Upload structure files for ligands that have a local file.

For each ligand with non-None :attr:~Ligand.local_path, serializes and assigns :attr:~Ligand.remote_path (same contract as :meth:Ligand.upload), then uploads all files in parallel via :meth:deeporigin.platform.files.FilesClient.upload_many. Ligands without local_path are skipped.

Parameters:

Name Type Description Default
client Optional[DeepOriginClient]

DeepOrigin client. If None, uses DeepOriginClient().

None
max_workers int

Maximum concurrent uploads (passed to upload_many).

20