Skip to content

deeporigin.drug_discovery.LigandSet

A class representing a set of Ligand objects.

Attributes:

Name Type Description
ligands list[Ligand]

A list of Ligand instances contained in the set.

network dict

A dictionary containing the network of ligands estimated using Konnektor.

Attributes

ligands class-attribute instance-attribute

ligands: list[Ligand] = field(default_factory=list)

network class-attribute instance-attribute

network: dict = field(default_factory=dict)

Functions

add_hydrogens

add_hydrogens() -> None

Add hydrogens to all ligands in the set.

admet_properties

admet_properties(use_cache: bool = True)

Predict ADMET properties for all ligands in the set. This calls the admet_properties() method on each Ligand in the set. Returns a list of the results for each ligand. Shows a progress bar using tqdm.

compute_constraints

compute_constraints(
    *, reference: Ligand, mcs_mol=None
) -> list[list[dict]]

Align a set of ligands to a reference ligand

compute_rmsd

compute_rmsd()

compute pairwise rmsd between all ligands in the set

embed

embed()

Minimize all ligands in the set using their 3D optimization routines. This calls the embed() method on each Ligand in the set.

filter_top_poses

filter_top_poses(
    *, by_pose_score: bool = False
) -> LigandSet

Filter ligands to keep only the best pose for each unique molecule.

Groups ligands by their 'initial_smiles' property and retains only the one with: - Minimum binding energy (default), or - Maximum pose score (when by_pose_score=True)

Parameters:

Name Type Description Default
by_pose_score bool

If True, select by maximum pose score. If False (default), select by minimum binding energy.

False

Returns:

Name Type Description
LigandSet LigandSet

A new LigandSet containing only the best pose for each unique molecule.

Raises:

Type Description
DeepOriginException

If required properties are missing from ligands.

Example

Filter by binding energy (default)

filtered_ligands = ligand_set.filter_top_poses()

Filter by pose score

filtered_ligands = ligand_set.filter_top_poses(by_pose_score=True)

from_csv classmethod

from_csv(
    file_path: str, smiles_column: str = "smiles"
) -> LigandSet

Create a LigandSet instance from a CSV file containing SMILES strings and additional properties.

Parameters:

Name Type Description Default
file_path str

The path to the CSV file.

required
smiles_column str

The name of the column containing SMILES strings. Defaults to "smiles".

'smiles'

Returns:

Name Type Description
LigandSet LigandSet

A LigandSet instance containing Ligand objects created from the CSV file.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

DeepOriginException

If the CSV does not contain the specified smiles column or if SMILES strings are invalid.

from_dir classmethod

from_dir(directory: str) -> LigandSet

Create a LigandSet instance from a directory containing SDF files.

from_rdkit_mols classmethod

from_rdkit_mols(mols: list[Mol])

Create a LigandSet from a list of RDKit molecules.

from_sdf classmethod

from_sdf(
    file_path: str,
    *,
    sanitize: bool = True,
    remove_hydrogens: bool = False
) -> LigandSet

Create a LigandSet instance from an SDF file containing one or more molecules.

Parameters:

Name Type Description Default
file_path str

The path to the SDF file.

required
sanitize bool

Whether to sanitize molecules. Defaults to True.

True
remove_hydrogens bool

Whether to remove hydrogens. Defaults to False.

False

Returns:

Name Type Description
LigandSet LigandSet

A LigandSet instance containing Ligand objects created from the SDF file.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

DeepOriginException

If the file cannot be parsed correctly.

from_sdf_files classmethod

from_sdf_files(
    file_paths: list[str],
    *,
    sanitize: bool = True,
    remove_hydrogens: bool = False
) -> LigandSet

Create a LigandSet instance from multiple SDF files by concatenating them together.

Parameters:

Name Type Description Default
file_paths list[str]

A list of paths to SDF files.

required
sanitize bool

Whether to sanitize molecules. Defaults to True.

True
remove_hydrogens bool

Whether to remove hydrogens. Defaults to False.

False

Returns:

Name Type Description
LigandSet LigandSet

A LigandSet instance containing Ligand objects from all SDF files.

Raises:

Type Description
FileNotFoundError

If any of the files do not exist.

DeepOriginException

If any of the files cannot be parsed correctly.

from_smiles classmethod

from_smiles(smiles: list[str] | set[str]) -> LigandSet

Create a LigandSet from a list of SMILES strings.

map_network

map_network(
    *,
    use_cache: bool = True,
    operation: Literal[
        "mapping", "network", "full"
    ] = "network",
    network_type: Literal["star", "mst", "cyclic"] = "mst"
)

Map a network of ligands from an SDF file using the DeepOrigin API.

mcs

mcs() -> str

Generates the Most Common Substructure (MCS) for ligands in a LigandSet

Returns:

Type Description
str

smartsString (str) : SMARTS string representing the MCS

protonate

protonate(
    *, ph: number = 7.4, filter_percentage: number = 1.0
)

Protonate the ligandSet. Only the most abundant species is retained for each ligand.

random_sample

random_sample(n: int) -> LigandSet

Return a new LigandSet containing n randomly selected ligands.

Parameters:

Name Type Description Default
n int

Number of ligands to randomly sample

required

Returns:

Name Type Description
LigandSet LigandSet

A new LigandSet with n randomly selected ligands

Raises:

Type Description
ValueError

If n is greater than the total number of ligands

show

show()

Visualize all ligands in this LigandSet in 3D

show_df

show_df()

Show ligands in the set in a dataframe with 2D visualizations.

show_grid

show_grid(
    mols_per_row: int = 3,
    sub_img_size: tuple[int, int] = (300, 300),
)

show all ligands in the LigandSet in a grid

show_network

show_network()

Show the network of ligands in the set.

to_dataframe

to_dataframe() -> DataFrame

Convert the LigandSet to a pandas DataFrame.

to_rdkit_mols

to_rdkit_mols() -> list[Mol]

Convert all ligands in the set to RDKit molecules.

to_sdf

to_sdf(output_path: Optional[str] = None) -> str

Write all ligands in the set to a single SDF file, preserving all properties from each Ligand's mol field.

Parameters:

Name Type Description Default
output_path str

The path to the output SDF file.

None

Returns:

Name Type Description
str str

The path to the written SDF file.

to_smiles

to_smiles() -> list[str]

Convert all ligands in the set to SMILES strings.