`deeporigin.drug_discovery.chemistry`¶

Contains functions for working with SDF files.

Functions¶

align ¶

align(
    *,
    mols: list[Mol],
    reference: Mol,
    mcs_mol: Mol,
    energy: float = 5
) -> list[list[dict]]

Aligns a set of molecules to a reference and returns MCS atom constraints.

Parameters:

Name	Type	Description	Default
`mols`	`list[Mol]`	Molecules to align.	required
`reference`	`Mol`	Reference molecule (with 3D coords).	required
`mcs_mol`	`Mol`	MCS molecule.	required
`energy`	`float`	Energy weight for constraints.	`5`

Returns:

Type	Description
`list[list[dict]]`	list[list[dict]]: Constraints for each molecule.

canonicalize_smiles ¶

canonicalize_smiles(smiles: str) -> str

Canonicalize a SMILES string.

Parameters:

Name	Type	Description	Default
`smiles`	`str`	SMILES string.	required

Returns:

Name	Type	Description
`str`	`str`	Canonicalized SMILES string.

count_molecules_in_sdf_file ¶

count_molecules_in_sdf_file(sdf_file: str | Path) -> int

Count the number of valid (sanitizable) molecules in an SDF file using RDKit, while suppressing RDKit's error logging for sanitization issues.

Parameters:

Name	Type	Description	Default
`sdf_file`	`str \| Path`	Path to the SDF file.	required

Returns:

Name	Type	Description
`int`	`int`	The number of molecules successfully read in the SDF file.

get_properties_in_sdf_file ¶

get_properties_in_sdf_file(sdf_file: str | Path) -> list

Returns a list of all user-defined properties in an SDF file

Parameters:

Name	Type	Description	Default
`sdf_file`	`str \| Path`	Path to the SDF file.	required

Returns:

Name	Type	Description
`list`	`list`	A list of the names of all user-defined properties in the SDF file.

mcs ¶

mcs(mols: list[Mol], *, timeout: int = 10) -> Mol

Generate the Maximum Common Substructure (MCS) for molecules

Returns:

Name	Type	Description
`Mol`	`Mol`	MCS molecule constructed from the smarts string

merge_sdf_files ¶

merge_sdf_files(
    sdf_file_list: list[str],
    output_path: Optional[str] = None,
) -> str

Merge a list of SDF files into a single SDF file.

Parameters:

Name	Type	Description	Default
`sdf_file_list`	`list of str`	List of paths to SDF files.	required

Returns:

Name	Type	Description
`str`	`str`	Path to the merged SDF file.

preprocess_mol ¶

preprocess_mol(mol: Mol) -> Mol

Preprocess a molecule for MCS

Parameters:

Name	Type	Description	Default
`mol`	`Mol`	RDKit molecule	required

Returns:

Type	Description
`Mol`	Chem.Mol: Preprocessed molecule

read_molecules_in_sdf_file ¶

read_molecules_in_sdf_file(
    sdf_file: str | Path,
) -> list[dict]

Reads an SDF file containing one or more molecules, and for each molecule: - Extracts the SMILES string - Extracts all user-defined properties

Returns:

Type	Description
`list[dict]`	list[dict]: A list of dictionaries, where each dictionary has: - "smiles": str - "properties": dict

read_property_values ¶

read_property_values(sdf_file: str | Path, key: str)

Given a SDF file with more than 1 molecule, return the values of the properties for each molecule

Parameters:

Name	Type	Description	Default
`sdf_file`	`str \| Path`	Path to the SDF file.	required
`key`	`str`	The key of the property to read.	required

read_sdf_properties ¶

read_sdf_properties(sdf_file: str | Path) -> dict

Reads all user-defined properties from an SDF file (single molecule) and returns them as a dictionary.

Parameters:

Name	Type	Description	Default
`sdf_file`	`str \| Path`	Path to the SDF file.	required

safe_substruct_match ¶

safe_substruct_match(
    mol: Mol, query: Mol, label: str
) -> list[int]

Safely get a substructure match for a molecule

Parameters:

Name	Type	Description	Default
`mol`	`Mol`	RDKit molecule	required
`query`	`Mol`	Query molecule	required
`label`	`str`	Label for the molecule	required

Returns:

Type	Description
`list[int]`	list[int]: List of atom indices that match the query

sdf_to_smiles ¶

sdf_to_smiles(sdf_file: str | Path) -> list[str]

Extracts the SMILES strings of all valid molecules from an SDF file using RDKit.

Parameters:

Name	Type	Description	Default
`sdf_file`	`str \| Path`	Path to the SDF file.	required

Returns:

Type	Description
`list[str]`	list[str]: A list of SMILES strings for all valid molecules in the file.

show_molecules_in_sdf_file ¶

show_molecules_in_sdf_file(sdf_file: str | Path)

show molecules in an SDF file in a Jupyter notebook using molstar

show_molecules_in_sdf_files ¶

show_molecules_in_sdf_files(sdf_files: list[str])

show molecules in an SDF file in a Jupyter notebook using molstar

smiles_list_to_base64_png_list ¶

smiles_list_to_base64_png_list(
    smiles_list: list[str],
    *,
    size: tuple[int, int] = (300, 100),
    scale_factor: int = 2,
    reference_smiles: Optional[str] = None
) -> list[str]

Convert a list of SMILES strings to a list of base64-encoded PNG tags.

This aligns images so that they have consistent core orientation.

Parameters:

Name	Type	Description	Default
`smiles_list`	`list[str]`	List of SMILES strings.	required
`size`	`tuple[int, int]`	(width, height) of the final rendered image in pixels (CSS downscaled).	`(300, 100)`
`scale_factor`	`int`	Factor to generate higher-resolution images internally.	`2`
`reference_smiles`	`Optional[str]`	If provided, all molecules will be oriented to match the 2D layout of this reference molecule.	`None`

smiles_to_base64_png ¶

smiles_to_base64_png(
    smiles: str, *, size=(300, 100), scale_factor: int = 2
) -> str

Convert a SMILES string to an inline base64 tag. Use this if you want to convert a single molecule into an image. If you want to convert a set of SMILES strings (corresponding to a set of related molecules) to images, use smiles_list_to_base64_png_list.

Parameters:

Name	Type	Description	Default
`smiles`	`str`	SMILES string.	required
`size`	`Tuple[int, int]`	(width, height) of the final rendered image in pixels (CSS downscaled).	`(300, 100)`
`scale_factor`	`int`	Factor to generate higher-resolution images internally.	`2`

smiles_to_sdf ¶

smiles_to_sdf(smiles: str, sdf_path: str) -> None

convert a SMILES string to a SDF file

Parameters:

Name	Type	Description	Default
`smiles`	`str`	SMILES string	required
`sdf_path`	`str`	Path to the SDF file	required

split_sdf_file ¶

split_sdf_file(
    *,
    input_sdf_path: str | Path,
    output_prefix: str = "ligand",
    output_dir: Optional[str | Path] = None,
    name_by_property: str = "_Name"
) -> list[Path]

Splits a multi-ligand SDF file into individual SDF files, optionally placing the output in a user-specified directory. Each output SDF is named using the molecule's name (if present) or a fallback prefix.

Parameters:

Name	Type	Description	Default
`input_sdf_path`	`str \| Path`	Path to the input SDF file containing multiple ligands.	required
`output_prefix`	`str`	Prefix for unnamed ligands. Defaults to "ligand".	`'ligand'`
`output_dir`	`Optional[str \| Path]`	Directory to write the output SDF files to. If None, output files are written to the same directory as input_sdf_path.	`None`

Returns:

Type	Description
`list[Path]`	list[Path]: A list of paths to the generated SDF files.

deeporigin.drug_discovery.chemistry¶

Functions¶

align ¶

canonicalize_smiles ¶

count_molecules_in_sdf_file ¶

get_properties_in_sdf_file ¶

mcs ¶

merge_sdf_files ¶

preprocess_mol ¶

read_molecules_in_sdf_file ¶

read_property_values ¶

read_sdf_properties ¶

safe_substruct_match ¶

sdf_to_smiles ¶

show_molecules_in_sdf_file ¶

show_molecules_in_sdf_files ¶

smiles_list_to_base64_png_list ¶

smiles_to_base64_png ¶

smiles_to_sdf ¶

split_sdf_file ¶

`deeporigin.drug_discovery.chemistry`¶