deeporigin.drug_discovery.chemistry
¶
Contains functions for working with SDF files.
Attributes¶
Functions¶
canonicalize_smiles
¶
canonicalize_smiles(smiles: str) -> str
Canonicalize a SMILES string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
smiles
|
str
|
SMILES string. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Canonicalized SMILES string. |
count_molecules_in_sdf_file
¶
count_molecules_in_sdf_file(sdf_file: str | Path) -> int
Count the number of valid (sanitizable) molecules in an SDF file using RDKit, while suppressing RDKit's error logging for sanitization issues.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf_file
|
str | Path
|
Path to the SDF file. |
required |
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The number of molecules successfully read in the SDF file. |
full_graph_map
¶
full_graph_map(
mol_a: Mol, mol_b: Mol, ignore_hs: bool = True
) -> Optional[list[Tuple[int, int]]]
Return atom map for identical graphs (isomorphic).
group_by_prop_smiles_to_multiconf
¶
group_by_prop_smiles_to_multiconf(
sdf_path: str,
*,
smiles_prop_name: str = "SMILES",
keep_hs: bool = False,
align_conformers: bool = True,
skip_no_coords: bool = True
) -> dict[str, Mol]
Read an SDF that contains many poses (possibly for multiple ligands) and group them by
an SDF property (default:
Returns¶
dict[str, Chem.Mol]: {prop_smiles_value -> Mol with N conformers}
mcs_map
¶
mcs_map(
mol_a: Mol,
mol_b: Mol,
ignore_hs: bool = True,
ring_matches_ring_only: bool = True,
complete_rings_only: bool = True,
match_valences: bool = True,
match_chiral_tag: bool = False,
timeout: int = 10,
) -> Optional[list[Tuple[int, int]]]
Return an atom map for the maximum common substructure (subset comparison).
pairwise_pose_rmsd
¶
pairwise_pose_rmsd(
mols: Sequence[Mol],
*,
conf_id: int = 0,
ignore_hs: bool = True,
use_mcs_if_needed: bool = True,
fill_value_for_unmapped: float = nan
)
NxN matrix of pose-sensitive RMSDs (no alignment). If two mols can’t be mapped, entry is fill_value_for_unmapped (default NaN).
pose_rmsd
¶
pose_rmsd(
mol_a: Mol,
mol_b: Mol,
*,
conf_id_a: int = 0,
conf_id_b: int = 0,
ignore_hs: bool = True,
use_mcs_if_needed: bool = True
) -> Optional[float]
Pose-sensitive RMSD: NO alignment, NO centering. High if the same structure is translated/rotated. Tries full-graph mapping; if that fails and use_mcs_if_needed=True, uses MCS subset mapping. Returns None if no mapping found.
raw_rmsd_from_map
¶
raw_rmsd_from_map(
mol_a: Mol,
mol_b: Mol,
atom_map: list[Tuple[int, int]],
conf_id_a: int = 0,
conf_id_b: int = 0,
) -> float
Compute RMSD directly from coordinates on a given atom mapping. NO alignment, NO centering.
read_property_values
¶
read_property_values(sdf_file: str | Path, key: str)
Given a SDF file with more than 1 molecule, return the values of the properties for each molecule
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf_file
|
str | Path
|
Path to the SDF file. |
required |
key
|
str
|
The key of the property to read. |
required |
sdf_to_smiles
¶
sdf_to_smiles(sdf_file: str | Path) -> list[str]
Extracts the SMILES strings of all valid molecules from an SDF file using RDKit.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf_file
|
str | Path
|
Path to the SDF file. |
required |
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of SMILES strings for all valid molecules in the file. |
smiles_to_sdf
¶
smiles_to_sdf(smiles: str, sdf_path: str) -> None
convert a SMILES string to a SDF file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
smiles
|
str
|
SMILES string |
required |
sdf_path
|
str
|
Path to the SDF file |
required |
split_sdf_file
¶
split_sdf_file(
*,
input_sdf_path: str | Path,
output_prefix: str = "ligand",
output_dir: Optional[str | Path] = None,
name_by_property: str = "_Name"
) -> list[Path]
Splits a multi-ligand SDF file into individual SDF files, optionally placing the output in a user-specified directory. Each output SDF is named using the molecule's name (if present) or a fallback prefix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_sdf_path
|
str | Path
|
Path to the input SDF file containing multiple ligands. |
required |
output_prefix
|
str
|
Prefix for unnamed ligands. Defaults to "ligand". |
'ligand'
|
output_dir
|
Optional[str | Path]
|
Directory to write the output SDF files to. If None, output files are written to the same directory as input_sdf_path. |
None
|
Returns:
Type | Description |
---|---|
list[Path]
|
list[Path]: A list of paths to the generated SDF files. |