deeporigin.drug_discovery.chemistry
¶
Contains classes and functions for working with molecules, proteins, and related files.
Defines Ligand
as Protein
classes, as well as functions for reading/writing SDF files,
SMILES / SDF Conversion, validating data, DataFrame
integration, and preparing visualizations.
These can be used together with the drug_discovery
module for tasks such as docking.
Ligand
: Represents a small molecule ligand, accepting either a file path (SDF) or a SMILES string. Providesshow()
method to display it.Protein
: Represents a protein, accepting a local file path (PDB) or a PDB ID. Providesshow()
method to display it.
Classes¶
Ligand
dataclass
¶
Class to represent a ligand (typically backed by a SDF file)
Attributes¶
Functions¶
from_csv
classmethod
¶
from_csv(
*,
file: str | Path,
smiles_column: str,
properties_columns: list[str] = None
) -> list[Ligand]
create a list of ligands from a CSV file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
str | Path
|
Path to CSV file |
required |
smiles_column
|
str
|
Column name containing SMILES strings |
required |
properties_columns
|
list[str]
|
List of column names to extract as properties |
None
|
Returns:
Type | Description |
---|---|
list[Ligand]
|
List of Ligand objects |
Protein
dataclass
¶
Functions¶
canonicalize_smiles
¶
canonicalize_smiles(smiles: str) -> str
Canonicalize a SMILES string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
smiles
|
str
|
SMILES string. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Canonicalized SMILES string. |
count_molecules_in_sdf_file
¶
count_molecules_in_sdf_file(sdf_file: str | Path) -> int
Count the number of valid (sanitizable) molecules in an SDF file using RDKit, while suppressing RDKit's error logging for sanitization issues.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf_file
|
str | Path
|
Path to the SDF file. |
required |
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The number of molecules successfully read in the SDF file. |
download_protein
¶
download_protein(pdb_id: str, save_dir: str = '.') -> str
Downloads a PDB structure by its PDB ID from RCSB and saves it to the specified directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pdb_id
|
str
|
PDB ID of the protein. |
required |
save_dir
|
str
|
Directory to save the downloaded PDB file. |
'.'
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Path to the downloaded PDB file. |
Raises:
Type | Description |
---|---|
Exception
|
If the download fails. |
filter_sdf_by_smiles
¶
filter_sdf_by_smiles(
*,
input_sdf_file: str | Path,
output_sdf_file: str | Path,
keep_only_smiles: list[str] | Series
)
Extracts the SMILES strings of all valid molecules from an SDF file using RDKit.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_sdf_file
|
str | Path
|
Path to the SDF file. |
required |
output_sdf_file
|
str | Path
|
Path to the output SDF file. |
required |
keep_only_smiles
|
list[str] | Series
|
List or Series of SMILES strings to keep. |
required |
get_properties_in_sdf_file
¶
get_properties_in_sdf_file(sdf_file: str | Path) -> list
Returns a list of all user-defined properties in an SDF file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf_file
|
str | Path
|
Path to the SDF file. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list
|
A list of the names of all user-defined properties in the SDF file. |
ligands_to_dataframe
¶
ligands_to_dataframe(ligands: list[Ligand])
convert a list of ligands to a pandas dataframe
merge_sdf_files
¶
merge_sdf_files(
sdf_file_list: list[str],
output_path: Optional[str] = None,
) -> str
Merge a list of SDF files into a single SDF file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf_file_list
|
list of str
|
List of paths to SDF files. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Path to the merged SDF file. |
read_molecules_in_sdf_file
¶
read_molecules_in_sdf_file(
sdf_file: str | Path,
) -> list[dict]
Reads an SDF file containing one or more molecules, and for each molecule: - Extracts the SMILES string - Extracts all user-defined properties
Returns:
Type | Description |
---|---|
list[dict]
|
list[dict]: A list of dictionaries, where each dictionary has: - "smiles_string": str - "properties": dict |
read_property_values
¶
read_property_values(sdf_file: str | Path, key: str)
Given a SDF file with more than 1 molecule, return the values of the properties for each molecule
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf_file
|
str | Path
|
Path to the SDF file. |
required |
key
|
str
|
The key of the property to read. |
required |
read_sdf_properties
¶
read_sdf_properties(sdf_file: str | Path) -> dict
Reads all user-defined properties from an SDF file (single molecule) and returns them as a dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf_file
|
str | Path
|
Path to the SDF file. |
required |
sdf_to_smiles
¶
sdf_to_smiles(sdf_file: str | Path) -> list[str]
Extracts the SMILES strings of all valid molecules from an SDF file using RDKit.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf_file
|
str | Path
|
Path to the SDF file. |
required |
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: A list of SMILES strings for all valid molecules in the file. |
show_ligands
¶
show_ligands(ligands: list[Ligand])
show ligands in the FEP object in a dataframe. This function visualizes the ligands using core-aligned 2D visualizations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ligands
|
list[Ligand]
|
list of ligands |
required |
show_molecules_in_sdf_file
¶
show_molecules_in_sdf_file(sdf_file: str | Path)
show molecules in an SDF file in a Jupyter notebook using molstar
show_molecules_in_sdf_files
¶
show_molecules_in_sdf_files(sdf_files: list[str])
show molecules in an SDF file in a Jupyter notebook using molstar
smiles_list_to_base64_png_list
¶
smiles_list_to_base64_png_list(
smiles_list: list[str],
*,
size: Tuple[int, int] = (300, 100),
scale_factor: int = 2,
reference_smiles: Optional[str] = None
) -> list[str]
Convert a list of SMILES strings to a list of base64-encoded PNG tags.
This aligns images so that they have consistent core orientation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
smiles_list
|
list[str]
|
List of SMILES strings. |
required |
size
|
Tuple[int, int]
|
(width, height) of the final rendered image in pixels (CSS downscaled). |
(300, 100)
|
scale_factor
|
int
|
Factor to generate higher-resolution images internally. |
2
|
reference_smiles
|
Optional[str]
|
If provided, all molecules will be oriented to match the 2D layout of this reference molecule. |
None
|
smiles_to_base64_png
¶
smiles_to_base64_png(
smiles: str, *, size=(300, 100), scale_factor: int = 2
) -> str
Convert a SMILES string to an inline base64 tag. Use this if you want to convert a single molecule into an image. If you want to convert a set of SMILES strings (corresponding to a set of related molecules) to images, use
smiles_list_to_base64_png_list
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
smiles
|
str
|
SMILES string. |
required |
size
|
Tuple[int, int]
|
(width, height) of the final rendered image in pixels (CSS downscaled). |
(300, 100)
|
scale_factor
|
int
|
Factor to generate higher-resolution images internally. |
2
|
smiles_to_sdf
¶
smiles_to_sdf(smiles: str, sdf_path: str) -> None
convert a SMILES string to a SDF file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
smiles
|
str
|
SMILES string |
required |
sdf_path
|
str
|
Path to the SDF file |
required |
split_sdf_file
¶
split_sdf_file(
*,
input_sdf_path: str | Path,
output_prefix: str = "ligand",
output_dir: Optional[str | Path] = None,
name_by_property: str = "_Name"
) -> list[Path]
Splits a multi-ligand SDF file into individual SDF files, optionally placing the output in a user-specified directory. Each output SDF is named using the molecule's name (if present) or a fallback prefix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_sdf_path
|
str | Path
|
Path to the input SDF file containing multiple ligands. |
required |
output_prefix
|
str
|
Prefix for unnamed ligands. Defaults to "ligand". |
'ligand'
|
output_dir
|
Optional[str | Path]
|
Directory to write the output SDF files to. If None, output files are written to the same directory as input_sdf_path. |
None
|
Returns:
Type | Description |
---|---|
list[Path]
|
list[Path]: A list of paths to the generated SDF files. |