Skip to content

Work with Ligands

This document describes how to work with ligands (molecules) and use them in Deep Origin tools.

There are two classes that help you work with ligands:

Constructing a Ligand or LigandSet

From a SDF file

A single Ligand can be constructed from a SDF file:

from deeporigin.drug_discovery import Ligand, BRD_DATA_DIR

ligand = Ligand.from_sdf(BRD_DATA_DIR / "brd-2.sdf")

A LigandSet can be constructed from a SDF File:

from deeporigin.drug_discovery import LigandSet, DATA_DIR

ligands = LigandSet.from_sdf(DATA_DIR / "ligands" / "ligands-brd-all.sdf")

From SMILES string(s)

A ligand can be constructed from a SMILES string, which is a compact way to represent molecular structures:

from deeporigin.drug_discovery import Ligand


# Basic usage with just a SMILES string
ligand = Ligand.from_smiles(smiles="CCO")  # Ethanol

# With additional parameters
ligand = Ligand.from_smiles(
    smiles="c1ccccc1",  # Benzene
    name="Benzene",     # Optional name for the ligand
)

The from_smiles constructor:

  • Takes a SMILES string as input
  • Optionally accepts a name for the ligand
  • Optionally accepts a save_to_file parameter to control file persistence
  • Automatically validates the SMILES string and creates a proper molecular representation
  • Returns a Ligand instance that can be used for further operations

SMILES Validation

The constructor will raise an exception if the provided SMILES string is invalid or cannot be parsed into a valid molecule.

A LigandSet can be constructed from a list or set of SMILES strings:

from deeporigin.drug_discovery import LigandSet

smiles = {
    "C/C=C/Cn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
    "C=CCCn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
}

ligands = LigandSet.from_smiles(smiles)

From a Chemical Identifier

You can create a ligand from common chemical identifiers (like PubChem names, common names, or drug names). This is particularly useful when working with well-known biochemical molecules:

from deeporigin.drug_discovery import Ligand

# Create ligands from common biochemical names
atp = Ligand.from_identifier(
    identifier="ATP",  # Adenosine triphosphate
    name="ATP"
)

serotonin = Ligand.from_identifier(
    identifier="serotonin",  # 5-hydroxytryptamine (5-HT)
    name="Serotonin"
)

The from_identifier constructor:

  • Accepts common chemical names and identifiers
  • Automatically resolves the identifier to a molecular structure
  • Creates a 3D conformation of the molecule
  • Particularly useful for well-known biochemical molecules like:
    • Nucleotides (ATP, ADP, GTP, etc.)
    • Neurotransmitters (serotonin, dopamine, etc.)
    • Drug molecules (by their generic names)
    • Common metabolites and cofactors

Identifier Resolution

The constructor will attempt to resolve the identifier using chemical databases. If the identifier cannot be resolved, it will raise an exception.

From an RDKit Mol object

If you're working with RDKit molecules directly, you can create a Ligand from an RDKit Mol object:

from deeporigin.drug_discovery import Ligand
from rdkit import Chem

# Create an RDKit molecule
mol = Chem.MolFromSmiles("CCO")  # Ethanol

# Convert to a Ligand
ligand = Ligand.from_rdkit_mol(
    mol=mol,
    name="Ethanol",  # Optional name for the ligand
    save_to_file=False  # Optional: whether to save the ligand to file
)

This is particularly useful when you're working with RDKit's molecular manipulation functions and want to convert the results into a Deep Origin Ligand object for further processing or visualization.

The method will:

  • Read the CSV file using pandas
  • Extract SMILES strings from the specified column
  • Create a Ligand instance for each valid SMILES
  • Store all other columns as properties in each Ligand instance
  • Skip any rows with empty or invalid SMILES strings

Error Handling

The method will raise: - FileNotFoundError if the CSV file does not exist - ValueError if the specified SMILES column is not found in the CSV file

From a CSV file

You can also create a LigandSet from a CSV file containing SMILES strings and optional properties:

from deeporigin.drug_discovery import LigandSet, DATA_DIR

ligands = LigandSet.from_csv(
    file_path=DATA_DIR / "ligands" / "ligands.csv",
    smiles_column="SMILES"  # Optional, defaults to "smiles"
)

Visualization

Jupyter notebook required

Visualizations such as these require this code to be run in a jupyter notebook. We recommend using these instructions to install Jupyter.

Browser support

These visualizations work best on Google Chrome. We are aware of issues on other browsers, especially Safari on macOS.

Ligands

A ligand object can be visualized using show:

from deeporigin.drug_discovery import Ligand

ligand = Ligand.from_identifier("serotonin")

ligand.show()

A visualization similar to the following will be shown:

LigandSets

A LigandSet can be visualized using two different methods. First, simply printing the LigandSet shows a table of ligands in the LigandSet:

from deeporigin.drug_discovery import LigandSet

smiles = {
    "C/C=C/Cn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
    "C=CCCn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
}

ligands = LigandSet.from_smiles(smiles)
ligands

Expected Output

To view 3D structures of all ligands in a LigandSet, use:

from deeporigin.drug_discovery import LigandSet, DATA_DIR

ligands = LigandSet.from_sdf(DATA_DIR / "ligands" / "ligands-brd-all.sdf")
ligands.show()

A visualization similar to this will be shown. Use the arrows to flip between ligands in the LigandSet.

Operations on Ligands

Ligand Minimization

You can minimize the 3D structure of a single ligand or all ligands in a LigandSet. Minimization optimizes the geometry of the molecule(s) using a force field, which is useful for preparing ligands for docking or other modeling tasks.

from deeporigin.drug_discovery import Ligand, BRD_DATA_DIR

ligand = Ligand.from_sdf(BRD_DATA_DIR / "brd-2.sdf")
ligand.minimize()  # Optimizes the 3D coordinates in place
from deeporigin.drug_discovery import LigandSet, DATA_DIR

ligands = LigandSet.from_sdf(DATA_DIR / "ligands" / "ligands-brd-all.sdf")
ligands.minimize()  # Optimizes all ligands in the set in place

This will call the minimize() method on each ligand in the set, updating their 3D coordinates. The method returns the LigandSet itself for convenience, so you can chain further operations if desired.

Constructing a network using Konnektor

To run RBFE, it is helpful to map out a network within the ligand set, so that we can run RBFE on those pairs of ligands. To do so, use:

# assuming ligands is a LigandSet
ligands.map_network().show_network()

maps the network and creates a visualization similar to:

Predicting ADMET Properties

ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties can be predicted for Ligands or LigandSets.

You can predict ADMET properties for a ligand using the admet_properties method:

# Predict ADMET properties
properties = ligand.admet_properties()

The method returns a dictionary containing various ADMET-related predictions:

{
    'smiles': 'Cn1c(=O)n(Cc2ccccc2)c(=O)c2c1nc(SCCO)n2Cc1ccccc1',
    'properties': {
        'logS': -4.004,  # Aqueous solubility
        'logP': 3.686,   # Partition coefficient
        'logD': 2.528,   # Distribution coefficient
        'hERG': {'probability': 0.264},  # hERG inhibition risk
        'ames': {'probability': 0.213}, # Ames mutagenicity
        'cyp': {     # Cytochrome P450 inhibition
            'probabilities': {
                'cyp1a2': 0.134,
                'cyp2c9': 0.744,
                'cyp2c19': 0.853,
                'cyp2d6': 0.0252,
                'cyp3a4': 0.4718
            }
        },
        'pains': {    # PAINS (Pan Assay Interference Compounds)
            'has_pains': None,
            'pains_fragments': []
        }
    }
}

The predicted properties are automatically stored in the ligand's properties dictionary and can be accessed later using the get_property method:

# Access a specific property
logP = ligand.get_property('logP')

Property Storage

All predicted properties are automatically stored in the ligand's properties dictionary and can be accessed at any time using the get_property method.

You can predict ADMET properties for all ligands in a LigandSet using the admet_properties method. This will call the prediction for each ligand and display a progress bar using tqdm:

from deeporigin.drug_discovery import LigandSet, DATA_DIR

ligands = LigandSet.from_csv(
    file_path=DATA_DIR / "ligands" / "ligands.csv",
    smiles_column="SMILES"
)

ligands.admet_properties()  

Each entry in results is a dictionary of ADMET properties for the corresponding ligand. The properties are also stored in each ligand's .properties attribute for later access.

To view ADMET properties of all ligands in the ligand set, simply view the ligandset as a dataframe using:

ligands

or, optionally, convert to a DataFrame for further analysis:

ligands.to_dataframe()

Most Common Substructure

The Most Common Substructure (MCS) for a LigandSet can be computed as follows:

from deeporigin.drug_discovery import LigandSet

BRD_SMILES = {
    "C/C=C/Cn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
    "C=CCCn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
    "C=CCn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
    "CCCCn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
    "CCCn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
    "CCn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
    "CN(C)C(=O)c1cccc(-c2cn(C)c(=O)c3[nH]ccc23)c1",
    "COCCn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
}

ligands = LigandSet.from_smiles(BRD_SMILES)
ligands.mcs()

Expected Output

'[#6]1=[#6]-[#6]=[#6]-[#6](=[#6]-1)-[#6]1=[#6]-[#7](-[#6])-[#6](-[#6]2=[#6]-1-[#6]=[#6]-[#7]-2)=[#8]'

Exporting ligands

To SDF files

To write a Ligand to a SDF file, use:

from deeporigin.drug_discovery import Ligand

ligand = Ligand.from_smiles("NCCc1c[nH]c2ccc(O)cc12")
ligand.to_sdf()

To write a LigandSet to a SDF file, use:

from deeporigin.drug_discovery import LigandSet

smiles = {
"C/C=C/Cn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
"C=CCCn1cc(-c2cccc(C(=O)N(C)C)c2)c2cc[nH]c2c1=O",
}

ligands = LigandSet.from_smiles(smiles)
ligands.to_sdf()

To mol files

To write a ligand to a mol file, use:

from deeporigin.drug_discovery import Ligand

ligand = Ligand.from_smiles("NCCc1c[nH]c2ccc(O)cc12")
ligand.to_mol()

To PDB files

To write a ligand to a PDB file, use:

from deeporigin.drug_discovery import Ligand

ligand = Ligand.from_smiles("NCCc1c[nH]c2ccc(O)cc12")
ligand.to_pdb()

To Pandas DataFrames

To convert a LigandSet to a Pandas DataFrame, use:

from deeporigin.drug_discovery import LigandSet, DATA_DIR

ligands = LigandSet.from_csv(
    file_path = DATA_DIR / "ligands" / "ligands.csv",
    smiles_column="SMILES"  # Optional, defaults to "smiles"
)
df = ligands.to_dataframe()

To CSV files

To write a LigandSet to a CSV file, use method chaining:

# we're using pandas' native to_csv method here

ligands.to_dataframe().to_csv("temp.csv")