Entities API¶

The Entities API provides access to ligand and protein records in the data platform.

from deeporigin.platform.client import DeepOriginClient

client = DeepOriginClient()

# Search ligands
results = client.entities.search_ligands(limit=10)

# Search proteins
results = client.entities.search_proteins(limit=10)

# Generic entity search
results = client.entities.search("ligands")

# Create a new ligand
ligand = client.entities.create_ligand(
    smiles="CCOc1ccc2nc(S(=O)(=O)N3CCN(CC3)C)c(N)c2c1",
    name="Compound-12345",
    formal_charge=0,
    hbond_donor_count=1,
    hbond_acceptor_count=6,
    rotatable_bond_count=5,
    tpsa=85.12,
    molecular_weight=447.5,
)

# Delete an entity
result = client.entities.delete(entity="proteins", entity_id="08BSPN61NYVE3")

# List public models
models = client.entities.list_models()

CLI¶

Fetch a single protein (same as client.entities.get_protein(id=...)); response JSON is printed to stdout:

deeporigin entities get-protein --id 08BSPN61NYVE3
deeporigin entities get-ligand --id LIG123

Use -i as a short flag for --id. Command names are singular (get-protein, get-ligand) to match get_protein / get_ligand and a single-ID lookup; results get-poses follows the same verb style.

src.platform.entities.Entities ¶

Data Platform entity API wrapper.

Provides access to ligand, protein, and generic entity endpoints through the DeepOriginClient.

Methods:¶

batch_create ¶

batch_create(
    entity: str,
    *,
    rows: list[dict[str, Any]],
    returning: list[str] | None = None
) -> dict

Batch create entity rows.

Calls POST /data-platform/{orgKey}/{entity}/batch/create.

This method is intentionally generic so tools can persist dataset rows into arbitrary target tables (e.g. result tables), not only the built-in convenience entities like ligands.

Parameters:

Name	Type	Description	Default
`entity`	`str`	Entity (table) name to batch-create.	required
`rows`	`list[dict[str, Any]]`	List of row dicts to persist.	required
`returning`	`list[str] \| None`	Optional list of fields to include in the response. For `ligands` and `proteins`, defaults to an INSERT-safe returning list when omitted (avoids the platform COPY ingest path).	`None`

Returns:

Type	Description
`dict`	Dictionary containing the batch creation response.

batch_create_ligands ¶

batch_create_ligands(*, rows: list[dict[str, Any]]) -> dict

Batch create ligands.

Each row should contain at minimum a smiles key. Optional keys match the fields accepted by :meth:create_ligand (e.g. name, formal_charge, molecular_weight, etc.). The platform will compute canonical_smiles, inchi, and other derived fields.

Parameters:

Name	Type	Description	Default
`rows`	`list[dict[str, Any]]`	List of dicts, each describing one ligand to create. Every dict must contain `smiles` (str). All other keys are optional and mirror the `set` payload of `create_ligand`.	required

Returns:

Type	Description
`dict`	Dictionary containing the batch creation response with a `data`
`dict`	list of created ligand records.

create_ligand ¶

create_ligand(
    *,
    smiles: str,
    project_id: str | None = None,
    name: str | None = None,
    mol_file: str | None = None,
    formal_charge: int = 0,
    hbond_donor_count: int | None = None,
    hbond_acceptor_count: int | None = None,
    rotatable_bond_count: int | None = None,
    tpsa: float | None = None,
    molecular_weight: float | None = None,
    variant_name_tag: str = ""
) -> dict

Create a new ligand.

Parameters:

Name	Type	Description	Default
`smiles`	`str`	SMILES string (required).	required
`project_id`	`str \| None`	Project ID for the ligand.	`None`
`name`	`str \| None`	Name of the ligand.	`None`
`mol_file`	`str \| None`	Path to the molecule file (e.g., SDF file) in remote storage.	`None`
`formal_charge`	`int`	Formal charge. Defaults to 0.	`0`
`hbond_donor_count`	`int \| None`	Number of hydrogen bond donors.	`None`
`hbond_acceptor_count`	`int \| None`	Number of hydrogen bond acceptors.	`None`
`rotatable_bond_count`	`int \| None`	Number of rotatable bonds.	`None`
`tpsa`	`float \| None`	Topological polar surface area.	`None`
`molecular_weight`	`float \| None`	Molecular weight.	`None`
`variant_name_tag`	`str`	Variant name tag. Defaults to empty string.	`''`

Returns:

Type	Description
`dict`	Dictionary containing the created ligand data.

create_protein ¶

create_protein(
    *,
    file_path: str,
    gene_symbol: str | None = None,
    pdb_id: str | None = None,
    fasta_sequence: str | None = None,
    protein_name: str | None = None,
    protein_length: int | None = None,
    project_id: str | None = None
) -> dict

Create a new protein.

Parameters:

Name	Type	Description	Default
`file_path`	`str`	Path to the protein file (required).	required
`gene_symbol`	`str \| None`	Gene symbol.	`None`
`pdb_id`	`str \| None`	PDB ID.	`None`
`fasta_sequence`	`str \| None`	FASTA sequence.	`None`
`protein_name`	`str \| None`	Protein name.	`None`
`protein_length`	`int \| None`	Protein length.	`None`
`project_id`	`str \| None`	Project ID for the protein.	`None`

Returns:

Type	Description
`dict`	Dictionary containing the created protein data.

delete ¶

delete(*, entity: str, entity_id: str) -> dict

Delete an entity by ID.

Parameters:

Name	Type	Description	Default
`entity`	`str`	The entity type (e.g., "ligands", "proteins").	required
`entity_id`	`str`	The ID of the entity to delete.	required

Returns:

Type	Description
`dict`	Dictionary containing the deletion result (e.g., `{"deleted": 1}`).

get ¶

get(*, entity: str, entity_id: str) -> dict

Get an entity by ID.

Parameters:

Name	Type	Description	Default
`entity`	`str`	The entity type (e.g., "ligands", "proteins").	required
`entity_id`	`str`	The ID of the entity to retrieve.	required

Returns:

Type	Description
`dict`	Dictionary containing the entity data.

get_ligand ¶

get_ligand(id: str) -> dict

Get a ligand by ID.

Parameters:

Name	Type	Description	Default
`id`	`str`	The ID of the ligand to retrieve.	required

Returns:

Type	Description
`dict`	Dictionary containing the ligand data.

get_ligands ¶

get_ligands(ids: list[str]) -> list[dict]

Get multiple ligands by their IDs in a single search request.

Sends one POST /ligands/search with an {"id": {"in": ids}} filter and limit = len(ids), bypassing :meth:search_ligands so no default pagination limit applies. The platform translates the id filter to the canonical-id column and decodes the string IDs to buffers server-side (see coerceCanonicalIdInFilter in data-platform-service).

The filter includes deleted: false for parity with :meth:search. Non-current versions may still be excluded server-side depending on platform rules. Callers that need soft-deleted or historical rows should fall back to per-ID :meth:get_ligand calls.

Parameters:

Name	Type	Description	Default
`ids`	`list[str]`	List of ligand IDs to retrieve.	required

Returns:

Type	Description
`list[dict]`	List of dictionaries for the matching ligands. Missing IDs are
`list[dict]`	omitted; callers should diff returned IDs against `ids` when
`list[dict]`	completeness matters.

get_protein ¶

get_protein(id: str) -> dict

Get a protein by ID.

Parameters:

Name	Type	Description	Default
`id`	`str`	The ID of the protein to retrieve.	required

Returns:

Type	Description
`dict`	Dictionary containing the protein data.

list_models ¶

list_models() -> dict

List public models.

The result is cached per instance.

Returns:

Type	Description
`dict`	Dictionary containing the list of models.

search ¶

search(
    entity: str,
    *,
    cursor: str | None = None,
    filter_dict: dict[str, Any] | None = None,
    limit: int | None = None,
    offset: int | None = None,
    select: list[str] | None = None,
    sort: dict[str, str] | None = None
) -> dict

Search an entity (table).

Parameters:

Name	Type	Description	Default
`entity`	`str`	Entity (table) name to search (e.g., "ligands").	required
`cursor`	`str \| None`	Cursor for pagination.	`None`
`filter_dict`	`dict[str, Any] \| None`	Additional filter criteria as a dictionary.	`None`
`limit`	`int \| None`	Maximum number of results to return. Defaults to 100.	`None`
`offset`	`int \| None`	Number of results to skip.	`None`
`select`	`list[str] \| None`	List of fields to select in the response.	`None`
`sort`	`dict[str, str] \| None`	Dictionary mapping field names to sort order ("asc" or "desc").	`None`

Returns:

Type	Description
`dict`	Dictionary containing the search results.

Raises:

Type	Description
`ValueError`	If the entity is not a valid table name.

search_ligands ¶

search_ligands(
    *,
    filter_dict: dict[str, Any] | None = None,
    smiles: str | None = None,
    smiles_list: list[str] | None = None,
    canonical_smiles: str | None = None,
    min_molecular_weight: float | int | None = None,
    max_molecular_weight: float | int | None = None,
    limit: int | None = 100,
    offset: int | None = None,
    select: list[str] | None = None,
    sort: dict[str, str] | None = None
) -> dict

Search ligands entity with automatic cursor-based pagination.

Convenience method that calls search(entity="ligands") and iterates through all pages using cursor-based pagination, returning all matching records in a single response.

Parameters:

Name	Type	Description	Default
`filter_dict`	`dict[str, Any] \| None`	Additional filter criteria as a dictionary.	`None`
`smiles`	`str \| None`	Filter by a single SMILES string (exact match).	`None`
`smiles_list`	`list[str] \| None`	Filter by multiple SMILES strings. Uses an "in" filter on canonical_smiles. Mutually exclusive with smiles and canonical_smiles.	`None`
`canonical_smiles`	`str \| None`	Filter by canonical SMILES string.	`None`
`min_molecular_weight`	`float \| int \| None`	Minimum molecular weight filter (inclusive).	`None`
`max_molecular_weight`	`float \| int \| None`	Maximum molecular weight filter (inclusive).	`None`
`limit`	`int \| None`	Maximum total number of results to return across all pages.	`100`
`offset`	`int \| None`	Number of results to skip.	`None`
`select`	`list[str] \| None`	List of fields to select in the response.	`None`
`sort`	`dict[str, str] \| None`	Dictionary mapping field names to sort order ("asc" or "desc").	`None`

Returns:

Type	Description
`dict`	Dictionary containing all search results across pages, with `data`
`dict`	holding the full list of records and `meta` from the final response.

Raises:

Type	Description
`ValueError`	If smiles_list is used together with smiles or canonical_smiles, or if ligands is not a valid table name.

search_proteins ¶

search_proteins(
    *,
    cursor: str | None = None,
    pdb_id: str | None = None,
    file_path: str | None = None,
    project_id: str | None = None,
    min_molecular_weight: float | int | None = None,
    max_molecular_weight: float | int | None = None,
    sequence: str | None = None,
    limit: int | None = 100,
    offset: int | None = None,
    select: list[str] | None = None,
    sort: dict[str, str] | None = None
) -> dict

Search proteins entity.

Convenience method that calls search(entity="proteins").

Parameters:

Name	Type	Description	Default
`cursor`	`str \| None`	Cursor for pagination.	`None`
`pdb_id`	`str \| None`	Filter by PDB ID.	`None`
`file_path`	`str \| None`	Filter by file path.	`None`
`project_id`	`str \| None`	Filter by data platform project id.	`None`
`min_molecular_weight`	`float \| int \| None`	Minimum molecular weight filter (inclusive).	`None`
`max_molecular_weight`	`float \| int \| None`	Maximum molecular weight filter (inclusive).	`None`
`sequence`	`str \| None`	Filter by FASTA sequence (exact match).	`None`
`limit`	`int \| None`	Maximum number of results to return. Defaults to 100.	`100`
`offset`	`int \| None`	Number of results to skip.	`None`
`select`	`list[str] \| None`	List of fields to select in the response.	`None`
`sort`	`dict[str, str] \| None`	Dictionary mapping field names to sort order ("asc" or "desc").	`None`

Returns:

Type	Description
`dict`	Dictionary containing the search results.

Raises:

Type	Description
`ValueError`	If proteins is not a valid table name (should not happen).