Utility functions¶
How to use this reference
This page contains information about each class and function in this module. This is meant as a detailed reference for this module. If you're looking an introduction, we recommend reviewing the How to section.
This module contains utility functions used by tool execution. In general, you will not need to use many of these functions directly.
add_provider_if_databaseid_found
¶
add_provider_if_databaseid_found(data)
Recursively traverse a data structure of nested dictionaries/lists. If a dict contains the key 'databaseId', add a peer key '$provider' = 'datahub-cell'. Return the modified data structure.
get_job_dataframe
¶
get_job_dataframe(update: bool = False) -> Any
returns a dataframe of all jobs and statuses, reading from local cache
Parameters:
Name | Type | Description | Default |
---|---|---|---|
update
|
bool
|
Whether to check for updates on non-terminal jobs. Defaults to False. |
False
|
Note that this function is deliberately not annotated with an output type because pandas is imported internally to this funciton.
Returns:
Type | Description |
---|---|
Any
|
pd.DataFrame: A dataframe containing job information |
make_payload
¶
make_payload(
*,
inputs: dict,
outputs: dict,
cluster_id: Optional[str] = None,
cols: Optional[list] = None
) -> dict
helper function to create payload for tool execution. This helper function is used by all wrapper functions in the run module to create the payload.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs
|
dict
|
inputs |
required |
outputs
|
dict
|
outputs |
required |
cluster_id
|
Optional[str]
|
cluster ID. Defaults to None. If not provided, the default cluster (us-west-2) is used. |
None
|
cols
|
Optional[list]
|
(Optional[list], optional): list of columns. Defaults to None. If provided, column names (in inputs or outputs) are converted to column IDs. |
None
|
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
correctly formatted payload, ready to be passed to execute_tool |
query_run_status
¶
query_run_status(execution_id: str) -> str
Determine the status of a run, identified by execution_id ID
Parameters:
Name | Type | Description | Default |
---|---|---|---|
execution_id
|
str
|
execution_id ID |
required |
Returns:
Type | Description |
---|---|
str
|
One of "Created", "Queued", "Running", "Succeeded", or "Failed" |
query_run_statuses
¶
query_run_statuses(job_ids: list[str]) -> dict
get statuses for multiple jobs in parallel
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job_ids
|
list[str]
|
list of job IDs |
required |
run_tool
¶
run_tool(*, data: dict, tool_key: str)
run any tool using provided data transfer object (DTO)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
dict
|
data transfer object. This is typically generated by the |
required |
wait_for_job
¶
wait_for_job(
execution_id: str, *, poll_interval: int = 4
) -> None
Repeatedly poll Deep Origin for the job status, till the status is "Succeeded" or "Failed (a terminal state)
This function is useful for blocking execution of your code till a specific task is complete.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
execution_id
|
str
|
execution_id ID. This is typically printed to screen and returned when a job is initialized. |
required |
poll_interval
|
int
|
number of seconds to wait between polling. Defaults to 4. |
4
|
wait_for_jobs
¶
wait_for_jobs(
refresh_time: int = 3, hide_succeeded: bool = True
) -> Any
Wait for all jobs started via this client to complete
Parameters:
Name | Type | Description | Default |
---|---|---|---|
refresh_time
|
int
|
number of seconds to wait between polling. Defaults to 3. |
3
|
hide_succeeded
|
bool
|
whether to hide jobs that have already completed. Defaults to True. |
True
|
Note that this function signature is explicitly not annotated with a return type to avoid importing pandas outside this function
Returns:
Type | Description |
---|---|
Any
|
pd.DataFrame: dataframe of all jobs. |