Skip to content

Work with proteins

This document describes how to work with proteins and use them in Deep Origin tools.

The Protein class

The Protein class is the primary way to work with proteins in Deep Origin.

Constructing a protein

From a file

A protein can be constructed from a file:

from deeporigin.chemistry import Protein
protein = Protein("/path/to/pdb")

From a PDB ID

A protein can also be constructed from a PDB ID:

from deeporigin.chemistry import Protein
protein = Protein(pdb_id="1EBY")

Visualizing a protein

Browser support

These visualizations work best on Google Chrome. We are aware of issues on other browsers, especially Safari on macOS.

A protein object can be visualized using show:

protein.show()

A visualization such as this will be shown:

Jupyter notebook required

Visualizations such as these require this code to be run in a jupyter notebook. We recommend using these instructions to install Jupyter.

Modifying proteins

Removing specific residues

You can remove specific residue names from a protein structure using the remove_resnames method:

# Remove water molecules (HOH) and ions (NA, CL)
protein.remove_resnames(exclude_resnames=["HOH", "NA", "CL"])

Residue name format

Residue names in PDB files are always uppercase (e.g., "HOH" for water, "NA" for sodium, "CL" for chloride).

This method modifies the protein structure in place by removing the specified residue names. If no residue names are provided, the protein structure remains unchanged.

Removing HETATM records

You can remove HETATM records from the protein structure using the remove_hetatm method:

# Remove all HETATM records except water and zinc
protein.remove_hetatm(keep_resnames=["HOH"], remove_metals=["ZN"])

This method modifies the protein structure in place by removing HETATM records (heteroatoms) from the structure. You can: - Keep specific residues by providing their names in keep_resnames - Keep specific metals by providing their names in remove_metals (these metals will be excluded from removal) - If no arguments are provided, all HETATM records will be removed

Metal names

Metal names should be provided in uppercase (e.g., "ZN" for zinc, "FE" for iron).

Removing water molecules

You can remove all water molecules from a protein structure using the remove_water method:

protein.remove_water()

This method modifies the protein structure in place by removing all water molecules (residue name "HOH"). Unlike remove_resnames, this method does not return a new protein structure but instead modifies the existing one.