Structure
Add SDF to PDB
Adds a small molecule to a protein structure. The docking between the molecule and the protein must be calculated beforehand, e.g., with the Diff Dock node. For the combined structure, the binding affinity can be calculated with Protein Ligand Binding Affinity.
Input:
- Molecule Structure: The SDF format of the molecule to add to the protein structure.
- Protein Structure: The PDB file containing a protein structure to which the molecule will be added.
Output:
- Combined Structure: The PDB file containing the given protein structure and small molecule.
Alpha Fold
This node is currently under construction and therefore not available at the moment. Please refer to Omega Fold as an alternative or use Nanobody Builder 2 if your sequence belongs to a nanobody.
Computes a protein tertiary structure using AlphaFold2.
Input: Fasta file containing a protein sequence for which the structure is to be predicted.
Input Parameters:
- Max template date: Cutoff date for which structures from the Alpha Fold databases to include in the structural predictions. Can be useful for repeating older analyses.
- Models to relax: Determines if the node will apply a relaxation step using molecular dynamics to either only the best predicted structure (structure 0), all predicted structures, or no predicted structure. A relaxation step can help refine the structure prediction of Alpha Fold by allowing for the optimization of bond lengths and other parameters, potentially generating a more accurate and stable structure.
Output: Predictions of all five Alpha Fold models as PDB files, ranked from best to worst prediction with the first prediction being the best.
Reference:
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. DOI: 10.1038/s41586-021-03819-2.
Alpha Fold Multimer
This node is currently under construction and therefore not available at the moment.
Predicts protein complex structures from multiple fasta sequences. Each fasta sequence in the supplied fasta file is assumed to be one chain of a multichain protein complex.
Input: Multifasta file to predict the structure of. Each fasta sequence in the supplied fasta file is assumed to be one chain of a multichain protein complex.
Input Parameters:
- Number of models to relax: Models to relax. Options are all, none, and best.
- max_template_date: Maximum age of the most recent template files to use. Useful for replicating older alphafold analyses on less recent databases. For an up-to-date database prediction, use the current data in the format yyyy-mm-dd.
Output:
- Prediction1 to Prediction5: Separate structure outputs, as AlphaFold is comprised of five separate models, with each giving their own prediction, ordered by model confidence.
Diff Dock
Molecular docking using the Diff Dock-L diffusion machine learning model. Can dock small molecule ligands given in an SDF file into a given protein structure. The resulting docked ligand can be bound to the protein with the Add SDF to PDB node.
Input:
- Receptor: The protein structure to dock the ligand to as loaded PDB structure.
- Ligand: The small molecule to dock as loaded SDF structure (ChemicalStructure).
Output:
- Docked Ligand: Ligand in predicted docked pose (ChemicalStructure, SDF file).
- Confidence Score: Confidence score of the prediction as float.
Reference:
Corso, G., Deng, A., Fry, B., Polizzi, N., Barzilay, R., & Jaakkola, T. (2024). Deep Confident Steps to New Pockets: Strategies for Docking Generalization. _arXiv preprint arXiv:_2402.18396. DOI: https://doi.org/10.48550/arXiv.2402.18396.
Molscrub Ligand Preparation
This tool streamlines the preparation of small molecules (ligands) for molecular docking and molecular dynamics simulations using molscrub.py. It takes an input SDF file and performs a series of cleaning and optimization steps to ensure the ligand structure is suitable for downstream computational workflows.
The preparation process can involve several advanced chemical transformations and optimizations: - pH Correction/Acid-Base Enumeration: Adjusting the protonation states of the ligand based on a target pH. - Tautomer Enumeration: Generating different structural isomers (tautomers) that are in equilibrium, focusing on low-energy states. - Ring Fixes: Correcting distorted 6-membered rings and enumerating different chair conformations (for cyclohexane-like rings). - 3D Coordinate Generation and Optimization: Creating or refining the 3D atomic coordinates of the ligand and performing energy minimization using a chosen force field to find stable conformations.
Reference:
Forli, S., & The Forli Laboratory. (2024). Molscrub: a tool for cleaning molecular files (version 0.1.1). Retrieved from https://github.com/forlilab/molscrub
Input
- Input Ligand: The ligand structure file, typically in SDF format, that you wish to clean and prepare.
Input Parameters
- Protonation pH: A numerical value (float) representing the target pH for any acid/base transformations. This ensures the ligand's protonation state is appropriate for the simulated environment. (Range: 0.0 to 14.0, Default: 7.4)
- Output Filename: A text field to specify the desired name for the output prepared SDF file. If left empty, a descriptive name will be automatically generated.
Expert Parameters (available via the menu icon in the node)
- Low pH limit: Sets the lower end of the pH range for enumerating acid/base conjugates. This parameter supersedes the single --ph option if used. (Range: 0.0 to 14.0, Default: Not used)
- High pH limit: High pH limit: (Float) Sets the upper end of the pH range for enumerating acid/base conjugates. This parameter supersedes the single --ph option if used. (Range: 0.0 to 14.0, Default: Not used)
- Skip acid/base enumeration: If enabled, the generation of acid/base conjugates (different protonation states) will be disabled. (Default: False)
- Skip tautomer enumeration: If enabled, the generation of tautomers (different arrangements of hydrogens and double bonds) will be skipped. (Default: False)
- Skip six-membered ring fixes: If enabled, corrections for distorted six-membered rings (e.g., converting boats to chairs) will not be performed. (Default: False)
- Skip 3D coordinate generation: If enabled, the generation of 3D atomic coordinates and conformers will be skipped (this also disables ring fixes). (Default: False)
- ETKDG random seed: Sets a specific random seed for the ETKDG conformer generation algorithm. Using a consistent seed ensures reproducibility of conformer generation across runs. A value of -1 uses a truly random seed. (Default: -1)
- Force field type: Selects the force field to be used for optimizing the 3D geometry of the ligand. Options include: uff, mmff94, mmff94s, espaloma. (Default: mmff94)
- Max force field iterations: Sets the maximum number of energy minimization steps during the 3D optimization process. A higher number allows for a more thorough search for the lowest energy conformation. (Minimum: 1, Default: 200)
- Energy threshold for conformer distinction: Specifies an energy cutoff (in kcal/mol). Two conformers are considered distinct only if their energy difference is greater than this value. This helps in filtering out very similar, high-energy conformers. (Minimum: 0.0 kcal/mol, Default: 1.0 kcal/mol)
- Minimize ring conformers: If enabled, force-field energy minimization will be used specifically to determine the lowest-energy conformation for six-membered rings. (Default: False)
- Template molecule file: Provides the path to a template molecule file (.sdf) that can be used for constrained 3D embedding. This allows the tool to generate conformers that match a known reference geometry. (Default: Not used)
- Template SMARTS pattern: A SMARTS pattern (a chemical language for defining substructures) that matches corresponding atoms in both the template and query molecules for constrained 3D embedding. (Default: Not used)
Output
- Prepared Ligand File: The resulting ligand structure file in SDF format, after all cleaning, protonation, tautomerization, and 3D optimization steps have been applied. This file is ready for use in subsequent molecular docking or molecular dynamics simulations.
Omega Fold
Computes a protein tertiary structure de novo (with no needed templates and multiple sequence alignment) using Omega Fold. This is a lot faster than computing the structure using Alpha Fold while being slightly less accurate.
Input: - Protein Sequence: Protein sequence to predict the structure for.
Input Parameters: - Subbatch size: To use less VRAM. The subbatch size determines how much of the structure is computed in one computational batch. Larger batches can cause the computation of the protein structure to fail, if the protein sequence is too large. Set -1 to use the number of residues in the sequence and compute everything in one batch.
Output: Predicted protein structure.
Reference:
Wu, R., Ding, F., Wang, R., Shen, R., Zhang, X., Luo, S., ... & Peng, J. (2022). High-resolution de novo structure prediction from primary sequence. BioRxiv, 2022-07. DOI: https://doi.org/10.1101/2022.07.21.500999.
Output: Reverse complement of the input sequence as fasta file
OpenMM Energy Minimization
This tool optimizes the 3D structure of a protein (or other biomolecule) by performing "energy minimization" using the OpenMM library. In essence, it takes an initial structure and adjusts the positions of its atoms to find a more stable, lower-energy arrangement. This process is crucial for preparing structures for subsequent simulations like molecular docking, as it removes clashes and ensures the molecule is in a realistic, relaxed state. The node first adds any missing hydrogen atoms, then applies a chosen "force field" (a set of rules governing atomic interactions) to guide the structure towards its most stable conformation.
Reference:
Eastman, P., Swails, J., Chodera, J. D., McGibbon, R. T., Zhao, Y., Beauchamp, K. A., Wang, L. P., Simmonett, A. C., Harrigan, M. P., Stern, C. D., Wiewiora, R. P., Brooks, B. R., & Pande, V. S. (2017). OpenMM 7: Rapid Development of High Performance Algorithms for Molecular Dynamics. PLOS Computational Biology, 13(7), e1005659.https://doi.org/10.1371/journal.pcbi.1005659
Input
- Input PDB: The PDB file (.pdb) containing the 3D structure of the molecule you wish to energy minimize.
Input Parameters
- Constrain Minimization: A toggle (on/off) option. If enabled, certain parts of the molecule can be held fixed or restrained during the minimization process. When this is 'on', the "Constraint Selection" and "Constraint Stiffness" parameters in the Expert Parameters section become active. (Default: Off)
- Output Filename: A text field to specify the name of the resulting minimized PDB file. If left empty, a name will be automatically generated, typically by appending "_minimized" to the original input file name.
Expert Parameters (available via the menu icon in the node)
This node provides access to advanced parameters that give you fine-grained control over the energy minimization process. These settings allow you to define the physical environment and algorithmic details of the simulation.
- Force Field: A "force field" is a collection of mathematical equations and parameters that describe the potential energy of a molecular system based on the positions of its atoms. Choosing the appropriate force field is critical as it determines how the interactions between atoms are calculated, which in turn influences the resulting minimized structure. Common options include various Amber force fields suitable for proteins and nucleic acids. (Default: amber14/protein.ff14SB.xml)
- Water and Solvent Model: This parameter defines how the surrounding environment (solvent, typically water) is represented. You can choose between "explicit" water models (where individual water molecules are simulated) or "implicit" solvent models (which approximate the solvent's effect without including every water molecule, potentially speeding up simulations). (Default: implicit/obc2.xml)
- Non-bonded Method: This setting dictates how interactions between atoms that are not directly connected by a bond (non-bonded interactions) are calculated. Methods like "PME" (Particle Mesh Ewald) are commonly used for accurate calculations in periodic systems (simulating an infinite bulk environment). (Default: NoCutoff)
- Integrator: This specifies the mathematical algorithm used to update the positions and velocities of atoms during the simulation. While minimization doesn't involve "movement" in the traditional sense, these integrators are part of the underlying numerical methods. (Default: Langevin)
- Temperature (K): The temperature of the system in Kelvin. While energy minimization typically aims for 0K (absolute minimum), this parameter can be used to set an initial temperature for the system or affect certain integration algorithms. (Default: 300.0 K)
- Max Iterations: The maximum number of steps the minimization algorithm will take to find a stable structure. If set to 0, the minimization will continue until the Energy Tolerance is met, without a step limit. (Default: 0)
- Energy Tolerance (kJ/mol): This value defines how stable the structure needs to be for the minimization to stop. The process will halt when the change in potential energy between successive steps falls below this specified value. A lower tolerance means a more precise (and potentially longer) minimization. (Default: 2.0 kJ/mol)
- Constraint Stiffness (kJ/mol/nm^2): If "Constrain Minimization" is enabled, this value determines the "strength" of the restraint applied to the selected atoms. A higher stiffness means the atoms are held more rigidly in place. (Default: 1000.0 kJ/mol/nm^2)
- Constraint Selection: If enabled, this parameter specifies which group of atoms in the molecule will be restrained. Options include restraining the main chain ("backbone"), all non-hydrogen atoms ("heavy_atoms"), or specific parts of the protein. (Default: backbone)
Output
- Minimized PDB: The 3D structure of your molecule after energy minimization, provided as a PDB file. This output represents a more stable and realistic conformation of your molecule, suitable for further computational experiments.
PDBFixer
This tool is designed to "fix" common issues found in protein data bank (PDB) files, which are 3D structural descriptions of biological molecules. PDB files, especially those obtained from experimental sources, often have missing atoms or residues (building blocks of proteins), or require standardization before they can be reliably used in computational simulations.
This node uses the PDBFixer library to address these problems. It can add missing parts, replace unusual (nonstandard) residues with their common equivalents, and ensure the structure is complete and consistent. It can also add a water box around your protein and introduce ions to mimic physiological conditions. After fixing, it uses pdb-tools to renumber residues and atoms sequentially, which can be important for downstream analysis and compatibility with other software. The goal is to provide a clean, complete, and standardized protein structure ready for further processing.
References:
Eastman, P., Swails, J., Chodera, J. D., McGibbon, R. T., Zhao, Y., Beauchamp, K. A., Wang, L. P., Simmonett, A. C., Harrigan, M. P., Stern, C. D., Wiewiora, B. R., Brooks, B. R., & Pande, V. S. (2017). OpenMM 7: Rapid Development of High Performance Algorithms for Molecular Dynamics. PLOS Computational Biology, 13(7), e1005659.https://doi.org/10.1371/journal.pcbi.1005659 Rodrigues, J. P. G. L. M., Teixeira, J. M. C., Trellet, M., & Bonvin, A. M. J. J. (2018). pdb-tools: a swiss army knife for molecular structures. F1000Research, 7, 1961.https://doi.org/10.12688/f1000research.17456.1
Input
- Input PDB: The PDB file (.pdb) of the protein structure that you want to clean up and fix.
Input Parameters
- Add Missing Atoms: A dropdown menu to choose which missing atoms should be added to the PDB file.
- none: No missing atoms are added.
- heavy: Only non-hydrogen atoms (e.g., carbon, oxygen, nitrogen) are added. This is often a safe default as hydrogen positions can be ambiguous. (Default)
- hydrogen: Only missing hydrogen atoms are added.
- all: All missing atoms, both heavy and hydrogen, are added.
- Add Missing Residues: A toggle (on/off) option. If enabled, the tool will attempt to add any missing protein residues to complete the chain. Caution: This can sometimes lead to long, flexible, and potentially unrealistic loops in the structure. It is highly recommended to visually inspect the output structure if you use this option. (Default: Off)
- Replace Nonstandard Residues: A toggle (on/off) option. If enabled, the tool will identify and replace any non-standard or modified amino acid residues with their closest standard equivalents. This is useful for ensuring compatibility with other tools that expect standard protein building blocks. (Default: Off)
- Keep Heterogens: A dropdown menu to specify which non-protein molecules (heterogens, like ligands, cofactors, or water) should be kept in the fixed PDB file.
- all: Keep all heterogen molecules present in the original file.
- water: Only keep water molecules.
- none: Remove all heterogen molecules. (Default)
- Output File Name: A text field to specify the name of the new, fixed PDB file. If left empty, a name will be automatically generated, typically by appending "_fixed" to the original file name.
Expert Parameters (available via the menu icon in the node)
This node provides access to advanced parameters for both the PDBFixer and pdb-tools utilities, allowing for more specific control over the fixing process and system setup.
- PDBFixer Parameters:
- --water-box: Add a rectangular box of water molecules around the protein. You need to provide three numbers representing the dimensions (X, Y, Z) of the box in nanometers (nm). For example, 2.5 2.4 3.0.
- --positive-ion: Specify the type of positive ion (e.g., "Na+", "K+") to add to the water box to neutralize the system's charge. (Default: Na+)
- --negative-ion: Specify the type of negative ion (e.g., "Cl-", "Br-") to add to the water box to neutralize the system's charge. (Default: Cl-)
- --ionic-strength: Set the molar concentration of ions to add to the water box. This controls the salinity of the environment. (Default: 0.0)
- --ph: Set the pH value to use for adding missing hydrogen atoms. This affects the protonation state of certain residues. (Default: 7.0)
- pdb-tools Parameters:
- Renumber Residues: If enabled, all residues in the PDB file will be renumbered sequentially, starting from 1. If disabled (default), residues will be renumbered starting from the first residue number found in the input PDB file, preserving relative numbering but ensuring sequential order if there were gaps.
Output:
- Output PDB: The cleaned, fixed, and sequentially renumbered PDB file. This output is a complete protein structure that is more suitable for further computational analyses and simulations.
Protonate PDB
This tool prepares protein structures by adjusting their "protonation states" and adding missing hydrogen atoms based on a specified pH value. In biological systems, the charge and reactivity of amino acid residues (the building blocks of proteins) can change depending on the acidity or alkalinity (pH) of their environment. Accurately modeling these protonation states is critical for many computational simulations, such as molecular docking or molecular dynamics.
This node uses PROPKA to predict the pKa values (a measure of acidity) of titratable residues (like Aspartic acid, Glutamic acid, Histidine, Lysine, Arginine) and then employs PDB2PQR to add hydrogens and assign atomic charges according to the chosen pH and a specific "force field" (a set of parameters defining atomic interactions). This ensures your protein structure is chemically realistic for downstream simulations.
References:
Dolinsky, T. J., Czodrowski, P., Li, H., Nielsen, J. E., Jensen, J. H., Klebe, G., & Baker, N. A. (2007). PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Research, 35(suppl_2), W522–W525.https://doi.org/10.1093/nar/gkm276 Jurrus, E., Engel, D., Star, K., Monson, K., Brandi, J., Felberg, L. E., Brookes, D. H., Wilson, L., Chen, J., Liles, K., Chun, M., Li, P., Gohara, D. W., Dolinsky, T., Konecny, R., Koes, D. R., Nielsen, J. E., Head-Gordon, T., Geng, W., Krasny, R., Wei, G. W., Holst, M. J., McCammon, J. A., & Baker, N. A. (2018). Improvements to the APBS biomolecular solvation software suite. Protein Science, 27(1), 112–128.https://doi.org/10.1002/pro.3280 Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M., & Jensen, J. H. (2011). PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pKa Predictions. Journal of Chemical Theory and Computation, 7(2), 525–537.https://doi.org/10.1021/ct100578z
Input
- Input PDB: The PDB file (.pdb) of the protein structure you wish to protonate.
Input Parameters:
- pH Value: A numerical value (float) representing the target pH for adjusting the protein's protonation states. The tool will protonate or deprotonate residues to match their expected state at this pH. (Range: 0.0 to 14.0, Default: 7.0)
- Force Field: A dropdown menu to select the force field that will be used. This choice influences how atomic charges and atom types are assigned during the protonation process, affecting how the protein interacts in subsequent simulations. Options include: AMBER, CHARMM, PARSE, TYL06, PEOEPB, SWANSON. (Default: AMBER)
- Output File Name: A text field to specify the desired name for the output protonated PDB file. If left empty, a descriptive name will be automatically generated, typically including the original filename and the target pH (e.g., myprotein_protonated_pH7.0.pdb).
Expert Parameters: (available via the menu icon in the node) - --drop-water: If enabled, any water molecules present in the input PDB file will be removed before the protonation process begins. (Default: False)
Output
- Output PDB: A new PDB file (.pdb) containing your protein structure with all hydrogen atoms added and the protonation states of its titratable residues adjusted to the specified pH. This file is ready for use in molecular simulations.
Select PDB Chain
This tool allows you to isolate and extract a specific protein chain (or multiple chains) from a larger PDB (Protein Data Bank) file. PDB files often contain multiple protein chains, or even complexes of proteins and other molecules. For many bioinformatics tasks, such as molecular docking or structural analysis, you might only need to work with a single or few of the protein chains. This node automates the process of identifying and saving only the chain(s) you are interested in, providing a clean PDB file ready for further steps.
You can choose to extract a chain by providing its specific ID (e.g., 'A, B'), or you can instruct the tool to automatically identify and extract either the largest or the smallest protein chain present in the input file based on residue count.
References:
Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., & de Hoon, M. J. L. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422–1423.https://doi.org/10.1093/bioinformatics/btp163 Rodrigues, J. P. G. L. M., Teixeira, J. M. C., Trellet, M., & Bonvin, A. M. J. J. (2018). pdb-tools: a swiss army knife for molecular structures. F1000Research, 7, 1961.https://doi.org/10.12688/f1000research.17456.1
Input
- PDB File: The input PDB file (.pdb) from which you want to extract a specific protein chain.
Input Parameters
- Chain Selection:
- User Defined: Select this option if you know the exact ID of the chain you want to extract. You will then need to provide the PDB Chain ID.
- Largest: Automatically identifies and extracts the protein chain with the highest number of residues.
- Smallest: Automatically identifies and extracts the protein chain with the fewest number of residues.
- PDB Chain ID: A text field where you type the single letter (in all caps, e.g., 'A', 'B', 'L') corresponding to the specific chain you wish to extract. If you want multiple chains, you need to separate with a comma. E.g. 'A, E, G'. This parameter is only used if "User Defined" is selected for Chain Selection. If "Largest" or "Smallest" is chosen, this field will be ignored.
- Output File Name: A text field to specify the name of the output PDB file containing only the extracted chain. If left empty, a descriptive name will be generated automatically, typically including the original filename and the extracted chain ID (e.g., "myprotein_chain_A.pdb").
Output
- Extracted PDB File: A new PDB file (.pdb) containing only the protein chain that was selected and extracted from the input file. This file is suitable for use in subsequent steps that require a single protein chain.
Smiles To Structure
Converts a SMILES (Simplified molecular-input line-entry system) string, given as text, into an SDF format structure. The SDF file format stores three-dimensional structural data of molecules like drugs or metabolites.
Input:
- SMILES: Smiles string of a chemical compound to convert to a SDF structure.
Input Parameters: - Optimize Structure: Option that determines if xyna.bio should optimize the three-dimensional geometry of the molecule using molecular mechanics. This can be useful to obtain a more accurate three-dimensional structure of the molecule.
Output: Generated Chemical Structure as SDF file.