Biology

GC Content

Calculates the ratio of the bases guanine (G) and cytosine (C) out of all four possible bases (adenine, guanine, cytosine, thymine (or uracil in RNA)) in a given DNA sequence.

The GC content of a sequence can be important to know for the planning of sequencing experiments, PCR reactions, and many other experiments that might be sensitive to DNA with an increased melting temperature, as the bases G and C increase a nucleotide’s melting temperature.

Input:

Single Sequence Fasta: The fasta sequence to calculate the guanine-cytosine content for. The fasta file must have only a single DNA sequence entry.

Output:

GC content: The percentage (as float) of nitrogenous bases in a DNA or RNA sequence that are either guanine (G) or cytosine (C).

Molecular Weight of DNA

Calculates the molecular weight of a DNA sequence in Dalton.

The weight of a DNA sequence differs depending on if the DNA is assumed to be single or double-stranded, circular, or linear and depending on what distribution of isotopes is assumed for the atoms in the DNA.

Input:

Single Sequence Fasta: Fasta file containing a nucleic acid or amino acid sequence to calculate the molecular weight of. The fasta file must have only a single entry.

Input Parameters: Characteristics of the sequence can be ticked off if they apply (Double Stranded, Circular, Monoisotropic). A monoisotropic DNA sequence is assumed to only contain the most abundant naturally occurring stable isotope for each type of atom.

Output: The molecular weight of the given DNA sequence in Dalton (float).

Protein Instability Index

Computes the protein instability index based upon the observed frequency of dipeptides in different stable/unstable proteins. The larger the value, the more instable the protein.

It is a heuristic for the stability of protein sequences given the observed differences in dipeptide frequency between stable and unstable proteins. Values > 40 indicate short half-life in vivo of the protein. Conversely, values below this threshold indicate relative stability.

Input:

Protein: Protein sequence (fasta) or structure (pdb) to calculate the instability index of.

Output:

Protein Instability Index: Describes how unstable a protein is. A value above 40 means the protein has a short half life.

Reference:

Guruprasad, K., Reddy, B. B., & Pandit, M. W. (1990). Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engineering, Design and Selection, 4(2), 155-161. DOI: https://doi.org/10.1093/protein/4.2.155

Protein Ligand Binding Affinity

Calculates the protein ligand binding affinity of a given PDB structure with a docked ligand.

The binding affinity of a protein and a ligand is a measure for how well a given protein structure can bind a ligand. This can be important to judge how well a ligand is able to bind into a structure, for example, to find the most potent inhibitor of an enzyme.

The affinity is calculated with Prodigy-Lig, a structure-based method for the prediction of binding affinity in protein-small ligand complexes. The tool uses Gibbs free energy in kcal/mol to measure the binding affinity. The lower (more negative) the value, the stronger the binding.

Input:

PDB structure: Loaded PDB file containing the structure of a protein.

Requirements: The file must contain exactly one ligand, which needs to be chain H with the name UNL. This is automatically set using the Add SDF to PDB node after docking. Otherwise, it needs to be set by hand. The PDB file can contain more than one chain, but not more than one ligand.

Output:

Protein Ligand Binding Affinity: in Kcal/mol as float

References:

Vangone A, Schaarschmidt J, Koukos P, Geng C, Citro N, Trellet M, Xue L, Bonvin A.: Large-scale prediction of binding affinity in protein-small ligand complexes: the PRODIGY-LIG web server. Bioinformatics

Kurkcuoglu Z, Koukos P, Citro N, Trellet M, Rodrigues J, Moreira I, Roel-Touris J, Melquiond A, Geng C, Schaarschmidt J, Xue L, Vangone A, Bonvin AMJJ.: Performance of HADDOCK and a simple contact-based protein-ligand binding affinity predictor in the D3R Grand Challenge 2. J Comput Aided Mol Des 32(1):175-185 (2017).