Skip to content

Biology

GC Content

Calculates the ratio of the bases guanine (G) and cytosine (C) out of all four possible bases (adenine, guanine, cytosine, thymine (or uracil in RNA)) in a given DNA sequence.

The GC content of a sequence can be important to know for the planning of sequencing experiments, PCR reactions, and many other experiments that might be sensitive to DNA with an increased melting temperature, as the bases G and C increase a nucleotide’s melting temperature.

Input: Loaded nucleic acid sequence.

Output: GC content as float.

Molecular Weight of DNA

Calculates the molecular weight of a DNA sequence in Dalton.

The weight of a DNA sequence differs depending on if the DNA is assumed to be single or double-stranded, circular, or linear and depending on what distribution of isotopes is assumed for the atoms in the DNA.

Input: Loaded protein or nucleic acid sequence.

Input Parameters: Characteristics of the sequence can be ticked off if they apply (Double Stranded, Circular, Monoisotropic). A monoisotropic DNA sequence is assumed to only contain the most abundant naturally occurring stable isotope for each type of atom.

Output: Molecular weight as float.

Protein Instability Index

Computes the protein instability index based upon the observed frequency of dipeptides in different stable/unstable proteins. The larger the value, the more instable the protein.

It is a heuristic for the stability of protein sequences given the observed differences in dipeptide frequency between stable and unstable proteins. Values > 40 indicate short half-life in vivo of the protein. Conversely, values below this threshold indicate relative stability.

Input: Loaded protein sequence.

Output: Protein instability index as float.

Reference:

Guruprasad, K., Reddy, B. B., & Pandit, M. W. (1990). Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engineering, Design and Selection, 4(2), 155-161. DOI: https://doi.org/10.1093/protein/4.2.155

Protein Ligand Binding Affinity

Calculates the protein ligand binding affinity of a given PDB structure with a docked ligand.

The binding affinity of a protein and a ligand is a measure for how well a given protein structure can bind a ligand. This can be important to judge how well a ligand is able to bind into a structure, for example, to find the most potent inhibitor of an enzyme.

The affinity is calculated with Prodigy-Lig, a structure-based method for the prediction of binding affinity in protein-small ligand complexes. The tool uses Gibbs free energy in kcal/mol to measure the binding affinity. The lower (more negative) the value, the stronger the binding.

Input: PDB structure.

Requirements: The file must contain exactly one ligand, which needs to be chain H with the name UNL. This is automatically set using the Add SDF to PDB node. Otherwise, it needs to be set by hand. The PDB file can contain more than one chain, but not more than one ligand.

Output: Binding affinity as float.

References:

Vangone A, Schaarschmidt J, Koukos P, Geng C, Citro N, Trellet M, Xue L, Bonvin A.: Large-scale prediction of binding affinity in protein-small ligand complexes: the PRODIGY-LIG web server. Bioinformatics

Kurkcuoglu Z, Koukos P, Citro N, Trellet M, Rodrigues J, Moreira I, Roel-Touris J, Melquiond A, Geng C, Schaarschmidt J, Xue L, Vangone A, Bonvin AMJJ.: Performance of HADDOCK and a simple contact-based protein-ligand binding affinity predictor in the D3R Grand Challenge 2. J Comput Aided Mol Des 32(1):175-185 (2017).

GO Enrichment

Computes a Ontology (GO) enrichment based upon a provided gene set.

The GO database classifies gene functions into three main categories: Biological Processes, Molecular Functions, Cellular Components. In an enrichment analysis, a set of genes of interest, e.g., differentially expressed genes, is compared against a background set of all genes, to determine which GO terms are statistically overrepresented in the dataset. At the moment, the background set always contains all genes of a certain organism. Soon, an option to manually set the background set will be provided.

Input: Gene Set as csv file.

Requirements: Each row refers to a gene name. In addition to the gene names, the following columns must exist:

  • p value: unadjusted
  • avg_log2FC: Logarithmic fold change of average expression. Positive values mean that the gene is higher expressed in the first group.
  • pct. 1: Percentage of cells in which the gene is detected in for the first group.
  • pct. 2: Percentage of cells in which the gene is detected in for the second group.
  • p_val_adj: p value adjusted by Bonferroni correction.

Input Parameters:

  • Ontology Type: Desired ontology type (BP, MF, CC, all)
  • Gene name format: Format of the gene names provided in the input dataset (SYMBOL, ENTREZID)

Output: Enriched gene set as csv file. Can be plotted in a bar plot. More plots will be provided soon.

Reference:

Ashburner et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000 May;25(1):25-9. DOI: 10.1038/75556

The Gene Ontology Consortium. The Gene Ontology knowledgebase in 2023. Genetics. 2023 May 4;224(1):iyad031. DOI: 10.1093/genetics/iyad031

Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou LP, Mi H. PANTHER: Making genome-scale phylogenetics accessible to all. Protein Sci. 2022 Jan;31(1):8-22. DOI: 10.1002/pro.4218

GO Term Barplot

Plots a bar plot of the GO enrichment result (enriched gene set).

Input: Enriched gene/term set from GO Enrichment node.

Output: Generated plot. Can be viewed with Image Viewer.