Annotation

Add Selection

Defines a selection on a protein (PDB or fasta) or gene (fasta) sequence.

Input:

Selections: The selections to add the new selection to. Can be a loaded sequence.

Input Parameters:

Name: Label to assign to selection. Is used in downstream nodes to refer to this selection.
Start: Start index of the selection in the sequence.
End: End index of the selection in the sequence.
Color: Color to apply to the selection in hex code.
Chain ID: The chain ID of the sequence containing the selection.
Sequence Name: Name of the sequence in a multi-sequence.

Output: List of selections, including the one added by the node.

Fasta from GenBank

Generates a fasta sequence and a selection from a GenBank file.

Input:

Genbank File: Loaded GenBank file, usually from the Load GenBank node.

Output:

Sequence: Generated nucleic acid sequence.
Selections: Sequence annotations from the GenBank input file.

Find ORFs

Finds possible open reading frames (ORFs) in each gene sequence by searching for start and stop codons in all six possible reading frames. It uses many different combinations of start and stop codons from different codon usage tables.

An ORF is a sequence of DNA that can be translated into a protein. Since DNA is coding for amino acids in triplets of bases, there are three possible frames to read the DNA in, depending on which base it starts on. Since DNA is a double-stranded antiparallel molecule, it is also possible to read the reverse of each DNA sequence as well, adding another three possible reading frames.

Input:

Single Sequence Fasta: The fasta sequence to extract ORFs from. The fasta file must have only a single DNA sequence entry.
Selections (optional): Selection of the sequence (e.g., from Add Selection) node.

Input Parameters:

Codon Table: The codon table to use for translation. Selected from the dropdown menu. Dependent on the source and host organism of the sequence.
Minimum Protein length: The minimum length of the predicted protein coded for by the potential ORFs. Any found ORF will code for a protein longer than the given number.

Output: Annotated ORFs as a GenBank file containing the DNA sequence and the ORF annotation.