Nodes

File Operations

File operation nodes serve as file input to the graph. Data can be loaded, extracted, or merged before running the pipeline.

Loading

In each loading node, the input file can be selected from the dropdown menu. The file has to be loaded into the platform first through the Files tab. If it fits the file format of the loading node, it automatically appears in the dropdown menu.

Load Fasta

Loads a fasta file (.fasta) containing a biological sequence, including DNA or protein sequences. The node automatically recognizes the sequence type. xyna.bio provides various options to process the sequence further, e.g., protein structure prediction for protein sequences.

Various -omics data types can usually be represented using the same file ending .fasta. This is inconsistent with the not compatible types of data stored and can lead to mistakes and chaining together incompatible tools. For this reason, xyna.bio automatically recognizes the contents of fasta files and infers the reference type for further processing. The reference type identity can also be manually assigned.

Data handling for fasta files is split between gene fasta files and protein fasta files (both with the ending .fasta).

Additionally, there are aligned fasta files and multi fasta files.

Aligned fasta files contain gene or protein sequences respectively that are the output of a multiple sequence alignment. These alignments in fasta format are allowed to contain “-” as a special character indicating a gap in the alignment.

Multi fasta files contain more than one sequence per file and can be manipulated using merge nodes.

Fasta files can, for instance, be downloaded from GenBank.

Input: (Multi-)Fasta file

Output: Sequence (e.g., Protein, NucleicAcid, MultiProtein)

Load Genbank

Loads a GenBank (.gb) file containing a gene sequence with annotations.

Input (GenBank file):

The GenBank file format (.gb) allows for the storage of gene sequences along with additional information like region annotations, sample information and references to publications.

Output: GenBank file

Load PDB

Loads a PDB (.pdb) file containing a 3D protein structure. The loaded structure can, for instance, be displayed in MolStar or with the Protein Structure Viewer node.

Input (PDB file): The PDB file format (.pdb) contains three-dimensional structural data in the form of atomic coordinates. xyna.bio expects a pdb file to contain a protein structure. Usually, the files are downloaded from RCSB PDB.

Requirements: The file must contain a protein structure. For some use cases, such as Protein Ligand Binding Affinity, a PDB structure can also contain a protein structure and ligand. In the case of AlphaFold Multimer, the file must contain multiple protein chains.

Output: Protein structure (loaded pdb file)

Load SDF

Loads an SDF (.sdf) file containing the structure of a molecule. The structure can, for instance, be docked to a protein with the DiffDock node.

Input (SDF file): The SDF file format (.sdf) contains three-dimensional structural data, but instead of protein structures, it is used to store coordinates of smaller molecules like drugs or metabolites.

SDF and PDB files can be combined into one PDB file containing the information of both. In xyna.bio, this can be done with the Add SDF to PDB node.

A common source for SDF files is the PubChem database.

Output: Chemical structure (.sdf format)

Load SMILES

The Load SMILES node allows to load a SMILES file from the workspace files folder. Select a SMILES file from the drop-down list and connect the node’s output to any SMILES input in your workflow.

Input (SMILES file)

A SMILES file selected from the drop-down list

Output: Extracted SMILES string from the .smiles file (e.g., "CCO")

Load PDBQT

Loads a .pdbqt file, which is a specific file format used in molecular docking simulations.

Input (PDBQT file)

.pdbqt files typically contain the 3D atomic coordinates of a protein or a ligand, along with additional information like atomic charges and atom types that are essential for docking software like AutoDock Vina.

Output: .pdbqt file

Extracting

Extract Selection From Fasta

Extracts a subsequence from a fasta sequence based upon a selection name of a previously defined selection (see Add Selection node).

Input:

Single Sequence Fasta: The loaded fasta sequence to extract a selection from. The fasta file must have only a single entry.
Selections: A list of selections which mark the positions of the sequence that will be extracted. Only one of the selections can be extrated at a time. The seletions can be retrieved from the Add Selection node.

Input Parameters:

Selection Name: The name of the selection to extract.

Output:

Extracted Sequence: Fasta file containing the extracted sequence.
Extracted Selections: Selections that fit to the extracted sequence. For example, if the extracted sequence contains residues 10-40, any selection that contains residues outside of that will be discarded.

Extract Selection Structure

Gets a substructure of a given PDB file based upon the name of a previously defined selection (see Add Selection node).

Returns all amino acids within the selection range as a new PDB structure. Does not return water molecules or other types of atoms present in the PDB file.

Input:

Protein Structure: The PDB file containing a protein structure from which a selection will be extracted.
Selections: A list of selections which mark the positions of the sequence that will be extracted. Only one of the selections can be extrated at a time. The seletions can be retrieved from the Add Selection node.

Input Parameters:

Selection Name: Name of the desired selection from “Selections” input.

Output:

Extracted Structure: The PDB file containing the extracted structure.
Extracted Selections: Selections that fit to the extracted sequence. For example, if the extracted sequence contains residues 10-40, any selection that contains residues outside of that will be discarded.

Extract Sequence From Fasta

Extracts a single fasta sequence from a multifasta file.

Input:

Fasta: The fasta file to extract one sequence from. The file must have multiple entries (multifasta).
Selections (optional): Selections for the fasta file. This node will output those selections which are for the extracted sequence.

Input Parameters:

Name: Name of the sequence to extract from the provided source fasta file.

Output:

Extracted Sequence: Fasta file containing the extracted sequence.
Selections: Selections that fit to the extracted sequence. For example, if the extracted sequence has the name "ID101245", any selection that is not for this sequence will be discarded.

Merging

Merge Fasta

Merges two protein fasta sequences into one multi fasta file. A multi fasta file is a file in fasta format, containing multiple formatted sequences. The merging is based on selections, which can be created with the Add Selection node. The selections are merged, too.

Input:

Fasta 1: First fasta file to combine. Both files must contain the ssame type of entries (DNA or Protein).
Fasta 2: Second fasta file to combine. Both files must contain the ssame type of entries (DNA or Protein).
Selections 1: Selections for the first fasta file. Selections can be added through the Add Selection node.
Selections 2: Selections for the second fasta file. Selections can be added through the Add Selection node.

Output:

Merged Fasta: Merged fasta sequence (MultiProtein or MultiNucleicAcid)
Selections: Merged selections of the two fasta files.

Other

Select File

Selects a loaded file from the file storage according to its ID and type. This node is helpful to access intermediate files or to avoid loading datasets multiple times (e.g., if loading takes long). For more details on file management, check out the Getting Started section.

Input Parameters:

File ID: ID of the file which should be loaded. Can be retrieved from the Job Spreadsheet or the Files Tab (see System Features)
File Type: Type of the file to load. Can be retrieved from the output node or the Job Spreadsheet.

Output: Loaded file.