Nanobody Aggregation Score Calculation

The nanobody aggregation score pipeline will be available online soon via xyna.bio. Stay tuned for upcoming updates and public launch announcements!

This pipeline enables users to predict nanobody aggregation properties starting from VHH FASTA sequences. It is based on the nanobody aggregation scoring method described in the preprint by Geyer et al., 2025 (citation TBD).

This method considers:

exposed surface hydrophobicity
intramolecular hydrophobic interaction
instability index

of the FR2 region as well as the residue 118 at the FR4 interface.

Geyeretal2025_Fig2

Figure 1: Graphical representation of the framework region 2 (FR2) and residue 118. Region highlighted according to the hydrophobicity of its residues: green for hydrophobic ones and red for hydrophilic ones. Cartoon and molecular surface representation created using Mol * (https://doi.org/10.1093/nar/gkab314). Figure adapted from (Geyer et al., 2025 (citation TBD).)

Disclaimer: We are currently optimizing the "Intramolecular Hydrophobic Interactions" node. The output deviates from the manual analysis presented in _Geyer et al., 2025_ (citation TBD), as the algorithm to quantify intramolecular hydrophobic interactions, implemented in Mol*, has a higher resolution and thus caputures (on average) a greater number of interactions. Results might deviate accordingly.

Nanobody Pipeline

Figure 2: Schematic overview of the xyna.bio nanobody aggregation score calculation pipeline.

1. Load FASTA

Begin by loading a FASTA or MultiFASTA file containing VHH sequence(s). The FASTA headers should only contain the ID/name of the corresponding VHH sequence.

Example:

>Sequence_1
QVQLQESGGGSVQAG...

>Sequence_2
QVQLQESGGGSVQAG...

>Sequence_3
QVQLQESGGGSVQAG...

2. NanobodyBuilder2

NanobodyBuilder2 is a tool for predicting the 3D structure of nanobodies from their amino acid sequence using a deep learning model. It numbers the created .pdb files using ANARCI

Reference:

Abanades, B., Wong, W. K., Boyles, F., Georges, G., Bujotzek, A., & Deane, C. M. (2023). ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins. Communications Biology, 6(1), 575. https://doi.org/10.1038/s42003-023-04927-7

Browser application: NanoBodyBuilder2

Input

Sequences: A fasta file containing the sequence(s) of prospective nanobodies to be analysed.

Input Parameters

Numbering Scheme: The ANARCI numbering scheme used.

Output

Structures: A list of predicted 3D structures in .pdb format. This output should be directed into a Batch node to process and calculate the aggregation score of each nanobody in a single run.

3. Intramolecular Hydrophobic Interactions

This node implements a simplified algorithm to determine intramolecular hydrophobic interactions between amino acid residues of a nanobody. The algorithm is partially inspired by the Contacts of Structural Units (CSU) algorithm. For each residue of the nanobody, all hydrophobic side chain C-atoms are considered in the calculation. These are all side chain C-atoms that are not covalently bound to an oxygen, nitrogen, or sulfur atom. For each hydrophobic side chain C-atom of a residue R_i, the distances to all hydrophobic side chain C-atoms of the other residues R_j of the nanobody structure are determined from the PDB file. If a distance is lower than the pre-defined cutoff, residue R_j is classified as interaction partner of residue R_i. The number of intramolecular hydrophobic interactions for residue R_i is its number of interaction partners.

Reference:

Sobolev, V., Sorokine, A., Prilusky, J., Abola, E. E., & Edelman, M. (1999). Automated analysis of interatomic contacts in proteins. Bioinformatics (Oxford, England), 15(4), 327-332.https://doi.org/10.1093/bioinformatics/15.4.327

Input

Input Structure: A single .pdb formatted structure file containing the ANARCI numbered structure of an anti- or nanobody.

Input Parameters

Distance Cutoff: The distance in Angstrom up to which two residues will be determined to have an interaction. Set it to 4 Å, in order to reproduce parameter settings of Geyer et al., 2025 (citation TBD).

Output

Hydrophobic Interactions: A list containing the number of interaction partners for each amino acid of the given structure.

4. Exposed Surface Hydropathy

A large factor in aggregation is the total amount of exposed surface area hydropathy of the former VH-VL interface, called FR2 region (residue 39-55 based on IMGT numbering). This node determines the exposed surface hydrophobicity by calculating the mean product of the exposed surface area (Shrake-Rupley algorithm) and hydrophobicity of each amino acid (Wimley and White hydrophobicity scale). A more negative result for a particular residue reflects more hydrophilic exposed surface. A more positive result reflects more hydrophobic surface.

References:

Shrake-Rupley algorithm: Shrake, A., & Rupley, J. A. (1973). Environment and exposure to solvent of protein atoms. Lysozyme and insulin. Journal of molecular biology, 79(2), 351-371.https://doi.org/10.1016/0022-2836(73)90011-9 (Implementation: https://github.com/biopython/biopython/blob/master/Bio/PDB/SASA.py)

Hydrophobicity scale: Wimley, W. C., & White, S. H. (1996). Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nature structural biology, 3(10), 842-848.https://doi.org/10.1038/nsb1096-842

Input

Nanobody Structure: A single .pdb file containing a nanobody structure annotated with ANARCI. It can, for instance, be the output of the ANARCI PDB node or a batch item of the NanobodyBuilder2 Output.

Input Parameters

N Points: Number of points at which the surface will be probed. A higher number might increase accuracy but also increases runtime. It is recommended to start with 100 (default), if not stated otherwise.

Output

Surface Properties: A list containing the hydrophobicity of the exposed surface for each amino acid of the given structure.

5. Nanobody Aggregation Score

This is the final pipeline node, calculating the aggregation score for a given nanobody. It follows the formula laid out in Geyer et al., 2025 (citation TBD) to predict the likelihood of said nanobody aggregation based on the exposed surface area hydropathy, the internal hydrophobic interactions and the inherent (in-)stability of the FR2 region.

AS_formula_Geyeretal.2025

Figure 3: Mathematical formula for the aggregation score behind the Nanobody Aggregation Score node. (I) the hydrophobicity of the conserved immunoglobulin domain interaction interface (FR2 + residue 118), (II) the mean hydrophobic intramolecular interactions possible for each residue in a radius of 4 Å of the contact interface (FR2 + residue 118), and (III) the instability index of FR2. Adapted from Geyer et al., 2025 (citation TBD).

Input

Nanobody Structure: A single .pdb file containing the ANARCI numbered structure of the nanobody. It can, for instance, be the output of the ANARCI PDB node or a batch item of the NanobodyBuilder2 Output.
Surface Properties: A list containing the hydropathy of the exposed surface for each amino acid of the given structure. Output of the Exposed Surface Hydrophobicity node.
Interactions: A list of the number of interaction partners each for each amino acid of the given structure. Output by the Intramolecular Hydrophobic Interactions node.

Input Parameters

Surface Area Hydropathy ROI: Region of Interest (ROI) for the surface area hydropathy on which the analysis should be performed. The selected default is the FR2 region and residue 118 (according to IMGT numbering). The indices should be provided according to the used ANARCI numbering scheme. Separate the indices by '-' for ranges and by ',' for single indices (e.g.: 39-55,118 for the proposed region). If the field is left blank, the entire nanobody is used.
Hydrophobic Interactions ROI: ROI for the hydrophobic interactions on which the analysis should be performed. Again, the selected default is the FR2 region and residue 118 (according to IMGT numbering). The indices should be provided as stated for the Surface Area Hydropathy ROI parameter.
Instability Index ROI: ROI for the instability index on which the analysis should be performed. The selected default is the FR2 region (according to IMGT numbering). The indices should be provided as stated for the Surface Area Hydropathy ROI parameter.

Guruprasad, K., Reddy, B. B., & Pandit, M. W. (1990). Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engineering, Design and Selection, 4(2), 155-161. DOI: https://doi.org/10.1093/protein/4.2.155
Job Name: A name for the given job that is used to aggregate the results of one batch into one output file.

To reproduce the settings from from Geyer et al., 2025 (citation TBD), select the following regions of interest:

Nanobody_AS_node

Output

Aggregation Score: The final aggreagation score of each nanobody in the batch, stored as one .csv file.

Advanced settings

Open the side panel, and toggle Save AA specific CSV data, to additionally store the amino acid values for the interactions, surface area, and hydrophobicity.

Nanobody_AS_node_advanced_settings

Start your own analysis now: xyna.bio

This pipeline will soon be freely available for academic use in the xyna.bio pipeline store.