Nanobody Aggregation
These nodes present a toolset, which provides users with the ability to predict nanobody aggregation properties. It will be available online soon via xyna.bio. Stay tuned for upcoming updates and public launch announcements!
For more detail, refer to the pipeline documentation.
ANARCI
Antigen Receptor Numbering And Receptor ClassIfication (ANARCI) is a tool for classifying and numbering each amino acid in an anti- or nanobody. As these proteins are highly variable in length and amino acid composition, it is hard to assign an amino acid to a functional region (like CDR loops and framework regions) only based on its position in the sequence. For this reason, different numbering schemes with diverse applications exist. It is important to keep in mind that some numbering schemes might reflect long sequences with many insertions incorrectly. In these cases, interpretation must be handled with caution.
Reference:
Dunbar, J., & Deane, C. M. (2016). ANARCI: antigen receptor numbering and receptor classification. Bioinformatics, 32(2), 298-300.https://doi.org/10.1093/bioinformatics/btv552
The user can choose between the following numbering schemes:
- IMGT: IMGT stands for International ImMunoGeneTics Information System. It ensures standardization and structural consistency by aligning and annotating sequences based on conserved residues. For instance, Complementarity Determining Regions (CDRs) are defined with fixed positions. It is the most widely used numbering scheme and recommended for comparisons across datasets, thus the default in xyna.bio.
Reference: Lefranc MP, Pommié C, Ruiz M, Giudicelli V, Foulquier E, Truong L, Thouvenin-Contet V, Lefranc G. IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains. Dev Comp Immunol. 2003 Jan;27(1):55-77.
- Kabat: The Kabat scheme was developed based on the location of regions of high sequence variation between sequences of the same domain type. For instance, CDRs are defined based on regions of high variability. Kabat's strength is that it captures the natural diversity of anti- and nanobodies.
Reference: Kabat E.A., et al. (1991) Sequences of Proteins of Immunological Interest. Fifth Edition. NIH Publication No. 91-3242.
- Chothia: Chothia's scheme refines Kabat by aligning sequences to known antibody structures, making it suitable for 3D modeling and structural prediction.
Reference: Al-Lazikani B., et al. (1997) Standard conformations for the canonical structures of immunoglobulins. J. Mol. Biol., 273, 927–948.
- Martin: Martin's scheme is an enhanced version of Chothia with further structural corrections for higher accuracy.
Reference: Abhinandan K.R., Martin A.C.R. (2008) Analysis and improvements to Kabat and structurally correct numbering of antibody variable domains. Mol. Immunol., 45, 3832–3839.
- AHo: AHo stands for Antibody Homology. The scheme is based on structure-based homology and uses fixed lengths to place residues. It is well-suited for machine learning and structural bioinformatics.
Reference: Honegger A., Plückthun A. (2001) Yet another numbering scheme for immunoglobulin variable domains: an automatic modeling and analysis tool. J. Mol. Biol., 309, 657–670.
- Wolfguy: The Wolfguy scheme is also structural-based. It defines CDRs as a combined definition from the Kabat and Chothia schemes.
Reference: Bujotzek A, Dunbar J, Lipsmeier F et al. Prediction of VH–VL domain orientation for antibody variable domain modeling. Proteins Struct. Funct. Bioinforma. 2015;83:681–95. 10.1002/prot.24756.
ANARCI PDB
The ANARCI PDB node renumbers an anti- or nanobody PDB structure using ANARCI.
Input
- Structure: The anti- or nanobody structure to renumber in .pdb format.
- Numbering Scheme: The ANARCI numbering scheme used.
Output
- Renumbered Structure: The renumbered structure in .pdb format.
Nanobody Builder 2
NanobodyBuilder2 is a tool for predicting the 3D structure of nanobodies from their amino acid sequence using a deep learning model. It numbers the created .pdb files using ANARCI.
Reference:
Abanades, B., Wong, W. K., Boyles, F., Georges, G., Bujotzek, A., & Deane, C. M. (2023). ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins. Communications Biology, 6(1), 575. https://doi.org/10.1038/s42003-023-04927-7
Browser application: NanoBodyBuilder2
Input
- Sequences: A .fasta file containing the sequence(s) of prospective nanobodies to be analysed.
Input Parameters
- Numbering Scheme: The ANARCI numbering scheme used.
Output
- Structures: A list of predicted 3D structures in .pdb format. This output should be directed into a Batch node to process and calculate the aggregation score of each nanobody in a single run. To maintain robustness, the Batch node is also needed if only one nanobody is analyzed.
- Numbered Sequences: The numbered sequences according to the selected numbering scheme as .csv file. The columns are the indices allocated to the residues of the sequences provided in the input file. A missing cell or number indicates a gap. For context and interpretation, see ANARCI.
Extract Nanobody from List
This node extracts a nanobody structure by its ID from a list of nanobody structures as .pdb files.
Input
- Input List: The list of .pdb files containing nanobody structures. Usually, this list is created by the Nanobody Builder 2 node.
Input Parameters
- Nanobody ID: The ID of the nanobody to be extracted from the list. It must match the name given in the fasta file. The ID can be provided in upper- or lowercase.
Output
- Extracted Nanobody: Reference to the .pdb file of the extracted nanobody. It can directly be linked to the Protein Structure Viewer node.
Intramolecular Hydrophobic Interactions
This node fetches precomputed intramolecular hydrophobic interactions directly from Mol*'s built-in analysis module, accessing the residue-level hydrophobic contacts derived from Mol*’s spatial analysis routines, which incorporate atomic proximity, physicochemical properties, and interaction types as visible in the interactive structure visualization in Mol*. The output consists of per-residue hydrophobic interaction counts, corresponding to the number of unique, spatially-accessible hydrophobic contacts each residue forms with others. For more detail, refer to the pipeline documentation.
References:
David Sehnal, Sebastian Bittrich, Mandar Deshpande, Radka Svobodová, Karel Berka, Václav Bazgier, Sameer Velankar, Stephen K Burley, Jaroslav Koča, Alexander S Rose: Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures, Nucleic Acids Research, 2021; 10.1093/nar/gkab31
Sobolev, V., Sorokine, A., Prilusky, J., Abola, E. E., & Edelman, M. (1999). Automated analysis of interatomic contacts in proteins. Bioinformatics (Oxford, England), 15(4), 327-332.https://doi.org/10.1093/bioinformatics/15.4.327
Input
- Input Structure: A single .pdb formatted structure file. In the context of nanobodies and anitbodies, the structure should be annotated with ANARCI. It can, for instance, be the output of the ANARCI PDB node or a batch item of the NanobodyBuilder2 Output.
Input Parameters
- Distance Cutoff: The distance in Angstrom up to which two residues will be determined to have an interaction. Set it to 4 Å to reproduce the parameter settings of Geyer et al., 2025 (citation TBD).
Output
- Hydrophobic Interactions: A list containing the number of interaction partners for each amino acid of the given structure.
Exposed Surface Hydropathy
A large factor in aggregation is the total amount of exposed surface area hydropathy of the former VH-VL interface, called FR2 region (residue 39-55 based on IMGT numbering). This node determines the exposed surface hydrophobicity by calculating the mean product of the exposed surface area (Shrake-Rupley algorithm) and hydrophobicity of each amino acid (Wimley and White hydrophobicity scale). A more negative result for a particular residue reflects more hydrophilic exposed surface. A more positive result reflects more hydrophobic surface.
References:
Shrake-Rupley alogorithm: Shrake, A., & Rupley, J. A. (1973). Environment and exposure to solvent of protein atoms. Lysozyme and insulin. Journal of molecular biology, 79(2), 351-371.https://doi.org/10.1016/0022-2836(73)90011-9 (Implementation: https://github.com/biopython/biopython/blob/master/Bio/PDB/SASA.py)
Hydrophobicity scale: Wimley, W. C., & White, S. H. (1996). Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nature structural biology, 3(10), 842-848.https://doi.org/10.1038/nsb1096-842
Input
- Nanobody Structure: A single .pdb file containing a nanobody structure annotated with ANARCI. It can, for instance, be the output of the ANARCI PDB node or a batch item of the Nanobody Builder 2 Output.
Input Parameters
- N Points: Number of points at which the surface will be probed. A higher number might increase accuracy but also increases runtime. It is recommended to start with 100 (default), if not stated otherwise.
Output
- Exposed Area: A list containing the hydropathy of the exposed surface for each amino acid of the given structure.
Nanobody Aggregation Score
This is the final pipeline node, calculating the aggregation score for a given nanobody. It follows the formula laid out in Geyer et al., 2025 (citation TBD) to predict the likelihood of said nanobody aggregation based on the exposed surface area hydropathy, the internal hydrophobic interactions and the inherent (in-)stability of the FR2 region. For more detail, refer to the pipeline documentation.
Input
-
Nanobody Structure: A single .pdb file containing a nanobody structure annotated with ANARCI. It can, for instance, be the output of the ANARCI PDB node or a batch item of the Nanobody Builder 2 Output.
-
Surface Area Hydropathy: A list containing the hydropathy of the exposed surface for each amino acid of the given structure. Output of the Exposed Surface Hydropathy node.
-
Hydrophobic Interactions: A list of the number of interaction partners for each amino acid of the given structure. Output by the Intramolecular Hydrophobic Interactions node.
Input Parameters
- Surface Area Hydropathy ROI: Region of Interest (ROI) for the surface area hydropathy on which the analysis should be performed. The selected default is the FR2 region and residue 118 (according to IMGT numbering). The indices should be provided according to the used ANARCI numbering scheme. Separate the indices by '-' for ranges and by ',' for single indices (e.g.: 39-55,118 for the default region). If the field is left blank, the entire nanobody is used.
- Hydrophobic Interactions ROI: ROI for the hydrophobic interactions on which the analysis should be performed. Again, the selected default is the FR2 region and residue 118 (according to IMGT numbering). The indices should be provided in the same format as stated for the Surface Area Hydropathy ROI parameter.
- Instability Index ROI: ROI for the instability index on which the analysis should be performed. The selected default is the FR2 region (according to IMGT numbering). The indices should be provided in the same format as stated for the Surface Area Hydropathy ROI parameter.
Reference: Guruprasad, K., Reddy, B. B., & Pandit, M. W. (1990). Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engineering, Design and Selection, 4(2), 155-161. DOI: https://doi.org/10.1093/protein/4.2.155
- Job Name: A name for the given job that is used to aggregate the results of one batch into one output file.
Output
- Aggregation Score: The final aggregation score of each nanobody in the batch and its sub-components, stored as one .csv file.
Advanced settings
Open the side panel, and toggle Save AA specific CSV data, to additionally store the amino acid values for the surface area hydropathy, hydrophobic interactions, and hydrophobicity.
Disclaimer: The columns of the output files are currently hard coded, depending on the selected numbering scheme. For this reason, insertions and deletions are likely not truthfully represented. For short sequences, many columns are empty, while very long sequences might be reflected incorrectly. An improved version is under construction. For a correct representation of the sequence numbering, refer to the Numbered Sequences output of the Nanobody Builder 2 node.