Article,

The human antibody sequence space and structural design of the V, J regions, and CDRH3 with Rosetta

, , , and .
MAbs, 14 (1): 2068212 (January 2022)

Abstract

The human adaptive immune response enables the targeting of epitopes on pathogens with high specificity. Infection with a pathogen induces somatic hyper-mutation and B-cell selection processes that govern the shape and diversity of the antibody sequence landscape. To date, even the largest immunome repertoires of adaptive immune receptors acquired by next-generation sequencing cannot fully capture the vast antibody sequence space of a single individual, which is estimated to be at least 1012 potential sequences. Degeneracy of the genetic code means that the number of possible nucleotide triplets (64) is greater than the number of canonical amino acids (20), resulting in some amino acids being encoded by multiple triplets and different amino acids sharing the same nucleotide in 1 or 2 positions in the triplet. We hypothesize that the degeneracy of the genetic code can be used to statistically model an enlarged space of human antibody amino acid sequences, accommodating for the discrepancy between the observed and the hypothesized antibody sequence space. Facilitated by Bayesian statistics and immunome repertoire clustering, we calculated amino acid probabilities from single nucleotide frequencies to infer a human amino acid sequence space that is used to design human-like antibodies with Rosetta. We show that antibodies designed with our restraints are on average up to 16.6\% more human-like in the V and J regions compared to the Rosetta designs produced without constraints. The human-likeness of the heavy-chain CDR3 region (CDRH3) could be increased for 8 of 27 antibodies compared to Rosetta designs with a similar number of mutations and could be successfully applied on Mus musculus antibodies to demonstrate humanization.

Tags

Users

  • @scadsfct

Comments and Reviews