The challenges proposed in this inaugural competition are based on unpublished SARS-CoV-2 next-generation sequencing (NGS) datasets from which some antibodies were characterized21,22. Given the vast amount of additional public data available for SARS-CoV-2–binding antibodies (for example, in COVIC19 and Cov-AbDab23), the following should provide the best possible scenario for AI/ML task success.
Once designed or identified sequences have been uploaded, Azenta will synthesize up to 1,200 genes for antibodies, express and purify them, carry out size-exclusion chromatography and provide coded antibodies to other partners. Antibody affinities will be assessed by Carterra using surface plasmon resonance (LSA-XT) and by Sapidyne using the kinetic exclusion assay (KinExA) for the strongest binders. Mosaic will assess developability (hydrophobic interaction (HIC) high-performance liquid chromatography (HPLC), baculovirus particle (BVP) enzyme-linked immunosorbent assay (ELISA), affinity-capture self-interaction nanoparticle spectroscopy (AC-SINS), Tm and Tagg). Bio-Techne will provide the target. These assays will ensure standardized conditions and unbiased head-to-head comparison of predicted sequences.
Competition 1: In silico antibody affinity maturation
Participants will be provided with experimental datasets derived from the affinity maturation of an antibody recognizing the RBD of SARS-CoV-2 (Table 1). Each dataset comprises three NGS sub-datasets of CDR sequences (LCDR1+2, LCDR3 and HCDR1+2, in which the remaining CDRs are constant) generated during phase 1 of a previously described experimental affinity maturation method20 (Fig. 1), in which phase 2 involves combining all phase 1 outputs and experimentally selecting for higher affinity. Each antibody population has diversity only in the indicated CDRs (Fig. 1a), with the remaining CDRs being parental, and has been displayed on yeast and sorted for target binding (Fig. 1b). Although each population binds the target more tightly than the parental population, individual NGS sequences have not been assessed for their ability to encode antibodies with improved binding activity and may include PCR or sequencing errors. Although amino acid sequences of phase 2–characterized antibodies with their affinities have been determined (Fig. 1c,d), these data will not be provided, so that the computational methods are given the opportunity to generate the same (or better) sequences.
a, Phase 1: DNA library diversity is introduced into L1+L2, L3 or H1+H2 of an anti-RBD scFv. b, Selective pressure is applied by FACS using RBD to select for improved binders. c, Phase 2: RBD binding diversity from each of the three arms (L1+L2, L3, H1+H2) is recovered, PCR-amplified and recombined into a yeast display vector. d, The combined diversity is transformed back into yeast and sorted for improved affinity and expression. e–g, Sequencing identifies final improved variants, which are reformatted into IgG for expression (e), binding affinity (f) and developability (g) measurements.
The computational goal of Competition 1 is to design antibodies (Fig. 1e) with improved affinity (Fig. 1f) for the RBD of SARS-CoV-2 that also exhibit favorable developability properties (Fig. 1g) using the NGS datasets. Designs should be applied to only heavy chain complementarity determining regions 1 and 2 (HCDR1–2) and light chain complementarity determining regions 1, 2, and 3 (LCDR1–3) and not to frameworks or heavy chain complementarity determining region 3 (HCDR3). The blinded assessment will determine the affinities of the designed antibodies and how well they compare to those of the antibodies obtained experimentally in phase 2. Although the experimental affinity maturation was carried out on single-chain variable fragments (scFvs) displayed on yeast, antibodies were, and will be, tested as full-length IgGs. Results will be compared to the affinities of experimentally derived sequences generated by combining the three phase 1 outputs and selecting from the corresponding combinatorial library displayed on yeast.
Competition 2: In silico affinity rank prediction for antibody discovery
Participants will be provided with an NGS dataset of a single selection output recognizing the RBD of SARS-CoV-2, clustered by HCDR3 sequence22 using a previously published library24, which comprises natural CDRs embedded within well-behaved therapeutic scaffolds. Although not all individual NGS sequences have been assessed for their ability to encode antibodies with binding activity, and therefore they may include PCR or sequencing errors, those sequences encoding antibodies (as IgG) demonstrated to bind the target will be identified or provided, with their corresponding affinities. Furthermore, the relative frequency of different sequences within the clusters will be provided; see Table 2 for a representative dataset.
The computational goal of Competition 2 is to identify those sequences within the existing NGS dataset that encode the highest-affinity antibodies (that have not already had their affinities determined) in the two largest HCDR3 clusters, by largest number of VL+VH sequences, from experimental bin group 1 (that is, 28F and 27F) and the largest cluster from experimental bin group 2 (that is, 47F; Table 4). Results will be compared to the affinities of those antibodies that were experimentally derived and for which sequences were provided.
Competition 3: NGS-inspired computational antibody design
Participants will be given the same NGS output as in competition 2. The AI/ML goal is to generate out-of-library sequences of antibodies binding the same target with as high affinities as possible that also exhibit favorable developability properties, using the provided NGS datasets described here and any other useful publicly available data. Only CDRs should be designed, and frameworks should remain unmodified (Table 2). The out-of-library sequences should not be present within the NGS dataset itself, nor be derived from any previous independent experiments. Results will be compared to the affinities of those antibodies that were experimentally derived and for which sequences were provided.
Each of these challenges has high affinity as an endpoint, but developability will also be assessed to ensure that affinity is not generated at the expense of developability. Antibodies will be judged as passing, questionable or failing on each of the five developability assays described above, with failing antibodies scored 2, questionable antibodies scored 1 and passing antibodies scored 0; any antibody with a total score of 4 or above will be considered failing.