Language selection

Search

Patent 3206795 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 3206795
(54) English Title: METHODS AND SYSTEMS FOR GENERATING NUCLEIC ACID DIVERSITY
(54) French Title: PROCEDES ET SYSTEMES POUR GENERER UNE DIVERSITE D'ACIDES NUCLEIQUES
Status: Compliant
Bibliographic Data
(51) International Patent Classification (IPC):
  • C12N 15/113 (2010.01)
(72) Inventors :
  • BIKARD, DAVID (France)
  • LAURENCEAU, RAPHAEL (France)
  • ROSTAIN, WILLIAM (France)
(73) Owners :
  • INSTITUT PASTEUR (France)
(71) Applicants :
  • INSTITUT PASTEUR (France)
(74) Agent: ROBIC AGENCE PI S.E.C./ROBIC IP AGENCY LP
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date: 2022-02-17
(87) Open to Public Inspection: 2022-08-25
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): Yes
(86) PCT Filing Number: PCT/EP2022/053934
(87) International Publication Number: WO2022/175383
(85) National Entry: 2023-07-27

(30) Application Priority Data:
Application No. Country/Territory Date
63/150,563 United States of America 2021-02-17
21305785.4 European Patent Office (EPO) 2021-06-09

Abstracts

English Abstract

Provided are methods comprising expressing in a recombinant cell a recombinant error- prone reverse transcriptase (RT) and recombinant spacer RNA comprising a target sequence; making a mutagenized cDNA polynucleotide homologous to a DNA sequence in the recombinant cell; expressing a recombinant recombineering system in the recombinant cell; and recombining the mutagenized cDNA with the homologous DNA sequence in the recombinant cell. Also provided are recombinant cells comprising recombinant coding sequences for a recombinant error-prone reverse transcriptase (RT), recombinant spacer RNA comprising a target sequence, and recombinant recombineering system.


French Abstract

Procédés comprenant l'expression dans une cellule recombinée d'une transcriptase inverse (RT) recombinée sujette aux erreurs et d'un ARN espaceur recombiné comprenant une séquence cible; la fabrication d'un polynucléotide d'ADNc mutagéné homologue à une séquence d'ADN dans la cellule recombinée; l'expression d'un système de recombinaison recombiné dans la cellule recombinée; et la recombinaison de l'ADNc mutagéné avec la séquence d'ADN homologue dans la cellule recombinée. L'invention concerne également des cellules recombinées comprenant des séquences codantes recombinées pour une transcriptase inverse (RT) recombinée sujette aux erreurs, un ARN espaceur recombiné comprenant une séquence cible, et un système de recombinaison recombiné.

Claims

Note: Claims are shown in the official language in which they were submitted.


WO 2022/175383
PCT/EP2022/053934
76
CLAIMS
1. A method of generating targeted nucleic acid diversity, comprising
expressing in a
recombinant cell a recombinant error-prone reverse transcriptase (RT) and a
recombinant
spacer RNA comprising a target sequence; making a mutagenized cDNA
polynucleotide
homologous to a DNA sequence in the recombinant cell; expressing a recombinant
recombineering system in the recombinant cell; and recombining the mutagenized
cDNA
with the homologous DNA sequence in the recombinant cell.
2. The method according to claim 1, wherein the recombinant error-prone
reverse
transcriptase (RT) comprises a recombinant DGR reverse transcriptase major
subunit
(RT) and a recombinant DGR accessory subunit (Avd), and the recombinant spacer
RNA
comprises a recombinant DGR spacer RNA comprising a target sequence.
3. The method according to claim 1 or 2, wherein the recombinant error-prone
reverse
transcriptase (RT) comprises the motif I/LGXXXSQ (SEQ ID NO: 2).
4. The method according to claim 1 or 3, wherein the recombinant error-prone
RT is an
engineered recombinant error-prone RT derived from a non-mutagenic reverse-
transcriptase; preferably the recombinant error-prone RT is a mutant Ec86
retron reverse
transcriptase comprising the replacement of the motif QGXXXSP (SEQ ID NO: 1)
with
the motif I/LGXXXSQ (SEQ ID NO: 2).
5. The method according to claim 2 or 3, wherein the recombinant DGR RT, the
recombinant DGR Avd, and the recombinant DGR spacer RNA are from the
Bordetella
bacteriophage BPP-1.
6. The method according to any one of claims 1 to 5, wherein, the
recombinant error-prone
RT has adenine mutagenesis activity; preferably wherein the recombinant error-
prone RT
is a DGR RT comprising a mutation that decreases its error rate at adenine
position
selected from the group consisting of: R74A and I181N, the positions being
indicated by
alignment with SEQ ID NO: 4.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
77
7. The method according to any one of claims 1 to 6, wherein the recombinant
recombineering system is different from the DGR retrohoming.
8. The method according to any one of claims 1 to 7, wherein the recombinant
recombineering system is a recombinant single-stranded annealing protein
mediating
oligo recombineering; preferably selected from the group consisting of: the
phage
lambda' s Red Beta protein, RecT, PapRecT and CspRecT.
9. The method according to any one of claims 2, 3 and 5 to 8, wherein the
recombinant DGR
RT, recombinant DGR Avd, recombinant DGR spacer RNA, and recombinant
recombineering system are all expressed from one or a plurality of recombinant
plasmids
together comprising coding sequences for the recombinant DGR RT, recombinant
DGR
Avd, recombinant DGR spacer RNA and recombinant recombineering system.
10. The method according to any one of claims 1 to 9, wherein the mutagenized
target
sequence is from 40 to 200 base pairs long or more.
11. The method according to any one of claims 1 to 10, wherein the adenine
content and/or
position(s) in the target sequence and/or homologous DNA sequence in the
recombinant
cell is modified to modulate recombination frequency or control sequence
diversity.
12. The method according to any one of claims 1 to 11, wherein the
recombination
frequency is at least 1%; preferably 3% or more; more preferably 10% or more.
13. The method according to any one of claims 1 to 12, wherein the recombinant
cell
comprises at least two spacer RNAs comprising a target sequence; in particular
at least
two DGR spacer RNAs comprising a target sequence; preferably wherein the
multiple
spacer RNAs target the same gene in the recombinant cell.
14. The method according to any one of claims 1 to 13, wherein the recombinant
cell is a
prokaryotic cell; preferably a bacterial cell; more preferably an E. coli
cell.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
78
15. The method according to claim 14, wherein the bacterial cell expresses
dominant
negative mutL; and/or the E. coil cell is deleted for the two exonucleases
SbcB and
RecJ to increase recombineering efficiency.
16. A recombinant cell comprising recombinant coding sequences for a
recombinant
error-prone reverse transcriptase (RT) and at least one recombinant spacer RNA
comprising a target sequence, and a coding sequence that expresses a
recombinant
recombineering system as defined in any one of claims 1 to 11, 13, 14 and 16
to 18.
17. The recombinant cell according to claim 16, wherein the cell further
comprises the
recombinant error-prone reverse transcriptase (RT), at least one recombinant
spacer
RNA comprising a target sequence and recombinant recombineering system.
18. A kit for generating targeted nucleic acid diversity, comprising one or a
plurality of
recombinant expression vectors together comprising coding sequences for a
recombinant error-prone reverse transcriptase (RT) and at least one
recombinant
spacer RNA comprising a target sequence, and a coding sequence that expresses
a
recombinant recombineering system, as defined in any one of claims 1 to 11,
13, 14
and 16 to 18
19. The kit according to claim 18, comprising one or a plurality of
recombinant expression
plasmids together comprising coding sequences for the recombinant DGR RT,
recombinant DGR Avd, recombinant DGR spacer RNA(s) and recombinant SSAP
mediating oligonucleotide recombineering as defined in claim 8; preferably
comprising the plasmid pRL014 having the sequence SEQ ID NO: 17.
CA 03206795 2023- 7- 27

Description

Note: Descriptions are shown in the official language in which they were submitted.


WO 2022/175383
PCT/EP2022/053934
1
METHODS AND SYSTEMS FOR GENERATING NUCLEIC ACID DIVERSITY
FIELD OF THE INVENTION
[0001] The invention relates to a method for generating targeted nucleic acid
diversity in vivo
in a recombinant cell. The invention further relates to a recombinant cell
system for generating
targeted nucleic acid diversity and to their uses.
BACKGROUND
[0002] Directed evolution mimics natural selection with the goal to generate
useful variants of
nucleic acids and/or proteins of interest. Mutations can be introduced in
genes either randomly,
through mutagenic agents, or in a targeted manner in a gene of interest,
optionally followed by
selection for a trait of interest. When the goal is to evolve a specific gene
or set of genes, targeted
diversity generation may be useful to limit the chances that mutations outside
of the genes of
interest will be selected. Targeted mutagenesis can also ensure that many more
sequences of the
target gene are being evaluated than what would otherwise be possible through
purely random
mutagenesis approaches. Careful design of the targeted approach can also
ensure an efficient
exploration of the sequence space, for instance by exploring sequence
variation at specific
residues of interest or by avoiding non-sense mutations. This targeted
mutagenesis has typically
been conducted in vitro through various molecular biology techniques including
error-prone
PCR, or through the rational design and construction of plasmid libraries.
These steps can,
however, be cumbersome, especially when many cycles of evolution are
performed. The ability
to diversify sequences in a targeted manner directly in vivo is a long-
standing goal of directed
evolution and a step towards continuous evolution setups where both
diversification and
selection can happen in vivo.
[0003] Examples of targeted diversity generation exist in nature. Diversity
generation in
antibodies is a key feature of human adaptive immune system. In bacteria,
diversity generating
retroelements (DGRs) are able to introduce controlled sequence diversity in
phage proteins and
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
2
bacterial proteins involved in the interaction with their environment. DGRs,
initially
characterized in the Bordetella bacteriophage BPP-1 [1], are found in a wide
range of phage,
bacteria, and archaea [2]. In DGR recombination, a variable region within the
genome will be
overwritten by a DNA fragment produced from a near repeat template region in a
process
involving transcription, error-prone reverse transcription of the template and
recombination. The
error-prone reverse transcription ensures the introduction of genetic
diversity at the variable
region. In the DGR systems characterized to date, two DGR proteins are
necessary for this process,
a reverse transcriptase major subunit (RT) and an accessory subunit (Avd) that
together form the
active reverse transcriptase complex ([1]; 113];114]; [5]; [6];[7]). An
alternative accessory gene
consisting of an HRDC (helicase and RNase D C-terminal) domain was also
identified in some
DGRs by bioinformatic analysis [3]. Most variable regions have been identified
within a few
kilobase pairs (kb) of the template region and the two DGR proteins (113];
[2]). The template region
that defines the mutagenesis window is embedded within the Avd and RT coding
sequences, inside
a transcribed RNA segment starting from the end of the AVD gene to the start
of the RT gene,
named Spacer RNA, the DGR RNA or DGR Spacer RNA. A cDNA copy is unfaithfully
generated from the mRNA by the DGR RT complex in a self-priming process [6]. A
specific
bias in the DGR RT incorporates random nucleotides in place of adenines. The
variable region
is then overwritten using this cDNA copy, resulting in the acquisition of A to
N mutations in the
gene. Due to the location of A residues within the sequence, the overall
protein structure,
typically a C-type lectin fold, is typically preserved while key residues in
the binding groove are
changed ([1]; [8]). In the case of Bordetella, DGR recombination can introduce
a diversity of
1013 unique amino acid sequences. However, the positions of the A nucleotides
in the codons
(i.e. exclusively in the first and second positions of the codons) negate the
possibility of non-
sense mutations occurring (111]; 118]). A DGR system has already been
harnessed to redirect the
mutagenesis towards a target sequence of choice [9], however this was achieved
only by using
the DGR in its native host, a Bordetella strain, and maintaining the
requirement of a recognition
sequence to be placed next to the desired mutagenesis window (the IMH
sequence), which
dramatically limits its possible applications as a genetic tool.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
3
[0004] While DGRs have yet to be harnessed in directed evolution setups, a
large number of
artificial targeted mutagenesis strategies have been proposed, and have
multiplied in recent years,
demonstrating a pressing need for improvement in this field ([10]; [11]).
Indeed, the ability to
precisely mutagenize a particular segment of coding DNA is at the cornerstone
of applications
that extend to all subfields of biotechnology, from enzyme engineering,
vaccine development, to
diagnostics developments. Recently reviewed by Csorgo et al. [10], targeted
mutagenesis
technologies can be classified across several parameters including mutagenesis
rate and span, and
the conditions in which the library of variant sequences are generated.
[0005] Only a handful of targeted mutagenesis technologies, out of the dozen
that have been
developed to date, allow for in vivo mutagenesis.
[0006] In the EvolvR system, a DlOA Cas9 nickase (Cas9n1) is used to localize
a fused error-
prone nick-translating DNA polymerase to a desired region of the genome
(Halperin et al. 2018).
Cas9n1 nicks one strand, generating a 3' end that can be extended by the fused
DNA polymerase
followed by repair [13]. Such re-polymerization results in nucleotide
misincorporation and can
cause a peak 108-fold increase in the DNA mutation rate immediately upstream
of the Cas9 nick
site, around 1 mutation per 102 nucleotides per generation [13]. By altering
the fused polymerase,
the EvolvR system can be modulated to alter the mutation rate as well as
increase or decrease the
size of the window where mutations preferentially occur. A limitation of
EvolvR is its propensity
to introduce nonsense mutations. The overall E. coli mutation rate is also
affected by the presence
of the mutagenic polymerase fusion increased between 120-fold to 555-fold, and
raising the risk
to select mutations outside the region of interest.
[0007] The T7-DIVA system relies on a mutagenic T7 RNA polymerase-Base
Deaminase
fusion (BD-T7RNAP). The mutagenesis window is delineated upstream by the T7
promoter,
and downstream by the targeting with dCas9 to serve as a "roadblock" for BD-
T7RNAP
elongation [14]. The requirement for a T7 promoter means that mutagenesis of
the target
sequence in its native genomic context is not feasible, and the Base Deaminase
mutation profile
being restricted to a single possible nucleotide substitution (for example C >
T) limits its ability
to generate tailored mutagenesis for exploring protein sequence diversity.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
4
[0008] A system developed by Simon et al. relies on engineered retrons
(another bacterial
retroelement, unrelated to DGRs). The mutagenesis activity results from
coupling the retron with
a mutagenic T7 RNA polymerase [15]. They obtain mutation rates in the targeted
region 190-
fold higher than background cellular mutation rates (up to 6.3 x 10-7 per
generation) over a
mutagenesis window restricted to 31 bp (thus covering only a maximum of 10
amino acids in a
protein-coding sequence). This limits its ability to generate tailored
mutagenesis for exploring
protein sequence diversity.
[0009] Overall, these methods suffer from a low mutagenesis rate. In addition,
none of the
techniques available to date provide control over the exact position of the
bases that are mutated
nor offer mechanisms to ensure that the mutations introduced will not generate
stop codons.
Accordingly, there exists a great need to develop additional methods, systems,
compositions, and
manufactures for generating sequence diversity and applications of using it.
This invention meets
these and other needs in certain embodiments.
SUMMARY
[0010] This invention provides an in vivo targeted diversity generation
strategy based on the use
of a mutagenic reverse transcriptase, producing mutagenized cDNA oligos
homologous to a
desired target sequence, which are then recombined within a target region
anywhere on the
genome or recombinant vector via oligo recombineering (Figure 1). A functional
implementation
of the strategy in the model laboratory organism E. coli is demonstrated,
enabling various
applications in directed evolution. In certain embodiments, the invention
allows an increase in
the in vivo mutagenic potential of any target in its native genomic context by
several orders of
magnitude, in a more precisely tuned genomic region, all of it encoded from a
compact plasmid-
borne system.
[0011] The approach relies on two critical achievements disclosed herein for
the first time: 1)
The expression of a functional plasmid-based mutagenic retroelement platform
(or system) in E.
coil (inspired from natural DGRs); and 2) The coupling of this system with
oligonucleotide
recombineering, enabling the incorporation of mutations in a target region
anywhere on the
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
genome or recombinant vector (Figure 1). This system is named DGR
Recombineering or
DGRec.
[0012] These two combined elements represent a major achievement for directed
evolution
applications, as an unprecedented number of protein sequence variants can he
produced in vivo,
5 in a highly targeted manner, from a flexible plasmid-borne system. In
certain embodiments,
virtually 20 to 500 bp DNA sequence from a host genome or recombinant vector
can be densely
mutagenized, simply by specifying the mutagenesis target into the DGR Spacer
RNA locus. In
some embodiments, a plurality of DGR spacer RNAs are used, which increase the
target size
achievable beyond the size requirements of a single DGR spacer RNA.
[0013] Moreover, the mutagenesis profile may be highly specific and
predictable. When using
a reverse-transcriptase from a DGR system, adenine positions may in certain
embodiments be
substituted with roughly 25% chance with an A, T, C or G nucleotide [7]. This
predictable
mutagenesis provides flexibility in designing both the cDNA template, as well
as giving the
option to recode the target gene sequence, placing codons that favor some
amino acids over
others.
[0014] Finally, the DGRec system has a great potential for transposability in
Eukaryotic cells.
Another bacterial retroelement (the Ec86 retron) has recently been
successfully expressed for
genetic editing applications in different eukaryotic cells including human
cells [18]¨[20].
Furthermore, despite DNA repair mechanisms significantly different in
eukaryotic and
prokaryotic cells, the method of oligonucleotide recombineering originally
developed uniquely
in bacteria has also been successfully used in eukaryotic cells [21],
suggesting that the DGRec
method should be easily transposable to eukaryotes.
[0015] In a first aspect, the invention provides methods comprising expressing
in a recombinant
cell a recombinant error-prone reverse transcriptase (RT) and recombinant
spacer RNA
comprising a target sequence; making a mutagenized cDNA polynucleotide
homologous to a DNA
sequence in the recombinant cell; expressing a recombinant recombineering
system in the
recombinant cell; and recombining the mutagenized cDNA with the homologous DNA
sequence
in the recombinant cell. In some embodiments of the methods, the recombinant
error-prone reverse
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
6
transcriptase (RT) comprises the motif I/LGXXXSQ (SEQ ID NO: 2). In some
embodiments, the
recombinant error-prone RT is an engineered recombinant error-prone RT derived
from a non-
mutagenic reverse-transcriptase; preferably the recombinant error-prone RT is
a mutant Ec86
retron reverse transcriptase comprising the replacement of the motif QGXXXSP
(SEQ ID NO: 1)
with the motif I/LGXXXSQ (SEQ ID NO: 2).
[0016] In a second aspect, the invention provides methods comprising
expressing in a
recombinant cell a recombinant DGR reverse transcriptase major subunit (RT),
recombinant DGR
accessory subunit (Avd), and recombinant DGR spacer RNA comprising a target
sequence;
making a mutagenized cDNA polynucleotide homologous to a DNA sequence in the
recombinant
cell; expressing a recombinant recombineering system in the recombinant cell;
and recombining
the mutagenized cDNA with a homologous DNA sequence in the recombinant cell.
In some
embodiments the recombinant DGR RT, recombinant DGR Avd, recombinant DGR
spacer RNA
and recombinant recombineering system are all expressed from one or a
plurality of recombinant
plasmids together comprising coding sequences for the recombinant DGR RT,
recombinant DGR
Avd, recombinant DGR spacer RNA, and recombinant recombineering system. In
some
embodiments the coding sequences for the recombinant DGR RT and recombinant
DGR Avd are
present on the same plasmid. In some embodiments the coding sequence for the
DGR RT is
operatively linked to an inducible promoter. In some embodiments the coding
sequences for the
recombinant DGR Avd and recombinant DGR spacer RNA are operatively linked to
constitutive
promoter(s). In some embodiments the recombinant DGR RT, the recombinant DGR
Avd, and
recombinant DGR spacer RNA are from the Bordetella bacteriophage BPP-1.
[0017] In some embodiments, the recombinant error-prone RT has adenine
mutagenesis activity;
preferably wherein the recombinant error-prone RT is a DGR RT comprising a
mutation that
decreases its error rate at adenine position selected from the group
consisting of: R74A and
I18 1N, the positions being indicated by alignment with SEQ ID NO: 4.
[0018] In some embodiments of the methods the mutagenized target sequence
comprises 70
base pairs. In some embodiments of the methods the mutagenized target sequence
is from 50 to
120 base pairs long. In some embodiments of the methods the mutagenized target
sequence is
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
7
from 70 to 100 base pairs long. In some embodiments of the method the
mutagenized target
sequence is from 40 to 200 (40, 50, 70, 100, 120, 150, 175, 200) base pairs
long or more, in
particular 40 to 300 (40, 50, 70, 100, 120, 150, 175, 200, 225, 250, 275 or
300) base pairs long
or more. In some embodiments of the methods, the mutagenized target sequence
comprises less
than 40 base pairs, in particular 30, 20 base pairs or less.
[0019] In some embodiments of the methods the recombinant recombineering
system is
different from DGR retrohoming. In some embodiments of the methods the
recombinant
recombineering system is single-stranded annealing protein mediating oligo
recombineering,
preferably selected from the group consisting of: the phage lambda's Red Beta
protein, the
functional homolog RecT and variants thereof such as PapRecT and CspRecT, in
particular
CspRecT. In some embodiments of the methods the recombination frequency is at
least 0.01%.
[0020] In some embodiments, the adenine content and/or position(s) in the
target sequence
and/or homologous DNA sequence in the recombinant cell is modified to modulate

recombination frequency or control sequence diversity.
[0021] In some embodiments of the methods the recombination frequency is 0.1%.
In some
embodiments of the methods the recombination frequency is at least 1%;
preferably 3% or more;
more preferably 10% or more. In some embodiments of the methods the target
sequence is a non-
bacterial sequence. In some embodiments the methods further comprise
expressing the
mutagenized sequence.
[0022] In some embodiments of the methods the recombinant cell is a eukaryotic
cell. In some
embodiments of the methods the recombinant cell is a prokaryotic cell. In some
embodiments of
the methods the prokaryotic cell is a bacterial cell. In some embodiments of
the methods the
bacterial cell expresses mutL* (dominant negative mutL). In some embodiments
of the methods
the bacterial cell is an E. coli cell. In some embodiments of the methods the
K co/i is deleted
for the two exonucleases SbcB and RecJ to increase recombineering efficiency.
[0023] In some embodiments of the methods, the recombinant cell comprises at
least two spacer
RNAs comprising a target sequence; in particular at least two DGR spacer RNAs
comprising a
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
8
target sequence; preferably wherein the multiple spacer RNAs target the same
gene in the
recombinant cell.
[0024] Also provided are libraries of mutagenized sequences made according to
a method of
this invention.
[0025] Also provided are libraries of recombinant cells comprising the library
of mutagenized
sequences.
[0026] Also provided are recombinant cells comprising recombinant coding
sequences for a
recombinant error-prone reverse transcriptase (RT) and at least one
recombinant spacer RNA
comprising a target sequence. In some embodiments the cell further comprises
the recombinant
error-prone reverse transcriptase (RT) and recombinant spacer RNA comprising a
target
sequence.
[0027] Also provided are recombinant cells comprising recombinant coding
sequences for a
recombinant DGR RT, recombinant DGR Avd, and at least one recombinant DGR
spacer RNA
comprising a target sequence. In some embodiments, the recombinant cell
comprises one or a
plurality of recombinant plasmids that together comprise the coding sequences
for the
recombinant DGR RT, recombinant DGR Avd, and recombinant DGR spacer RNA
comprising
a target sequence. In some embodiments the recombinant cell further comprises
the recombinant
DGR RT, recombinant DGR Avd, and recombinant DGR spacer RNA comprising a
target
sequence. In some embodiments the coding sequences for the recombinant DGR RT
and
recombinant DGR Avd are present on the same plasmid. In some embodiments the
coding
sequence for the DGR RT is operatively linked to an inducible promoter. In
some embodiments
the coding sequences for the recombinant DGR Avd and recombinant DGR spacer
RNA are
operatively linked to constitutive promoters. In some embodiments the
recombinant DGR RT,
the recombinant DGR Avd, and recombinant DGR spacerRNA are from the Bordetella
bacteriophage B PP- 1.
[0028] In some embodiments the target sequence comprises 70 base pairs. In
some
embodiments the target sequence is from 50 to 120 base pairs long. In some
embodiments the
target sequence is from 70 to I 00 base pairs long. In some embodiments of the
method the target
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
9
sequence is from 40 to 200 (40, 50, 70, 100, 120, 150, 175, 200) base pairs
long or more, in
particular 40 to 300 (40, 50, 70, 100, 120, 150, 175, 200, 225, 250, 275 or
300) base pairs long
or more. In some embodiments, the target sequence comprises less than 40 base
pairs, in
particular 30, 20 base pairs or less.
[0029] In some embodiments the recombinant cell further comprises a coding
sequence that
expresses a recombinant recombineering system. In some embodiments the target
sequence is a
non-bacterial sequence. in some embodiments the recombinant cell further
comprises the
expression product of the mutagenized sequence.
[0030] In some embodiments the recombinant cell is a eukaryotic cell. In some
embodiments
the recombinant cell is a prokaryotic cell. In some embodiments the
prokaryotic cell is a bacterial
cell. In some embodiments the bacterial cell expresses mutL* (dominant
negative mutL). In some
embodiments the bacterial cell is an E. coli cell. In some embodiments the E.
coli is deleted for
the two exonucleases SbcB and RecJ to increase recombineering efficiency.
[0031] The invention further provides a kit for generating targeted nucleic
acid diversity,
comprising one or a plurality of recombinant expression plasmids together
comprising coding
sequences for the recombinant error-prone reverse transcriptase (RT) and for
the at least one
recombinant spacer RNA comprising a target sequence, and coding sequence that
expresses a
recombinant recombineering system according to the present disclosure; in
particular comprising
coding sequences for the recombinant DGR RT, recombinant DGR Avd, recombinant
DGR spacer
RNA(s) and recombinant SSAP mediating oligonucleotide recombineering according
to the
present diclosure; preferably comprising the plasmid pRL014 having the
sequence SEQ ID NO:
17.
DETAILED DESCRIPTION
[0032] This disclosure reports the first targeted diversity generation system
based on the use of
a mutagenic reverse transcriptase from a natural Diversity Generating
Retroelements (DGRs)
system. An embodiment of the system is exemplified herein in the model
laboratory organism E.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
coil, enabling various applications in directed evolution setups. Based on
this initial embodiment,
several other embodiments are disclosed. The exemplified embodiment is in no
way limiting.
[0033] In certain embodiments the system of the invention comprises any
combination of one
or more of the following features:
5
1) in vivo mutagenesis, so that the library of sequence variants does not need
to be created
in vitro, through expensive oligonucleotide library synthesis, for example,
and it does not need
to be transformed into the bacterium, a technical bottleneck for flexibility
of the technique. In
certain embodiments, in vivo mutagenesis may be coupled to a selection
framework to enable
continuous evolution, which may be a powerful combination for directed
evolution.
10
2) mutagenesis of the target sequence in its native genomic context, which may
enable
transferability of the system to various targets of choice, and
transferability of the system to
different bacterial taxa.
3) tailored mutagenesis for exploring protein sequence diversity, by
incorporating an
error-prone reverse-transcriptase from a DGR system into the system, the
ability to selectively
mutate adenines into any nucleotides, allows dense mutagenesis over small
protein domain-sized
windows while maintaining a usefully low rate of nonsense mutations.
Method
[0034] In a first aspect, the invention provides methods of generating
targeted nucleic acid
diversity comprising expressing in a recombinant cell a recombinant error-
prone reverse
transcriptase (RT) and recombinant spacer RNA comprising a target sequence;
making a
mutagenized cDNA polynucleotide homologous to a DNA sequence in the
recombinant cell;
expressing a recombineering system in the recombinant cell; and recombining
the mutagenized
cDNA with the homologous DNA sequence in the recombinant cell.
[0035] The diversity generation system according to the present invention has
a modular
arrangement as the different parts of both the diversity generating module and
the recombineering
module are independent, as shown in the examples. Therefore, they can a priori
be arranged in
several ways to function. The different parts of the diversity generating
module can thus be placed
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
11
all on the same recombinant vector(s) such as plasmids, split in different
vectors, placed inside
the host cell chromosome, or placed on vectors(s) such as plasmids and inside
the host cell
chromosome. Similarly, the recombineering module can be vector-borne such as
plasmid-borne,
inside the host genome, or mixed. Furthermore, the results obtained in the
model laboratory
organism E. coil presented in the examples show that the diversity generating
module does not
require the host cell environment to function and can thus be used in various
host cells.
[0036] The recombinant error-prone reverse transcriptase (RT) and recombinant
spacer RNA
form a functional enzymatic complex able to use the spacer RNA comprising the
target sequence
as a specific template for mutagenic reverse transcription. The target
sequence called template
region (TR) corresponds to the editable part of the reverse transcribed region
of the spacer RNA.
The recombinant error-prone reverse transcriptase (RT) uses the spacer RNA
comprising the target
sequence as RNA template to carry out the polymerization of the mutagenized
cDNA
polynucleotide homologous to a DNA sequence in the recombinant cell.
[0037] The method according to the invention may use any error-prone reverse
transcriptase
(RT) capable of forming a functional enzymatic complex with the spacer RNA
that is able to use
the spacer RNA comprising the target sequence as a specific template for
mutagenic reverse
transcription in the host cell. The recombinant error-prone reverse
transcriptase (RT) may
comprise the sequence of a natural error-prone reverse transcriptase (RT), or
a variant or fragment
thereof, that is functional in the host cell. Alternatively, the recombinant
error-prone reverse
transcriptase (RT) may be an engineered error-prone reverse transcriptase
(RT), for example
engineered from a non-mutagenic reverse-transcriptase. Most canonical RT have
a conserved
motif QGXXXSP (SEQ ID NO: 1) which directly interacts with the RT template. In
all DGR RT,
this motif is modified to I/LGXXXSQ (SEQ ID NO: 2), that has been linked to
their selective
infidelity at adenine positions (Handa et al., [25]). Non-limiting examples of
error-prone reverse
transcriptase (RT) that may be used to carry out the method of the invention
include: reverse
transcriptase from Diversity Generating retroelements and engineered error-
prone reverse
transcriptase. In some embodiments, the recombinant error-prone reverse
transcriptase (RT)
comprises the motif QGXXXSP or I/LGXXXSQ. In some particular embodiments, the
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
12
recombinant error-prone reverse transcriptase (RT) is engineered from a non-
mutagenic reverse-
transcriptase by replacement of the QGXXXSP motif (canonical RT motif) with
the I/LGXXXSQ
motif (canonical DGR RT motif).
[0038] In some embodiments, the recombinant error-prone reverse transcriptase
and spacer RNA
are from Diversity-generating retroelement (DGR). Diversity-generating
retroelements (DGRs)
are a unique family of retroelements that generate sequence diversity of DNA
to benefit their hosts
by introducing sequence variations and accelerating the evolution of target
proteins. They exist
widely at least in bacteria, archae, phage and plasmid. The prototype DGR was
found in Bordetella
phage (BPP-1) and two other DGRs have been characterized in Legion ella
pneumophila and
Treponema denticola (Wu et al., [3]). There are more than a thousand distinct
DGR systems that
have been predicted bioinformatically (Paul et al., 121). The examples of the
present application
show that three components of the DGR are necessary and sufficient to assemble
a functional
diversity generation system, the reverse transcriptase major subunit RT, the
accessory subunit
such as Avd, and the spacer RNA (see Figure 1). These three components have
been identified
in the putative DGR systems indicating that various known DGR systems can be
used in the
method according to the invention. Alternative DGR systems from these various
native DGR
systems could be screened for activity, using methods that are well-known in
the art such as the
mCherry fluorescence assay herein disclosed or similar screening systems that
may be easily
derived from this system. Known methods may be adapted to design a cell-free
expression system
(Garamella et al., [27]).
[0039] The two DGR proteins necessary to generate sequence diversity of DNA,
the reverse
transcriptase major subunit (RT) and accessory subunit such as Avd, together
form the active
mutagenic reverse transcriptase complex. The DGR spacer RNA is capable of
recruiting the
inutagenic reverse transcriptase complex and priming cDNA synthesis upstream
of a modifiable
part called TR (template region) (Flanda et al., [6]). The spacer RNA
(secondary and possibly
tertiary) structure formation is important in this process in natural DGR
systems (Handa et al., [6]).
The spacer RNA sequence comprises a modifiable part called TR (template
region) corresponding
to the editable part of the reverse transcribed region, flanked by 5' and 3'
conserved regions, as
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
13
illustrated in Figure 4 for BPP-1 DGR spacer RNA. The TR may correspond to all
or part of the
reverse transcribed region. The template region (TR) which can be modified
within a flexible size
range corresponds to the target sequence in recombinant DGR spacer RNAs
according to the
present invention. The 3' region comprises a self-priming hairpin containing
two self-annealing
segments that are necessary to prime the mutagenic RT complex. The starting
point of the cDNA
polymerization corresponds to the A56 ribonucleotide in BPP-1 DGR spacer RNA
and is about 4
nucleotides upstream of the TR region in BP-1 DGR spacer RNA. This
ribonucleotide is
covalently bound to the cDNA to form a DNA/RNA hybrid comprising a short RNA
tail at the 5'
end of the cDNA (Figure 4). Using BBP-1 DGR spacer RNA coding sequence (DNA
sequence
of SEQ ID NO: 3) as reference sequence, the 5'conserved region is from
positions 1 to 20; the
template region (TR) from positions 21 to 136 ; and the 3"conserved region is
from position 137
to 158. The indicated positions are determined by alignment with BPP-1 DGR
spacer RNA
reference sequence. One skilled in the art can easily determine the sequence
of another DGR
spacer RNA and positions of the 5', TR and 3' regions in said DGR spacer RNA,
by alignment
with the reference sequence using appropriate software available in the art
such as BLAST,
CLUSTALW and others. In recombinant DGR spacer RNAs, the template region is
replaced with
a target sequence of interest. The target sequence thus corresponds to all or
a subset of the reverse
transcribed region of the DGR spacer RNA (the template region), where it is
operably linked to
the DGR spacer RNA, and in particular to its cDNA polymerization starting
point. In recombinant
DGR spacer RNAs, the template region sequence of the DGR spacer RNA is deleted
and replaced
with a target sequence of interest, usually the target sequence replaces all
the template region
sequence. The activity of a recombinant DGR RNA may be assessed using methods
known by
the skilled person such as the mCherry fluorescence assay herein disclosed.
[0040] DGR RTs are error-prone reverse transcriptases which range in size from
about 300 to
about 500 amino acids and contain RT motifs 1-7, which correspond to the palm
and finger
domain of other polymerases. DGR RT' s contain motif 2a, located between
motifs 2 and 3, which
is found among group II introns, non-LTR retroelements and retrons, but not
among other RTs
such as retroviral or telomerase RTs (review in Wu et al., [3]). DGR RTs may
be chosen from the
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
14
RVT_1 pfam family (PF0078) that carry the I/LGXXXXSQ motif in place of the
prototypical
QGXXXSP motif (positions 133-140 of the pfam HMM logo).
[0041] The accessory gene avd encodes an essential 128 aa protein that has a
barrel structure
and forms a homopentamer. The avd genes are very poorly conserved but of
similar length. Avd
protein binds the reverse transcriptase (RT), and association between these
two proteins is
required for mutagenesis. Avd is highly basic and binds to both DNA and RNA in
vitro, but
without detectable sequence specificity. Consistent with a role in nucleic
acid binding, Avd is
highly basic with the average of calculated pI's being 9.5 0.7 (review in Wu
et al., 113]).
[0042] In Bordetella bacteriophage BPP-1, the DGR reverse transcriptase is
encoded by the brt
gene (Gene ID: 2717203) which corresponds to the 987 bp sequence from the
complement of
positions 1756 to 2742 of BPP-1 complete genome sequence (GenBank/NCBI
accession number
NC 005357.1 as accessed on 20 December 2020). BPP-1 DGR reverse transcriptase
(bRT) has
the 328 amino acid sequence GenBank/NCBI accession number NP_958675.1 as
accessed on 20
December 2020 or UniProtKB accession number Q775D8 as accessed on 2 December
2020 (SEQ
ID NO: 4). BPP-1 DGR accessory protein Avd is encoded by the avd gene (Gene
ID: 2717200)
which corresponds to the 387 bp sequence from the complement of positions 3021
to 3407 of BPP-
1 complete genome sequence (GenBank/NCBI accession number NC_005357.1 as
accessed on 20
December 2020). BPP-1 Avd (bAvd) protein has the 128 amino acid sequence
GenBank/NCBI
accession number NP 958676.1 as accessed on 20 December 2020 (SEQ ID NO: 5).
One skilled
in the art can easily determine the sequence of another DGR reverse
transcriptase and accessory
protein such as Avd, by alignment with the reference sequence using
appropriate software
available in the art such as BLAST, CLUSTALW and others.
[0043] The recombinant DGR RT, the recombinant DGR accessory protein such as
Avd, and
recombinant DGR spacer RNA according to the invention may be selected from the
DGR of
Bordetella bacteriophage BPP-1, Legionella pneumophila, Treponema denticola or
their
functional orthologs (Paul et al., [2]; Wu et al., [3]) and functional
variants or fragments thereof.
[0044] By functional orthologs of Bordetella BPP-1, Legionella or Trepanoma
DGR is intended
ortholog RT, accessory protein(s) such as Avd or others, and spacer RNA
encoded by ortholog
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
genes and that form a functional enzymatic complex able to use the spacer RNA
as a specific
template for mutagenic reverse transcription.
[0045] Mutagenic reverse transcription on spacer RNA template may be assessed
in assays that
are well-known by the skilled person such as the mCherry fluorescence
disclosed in the examples.
5 Briefly, a reporter E. coli strain (sRL002) comprising a mCherry gene
expression cassette
integrated in its genome is co-transformed with a plasmid for expression of
the tested DGR RT
and Avd proteins derived from pRL014 and a plasmid for expression of the
tested DGR spacer
RNA engineered to target mCherry gene and oligonucleotide recombineering
enzyme CspRecT
derived from pAM011. The DGR RT to be assayed is cloned under the control of
the Ph1F
10 promoter inducible by DAPG, replacing bRT in pRL014. The Avd protein to
be assayed is cloned
under the control of the J23119 promoter, replacing bAVd in pRL014. The DGR
spacer RNA to
be assayed is engineered to target mCherry gene by replacing its TR region
with TR_AM011
(SEQ ID NO: 19; Figure 3). The engineered DGR is then cloned under the control
of the J23119
promoter, replacing the spacer RNA in pAM011. sRL002 co-transformed with
control plasmid
15 encoding inactivated RT are used as negative control. 48h post-induction
of protein expression,
the activity of the DGR system (RT, Avd, Spacer RNA) is measured by the
percentage of non-
fluorescent colonies. Non-fluorescent colonies are not detected in the
negative control showing
the specificity of the assay.
[0046] The use of functional orthologs of the previously characterized DGRs
might improve the
DGRec efficiency in E. coil, and the variety of DGRec variants will render the
technology more
amenable to transfer in other bacterial species or to be adapted in eukaryotic
organisms.
[0047] In some particular embodiments, the recombinant DGR RT, the recombinant
DGR Avd,
and recombinant DGR spacer RNA are from bacteria, archae, phage or plasmid
selected from the
group consisting of: Legionella or Trepanoma chromosomal DGR, Bacteroides
Hankyphage
DGR or Bordetella bacteriophage BPP-1; preferably from the Bordetella
bacteriophage BPP-1.
[0048] The recombinant DGR RT, the recombinant DGR accessory protein such as
Avd, and
recombinant DGR spacer RNA according to the invention may be from the same DGR
(e.g, the
same organism) or from different DGRs (e.g. from different organisms). In some
embodiments,
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
16
the recombinant DGR accessory protein such as Avd, and recombinant DGR spacer
RNA
according to the invention are from the same DGR; preferably from the
Bordetella bacteriophage
BPP-1.
[0049] In some particular embodiments, the recombinant DGR RT comprises the
canonical
motif FLGXXXS Q.
[0050] In some particular embodiments, the recombinant DGR RT comprises a
sequence having
at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% identity, or 100 %
identity with
SEQ ID NO: 4 preferably the sequence comprises the canonical motif I/LGXXXSQ.
[0051] In some particular embodiments, the recombinant DGR accessory subunit,
in particular
recombinant DGR Avd, comprises a sequence having at least 70%, 75%, 80%, 85%,
90%, 95%,
96%, 97%, 98%, 99% identity, or 100 % identity with SEQ ID NO:5.
[0052] As used herein, the term "variant" refers to a polypeptide comprising
an amino acid
sequence having at least 70% sequence identity with the native sequence. The
term "variant"
refers to a functional variant having the activity of the native sequence.
Functional fragments of
the native sequence or variant thereof are also encompassed by the present
disclosure. The activity
of a variant or fragment may be assessed using methods well-known by the
skilled person such
as those disclosed herein. In particular, functional RT variant, accessory
protein(s) variant and
spacer RNA variant form a functional enzymatic complex able to use the spacer
RNA as a specific
template for mutagenic reverse transcription.
[0053] The percent amino acid sequence or nucleotide sequence identity is
defined as the
percent of amino acid residues or nucleotides in a Compared Sequence that are
identical to the
Reference Sequence after aligning the sequences and introducing gaps if
necessary, to achieve
the maximum sequence identity and not considering any conservative
substitutions for amino acid
sequences as part of the sequence identity. Alignment for purposes of
determining percent amino
acid sequence identity can be achieved in various ways known to a person of
skill in the art, for
instance using publicly available computer software such as the GCG (Genetics
Computer Group,
Program Manual for the GCG Package, Version 7, Madison, Wisconsin) pileup
program, or any
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
17
of sequence comparison algorithms such as BLAST (Altschul etal., J. Mol.
Biol., 1990, 215, 403-
), FASTA or CLUSTALW. When using such software, the default parameters, are
preferably
used.
[0054] In some embodiments, the term "variant" refers to a polypeptide having
an amino acid
sequence that differs from a native sequence by the substitution, insertion
and/or deletion of less
than 30, 25, 20, 15, 10 or 5 amino acids. In a preferred embodiment, the
variant differs from the
native sequence by one or more conservative substitutions, preferably by less
than 15, 10 or 5
conservative substitutions. Examples of conservative substitutions are within
the groups of basic
amino acids (arginine, lysine and histidine), acidic amino acids (glutamic
acid and aspartic acid),
polar amino acids (glutamine and asparagine), hydrophobic amino acids
(methionine, leucine,
isoleucine and valine), aromatic amino acids (phenylalanine, tryptophan and
tyrosine), and small
amino acids (glycine, alanine, serine and threonine).
[0055] In some embodiments, the recombinant error-prone RT is an engineered
recombinant
error-prone RT derived from a non-mutagenic reverse-transcriptase such as the
Ec86 retron
reverse transcriptase. In some preferred embodiment, the recombinant error-
prone RT is a mutant
Ec86 Tenon reverse transcriptase substituted to carry the motif I/LGXXXSQ
replacing the
prototypical QGXXXSP motif. This conserved motif is present in DGR Reverse
Transcriptase
and has been linked to their selective infidelity at adenine positions (Handa
et al.,[25 ]).
[0056] In some embodiments, the recombinant error-prone RT, in particular
recombinant DGR
RT, has adenine mutagenesis activity. This means that the mutagenesis will
happen randomly at
adenine positions. An approximation of 25% chances of incorporation of any
nucleotide at
adenine (A) positions gives a convenient model to predict the variants and
library size. However,
the actual RT errors can deviate from this rule [25]: they can vary from one A
position to another,
and errors can also happen at much lower frequencies at non-A nucleotides.
[0057] In some particular embodiments, the recombinant error-prone RT, in
particular
recombinant DGR RT, comprises a mutation that modulates (increases or
decreases) its error rate.
In some preferred embodiments, the recombinant DGR RT comprises a mutation
that decreases
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
18
its error rate at adenine position selected from the group consisting of: R74A
and 1181N, the
positions being indicated by alignment with SEQ ID NO: 4. Such variants are
disclosed in Handa
et al., [25]. In some more preferred embodiments, the recombinant DGR RT
comprising the R74A
mutation is encoded by the sequence SEQ ID NO: 9; and/or the recombinant DGR
RT comprising
the 1181 mutation is encoded by the sequence SEQ ID NO: 10.
[0058] The method according to the invention uses a recombineering system
which is different
from the natural DGR recombination system ("retrohoming"). The recombineering
system is a
recombinant system comprising or consisting of a recombinant recombineering
enzyme. The
method according to the invention may use any single-stranded oligonucleotide-
based
recombineering methods that are well-known in the art (Wannier et al., 2021
1126]).
Recombineering is in vivo homologous recombination-mediated genetic
engineering. This
process allows the incorporation of genetic DNA alterations to any DNA
sequence, either in the
chromosome or cloned onto a vector that replicates in E. coli or other
recombineering-proficient
cell. Recombineering with single-strand DNA can be used to create single or
multiple clustered
point mutations, small or large deletions and small insertions.
Oligonucleotide recombineering
rely on the annealing of synthetic single-stranded oligonucleotides to the
lagging strands at open
replication forks onto targeted DNA loci (Csorg6 et al., 1110]).
Oligonucleotide recombineering
requires specific single-stranded DNA annealing proteins (SSAP) such as those
derived from the
Red/ET recombination system, a powerful homologous recombination system based
on the Red
operon of lambda phage or RecE/RecT from Rec phage. Single-stranded DNA
annealing
proteins include in particular, the phage lambda's Red Beta protein for E.
coli, the functional
homolog RecT and variants thereof such as PapRecT and CspRecT, as well as
similar systems
(Wannier et al., PNAS, 2020, 117, 13689-13698 [40]). CspRecT protein has the
270 amino acid
sequence GenB ank/NCBI accession number WP 00672078.2 as accessed on 01 June
2019 (SEQ
ID NO: 6).
[0059] In some preferred embodiments, the cell, error-prone RT such as DGR RT,
spacer RNA
such as DGR spacer RNA and recombineering system are not from the same
organism, which
means that they are never found together in nature. The error-prone RT such as
DGR RT, and
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
19
spacer RNA such as DGR spacer RNA may be from the same organism or a different
organism;
preferably the DGR RT and DGR spacer RNA are from the same organism. In some
preferred
embodiments, the recombineering system is heterologous to the error-prone RT
and spacer RNA,
which means that the recombineering system originates from a different
organism than the error-
prone RT and spacer RNA. In some preferred embodiments, the cell is
heterologous to the error-
prone RT and spacer RNA, which means that the cell originates from a different
organism than
the error-prone RT and spacer RNA. In some preferred embodiments, the
recombineering system
is also heterologous to the cell and the error-prone RT and spacer, which
means that the cell
originates from a different organism than the error-prone RT and spacer RNA
and also the
recombineering system.
[0060] In some embodiments of the method, the recombineering system or enzyme
is a
recombinant single-stranded annealing protein (S SAP) mediating
oligonucleotide
recombineering selected from the group consisting of: the phage lambda's Red
Beta protein, the
functional homolog RecT or RecT and variants thereof such as PapRecT and
CspRecT;
preferably CspRecT.
[0061] In some embodiments, the recombinant single-stranded annealing protein
(SSAP)
mediating oligonucleotide recombineering comprises a sequence having at least
70%, 75%, 80%,
85%, 90%, 95%, 96%, 97%, 98%, 99% identity, or 100 % identity with SEQ ID NO:
6.
[0062] The error-prone RT such as DGR RT uses the spacer RNA comprising the
target
sequence as template to generate a mutagenized target sequence in the form of
a cDNA
polynucleotide homologous to a DNA sequence in the recombinant cell. The
recombineering
system that is expressed in the recombinant cell will then recombine the
mutagenized cDNA
polynucleotide with the homologous DNA sequence in the recombinant cell to
generate a DNA
sequence variant comprising the mutagenized target sequence (mutagenized DNA
sequence).
The homologous DNA sequence in the recombinant cell is named mutagenesis
target,
mutagenesis window, variable region, target gene region, targeted region or
targeted sequence.
The target sequence in the spacer RNA defines the mutagenesis window on the
genome or
recombinant vector in the recombinant cell. The target sequence does not have
to be identical to
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
the mutagenesis window but can have several mismatches compared to the
targeted sequence.
As explained just below, the target sequence may comprise a recoded version or
mutated version
of the mutagenesis window to allow more flexibility in the mutagenesis of the
targeted sequence.
The reverse transcribed region must contain homologies to the targeted region
on the genome
5 or recombinant vector that will enable recombination of the cDNA.
Homology to the targeted
region can occur throughout the cDNA, or only in part of the cDNA. Several
discontiguous
homology regions might exist in the cDNA. The non-homologous region present in
between
two homology regions will then replace the corresponding sequence in the
targeted region after
recombination.
10 [0063] The target sequence may be any nucleic acid sequence of interest
for mutagenesis or
diversification using the method of the invention, including coding and non-
coding sequences.
The target sequence and mutagenized target sequence are usually from 20 to 500
bases/base
pairs. In some embodiments of the methods the target sequence and/or
mutagenized target
sequence comprises 70 base pairs. In some embodiments of the method the target
sequence
15 and/or mutagenized target sequence is from 50 to 120 base pairs long. In
some embodiments of
the methods the target sequence and/or mutagenized target sequence is from 70
to 100 base pairs
long. In some embodiments of the method the target sequence and/or mutagenized
target
sequence is from 40 to 200 (40, 50, 70, 100, 120, 150, 175, 200) base pairs or
more, in particular
40 to 300 (40, 50, 70, 100, 120, 150, 175, 200, 225, 250, 275 or 300) base
pairs long or more. In
20 some embodiments of the method the target sequence and/or mutagenized
target sequence
comprises less than 40 base pairs, in particular 30, 20 base pairs or less.
[0064] The mutagenized target sequence and mutagenesis target share a
sufficient amount of
sequence identity to allow homologous recombination to occur between them.
Minimum length
of sequence homology required for in vivo recombination are well-known in the
art (see in
particular Wannier et al., 2021 [26], Thomason, Curr. Protocol.Mol. Biol.,
2014, 106:1.16.1-39).). Homology to the targeted region can occur throughout
the the cDNA, or only in part of
the cDNA. Several discontiguous homology regions might exist in the cDNA. The
non-
homologous region present in between two homology regions will then replace
the
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
21
corresponding sequence in the targeted region on the genome or recombinant
vector after
recombination.
[0065] In some embodiments, the adenine content (percentage) and/or
position(s) in the target
sequence (TR region) and/or homologous DNA sequence (rniitagenesi s target or
targeted
sequence) in the recombinant cell is modified to modulate recombination
frequency or control
sequence diversity. In some preferred embodiments, the target sequence
contains no more than
16% of adenines.
[0066] Recombineering efficiency decreases with the number of mismatches
between the
ssDNA and the targeted sequence. As a consequence of these constraints, it may
be desirable to
maximize the identity between the cDNA produced by the RT and the targeted
sequence. This
can be done by minimizing the number of adenines in the target sequence (TR
region). It is also
possible to recode the target gene region in order to minimize the number of
adenines in the
targeted sequence, thereby enabling to also reduce the number of adenines in
the TR region. As
an example, a target sequence (TR region) containing 16% of adenines has been
used with
success. Importantly, recoding the target gene region also offers the benefit
of giving more
flexibility in the design of the TR to choose the positions that will be
mutagenized by
strategically selecting codons containing more adenines at those positions
(thanks to codon
redundancy). Finally, the TR design provides another layer of flexibility and
control in the
mutagenesis profile, when adding mismatches between the TR sequence and its
target sequence.
A TR mismatch can 'force' the incorporation of a given nucleotide other than
an adenine (thus
forcing a given amino acid in a library of protein variants), or the mismatch
can 'force' higher
variability at this position by the addition of adenines.
[0067] In some embodiments, the target sequence orientation is designed to
optimize
recombination efficiency. Maximum recombineering efficiency is achieved when
oligos anneal
to the lagging strand during DNA replication, which can be identified for a
given gene according
to its position and orientation in the chromosome relative to its origin of
replication and terminus
(a process detailed in Wannier et al., [26]). Therefore, recombineering
efficiency may be
improved by designing target sequence orientation appropriately. If a doubt
remains concerning
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
22
the lagging strand of a genetic element (for example, phages or plasmids), it
is always possible
to design both TR orientations to ensure one will be annealing to the lagging
strand of the
targeted sequence.
[0068] In some embodiments of the method, the recombination frequency is at
least 0.01%. In
some embodiments of the methods the recombination frequency is 0.1%. In some
embodiments
of the method, the recombination frequency is at least 1%; preferably 3% or
more; more
preferably 10% or more.
[0069] In some embodiments of the method, the target sequence is a non-
bacterial sequence.
[0070] In some embodiments of the method, the recombinant cell comprises at
least two spacer
RNAs comprising a target sequence; in particular at least two DGR spacer RNAs
comprising a
target sequence. In some preferred embodiments, the multiple spacer RNAs
target the same gene
in the recombinant cell.
[0071] As used herein, "expressing" a recombinant protein or RNA in a
recombinant cell (host
cell) refers to the process resulting from the introduction of the recombinant
protein or RNA in the
cell; the introduction of a nucleic acid molecule encoding said protein or RNA
in expressible form
or a combination thereof.
[0072] In some embodiments of the method, the recombinant cell comprises
coding sequences
for the recombinant error-prone reverse transcriptase (RT), the recombinant
spacer RNA(s)
comprising a target sequence, and the recombineering system; in particular the
recombinant cell
comprises coding sequences for the recombinant DGR reverse transcriptase major
subunit (RT),
the recombinant DGR accessory subunit (Avd), the recombinant DGR spacer RNA(s)
comprising
a target sequence and the recombineering system.
[0073] In some particular embodiments, at least one of the coding sequences
for the recombinant
error-prone reverse transcriptase (RT), in particular the recombinant DGR
reverse transcriptase
major subunit (RT), the recombinant DGR accessory subunit (Avd) and the
recombineering system,
such as the recombinant SSAP, in particular CspRecT, are codon optimized for
expression in the
host cell. Codon optimization is used to improve protein expression level in
living organism by
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
23
increasing translational efficiency of target gene. Appropriate methods and
softwares for codon
optimization in the desired host are well-known in the art and publically
available (see for
example the GeneOptimizer software suite in Raab et al., Systems and Synthetic
Biology, 2010,
4, (3), 215-225). Codon optimization of a nucleic acid construct sequence
relates to the (protein)
coding sequences but not to the other (non-coding) sequences of the nucleic
acid construct.
[0074] In some preferred embodiments, the coding sequence according to the
present disclosure
is codon optimized for expression in E. co/i.
[0075] In some particular embodiments, the coding sequence for the recombinant
DGR reverse
transcriptase major subunit (RT) has at least 80%, 85%, 90%, 95%, 96%, 97%,
98%, 99% identity,
or 100 % identity with any one of SEQ ID NO: 7, 9 or 10. In some particular
embodiments, the
coding sequence for the recombinant DGR accessory subunit ( Avd) has at least
80%, 85%, 90%,
95%, 96%, 97%, 98%, 99% identity, or 100 % identity with SEQ ID NO: 11. In
some particular
embodiments, the coding sequence for the recombinant CspRecT has at least 80%,
85%, 90%,
95%, 96%, 97%, 98%, 99% identity, or 100 % identity with SEQ ID NO: 14.
[0076] The coding sequences according to the present disclosure are
expressible in the
recombinant cell (host cell or host). In some embodiments, the coding sequence
is operably linked
to appropriate regulatory sequence(s) for its expression in the recombinant
cell (host cell). Such
sequences which are well-known in the art include in particular a promoter,
and further regulatory
sequences capable of further controlling the expression of a transgene, such
as without limitation,
enhancer or activator, terminator, kozak sequence and intron (in eukaryote),
ribosome-binding
site (RBS) (in prokaryote).
[0077] In some particular embodiments, the coding sequence is operably linked
to a promoter.
The promoter may be a ubiquitous, constitutive or inducible promoter that is
functional in the
recombinant cell. Non-limiting examples of promoters suitable for expression
in E. coli include:
inducible promoters such as Ph1F (inducible by DAPG), Pm (inducible by XylS),
Ptet (inducible
by Atc), Pbad (inducible by arabinose) and constitutive promoters such as
J23119 (strong
constitutive promoter), Pr (strong constitutive promoter from the Lambda
phage). In some
preferred embodiments, the coding sequence for the recombinant DGR RT is
operatively linked
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
24
to an inducible promoter, in particular Ph1F promoter comprising the sequence
SEQ ID NO: 13.
In some preferred embodiments the coding sequences for the recombinant DGR Avd
and
recombinant DGR spacer RNA(s) are operatively linked to constitutive
promoter(s). Polycistronic
expression systems that are well-known in the art may be used to drive the
expression of several
DGR spacer RNAs from the same promoter. In some preferred embodiments, the
coding sequence
for the recombinant SSAP, in particular CspRecT is operably linked to an
inducible promoter, in
particular Pm promoter/XylS activator. In some preferred embodiments, the
coding sequence is
further operably linked to a ribosome binding site.
[0078] The nucleic acid comprising the coding sequence according to the
present disclosure
may be recombinant, synthetic or semi-synthetic nucleic acid which is
expressible in the
recombinant cell. The nucleic acid may be DNA RNA, or mixed molecule, which
may further be
modified and/or included included in any suitable expression vector. As used
herein, the terms
"vector" and "expression vector" mean the vehicle by which a DNA or RNA
sequence (e.g. a
foreign gene) can be introduced and maintained into a host cell, so as to
transform the host and
promote expression (e.g. transcription and translation) of the introduced
sequence. The
recombinant vector can be a vector for eukaryotic or prokaryotic expression,
such as a plasmid,
a phage for bacterium introduction, a YAC able to transform yeast, a
transposon, a mini-circle, a
viral vector, or any other expression vector. The vector may be a replicating
vector such as a
replicating plasmid. The replicating vector such as replicating plasmid may be
a low-copy or
high-copy number vector or plasmid.
[0079] In some embodiments, the coding sequence is DNA that is integrated into
the
recombinant cell genome or inserted in an expression vector. In some
particular embodiments,
the expression vector is a prokaryote expression vector such as plasmid,
phage, or transposon.
[0080] The diversity generation system has a modular arrangement as the
different parts of both
the diversity generating module and the recombineering module are independent,
as shown in the
examples. The different parts of the diversity generating and recombineering
modules can thus
be placed all on the same recombinant vector(s) such as plasmids, split in
different vectors, placed
inside the host cell chromosome, or placed on vectors(s) such as plasmids and
inside the host cell
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
chromosome. Similarly, the recombineering module can be vector-borne such as
plasmid-borne,
encoded within the host genome, or mixed.
[0081] In some embodiments, the recombinant DGR RT, recombinant DGR Avd, and
recombinant DGR spacer RNA(s) are all expressed from one or a plurality of
recombinant
5 plasmids together comprising coding sequences for the recombinant DGR RT,
recombinant DGR
Avd, and recombinant DGR spacer RNA(s) (DGRec system plasmid(s)). In some
embodiments,
the coding sequence for the recombinant recombineering system, in particular
recombinant single-
stranded annealing protein (SSAP) mediating oligonucleotide recombineering,
more particularly
CspRecT is on a plasmid. In some particular embodiments, the recombinant DGR
RT,
10 recombinant DGR Avd, recombinant DGR spacer RNA(s), and recombinant
recombineering
system, in particular recombinant SSAP mediating oligonucleotide
recombineering are all
expressed from one or a plurality of recombinant plasmids together comprising
coding sequences
for the recombinant DGR RT, recombinant DGR Avd, recombinant DGR spacer RNA(s)
and
recombinant recombineering system, in particular recombinant SSAP mediating
oligonucleotide
15 recombineering (DGRec system plasmid( s)).
[0082] In some embodiments, the coding sequences for the recombinant DGR RT
and
recombinant DGR Avd are present on the same plasmid. In some preferred
embodiments, the
plasmid is pRL014 (Figure 2) or pRL038 (Figure 5). pRL014 has the sequence SEQ
ID NO: 17.
In some embodiments, the coding sequences for the recombinant DGR RT,
recombinant DGR
20 Avd and recombinant DGR spacer RNA are present on the same plasmid. In
some preferred
embodiments, the plasmid is pRL038 (Figure 5). pRL038 has the sequence SEQ ID
NO: 20.
[0083] In some embodiments, the coding sequences for the recombinant
recombineering system,
in particular recombinant single-stranded annealing protein (SSAP) mediating
oligonucleotide
recombineering, more particularly CspRecT, and recombinant DGR spacer RNA are
present on
25 the same plasmid.
[0084] In some embodiments, the method comprises the step of cloning the
target sequence into
a plasmid comprising an engineered DGR spacer RNA comprising a cloning
cassette in
replacement of the template region (TR), preferably operably linked to a
constitutive promoter. In
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
26
some particular embodiments, the cloning cassette comprises a CcdB gene
flanked by copies of
the same type HS restriction site in convergent orientation, forming non
identical single stranded
overhangs (sticky ends), and the target sequence is cloned into the plasmid
using a synthetic
double-stranded oligonucleotide comprising the target sequence flanked by
copies of the same type
uS restriction site in divergent orientation, or double stranded nucleotides
with 4 bases of single
stranded overhangs (sticky ends) matching the recipient vector type IIS
restriction sites overhangs.
In some particular embodiments, a first type of plasmid further comprises the
coding sequence for
the recombinant recombineering system, in particular recombinant single-
stranded annealing
protein (SSAP) mediating oligonucleotide recombineering, more particularly
CspRecT;
preferably operably linked to an inducible promoter. In some preferred
embodiments, the plasmid
is pRL021 (Figure 5). pRL021 has the sequence SEQ ID NO: 18. In some preferred
embodiments,
a second type of plasmid further comprises the coding sequence for the
recombinant DGR RT and
recombinant DGR Avd. In some more preferred embodiments, the plasmid is pRL038
(Figure
5). pRL038 has the sequence SEQ ID NO: 20. In some particular embodiments, the
plasmid
comprises at least two cloning cassettes flanked by different type IIS
restriction sites. This allows
the cloning of different targets into the same plasmid. In some preferred
embodiments, the method
uses a first type and a second type of plasmid as defined above. This allows
the mutagenesis of
multiple targets simultaneously using only two plasmids for the cloning of the
targets and
expression of the DGRec.
[0085] There is complete freedom on the placement of the mutagenesis target,
broadening the
application possibilities of DGRec mutagenesis. Notably, the target can be
anywhere in the host
chromosome, it can be on a resident plasmid (for example, it can be added onto
one of the DGRec
system plasmids), the target can also be placed on a mobile genetic element to
be transferred or
received by the host, or it can be inside a phage genome that will serve to
infect the host cell. Of
note, if the target is in a high copy number within the host cell (for
example, on a high-copy
plasmid), not all targets will be mutagenized simultaneously. To observe the
effect of a single
variant of the target gene, cells will need to be grown until they segregate
the plasmids carrying
the distinct variants. On the other hand, a higher copy number of the target
genes might favor
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
27
more numerous DGR mutagenesis events, increasing the variant library size
faster than with a
single-copy target gene per cell. Multiple copies of a targeted sequence can
also be placed in
different locations inside the chromosome, or as repeated sequences inside a
single gene to
mutagenize in both positions in parallel. The target can be mutagenized during
the lysogenic cycle
or lytic cycle of a phage.
[0086] In some embodiments, the targeted sequence (mutagenesis target) is in
the cell genome
or on a mobile genetic element such as a plasmid, transposon or a phage. The
mobile genetic
element replicates in the recombinant cell. In some particular embodiments,
the mutagenesis
target is in the cell genome, on one of the DGRec plasmid or inside a phage
genome of a
recombinant phage that infects the recombinant cell.
[0087] In some embodiments of the methods the recombinant cell is a eukaryotic
cell. In some
embodiments of the methods the recombinant cell is a prokaryotic cell.
Prokaryote cell is in
particular bacteria. Eukaryote cell includes yeast, insect cell and mammalian
cell. In some
embodiments of the methods the prokaryotic cell is a bacterial cell. In some
embodiments of the
methods the bacterial cell is an E. coli cell. The error-prone the recombinant
error-prone RT, in
particular recombinant DGR RT, and recombinant recombineering system may be
chosen so as
to achieve optimal efficiency in the recombinant cell. For example, PapRecT
might be chosen
to implement DGRec in Pseudomonas aeruginosa.
[0088] To increase recombineering efficiency, it may be advantageous to shut
off some
endogenous DNA repair genes in the host, in particular mutL/S, sbcB, and/or
recJ in bacteria.
In some embodiments of the method, at least one of the DNA repair genes is
inactivated in the
recombinant cell. In some particular embodiments, at least one of the mutL/S,
sbcB, and recJ is
inactivated. The DNA repair gene may be inactivated by standard methods that
are known in
the art such as deletion of the gene or expression or a dominant negative
mutant of the gene. In
some embodiments of the methods, the E. coli is deleted for the two
exonucleases SbcB and
RecJ to increase recombineering efficiency. In some embodiments of the
methods, the bacterial
cell expresses mutL* (dominant negative mutL), in particular mutL* is encoded
by a nucleotide
sequence comprising the sequence SEQ ID NO: 15.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
28
[0089] In some embodiments the methods further comprise expressing the
mutagenized DNA
sequence.
[0090] Because of its adenine randomization mechanism, this technique produces
libraries of
variants that vary by several orders of magnitude depending on the number of
adenines and their
placement in the coding sequence. For a TR sequence containing 7 adenines, the
potential library
size reaches 47 (¨ 104) DNA sequence variants. For a TR sequence containing 16
adenines, it
reaches 416 (¨ 109) DNA sequence variants. In terms of protein sequence
variants, library sizes
vary even more broadly, depending on the strategic placement of adenines
within codons. For
example, the different TR designed against sacB disclosed in the eaxmples are
able to generate
library sizes ranging from 109 to 1015 potential protein sequence variants.
However, there is still
potential for improvement as the naturally occuring DGR system in Bordetella
phage can
potentially generate 1013 protein sequence variants, while another DGR system
in Treponema
can potentially generate 1020 protein sequence variants.
Library, cell, vector, system, kit
[0091] Also provided are libraries of mutagenized sequences made according to
a method of
this invention.
[0092] In some embodiments, a library of distinct TR sequences is made of
sheared DNA
fragments, for example using sonication. The fragments are repaired, tailed,
and cloned into a
custom vector for TR cloning such as pRL021 or pRL038. The creation of DGRec
TR libraries
- using, for example, a TR library made of sheared DNA fragments - allows a
broader
mutagenesis approach that can span entire biosynthetic gene clusters, as each
individual DGRec
system inside cells will be mutagenizing a different portion of the DNA region
that was sheared
in the first place. A similar approach was used for the Ec86 bacterial
retroelement (Schubert et
al., biorxiv 2020, [23]).
[0093] Also provided are libraries of recombinant cells comprising the library
of mutagenized
sequences.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
29
[0094] Also provided are recombinant cells comprising recombinant coding
sequences for a
recombinant error-prone reverse transcriptase (RT) and at least one
recombinant spacer RNA
comprising a target sequence according to the present disclosure. In some
embodiments the cell
further comprises the recombinant error-prone reverse transcriptase (RT) and
at least one
recombinant spacer RNA comprising a target sequence.
[0095] In some embodiments, the recombinant cell comprises recombinant coding
sequences
for a recombinant DGR RT, recombinant DGR Avd, and at least one recombinant
DGR spacer
RNA comprising a target sequence according to the present disclosure. In some
particular
embodiments, the cell comprises one or a plurality of recombinant plasmids
that together
comprise the coding sequences for the recombinant DGR RT, recombinant DGR Avd,
and
recombinant DGR spacer RNA comprising a target sequence. In some particular
embodiments,
the cell further comprises the recombinant DGR RT, recombinant DGR Avd, and
recombinant
DGR spacer RNA comprising a target sequence. In some preferred embodiments,
the coding
sequence for the DGR RT is operatively linked to an inducible promoter. In
some preferred
embodiments, the coding sequences for the recombinant DGR Avd and recombinant
DGR spacer
RNA are operatively linked to constitutive promoters. In some preferred
embodiments, the
recombinant DGR RT, the recombinant DGR Avd, and recombinant DGR spacer RNA
are from
the Bordetella bacteriophage BPP-1. In some preferred embodiments, the coding
sequences for
the recombinant DGR RT and recombinant DGR Avd are present on the same
plasmid, in
particular pRL014.
[0096] In some preferred embodiments, the cell further comprises a coding
sequence that
expresses a recombinant recombineering system such as a recombinant single-
stranded annealing
protein (SSAP) mediating oligonucleotide recombineering, in particular
recombinant CspRecT
according to the present disclosure. In some particular embodiments, the
coding sequences for
the recombinant single-stranded annealing protein (SSAP) mediating
oligonucleotide
recombineering, in particular recombinant CspRecT, and DGR spacer RNA
comprising a target
sequence are present on the same plasmid. In some preferred embodiments the
cell further
comprises the recombinant single-stranded annealing protein (SSAP) mediating
oligonucleotide
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
recombineering, in particular the recombinant CspRecT according to the present
disclosure. In
some preferred embodiments, the cell comprises the plasmid pRL021.
[0097] In some embodiments, the recombinant cell is a eukaryotic cell. In some
embodiments,
the recombinant cell is a prokaryotic cell. In some particular embodiments,
the prokaryotic cell
5 is a bacterial cell. In some particular embodiments, the bacterial cell
is an E. coli cell. In some
embodiments the bacterial cell expresses mutL* (dominant negative mutL), in
particular mutL*
comprising the sequence SEQ ID NO: 15. In some embodiments the E. coli is
deleted for the two
exonucleases SbcB and RecJ to increase recombineering efficiency.
[0098] In some embodiments of the recombinant cell, the target sequence
comprises 70 base
10 pairs. In some embodiments of the recombinant cell, the target sequence
is from 50 to 120 base
pairs long. In some embodiments of the recombinant cell, the target sequence
is from 70 to 100
base pairs long. In some embodiments of the recombinant cell, the target
sequence is from 50 to
200 (50, 75, 100, 125, 150, 175, 200) base pairs long or more, for example 50
to 300 (50, 100,
125, 150, 175, 200, 225, 250, 275 or 300) base pairs long or more. In some
embodiments of the
15 recombinant cell, the target sequence comprises less than 50 base pairs,
in particular 40, 30, 20
base pairs or less.
[0099] In some embodiments of the recombinant cell, the target sequence is a
non-bacterial
sequence.
[0100] In some embodiments the recombinant cell further comprises the
expression product of
20 the mutagenized sequence.
[0101] Another aspect of the invention relates to a recombinant cell system
for generating
targeted nucleic acid diversity, comprising a recombinant cell according to
the present disclosure.
[0102] Another aspect of the invention relates to a first kit for performing
the method according
to the present disclosure, comprising one or a plurality of recombinant
expression vectors
25 comprising coding sequences for the recombinant error-prone reverse
transcriptase (RT), the
recombinant spacer RNA(s) comprising a target sequence, and the recombineering
system. In
some particular embodiments, the kit comprises one or a plurality of
recombinant expression
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
31
plasmids together comprising coding sequences for the recombinant DGR RT,
recombinant DGR
Avd, recombinant DGR spacer RNA(s) and recombinant SSAP mediating
oligonucleotide
recombineering (DGRec system plasmid(s)). In some preferred embodiments, the
system
comprises the plasmid pRL014.
[0103] Another aspect of the invention relates to a second kit for performing
the method
according to the present disclosure, comprising:
- a first recombinant expression plasmid comprising coding sequences for
the recombinant
DGR RT and recombinant DGR Avd according to the present disclosure;
- a second recombinant expression plasmid comprising coding sequences for
the
recombinant single-stranded annealing protein (SSAP) mediating oligonucleotide
recombineering; and
- an engineered DGR spacer RNA comprising a cloning cassette in replacement
of the
template region (TR) according to the present disclosure inserted on at least
one,
preferably both first and second recombinant plasmids.
[0104] In some embodiments of the second kit, the coding sequence for the DGR
RT is
operatively linked to an inducible promoter. In some preferred embodiments,
the coding
sequences for the recombinant DGR Avd and recombinant DGR spacer RNA are
operatively
linked to constitutive promoters. In some preferred embodiments, the
recombinant DGR RT, the
recombinant DGR Avd, and recombinant DGR spacer RNA are from the Bordetella
bacteriophage BPP- t. In some preferred embodiments, the first plasmid is
pRL014 or pRL038.
[0105] In some embodiments of the second kit, the recombinant single-stranded
annealing
protein (SSAP) mediating oligonucleotide recombineering is recombinant
CspRecT. In some
embodiments of the second kit, the recombinant single-stranded annealing
protein (SSAP)
mediating oligonucleotide recombineering is operably linked to an inducible
promoter. In some
embodiments, the cloning cassette comprises a CcdB gene flanked by copies of
the same type ITS
restriction site in convergent orientation. In some preferred embodiments, the
second plasmid is
pRL038. In some particular embodiments, the second plasmid comprises at least
two cloning
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
32
cassettes flanked by different type IIS restriction sites, thereby allowing
cloning of different
targets into the same plasmid. In some preferred embodiments, the first and
second plasmids
comprise a cloning cassette. This allows the mutagenesis of multiple targets
simultaneously using
only two plasmids for the cloning of the targets and expression of the DGR
recombineering
system.
[0106] In some embodiments, the second kit further comprises the target
sequence; preferably
a synthetic double-stranded oligonucleotide comprising the target sequence
flanked by copies of
the same type IIS restriction site in divergent orientation, forming non
complementary sticky ends.
Uses
[0107] Another aspect of the invention relates to the in vitro use of the
recombinant cell system
according to the present disclosure for the generation of targeted nucleic
acid diversity.
[0108] Another aspect of the invention relates to a method of engineering a
protein having a
desired function, comprising;
- providing a sequence coding for a protein;
- generating a library of mutagenized sequences of the protein according to a
method of the
present disclosure;
- expressing the library: preferably in cell;
- screening the activity of the expressed proteins; and
- identifying protein(s) having the desired function.
[0109] The activity of the expressed proteins may be assessed by assays that
are known in the
art such as colorimetric enzymatic assays, or the binding of the expressed
protein to a desired
partner can be assessed by assays that are known in the art such as phage
display, bacterial display
or yeast diplay.
[0110] The DGRec in vivo targeted diversity system could be implemented in a
vast number of
applications in which one wants to improve, or change, a given protein
function. Because of the
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
33
unique DGR mechanism of adenine mutagenesis, diversity can be targeted with
precision and
multiple amino acid changes can occur in a single recombination event within
the mutagenesis
window (Figure 3C). The mutagenesis window being flexible in size, DGRec can
be applied to
mutagenize a specific protein location, such as an enzyme active site, or an
exposed domain
mediating interaction. For example, DGRec can be used to diversify the surface-
exposed domains
of bacterial receptors to create variants disrupting phage attachment,
creating bacterial strains
resistant to phage(s). The DGRec system can also be used to extend the host-
range of a phage by
mutagenesis of its tail fiber, thus reproducing and extending the ability of
natural phage DGR
systems onto phages devoid of these retroelements. In addition, the
predictability of adenine
mutagenesis to drive the mutagenesis provides the option of recoding the
target region to optimize
the mutagenesis profile within the window, mutating more intensively some
critical amino acids
position of choice. The ability to multiplex the targeted mutagenesis window
opens the possibility
of driving intense mutagenesis on different genomic locations in parallel.
Finally, the creation of
DGRec libraries - using, for example, a library made of sheared DNA fragments -
allows a
broader mutagenesis approach that can span entire biosynthetic gene clusters.
[0111] The practice of the present invention will employ, unless otherwise
indicated,
conventional techniques, which are within the skill of the art. Such
techniques are explained fully
in the literature.
[0112] The invention will now be exemplified with the following examples,
which are not
limitative, with reference to the attached drawings in which:
FIGURE LEGENDS
[0113] Figure 1 shows a non-limiting general scheme for practicing certain
embodiments of the
invention.
[0114] Figure 2 shows plasmid constructs successful for expression of a
synthetic DGR system.
CmR: chloramphenicol resistance gene; KanR: kanamycin resistance gene;
CspRecT: single-
stranded annealing protein mediating oligo recombineering; rnutL*: a dominant
negative inutL
allele shutting down the DNA mismatch repair system, increasing recombineering
efficiency.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
34
[0115] Figure 3 - DGRec mutagenesis with varying TR targets. A) Serial
dilution of two
replicate cultures plated after 48h DGRec induction, showing the emergence of
sucrose-resistant
colonies with a functional DGRec system targeting the sacB gene (pRL014 +
pAM009), but not
in a negative control containing an inactivated RT enzyme (pRL034 + pAM009).
B) Colonies
after 48h DGRec induction of plasmids pRL014 + pAM011 targeting the mCherry
gene in the
host chromosome. The picture is an overlay of mCherry fluorescence) with
bright field. Colonies
indicated with white arrows have lost their mCherry fluorescence due to DGRec
mutagenesis. C)
and D) The TR sequences used in the DGRec system are displayed in a box above
its target
region. For each TR tested, a selection of a few DGRec mutants obtained by
Sanger sequencing
of the target region are aligned to the reference. Mutations are highlighted
by grey boxes on
nucleotides, and adenine positions in the TR target are highlighted in grey.
The mutations
obtained predominantly follow the known DGR mutagenesis pattern of adenine
mutagenesis.
Figure 3C: TR_AM009 (SEQ ID NO: 24); TR_AM009 target wt/nt strand 1 (SEQ ID
NO: 43);
TR AM009 target wt/nt strand 2 (SEQ ID NO: 44); TR AM009 target wt/aa (SEQ ID
NO: 45);
Variant-TR_AM009 n 1 to 4 (SEQ ID NO: 46 to 49). TR_AM010 (SEQ ID NO: 25);
TR_AM010
target wt/nt strand 1 (SEQ ID NO: 50); TR_AM010 target wt/nt strand 2 (SEQ ID
NO: 51);
TR AM010 target wt/aa (SEQ ID NO: 52); Variant-TR AM010 n 1 to 4 (SEQ ID NO:
53 to
56). TR RL016 (SEQ ID NO: 42); TR RL016 target wt/nt strand 1 (SEQ ID NO: 57);
TR
RL016 target wt/nt strand 2 (SEQ ID NO: 58); TR_ RL016 target wt/aa (SEQ ID
NO: 59);
Variant-TR_ RL016 n 1 to 4 (SEQ ID NO: 60 to 64). Figure 3D: TR AM004 (SEQ ID
NO:
22); TR_AM004 target wt/nt strand 1 (SEQ ID NO: 64); TR_AM004 target wt/nt
strand 2 (SEQ
ID NO: 65); TR AM004 target wt/aa (SEQ ID NO: 66); Variant-TR_AM004 (SEQ ID
NO: 67).
TR_AM007 (SEQ ID NO: 23); TR_AM007 target wt/nt strand 1 (SEQ ID NO: 68);
TR_AM007
target wt/nt strand 2 (SEQ ID NO: 69); TR_AM007 target wt/aa (SEQ ID NO: 70);
Variant-
TR_AM007 n 1 to 4 (SEQ ID NO: 71 to 74). TR AM011 (SEQ ID NO: 19); TR AM011
target
wt/nt strand 1 (SEQ ID NO: 75); TR_AM011 target wt/nt strand 2 (SEQ ID NO:
76); Variant-
TR_AM011 n 1 to 4 (SEQ ID NO: 77 to 80).
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
[0116] Figure 4 - Spacer RNA structure in the DGRec system. A) Annotation of
the Spacer
RNA important features. Two grey boxes indicate the self-annealing segments
necessary to prime
the Reverse transcriptase complex. A triangle shows the A56 nucleotide which
forms the starting
point of the cDNA polymerization. B) Cartoon of the 3D conformation adopted by
the spacer
5 RNA allowing recruitment/priming of the Reverse Transcriptase complex.
[0117] Figure 5 - Plasmid map of pRL038 and pRL021. Detailed view section that
enables
fast cloning of new TR sequences inside the spacer RNA by Golden Gate
assembly. T symbols
indicate terminators. Brackets on each plasmid indicate ccdB cloning site.
[0118] Figure 6 - Multiplex DGRec mutagenesis. A) A selection of DGRec mutants
10 sequenced after 48h DGRec induction of plasmids pAM030 + pAM001. The
results show that
pAM030, derived from the pRL038 plasmid, is functional to drive DGRec
mutagenesis through
its encoded spacer RNA locus. B) Sequence of two clones obtained after 48h
DGRec induction
of plasmids pAM030 + pAM011, which contain a TR driving mutagenesis in the
sacB and
mCherry genes, respectively. These clones, obtained by combining the sucrose
and mCherry
15 fluorescence assay, were simultaneously mutagenized in both target
regions. Figure 6A:
TR AM009 (SEQ ID NO: 24); TR AM009 target wt/nt strand 1 (SEQ ID NO: 43); TR
AM009
target wt/nt strand 2 (SEQ ID NO: 44); TR_AM009 target wt/aa (SEQ ID NO: 45);
Variant-
TR_AM009 n 5 to 8 (SEQ ID NO: 80 to 84). Figure 6B: TR_AM011 (SEQ ID NO: 19);

TR_AM011 target wt/nt strand 1 (SEQ ID NO: 85); TR AM011 target wt/nt strand 2
(SEQ ID
20 NO: 86); Variant-TR AM011 n 5 to 6 (SEQ ID NO: 87 to 88). TR_AM009 (SEQ
ID NO: 24);
TR_AM009 target wt/nt strand 1 (SEQ ID NO: 89); TR AM009 target wt/nt strand 2
(SEQ ID
NO: 90); TR_AM009 target wt/aa (SEQ ID NO: 45); Variant-TR_AM009 n 9 to 10
(SEQ ID
NO: 91 to 92).
[0119] Figure 7 - Amplicon sequencing of mutagenesis target regions. A) A
selection of a
25 few sucrose-resistant mutants of the sacB gene obtained after 48h DGRec
mutagenesis inside the
sacB gene and Sanger sequenced are aligned over the same mutagenesis target
analyzed by
Illumina amplicon sequencing after 48h DGRec induction (and no selection). The
mutagenesis
target sequence is highlighted in grey as well as adenine positions within
this window. The
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
36
mutations obtained predominantly follow the known DGR mutagenesis pattern of
adenine
mutagenesis and remain well-delineated within the target region. B) Same
Illumina sequencing
analysis plots for different targeted regions. Figure 7A: mutagenesis target
(SEQ ID NO: 24);
wt/nt strand 1 (SEQ ID NO: 43); wt/nt strand 2 (SEQ ID NO: 44); wt/aa (SEQ ID
NO: 45);
Variant n'l to 4 (SEQ ID NO: 46 to 49). Sequence including mutagenesis target
shown below
plot (SEQ ID NO: 93).
[0120] Figure 8 - Phage host-range engineering. A) Cartoon representation of
various DGRec
strategies to manipulate phages and phage/host interactions. B) Selection of a
few lamB mutants
resistant to X, phage attachment obtained by DGRec mutagenesis. C) Selection
of a few gpJ
mutants able to infect a resistant lamB clone obtained by DGRec mutagenesis.
Figure 8B:
TR_RL055 (SEQ ID NO: 101); TR RL055 target wt/nt strand 1 (SEQ ID NO: 107); TR
RL055
target wt/nt strand 2 (SEQ ID NO: 108); TR RL055 target wt/aa (SEQ ID NO: 109;
Variant-
TR_RL055 n 1 to 7 (SEQ ID NO: 110 to 116). Figure 8C: TR_RL029 (SEQ ID NO:
97);
TR_RL029 target wt/nt strand 1 (SEQ ID NO: 117); TR_RL029 target wt/nt strand
2 (SEQ ID
NO: 118); TR_RL029 target wt/aa (SEQ ID NO: 119; Variant-TR_RL029 le 1 to 7
(SEQ ID NO:
120 to 126).
EXAMPLES
Material and Methods
Bacterial strains, plasmic's, media, and growth conditions
[0121] All bacterial strains and plasmids used in this work are listed in
Table 4. For plasmid
propagation and cloning the E. coli strain MG1655* was used. All the strains
were grown in
lysogeny broth (LB) at 37 C and shaking at 180 RPM_ For solid medium, 1.5 %
(w/v) agar was
added to LB. The following antibiotics were added to the medium when needed:
50 lug m1-1
kanamycin (Kan), 30 jag m1-1 chloramphenicol (Cm). For counterselection with
sacB, 5% of
sucrose was added to the plating media before pouring.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
37
Cloning procedures
[0122] Deletions were obtained by clonetegration [34], and combined by P1
transduction [35].
The sacB-mCherry cassette was inserted using OSIP plasmid pFD148.
[0123] Plasmids were constructed by Gibson Assembly [36] unless specified.
Plasmid
sequences are presented in the sequence listing, plasmid maps are displayed in
Figure 2 and
Figure 5, and the relevant recoded gene sequences are listed in Table 5.
[0124] Novel TR sequences can be cloned on pRL021 or pRL038 (Figure 5) using
Golden
Gate assembly with BsaI restriction sites [37]. The plasmids contain a ccdB
counter-selection
cassette in between two BsaI restriction sites [38]. This ensures the
selection of clones in which
a TR was successfully added to the plasmid during cloning. All oligonucleotide
sequences used
for TR assembly are listed in Table 6.
Induction of the DGRec system
[0125] To perform mutagenesis, the DGRec recipient strains listed in Table 4
were transformed
with the two DGRec plasmids via electroporation and plated on Kan and Cm
selective media.
After overnight growth at 37 C, colonies were picked into 1 mL of LB Kan, Cm
in a 96-well
plate and allowed to grow 6-8 hours. These un-induced pre-cultures were
diluted 500-fold into
1 mL of LB Kan, Cm, containing 1 mM m-toluic acid and 50 p M DAPG (inducing
recombineering module and the RT, respectively) in a 96 deep-well plate, and
allowed to grow
for 24 hours at 34 C with shaking at 700 rpm, reaching stationary phase. This
500-fold dilution
and growth was repeated once more for all cultures to perform a 48h time
point.
Evaluation of recombination efficiency
[0126] Sucrose assay: After 24h and 48h DGRec mutagenesis targeted at sacB
(plasmids
pRL014 combined with pRL016, pAM004, pAM007, pAM009 or pAM010 in strain
sRL002,
compared with negative control reverse transcriptase plasmid pRL034 effect),
the cells were
serially diluted in LB and plated on selective media supplemented with and
without 5% sucrose.
The fraction of sucrose-resistant cells per sample were estimated for 4
biological replicates. 8
sucrose-resistant colonies were sent for Sanger sequencing and were confirmed
to be DGRec
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
38
mutants. Of note, the spontaneous rate of sacB mutations is elevated in this
assay (reaching 10-4
in the negative control samples), and some spontaneous sacB mutant could
outcompete other cells
during the 48h growth, resulting in a large uncertainty in the recombination
efficiency evaluation
(value ranges reported in Figure 3C).
[0127] mCherry fluorescence assay: After 48h DGRec mutagenesis targeted at
mCherry
(plasmids pRL014+pAM011 in strain sRL002, compared negative control plasmids
pRL034+pAM011), cultures were diluted and plated on LB plates to obtain ¨200
colonies per
plate. Plates were then imaged using an Azure Biosystems Fluorescence Imager,
and images were
processed by ImageJ [39]. Colonies with and without fluorescence were counted
for 4 biological
replicates. 8 non-fluorescent colonies (only seen in pRL014+pAM011 replicates)
were sent for
Sanger sequencing and were confirmed to be DGRec mutants.
Production of DGRec mutated samples
[0128] Induction of the DGRec system (see all DGRec constructs in Table 4) was
performed
as previously described: the DGRec recipient strains were transformed with the
two DGRec
plasmids via electroporation and plated on Kan and Cm selective media. After
overnight growth
at 37 C, colonies were picked into 1 mL of LB Kan, Cm in a 96-well plate and
allowed to grow
6-8 hours. These un-induced pre-cultures were diluted 500-fold into lmL of LB
Kan, Cm,
containing 1 mM m-toluic acid and 50 MM DAPG (inducing recombineering module
and the RT,
respectively) in a 96 deep-well plate, and allowed to grow for 24 hours at 34
C with shaking at
700 rpm, reaching stationary phase. This 500-fold dilution and growth was
repeated once more
for all cultures to reach 48h of induction.
Genomic and plasmid DNA extraction
[0129] Genomic DNA was extracted from mutagenized strains using the NucleoSpin
96 Tissue,
96-well kit for DNA from cells and tissue (Macherey-Nagel), following
manufacturer's protocols.
When the DGRec targeted region was located on a plasmid, then plasmids were
extracted using
the QIAprep Spin Miniprep Kit (Qiagen).
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
39
Example 1: Expression of a functional plasmid-based DGR system in Escherichia
co/i.
[0130] Heterologous expression of a protein is always a challenge, due to the
possible problems
in protein folding, toxicity, or lack of function in the new host. However,
making a system work
in E. coli multiplies its usability, as these bacteria have become by far the
most widely used
bacterial chassis for genetic applications. Indeed, the fact that DGRs are
naturally absent from
common laboratory bacterial and phage cloning strains[2] is probably the main
reason why these
attractive retroelements have not yielded any genetic tools so far.
[0131] Several approaches were employed by the inventors to express a
functional reverse
transcriptase complex in E. coli, and herein described is the one that was
successful in the
inventor's hands: a `refactored' version of the native DGR system from the
Bordetella phage
BPP-1 was built, so that each of the DGR components are expressed
independently from each
other. There are three elements in the system that generate mutagenic cDNA:
the reverse
transcriptase major subunit (bRT), the reverse transcriptase accessory subunit
(Avd), and the
spacer RNA. These three elements are combined into an operon structure in the
native DGR
structure. In the method used in this example each of these elements was
cloned under a separate
promoter (Figure 2).
[0132] This setup allowed for more flexibility in tuning the relative amount
of each element: the
bRT protein was expressed under a Ph1F promoter (inducible by DAPG), while the
Avd accessory
protein and the spacer RNA were both expressed under a strong constitutive
promoter (J23119)
thus providing these components (required in higher copy numbers) in excess
for the system.
Furthermore, the bRT and avd coding sequences were codon-optimized for
expression in E. coll.
[0133] Example 2 shows that this approach was successful to assemble a
functional RT-avd
enzymatic complex in E. coil, able to use the spacer RNA as a specific
template for mutagenic
reverse transcription.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
Example 2: Coupling DGR cDNA production with oligonucleotide Recombineering
[0134] Natural DGRs require a recognition sequence called IMH flanking their
target sequence
to enable the retrohoming' step (the introduction of mutations in the target
region) [1], [9]. The
inventors looked into oligonucleotide recombineering as a way to entirely
bypass this poorly-
5 understood `retrohoming' step of natural DGRs.
[0135] Oligo-mediated recombineering uses incorporation of genomic
modifications via
oligonucleotide annealing at the replication fork onto target genomic loci
[10]. A
recombineering module was added onto one of the plasmids used for DGR
expression (Figure
2), and the inventors screened for activity in an E. coli strain deleted for
SbcB and RecJ, two
10 exonucleases shown to reduce recombineering efficiency [23].
[0136] For detecting the mutagenesis activity of the system, a sacB counter-
selection assay in
the recipient E. coli strain was used. SacB, encoded in the host genome, makes
sucrose toxic to
the cells, a way to negatively select them (see methods for detail). By
engineering the DGR RNA
to target the SacB gene, the appearance of mutants resistant to sucrose in the
population could be
15 detected. Those mutants were detected upon induction of the plasmid-
borne DGR system, and
Sanger sequencing in the area targeted by the synthetic DGR unmistakably
showed that a majority
of these mutants resulted from DGR mutagenesis activity (Figure 3). Indeed,
mutagenesis
happened primarily at adenine positions, the hallmark pattern of DGR systems.
Moreover, none
of such mutants was ever obtained using an inactive RT variant (Table 1;
Figure 3A).
25
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
41
[0137]
DGRec component Obtention of confirmed
DGR mutants upon
inactivation
Reverse Transcriptase No
Avd No
TR No
CspRecT No
mutL* Yes
AsbcB + ArecJ in host Yes
genome
Table I - Essentiality of DGR components. The DGR components were inactivated
as follows.
Reverse Transcriptase: a SMAA substitution in the enzyme active site (plasmid
pRL034); Avd:
removal from plasmid (plasmid pRL035); TR: placing of a TR with no
corresponding target inside
host (plasmid pAM001); CspRecT: removal from plasmid (plasmid pAM014); ninth*:
removal
from plasmid (plasmid pAM015); AsbcB + ArecJ in host genome: strain without
deletions (strain
sRL003). To look for DGR mutants, the sacB target TR region from 4 sucrose
resistant colonies
were amplified by PCR and sent for Sanger sequencing. Any mutations in the
target region was
counted as a 'confirmed DGR mutant'.
[0138] Recombination efficiency within the sacB gene can be estimated thanks
to a sucrose
counter-selection assay (see methods for details). Of note, TR_AM010 and
TR_AM009 which
target the active site position of SacB had much higher efficiencies (reaching
10% in some
samples) than TR RL016 targeting the C-terminal region of SacB, consistent
with the fact that a
larger number of DGRec variants will inactivate the enzyme within its active
site (Figure 3C).
[0139] The mCherry mutagenesis provides a different and more robust assay to
estimate the
DGRec recombination efficiency (no selection required), by counting the
fraction of cells losing
the mCherry fluorescence (see methods for details) (Figure 3B). The average
recombination
efficiency obtained from 4 biological replicates after 48h of DGRec
rnutagenesis is 3.6%
(standard deviation 1.6%) (Figure 3C). Of note, like for the sucrose assay,
this value is
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
42
necessarily an underestimation of the actual mutagenesis frequency, since only
the subset of
mCherry variants that have lost fluorescence are counted in this process.
101401 The essentiality of the various DGRec components was assessed, by
removing or
inactivating these components one by one and testing for the obtention of
DGRec mutants. The
drop in recombination efficiency when removing those components was further
assessed by
Amplicon sequencing (Example 4).
[0141] These results confirm the ability of the DGRec system to mutagenize
multiple targets,
in different genes, and using mutagenesis windows of varying sizes (Figure 3).
Example 3: Multiplex DGRec mutagenesis
[0142] The sucrose and mCherry fluorescence assay were combined to mutagenize
both target
regions simultaneously. pAM030, derived from the pRL038 plasmid contains bRT,
bAvd and
DGR RNA targeting TR_AM009. pAM001 contains CspRecT recombineering module and
no
DGR RNA target in the genome. pAM011 contains CspRecT recombineering module
and DGR
RNA targeting TR AM011 (mCherry). DGRec mutants were sequenced after 48h DGRec
induction of plasmids pAM030 + pAM001. The results show that pAM030, derived
from the
pRL038 plasmid, is functional to drive DGRec mutagenesis through its encoded
spacer RNA
locus (Figure 6A). DGRec mutants were sequenced after 48h DGRec induction of
plasmids
pAM030 + pAM011 which contain a TR driving mutagenesis in the sacB and mCherry
genes,
respectively. These clones, obtained by combining the sucrose and mCherry
fluorescence assay,
were simultaneously mutagenized in both target regions (Figure 6B).
[0143] These results confirm the ability of the DGRec system to mutagenize
multiple targets
simultaneously in different genes.
Example 4: Amplicon sequencing of mutagenesis target regions
[0144] Sequencing results confirmed and strengthened the previous observations
of DGrec
mutagenesis using Sanger sequencing shown in Example 2 (Figure 7A). A high
mutagenesis
well-constrained within the targeted region, and mainly concentrated on the
RNA template
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
43
adenine positions was observed. Moreover, deep sequencing allowed to detect
mutagenesis on
multiple gene targets without the need for selection of the mutants (Figure
7B).
[0145] After 48h induction of the DGRec system, between 1,000 and up to 10,000
gene variants
could he detected inside the targeted region (a large underestimate of the
actual number of
variants), with variant genotypes typically representing 20 to 100% of all
genotypes sequenced
within the cell population.
[0146] A measure of the DGRec mutagenesis in each sample can be obtained from
a measure
of the increase in mutation rate within the DGRec targeted region (mutation
rate of adenines
within the targeted region divided by the mutation rate of adenines outside of
the targeted region).
This value is named "Amut" in the following paragraphs. Note that mutations
outside of the target
region might be sequencing mistakes rather than actual mutation. This metric
is thus a measure
of signal over background rather than a measure of how much DGRec increase
mutation rate over
the spontaneous mutation rate of E. coll. Nonetheless this metric enables to
compare the DGRec
mutagenesis efficiency of different samples.
[0147] In the following, for each sample analyzed, the plasmids and E. coil
strains are indicated
under brackets.
Essentiality of DGRec components
[0148] Samples lacking a functional Reverse Transcriptase [pRL034+pRL016 in
sRL002],
lacking the AVD protein [pRL035+pRL016 in sRL002], or lacking CspRecT
[pRL014+pAM014
in sRL002] show no detectable DGRec mutagenesis (Amut on average 1.56 for all
these samples),
confirming the essentiality of these components of the system.
SbcB and RecJ DNA repair gene shutdown effect
[0149] On one targeted region, the deletions of sbcB and recJ exonucleases
were assessed and
show that their absence resulted in a reduction of DGRec efficiency of about 2-
fold (Amut = 97.0
with deletions LpRL014+pAM009 in sRL0021 against 52.5 without deletions
[pRL014+pAM009
in sRL003]).
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
44
Reverse Transcriptase variants with altered adenine infidelity
[0150] The Reverse Transcriptase variant I18 1N is functional and shows, as
expected, a reduced
level of DGRec mutagenesis ([pRL037+pRL031 in sRL002] Amut = 9.0 compared to
Amut =
36.3 by the wild type Reverse Transcriptase [pRL014+pRL031 in sRL0021).
[0151] The Reverse Transcriptase variant R74N did not show detectable levels
of DGRec
mutagenesis [pRL036+pRL031 in sRL002] (Amut = 1.9), but would require
additional controls
to ensure that this variant is functional for the production of cDNA.
[0152] In conclusion, these results support previous results that these
variants of the DGR
reverse transcriptase have a reduced error rate at adenine positions in the
RNA template.
pRL038 backbone compared to the pRL021 backbone
[0153] These two plasmids have a cloning site allowing the addition of
different TR sequences
and their subsequent transcription as part of the DGR RNA. pRL038 is a medium
copy plasmid,
pRL021 is a high copy plasmid, and the DGR RNA surroundings are entirely
different in those
two plasmids, so that one could expect differences in the DGRec mutagenesis
resulting from these
two backbones_ It was observed that SacB mutagenesis was 3 to 4 times higher
when driven from
the pRL021 backbone [pRL014+pAM009 in sRL002] (Amut = 97.0) than from the
pRL038
backbone [pAM030+pAM001 in sRL002] (Amut = 37.3).
[0154] A caveat in this comparison, however, is that for the pRL038 DGR RNA
expression, the
partner plasmid was also producing a distinct DGR RNA with no targeted regions
within the cell
(pAM001 plasmid), which might have competed for the reverse transcriptase
availability.
Double loci targeting
[0155] Two DGR RNA were introduced in E. coli on two different backbones:
pRL038 and
pRL021. The first was programmed to target sacB and the second mCherry
[pAM030+pAM011
in sRL002]. These DGR RNAs allowed to detect mutagenesis with good efficiency
of both a sacB
(Amut = 33.14) and mCherry targeted regions (Amut = 19.47), showing that two
DGR RNA
expressed simultaneously in the same cells can both be active.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
Template RNA self-targeting
[0156] Since the targeting in the DGRec system is solely driven by homology to
the cDNA
oligos, as opposed to the IMH requirement of the natural DGR systems, it was
hypothesized that
the DGRec system might be able to mutagenize the TR sequence carried on the
DGRec plasmid,
5 in addition to its target region within the E. coli chromosome. Indeed,
it was detected self-
targeting of the pRL021backbone plasmid (Amut = 93.5) and of the pRL038
backbone plasmid
(Amut = 113.8) within [pAM030+pAM011 in sRL002] cells.
[0157] Since the mutagenesis of the desired target could be obtained at high
efficiency in some
of those samples, the self-targeting of the DGR RNA is not an obstacle for the
DGRec system.
10 However, it should be taken into consideration in setups that would
require longer mutagenesis
induction times, as the TR sequence will likely mutate and degenerate over
time, gradually losing
its adenine nucleotides.
[0158] Note that it is also possible to take advantage of this phenomenon in a
directed evolution
setup where the TR and the target sequence will co-evolve to reach the desired
phenotype. In such
15 a setup, the sequence landscape explored by the DGRec system would
initially be large,
proportionally to the number of adenines in the TR. As adenines are
progressively lost from the
TR, the diversity of sequences that are explored in the target (VR) will
reduce progressively. This
phenomenon might help refine the desired activity without losing too many
sequences to the
exploration of invalid sequence space. Note that in this process, when an
adenine in the TR is
20 mutated to another base, this mutation will be transferred at a high
rate to the target, thereby
maintaining homology between TR and target during this evolutionary process.
One can thus
design TR sequences that contain A-rich segments, enabling a vast exploration
of the sequence
space and a progressive refinement over cycles of directed evolution.
DGRec mutagenesis on a plasmid target
25 [0159] It was possible to detect the mutagenesis of a target region
located inside the GFP gene
carried by a plasmid (pSC101 origin compatible with the DGRec plasmids, the
pAM020 plasmid)
(Figure 7B). Interestingly, both orientations of the TR showed similar levels
of mutagenesis
(Amut = 6.4 in forward direction [pRL014+pAM023+pAM020 in sRL001], Amut = 14.9
in
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
46
reverse direction [pRL014+pAM024-hpAM020 in sRL001_1), suggesting that the
plasmid
replication system produces single stranded DNA available for recombination on
both strands.
This is in contrast to the known preference of recombineering for the lagging
strand when
targeting the chromosome.
[0160] Note that the self-targeting of the DGR RNA described in the section
above also occurs
on a plasmid, demonstrating the ability of the DGRec system to mutagenize
targeted regions on
plasmids with different backbones (p15A on and pUC on plasmids).
Muta genesis of an integrated prophage
[0161] Using a strain that was lysogenized with the X phage (strain sRL004),
high mutagenesis
levels inside the targeted region of that phage [pRL014+pRL029 in sRL004] were
detected (Amut
= 65.3) (Figure 7B).
Example 5: TR and targeted region design rules
[0162] Next, the rules helping to properly design a TR sequence to tune the
DGRec system
towards producing the desired mutagenesis pattern were refined.
Top and bottom strands relation to the lagging strand
[0163] The Reverse Transcriptase can only randomize adenine nucleotides from
the template
RNA, but according to whether the TR sequence targets the coding or template
strand of the target
ORF, it can result in mutating either the A or T nucleotides of the coding
sequence. This modifies
the attainable amino acids, and which ones get mutated. If the target protein
can be moved in
forward or reverse orientation to be on the correct strand for mutagenesis,
then even if limited to
mutating the lagging strand, the DGRec system gives the option to target As or
Ts.
Attainable amino acids
[0164] "Attainable" amino acids were defined as the amino acids one can access
using DGRec
from a codon by mutating As (or Ts when targeting the reverse complement
strand). For example,
TTA can be mutated into 4 codons (TTA, TTG, TTC, TTT) and has 2 -attainable
amino acids":
Leu (TTA/TTG) and Phe (TTC/TTT).
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
47
[0165] If randomizing Ts when targeting the reverse complement strand,
attainable amino acids
are very different. For instance, TTA has 13 "attainable amino acids reverse".
[0166] The DGRec codon mutagenesis table (Table 2) shows, for each codon, the
attainable
amino acids, number of amino acids, and probability of attaining each amino
acids (assuming
random mutations), in forward and reverse orientation. There are large
differences in the number
of attainable amino acids between codons, even when they code for the same
amino acids. For
instance, AGA and CGC both code for Arginine, and have 6 and 1 attainable
amino acids.
[0167]
Triplet Amino Number of Number of Triplet Amino Number of Number of
acid attainable attainable
acid attainable attainable
aas fwd aas rvs aas fwd
aas rvs
TTT F 1 21 (*) TCT S 1
4
TTC F 1 15 TCC S 1
4
TTA L 2 13(*) TCA S 1
4
TTG L 1 14(*) TCG S 1
4
TAT Y 4 8(*) TGT C 1
6(*)
TAC Y 4 4 TGC C 1
4
TAA * 7 (*) 4 (*) TGA * 3 (*) 3
(*)
TAG * 4(1) 4(') TGG W 1
3
CTT L 1 5 CCT P 1
1
CTC L 1 4 CCC P 1
1
CTA L 1 4 CCA P 1
1
CTG L 1 4 CCG P 1
1
CAT H 4 2 CGT R 1
1
CAC H 4 1 CGC R 1
1
CAA Q 5 1 CGA R 1
1
CAG Q 4 1 CGG R 1
1
ATT I 4 7 ACT T 4
1
ATC I 4 4 ACC T 4
1
ATA I 5 4 ACA T 4
1
ATG M 3 4 ACG T 4
1
AAT N 15 2 AGT S 4
2
AAC N 15 1 AGC S 4
1
AAA K 21(*)
1 AGA R 6(*) 1
AAG K 14(*) 1 AGG R 3
1
GTT V 1 5 OCT A 1
1
GTC V 1 4 GCC A 1
1
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
48
GTA V 1 4 GCA A 1
1
GTG V 1 4 GCG A 1
1
GAT D 4 2 GOT 0 1
1
GAC D 4 1 GGC G 1
1
GAA E 5 1 GGA G 1
1
GAG E 4 1 COG G 1
1
Table 2 ¨ DGRec codon mutagenesis table. For each codon, the table reports the
number of
attainable amino acids (aas) with a TR in forward (fwd) direction compared to
its targeted ORF
(randomizing adenines) and with a TR in reverse (rvs) direction compared to
its targeted ORF
(randomizing thymines). Codons that can be mutated by the DGRec towards stop
codons are
marked with (*). These codons should be avoided in the TR design.
Theoretical library size and ORF recoding
[0168] The theoretical DNA library size for a given TR sequence can simply be
approximated
to 4^(number of adenines), corresponding to the total number of DNA sequences
that can be
obtained by randomization of each adenine position within the TR sequence. For
the theoretical
peptide library size, the calculation depends on codons and their number of
attainable amino
acids. As a consequence, an ORF can be recoded to keep the same protein
sequence but decrease
or increase the size of the peptide library that can be attained.
Recoding ORE for low diversity
[0169] While recoding to increase library size might seem like the obvious
choice, there can be
instances in which a portion of the targeted region of a protein must be
conserved. There can also
be instances in which the library size exceeds the selection capacity to
screen it, making the
recoding for low diversity useful when there is a need to comprehensively
screen a (DNA)
sequence space.
[0170] It was shown that it is also possible to recode a sequence in order to
increase the peptide
library size while keeping the DNA library size to a minimum, by removing
"useless" codons
such as CCA (Proline), which can mutate only to CCG, CCT or CCC, which all
also code for Pro.
These "useless" codons can decrease the recombineering efficiency of a cDNA
oligo onto its
targeted region, without adding any exploration of the protein sequence space.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
49
Internal control
[0171] Of note, codons like CCA, which can mutate but only attain one amino
acid, could also
be used as a form of internal control to check for diversification without
changing the amino acid
sequence.
Recoding for adenines or for thynzines
[0172] There are significant differences between recoding for high/low
diversity by changing
adenines or thymines. This is due to two reasons:
- After selecting the "best" codon (for high or low diversity), the average
number of
attainable codons is different for best adenines or best thymines codons
(Table 3).
- Not all amino acids have the same frequency inside proteins. For example,
the high diversity
generating amino acids when recoding adenines (asparagines (N) and lysines (K)
having
and 14 attainable amino acids) tend to be frequent in proteins, while their
counterpart
when recoding thymines (Phenyalanine (F) with 15 attainable amino acids) is
rarer.
[0173]
A mutagenesis T
mutagenesis
Low diversity recoding 3.5 aas 2.7 aas
High diversity recoding 4.5 aas 4.3 aas
15 Table 3 - Mean number of attainable amino acids after recoding for high
or low diversity
[0174] Consequently, regardless of whether the targeted region is recoded for
high or low
diversity, mutating adenines generally leads to higher library sizes than
mutating thymines.
Enforcing mismatch between the TR and the targeted ORF
[0175] In addition to recoding the ORF, the DGRec system offers the
flexibility of adding
mismatches between the TR sequence and the targeted region to "force"
variability at any given
amino acid whether its codon contains adenines or thymines.
Saturation mutagenesis
[0176] It is sometimes of interest to explore the largest possible number of
amino acids at a few
given positions. This might be achieved by optimizing for low diversity at
positions that should
stay constant and introducing adenines in the TR at positions to diversify.
The design of the TR
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
should avoid sequences that will lead to the introduction of stop codons in
the targeted sequence.
When the TR sequence matches that of the targeted coding strand this can be
achieved using AAT
or AAC codons. When the TR sequence matches that of the non-coding (template)
strand, the TR
should rather contain 5'-GAA-3' at the desired position to diversify, which
will lead to the
5 generation of all 5"-NNC-3' codons at the target position in the coding
sequence. In this
orientation the second codon with the highest diversity generation potential
is obtained by using
5'-AAT-3' in the TR which will lead to all 5'-ANN-3' codons in the coding
sequence, none of
which are stop codons. Note that these codons reach amino-acids than cannot be
encoded by the
NNC or NNT codons (lysine and methionine). The use of multiple DGR RNAs in the
same cell,
10 targeting the same position but on different strands and with different
codons can thus be
advantageous to explore the full diversity of amino-acids while ensuring that
no stop codons are
introduced.
Using stop codons to remove the WT amino acid sequence from the screen
[0177] It was shown that it is possible to introduce stop codons to "break" a
targeted ORF, then
15 fix it with DGRec mutagenesis, a strategy that might be useful to ensure
the selection of variants
only (removal of the wild type ORF sequence).
Example 6: Phage host-range engineering
[0178] Using the lambda phage as a model system, the DGRec system was used to
mutagenize
both the phage tail fiber (GpJ) and its bacterial receptor (LamB) (Figure 8A).
20 [0179] Firstly, mutations were introduced in the lanth gene inside the
bacterial chromosome,
using DGRec plasmids pRL061 + pRL055 (Table 4). Amplicon sequencing revealed
high
diversification of the targeted region. This LamB variant library was then
infected with Xvir, a
modified X, phage that cannot lysogenize and is therefore strictly lytic.
After infection, a large
number of resistant bacterial clones were isolated and their lamB sequenced,
revealing presence
25 of adenine mutations within the targeted region, that were absent from
non DGRec-mutagenized
resistant clones. These results demonstrate that DGRec mutagenesis can be used
to diversify the
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
51
surface-exposed domains of bacterial receptors to create variants disrupting
the phage attachment,
creating bacterial strains resistant to the phage (Figure 8B).
[0180] Secondly, a library of the kvir gpJ gene was created by infecting E.
coli cells carrying
induced plasmids pR1,043 + pR1.029 (Table 4). After 4 rounds of 2-hour
infections, kvir lysates
were harvested and used to infect the resistant lamB clones isolated in the
previous experiment.
Multiple plaques infecting the lamB mutant were obtained, and sequencing of
the gpJ from the
phage genome revealed extensive mutations at adenine nucleotides in the
targeted region (Figure
8C).
[0181] These results demonstrate the capacity for the DGRec system to
mutagenize a phage
during its lytic cycle. Given that it was also showed DGRec ability to
mutagenize a phage in its
lysogenic cycle (Figure 7B), these results prove the broad applicability of
the DGRec system to
engineer virtually any phage. The results also demonstrate the capacity for
the DGRec system to
extend the host-range of a phage by mutagenesis of its tail fiber, thus
reproducing and extending
the ability of natural phage DGR systems onto phages devoid of these
retroelements.
20
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
52
[0182] Table 4 - Strains and Plasmids. CmR, chloramphenicol; KmR, kanamycin;
mutL*, mutL
dominant negative allele; RT, Bordetella phage B-PP1 DGR Reverse Transcriptase
Description/relevant characteristics
Reference
E. coil strains
MG1655 F- lambda- ilvG- rfh-50 rph-1 derived from E coli
K12
MG1655* MG1655 AFhuA
MG1655 ArecA MG1655 recA;;Tn10
sRL001 MG1655 ArecJ, AsbcB; recipient strain for DGRec
plasmids This work
allowing targeted mutagenesis.
sRL002 MG1655 ArecJ, AsbcB, mCherry-sacB at A site;
Strain for TI? This work
targeting sacB or mCherry.
sRL003 MG1655 ruCherty-sacB at A site; Strain for
evaluation of .shcB This work
and reef deletions.
sRL004 MG1655::A, ArecJ, AsbcB ; Strain for mittagenesis
of the A This work
prop hage
Plasmids
construction
plasmids
pORTMAGE-Ecl Used as the source of recombineering module
(CspRecT, mutL* [40]
under Pm promoter)
pFD148 derived from pOSIP-KL for mCherry-sacB integration
at A site, .. This work
KmR
pAM020 sIGFP under Ptet inducible promoter, pSC101 on,
AtnpR. This work
Reverse
Transcriptase
plasmids
pRL014 RT under Ph1F inducible promoter, Avd, pl5A on,
CmR. This work
pRL034 pRL014, but RT with YMDD box in active site
replaced with .. This work
residues SMAA
pRL036 pRL014, but RT with R74A mutation
This work
pRL037 pRL014, but RT with I181N mutation
This work
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
53
pRL035 pRL014, but Avd deleted
This work
TR/Recombineering plasmids
pRL021-ccdB DGR RNA with Bsal/ccdB cassette for Golden gate
assembly of This work
TR, CspRecT-mutL* under Pm promoter, pUC on, KmR.
pRL016 pRL021-ccdB with TR targeting sacB (residues 20-
43) This work
(TR_RL016)
pAM001 pRL021-ccdB with the wild type B-PP1 phage TR
sequence This work
(TR_AM001)
pAM004 pRL021-ccdB with a 40 bp TR targeting sacB
(residues 25-38) This work
(TR_AM004)
pAM007 pRL021-ccdB with a 100 bp TR targeting sacB
(residues 10-43) This work
(TR_AM007)
pAM009 pRL021-cedli with TR targeting sacB active site
region (residues This work
235-237) (TR_AM009)
pRL031 pAM009 but with TR adding mismatch T>A at
nucleotide 4877 This work
(TR_RL031)
pAM010 pRL021-ccdB with TR targeting sacB active site
region (residues This work
79-102) (TR_AM010)
pAM011 pRL021-ccdB with TR targeting mCherry (resdiues 28-
51) This work
(TR_AMO//)
pAM014 pRL016, hut with CspRecT deleted (TR_RL016)
This work
pAM015 pRL016, but with mutL* deleted (TR_RL016)
This work
pRL038-ccdB pRL014, but with addition under a Pr promoter of a
DGR RNA This work
with BsaUccdB cassette for Golden gate assembly of TR.
pAM030 pRL038-ccdB with TR targeting sacB active site
region (residues This work
235-237)(TR AM009)
pRL029 pRL021-ccd13 with TR forward targeting gpJ
(residues 1075- This work
1111) (TR_RL029)
pRL039 pRL021-ccdB with TR forward targeting gpJ
(residues 994- This work
1043) (TR_RL039)
pRL043 pRL021-ccdB with TR forward targeting gpJ
(residues 986- This work
1022) (TR_RL043)
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
54
pRL055 pRL021-eedB with TR forward targeting laml3
(residues 237- This work
266) (TR_RL055)
pRL061 pRL038-eedB with TR forward targeting kunB
(residues 147- This work
172) (TR_RL061)
pAM021 pRL021-eedB with TR forward targeting itieZ
(residues 451- .. This work
476) (TR_AM021)
pAM022 pRL021-cedB with TR reverse targeting laeZ
(residues 451-476) This work
(TR_AM022)
pAM023 pRL021-cedB with TR forward targeting ,sIGFP
(residues 50-76) This work
(TR_AM023)
pAM024 pRL021-eedB with TR reverse targeting sfGFP
(residues 50-76) This work
(TR_A114024)
[0183] Table 5 - Sequences disclosed in the present application
Name SEQ Sequence
ID
NO
RT canonical 1 QGXXXSP
motif
DGR RT 2 I/LGXXXSQ
canonical
motif
BPP-1 spacer 3
AAGGGCAGGCTGGGAAATAACGCTGCTGCGCTATTCGGCGGCAACTGGAACAACACGTCGAA
RNA (DGR RNA)
CTCGGGTTCTCGCGCTGCGAACTGGAACAACGGGCCGTCGAACTCGAACGCGAACATCGGGS
CGCGCGGCGICTGTGC,CCATCACCITCTTGCATGGCTCTGCCAACGCTACGGCTIGGCGGGC
sequence .1=Gc4cci .1 .I=cci'CAA' l'AGG'I G'I'CC'l
C;C:1"I'CGGCGAACACG'I 'I'ACACGC-;
TOGGCAAAACOTCGATTACTGAAAATGGAAAC3'COGGGGCCCAC7TC
BPP-1 reverse 4 mgkrhrnlid gittwenild ayrktshgkr rtwgylefke
transgriptase ydldnlialg aelkagnyer gpyreflvye pkprlisale
(bRT)protein fkdr1vghal cnivapifea gl1pytyacr pdkgthagvc
hvgaelrrtr athfiksdfs kffpsidraa lyamidkkih
caatrrllry vlpdegvgip igsltsqlfa nvyggavdrl
lhdelkqrhw arymddivvl gddpeelrav fyrlrdfase
riglkichwg vapvorginf lgyriwpthk 11rkscvkra
krkvanfikh gedes1grf1 aswsghaqwa dthnlftwme
egygiach
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
BPP-1 Avd(bAvd) 5 mepieeatkc ydgmlivery ervisylypi agsiprkhgv
protein aremflkcil ggvelfivag ksnqvsklya adaglam1rf
wirflagicak phamtphgve taqvliaevg rilgswiary
nrkgclagk
CspRecT protein 6 mngivkftdd sglavgvtpd dvrryicena tekevg1f1g
lcgtqrinpf vkdaylvkyg gapasmitsy qvfnrracrd
anydgiksgv vvirdgdvvh krgaacykka geeliggwae
vrfkdgreta yaevalddys tgksnwakmp gvmiekcaka
aawrlafpdt fqqmyaaeem dqaqqpeqvr agaegpvdlq
pirelfkpyc ehfgitpaeg mtavcgavga egmhsmteqg
arrarawmee emaapaveae yevvdegevf
Bordetella 7 AT GGGAAAGCGCCAT C GTAAC GT TAT TGAC
CAAATGACCAC CT GCCAAAATTT CITAGAT CO
CTACCGTAACACTACCCACCCTAACCGCCCCACTTCGCCCTATCTTCAATTTAACCAGTACC
phage B-PP
AT TT GGCCAAT TT GC T TGCAC TT CAGGCGGAGTT GAAGGCAGGCAATTACGAGCGCGGACCG
Reverse TATC GCGAGTT TC TGGTTTACGAACCGAAACCCCGC T T
GAT TAGCGCAC TGGAATT TAAAGA
Transcriptase
TCGTCTTGTICAACACGCGC7GTGCAACATCGTTGCGCCAATC7TTGAAGCAGGTTTGCTTC
gene* CATATACATACGCGT GTCGTCCGGATAAGGGGACCCAT
GCAGGAGTGTGTCAT CT TCAGGCA
GAATTGCGTCGCACGCGCGCGACTCATTTTTTGAAGAGTGACT7TAGCAAGTTCTTCCCAAC
GATT CACCC TCCCCC TCTT TATC CCATCAT TGATAAAAAAATCCACTCCCCCCCTACACCTO
GCCTTTTGCGCGTTGTCCTGCCGGATGAGGGAGTTGGAATTCCCATTGGCAGCTTAACCTCT
CAGTTATTTGCCAACGTTTACGGGGGCGCGGTAGATCGTTTGT7ACACGACGAGTTAAAGCA
GCGCCACTGEGCTCGTTACATGGATGACATTGICGTACTTCGGGATGATCCAGAAGAACTGC
GCGCGGTCTTCTATCGTTTGCGT GATTTTGCGTCCGAACGCTTAGGTTT GAAGATTTCACAT
TGGCAGGTAGCGCCTOTGTC7CGTGGAATCAATTTTCTTGGTTACCGCATCTGGCCGACCCA
TAAATTGCTGCGTAACAGTAGTGTGAAGCGCGCTAAGCGTAAGGTGGCAAATTTCATCAAAC
ACGGAGAAGACGAGTCACT GCAACGC TT CC TT GCCTCGTGGTCGGGTCACGCCCAA TGGGCC
GACACTCAC.A_ATT TAT TGAC7 TGGAT GGAGGAGCAATATGGCA7C GCGT GT CATTAA
Inactive RT 8 AT CC GAAACCCCCAT C CTAAC GT TAT ICAC
CAA_ATGACCAGCT GGGAAAATTT GITACAT GO
variant with
GTACCGTAAGACTAGCCACGGTAAGCGCCGCACTTGGGGGTATCTTGAATTTAAGGAGTACG
AT TT GGCCAAT TT GC T TGCAC TT CAGGCGGAGTT GAAGGCAGGCAATTACGAGCGCGGACCG
SMAA residues TATC GCGAGTT TC TGGTTTACGAACCGAAACCCCGGT T
GAT TAGCGCAC TGGAATT TAAAGA
gene*
TCGTCTTGTTCAACACGCGC7GTGCAACATCGTTGCGCCAATC7TTGAAGCAGGTTTGCTTC
CATATACATACGCGTOTCGTCCGGATAAGGGGACCCATGCAGGAGTGTGTCATCITCAGGCA
CAATTCCGTCCCACCCCCGCGACTCATTTTTTCAMACTCACTTTACCAAGTTCTTCCCAAC
GATT GACCGTGCCGC TCTT TATGCGATGAT TGATAAAAAAATCCACTGCGCCGCTACACGTO
GCCTTTTGCGCGTTGTCCTGCCGGATGAGGGAGTTGGAATTCCCATTGGCAGCTTAACCTCT
CAGT TATTTECCAACGTTTACGGGGGCGCGGTAGATCGTTT GT TACACGACGAGTTAAAGCA
GCGCCACTGGGCTCGTTCTA:GGCGGCGATTGTCGTACTTGGGGATGATCCAGAAGAACTGC
GCGCGGTCTTCTATCGITTGCGTGATTTTGCGTCCGAACGCTTAGGTTTGAAGATTTCACAT
TGGCAGGTAGCGCCTOTGTC7CGTGGAATCAATTTTCTTGGTTACCGCATCTGGCCGACCC21
TAAATTGCTGCGTAAGAGTAGTGTGAAGCGCGCTAAGCGTAAGGTGGCAAATTTCATCAAAC
AGGGAGAAGACGAGTCACT GCAACGC TT CC TT GCCTCGTGGTCGGGTCACGCCCAA TGGGGC
GACACTCACAATT TAT TGAGG TGGAT GGAGGAGGAATATGGCAGC GCGT GT CATTAA
RT variant R7 4A 9 AT GGGAAAGCGCCAT C GTAAC GT TAT TGAC
CAAAGGACCAGCT GGGAAAATTT GTTAGAT GO
GTAC CGTAAGACTAGCCACGGTAAGCGCGGCACT TGGGGGTATCT TGAA TT TAAGGAGTAC G
gene*
AT TT GGCCAAT TT GC T TGCAC TT CAGGCGGAGTT GAAGGCAGGCAATTACGAGCGCGGACC G
TATC GCGAGTT TC TGGTTTACGAACCGAAACCCGCAT T GAT TAGCGCAC TGGAATT TAAAGA
TO CT CT TGT TCAACACGCGCT CT GCAACAT CGTT GCGCCAATC7T TGAAGCAGGTT TGCTTG
CATATACATACGCGTOTCGTC=ATAA=CACCCATCCAGCACTCTGTCATGITCACCCA
GAATTGCGTCGCACGC,GCGCGACTCATTTTTTGAAGAGTGACT7TAGCAAGTTCTTCCCAAG
CATTGACCGTGCCGCTCTTTATGCGATGAT TGATAAAAAAATCCACTGC GCCGCTACACGTO
GCCTTITGCECGTTGTCCTGCCGGATGAGGGAETTGGAATTCCCATTGGCAGCTTAACCTCT
CAGTTATTTGCCAACGTTTACGGGGGCGCGGTAGATCGTTTGT7ACACGACGAGTTAAAGCA
GCGCCACTGGGCTCGTTACA:GGATGACAT TGICGTAC TTGGGGATGAT CCAGAAGAACT CO
CCGCGGTCTTCTATCGTTTGCGT GATTTTGCCITCCGAACGCTTAGGTTT GAAGATTTCACAT
TGGGAGGTAGGGC CT GTGT C7GG TGGAATCAATT TT C T TGGTTAC CGGATC T GGGC GACGCA
CA 03206795 2023- 7- 27

WC) 2022/175383
PCT/EP2022/053934
56
TAAATTGCTCCGTAACAGTAGTGTGAAGCGCGCTAAGC=AAGGTGOCAAATTTCATCAAAC
AC GGAGAAGACGAGTCACTGCAACGCTTCC TTGCCTCGTGGTCGGGTCACGCCCAATGGGCC
GACACTCACAATT TAT TCAC7 TGGAT GGAGGACCAATATGGCA7C GCGT CT CATTAA
RT variant 13 AT GGGAAAGCGCCAT CGTAAC CT TAT TGAC
CAAATCACCAC CT GOGAAAAT TT GITAGAT GO
I18 1N gene*
CTACCCTAACACTACCCACCCTAACCCCCCCACTTCGCCCTATCTTCAATTTAACCACTACC
AT TT GGCCAAT TT GC T TGCAC TT CAGGCGGAGTTGAAGGCAGGCAATTACGAGCGCGGACCG
TA TC GC GAGTT TC TGGTTTAC GAACC GAAACCCC GC T T GAT TAGC GCAC TGGAATTTAAAGA
IC GTCTTGTICAACACGCGC=GTGCAACATCGTTGCGCCAATC-3TTGAAGCAGGTTTGCTIC
CA TATACATAC GC GT GTCGTCCGGATAAGGGGACCCAT GCAGGAGTGTG TCAT GIT CAGGCA
CAATTGCGTCGCACGCGCGCGACTCATTTT TTGAAGAGTGACT TTAGCAAGTTCTTCCCAAC
CATTGACCGTGCCGCTCTTTATGCGATGAT TGATAAAAAAATCCACTGCGCCGCTACACGTC
GC CTTTTGCGCGTTGTCCTGCCGGATGAGGGAGTTGGAATTCCCAACGGCAGCTTAACCTCT
CAGT TATTT GCCAAC GTTTAC GGGGGCGCGGTAGAT C GTTT GT TACACGAC GAGTTAAAGCA
GC GCCACTGGGCTCGTTACATGGATGACAT TGTCGTACTTGGGGATGAT CCAGAAGAACT GC
(7C(4CGC4iC:II C' I 'A 1"I " 'GC C4T
I" " TG 1:G' I cCc4AACc4cvIAce7iiv GA AGA' " "I 'C AC A' I =
TG GCAGGTAGC GC CT GTGT CT CG TGGAATCAATT TT C T TGGTTAC CGCA TC TGGCC GACC CA

TAAATT GCT GC GTAACAGTAGTG TGAAGCG CGCTAAGC GTAAGGT GGCAAATT TCA TCAAAC
AC GGAGAAGAC GAGT CACT GCAACGC TT CC TT GCCTC GTGGTC GGGTCACGCC CAA TGGGCG
GACACT CACAATT TAT TCAC7 TGGAT GGAGGACCAATATGGCA7C GCGT GT CATTAA
Bordetella BPP- 11 TTAT TT TCCGGCT TGICCT T7AC GGT
TCACACGAGCAATCCAGGACCCTAAGATAC GGCCAA
I phage Avd CC TC CGCGATAAGCAC CTGAGCAGTC TC GACC TGGT
GC GGAGT CATGGCAT GC GGC TT TT GA
AT GCCT GCCAAAAAGC GAAGCCAGAAGC GTAACATC GCCAAACCC GCGT CAGCAGCATACAG
gene* TT TC GAGACCT GGTT T GAT T=AC
CTGCTACAATGAACAGTTCCACCTGT CCCAACAGACAT
TCAAAAACATT TC GC GTGC GACT CCATGTT TACGAGGGATTGACTGAGCAATGGGATACAAC
TA TGAGATGAC GC GT T CATAGCGCTC TACAAT CAACAT CTGAT CGTAACAC TT GCTAGCC T C
TT CAATAGGTT C CAT
Xy1S (modified ) 2 TC AA CC CAC TT CC TT T TTCCA TT C;AC GC
TGTC.C,C4AAGCC AA CTCC CC C;A AC".C;CC;CT
to remove Bs aI TATAGT TTT CAGC GAAGCGTCCCAAATGTAAGAAGCC
GTAGTC TAGGGC TATC TCAGT TATA
CTACGCACATTGGCACTGGGATC GTT CAAG CAGGCGC GGAT GC IT TCGAGC TT GCG GT TGC
gene
GATGTAGTTCTTCGGCGTGG7GCCGGCGTGCTTCTCGAACAAATTGTAGAGCGAGCGTGGAC
restriction
TCATCATCGCCAGCTCCGCTAACCGCTCAAGGCTGATATTCCGTITGAGATTCTOCTCAATG
site)* AATT GAACGAC TC GC T CGAAAGACGGGT TACO TT
TGC T GAAAATT TCAC GGCTGACATTGC
CC CCAGCATTTCGAGCAGCT7GGAAGCGAT GATCCCCGCATAGTGCTCT TGGACCCGAGGCA
TO GACT TTGTATGTT C CCCTT CG TCACAAACTAACC C GAGTAGAT T CAT AA AGCCA TC GACT
TGCTGGAGATTGTGTCGCGCGGC GAAACGGATACCCTCCCTCGGCTTGT GCCAATTGTTGTC
AC TGCACGC CC GATCAAGGAC CACTGAGGG CAAT TTAACGATAAATTTC TC GCAAT CT TC T C
AATAGGTCAGGTCGGCTTGG7CATCCGGAT TGAGCAGCAATAGTTCGCCCGGCGCAAAATAG
TO CT CC TGGCCAT GGCCAC GCCACAGGCAATGGCCT T T GAGTATTATTT GCAGATGATAACA
GG TT TC TAATCCAGGC GAGA= TACCC TCAC GC TACC GCCGTAGCT GATT CGACACAGATC GA
CCCATCCGAAGATTCTGTGG7GCAGCCTGCCTGCCGGGCGCCCGCCCTTGGGCAGGCGAATA
CACTCCCTACCCACATACTCGTTAACATAATCCCACACTCCATACCCCTCCCCCTCCACCAA
GA TC T GACT IT TO TC ITT CAATAAGCAAAAAT C CAT
ph1F promoter* 13 AT GG CACGTAC CC CGT CAC G=AG TAG CAT T GS
TACO C T SC TACT CCGCATACCCATAAAGC
AATT CT GAC CAGTAC CATC GAGATCC TGAAAGAATGT GGTTATAGCGGACT GAGCA TT GAAA
GC GT TGCAC GT CGTGCCGGAGCAAGCAAAC CGACCAT T TAT CG7T GGTGGACGAATAAAGCA
GCAC TGATT GCCGAAGTGTAT GAAAATGAAAGCGAACAGGT GC GTAAAT TT CC GGA IC TGGS
'1 AGM"! AAAGCAGATCTGGA'l 'I"1"I AA'I"1"I
'1"IT;GCC; MAAACI 4\
TT TGCGGTGAAGCATTTCGT7GT GTTATTGCAGAAGCTCAGCTGGATCC TGCAACCCTGACC
CA GT TAAAGGATCAAT TTATGGAACGTC GT CGTGAGATGCCGAAAAAAC TGGT TGAAAAT GC
CA T TAC CAATC CT CAACTC CC GAAACATAC CAAT CG T CAAC TT CT TCTC CATATCA TT TT
TC
GT TT T T GTT GGTATC GCCT G7 TAACC GAACAGCT GAC C GTT GAACAGGA TATT GAAGRAT T
AC CT TC OTT CT GATTAATGG7GT TTGTCCGGGTACTCAGCGT
cspRecT gene* 14 AT GAACCAAAT CCTGAAGT 'ICAO TGACCAO TC
TCGCCTC;GC GG7T CAAG TTAC TCCAGAC GA
TG TT CGCCGTTATAT C TGT GAGAACGCTAC TGAAAAAGAGGTGGGCCTC TT TC TGCAACT C
GT CAGACTCAACGTCT CAATCCG TTT GT GAAAGACGC T TACCT GGTGAAATAC GGC GGTGC
CCAGCT TCTAT GATTACTT CC TATCAAGTT TT TAACC GTCGCGCGTGTC GT GATGC TAAC TA
TCAT GGTAT CAAATC T GGT =GC TTGTT CT GC GT GAC GGTGAT GT TGTGCATAAAC GT GGT C
CT GC GT GCTACAAAAAGGC GGGT GAGGAGC TCAT CGGT SGT TGGCOGGAAGTT CSC TT TAAC
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
57
GATGGCCGCGAGACTOCGTATGCTGAGGTGGCGCTCGACGACTATTCCACCGGCAAATCTAA
TTGGGCGAAAATGCCOGGTGTTATGATCGAAAAATGCGCGAAGGCTGCTGCTTGGCGCCTCG
CGTTCCCGGACACTTTTCAGGGCATGTACGCTGCGGAGGAAATGGATCAAGCGCAACAGCCA
GAACAGGTGCGCGCTCAGGCGGAGCAACCAGTGGATCTCCAGCCAATCCGCGAACTCTTCAA
GCCATATTGCGAACACTTCGGCATCACTCCGGCTGAGGGTATGACTGCTGTTTGIGGTGCGG
TGGGCGCTGAAGGCAIGCAC:CIATGACCGAGCAGCAAGCTCGCCGIGCTCGCGCTIGGAIG
GAGGAAGAAATGGCTGCGCCAGCTGTGGAAGCGGAGTATGAGGTTGTTGACGAGGGCGAGGT
GTTTTAA
mutL gene* 15
ATGCCAATTCAGGTCTTACCGCCACAACTGGCGAACCAGATTGCCGCAGGTGAGGTGGTCGA
(E32K* GCGACCTGCGTCGGTAGTCAAAGAACTAGTGAAAAACAGCCTCGATGCAGGTGCGACGCGTA
TCGATATTGATATCGAACGCGGTGGGGCGAAACTTATCCGCATTCGTGATAACGGCTGCGGT
ATCAAAAAAGATGAGCTGGCGCTGGCGCTGGCTCGTCATGCCACCAGTAAAATCGCCTCTCT
GGACGATCTCGAAGCCATTATCAGCCTGGGCTTTCGCGGTGAGGCGCTGGCGAGTATCAGTT
CGGTTTCCCGCCTGACGCTCACTTCACGCACCGCAGAACAGCAGGAAGCCTGGCAGGCCTAT
CG AAGGGCGCC; A' i'ATGA ACGT GACC;GTA AA ACCGGC:;;GCGC ATCC: TGGGGAC:GACGC:
GGAGGTGCTGGATCTOTTCTACAACACCCCGGCGCGGCGMAATTCCTGCGCACCGAGAAAA
CCGAATTTAACCACATTGATGAGATCATCCGCCGCATTGCGCTGGCGCGTTTCGACGTCACG
ATCAACCTGTCGCATAACGGTAAAATTGTGCGTCAGTACCGCGCAGTGCCGGAAGGCGGGCA
AAAAGAACGGCGCTTAGGCGCGATTTGCGGCACCGCTTTTCTTGAACAAGCGCTGGCGATTG
AATGGCAACACGGCGATCTCACGCTACGCGGCTGGGTGGCCGA7CCAAATCACACCACGCCC
GC: AC GC A A A A T* AG'l A'
TCACGCGATCCGCCAGGCCTGCGAAGACAAACTGGGGGCCGATCAGCAACCGGCATTTGTGT
TGTATCTGGAGATCGACCCACATCAGGTGGACGTCAACGTGCACCCCGCCAAACACGAAGTG
CGTTTCCATCAGTCGCGTCTGGTGCATGATTTTATCTATCAGGGCGTGCTGAGCGTGCTACA
ACAGCAACTGGAAACGCCGCTACCGCTGGACGATGAACCCCAACCTGCACCGCGTTCCATTC
CGGAAAACCGCGTGGCGGCGGGGCGCAATCACTTTGCAGAACCGGCAGCTCGTGAGCCGGTA
GCTCCGCGCTACACTCCTGCGCCAGCATCAGGCAGTCGTCCGGCTGCCCCCTGGCCGAATGC
GCAGCCAGGCTACCAGAAACAGCAAGGTGAAGTGTATCGCCAGCTTTTGCAAACGCCCGCGC
CGATGCAAAAATTAAAAGCGCCGGAACCGCAGGAACCTGCACTTGCGGCGAACAGTCAGAGT
TTTCCTCCCCTACTCACTATCGTCCATTCCCACTCTCCOTTCCTCCACCCCCACCCCAACAT
TTCACTTTTATCCTTGCCAGTGGCAGAACGTTGGCTGCGTCAGGCACAATTGACGCCGGGTG
AAGCGCCCGTTTGCGCCCAGCCGCTGCTGATTCCGTTGCGGCTAAAAGTTTCTGCCGAAGAA
AAATCGGCATTAGAAAAAGCGCAGTCTGCCCTGGCGGAATTGGGTATTGATTTCCAGTCAGA
TGCACAGCATGTGACCATCAGGGCAGTGCCTTTACCCTTACGCCAACAP_AATTTACAAATCT
TGATTCCTGAACTGATAGGCTACCTGGCGAAGCAGTCCGTATTCGAACCTGGCAATATTGCG
CAGEGGATTGCACGAANECTGATGAGCGAACMGCGCAGTGGTCANEGGCACAGGCCATAAC
CC TGCTGGCGGACGTGGAACGGE TATGTCC GCAACTTGIGAAAACGCCG CCGGGIGGTCTGI
TACANECTGITGATTEACATCCGGCGATAAAAGCCCTGAAAGN2GAGTGA
Engineered DGR
16 AAGGGCAGGCTGGGAAATAACGAGACCTGAATTGCGCGCAATTAACCCTCACTAAAGGGAAC
Spacer RNA+ccdB
AAAAGCTGGAGCTCTTATATTCCCCAGAACATCAGGTTAATGGCGTTTTTGATGTCATTTTC
GCGGTGGCTGAGATCAGCCACTTCTTCCCCGATAACGGACACCGGCACACTGGCCATATCGG
TGGTCATCATGCGCCAGCTTTCATCCCCGATATGCACCACCGGGTAAAGTTCACGGGAGACT
TTATCTGACAGCAGACGTGCACTGGCCAGGGGGATCACCATCCGTCGCCCGGGCGTGTCAAT
AATATCACTCTGTACATCCACAAACAGACGATAACGGCTCTCTCTTTTATAGGTGTAAACCT
TAAACTGCATCGTTTCACTCCATCCAAAAAAACGGGTATGGAGAAACAGTAGAGAGTTGCGA
TAAAAAGCGTCAGGTAGGATCCGCTGGTCTCATCTGTGCCCATCACCTTCTTGCATGGCTCT
GCCAACGCTACGGCTTGGCGGGCTGGCCTTTCCTCAATAGGTGGTCAGCCGGTTCTGTCCTG
CTTCGGCGAACACGTIACACGGTTCGGCAAAACGTCGATTACTGAAAATGGAAAGGCGGGGC
CGACTTC
pRL014
17 GAAGATCATCTTATTAATCAGATAAAATATTTCTAGATTTCAG7GCAATTTATCICTTCAAA
(4421 bp) TG TAGC ACgat. t. r. t. cggor. gntnagt. cc
t. ggt. ca tgot. gnga toattaaagagga
ga a a gg= a ctATGGCACGTACCCCGTCACGTACTACCATTGGTAGCCTGCGTAGTCCGCATA
CCCATAAAGCAATTCTGACCAGTACCATCGAGATCCTGAAAGAATGTGGTTATAGCGGACTG
AGCATTGAAAGCGTTGCACGTCGTGCCGGAGCAAGCAAACCGACCATTTATCGTTGGTGGAC
GAATAAAGCAGCACTGATTGCCGAAGTGTATGAAAATGAAAGCGAACAGGTGCGTAAATTTC
CGGATCTGGGTAGCTITAAAGCAGATCTGGATTTTTTACTGCG7AATTTATGGAAAGTTTGG
CGTGAAACTATTTGCGGTGAAGCATTTCGTTGTGTTATTGCAGAAGCTCAGCTGGATCCTGC
AACCCTGACCCAGTTAAAGGATCAATTTATGGAACGTCGTCGTGAGATGCCGAAAAAACTGG
TTGAAAATGCCATTAGCAATGGTGAACTGCCGAAAGATACCAATCGTGAACTTCTTCTGGAT
ATGATTTTTGGTTTTTGTTGGTATCGCCTGTTAACCGAACAGCTGACCGTTGAACAGGATAT
CA 03206795 2023- 7-27

WO 2022/175383
PCT/EP2022/053934
58
TGAAGAATTTACCTTCCTTCTGATTAATGGTGTTTGTCCGGGTACTCAGCGTTAACTAGGCC
ATAATCGCTACCAAATTCCAGAMACAGACGCTTTCGAGCGTCTTTTTTCGTTTTGGTCACG
AC GTACTGAATCTGATTCGTTAC CAATTGACATGATACGAAACGTACCG TATCGTTAAGGTG
GAGGCATATCAAAGGACGAGTGCAGGTGGCAAAAATGGGAAAGCGCCATCGTAACCTTATTG
AC CAAATCACCACCTGGGAAAAT TTGTTAGATGCGTACCGTAAGACTAGCCACGGTAAGCGC
CGCACTTGGGGGTATCTTGAATT TAAGGAGTACGATTTGGCCAATTTGCTTGCACTTCAGGC
GGAGTTGAAGGCAGGCAATTACGAGCGCGGACCGTATCGCGAGTTTCTGGTTTACGAACCGA
AACCCCGCTTGATTAGCGCACTGGAATTTAAAGATCGTCTTGTTCAACACGCGCTGTGCAAC
AT CCTTGCGCCAATCTTTGAAGCACGTTTG CTTCCATATACATACCCCT GTCCTCCGGATAA
GG GGACCCATGCAGGAGTGTGTCATGTTCAGGCAGAATTGCGTCGCACG CGCGCGACTCATT
TT TTGAAGAGTGACTTTAGCAAGTTCTTCCCAAGCATTGACCGTGCCGCTCTTTATGCGATG
AT TGATAAAAAAATCCACTGCGCCGCTACACGTCGCCTTTTGCGCGTTGTCCTGCCGGATGA
GG GAGTTGGAATTCCCATTGGCAGCTTAAC CTCTCAGTTATTTGCCAAC GTTTACGGGGGCG
CGGTAGATCGTTTGTTACACGACGAGTTAAAGCAGCGCCACTGGGCTCGTTACATGGATGAC
AT TCTCGTACTTCGGCATGATCCACAAGAACTCCGCGCCGTCTTCTATCGTTTGCGTGATTT
TGCGTCCGAACGCTTAGGTTTGAAGATTTCACATTGGCAGGTAGCGCCTGTGTCTCGTGGAA
TCAATTTTCTTGGTTACCGCATC TGGCCGACCCATAAATTGCTGCGTAAGAGTAGTGTGAAG
CG CGCTAAGCGTAAGGTGGCAAATTTCATCAAACACGGAGAAGACGAGT CACTGCAACGCTT
CC TTGCCTCGTGGTCGGGTCACG CCCAATG GGCGGACACTCACAATTTATTCACTTGGATGG
AG GAGCAATATGGCATCGCGTGT CATTAATAACGTTAAAGTCAGTTTCACCTGTTTTACGTT
AflAACCCGCTTCGGCGGGTTTTTACTTTTG Gt tt AGCCGAACGCCCCAMAAGCCTCGCTTT
CAGCACCTGTCGTTTCCTTTCTT TTCAGAGGGTATTTTAAATAAAAACATTAAGTTATGACG
AAGAAGAACGGAAACGCCTTAAACCGGAAAATTTTCATAAATAGCGAAAACCCGCGAGGTCG
CC GCCCCGTAACCTGTCGGA*_.CACCGGAAAGGACCCGTAAAGIGATAAT GATTATCATCTAC
ATATcAcAAcc-ri-c-x:GTAAA(--;GGAcragrggarc-rrrx4A-rci-cAAAAAAAGcAccrrArrrit:
CGGCTTGTCCTTTACGGTTCACACGAGCAATCCAGGACCCTAAGATACGGCCAACCTCCGCG
ATAAGCACCTGAGCAGTCTCGACCTGGTGCGGAGTCATGGCATGCGGCT TT TGAATGCCTGC
CAAAAAGCGAAGCCAGAAGCGTAACATCGCCAAACCCGCGTCAGCAGCATACAGITTCGAGA
CC TGGT TTGAT TTACCTGCTACAATGAACAGT TCCACCTGTCCCAACAGACAT TTCAAAAAC
AT TTCGCGTGCGACTCCATG7TTACGAGGGATTGACTGAGCAATGGGATACAAGTATGAGAT
GACGCGTTCATAGCGCTCTACAATCAACAT CTGATCGTAACACTTGGTAGCCTCITCAATAG
GTTCCATagaaact t t ct cct ct tt aat aCTAGTat t at acct aggact gagctag ct gt ca

gTCGGGTAGCACCAGAAGTCTATAGCATGt gc at aCCTTTGGTCGAAAAAAAAAGCCCGCAC
TCTCACCTGCCGCCTTTTTTCaCTCTTTCCt t gccggaTTACGCCCCCCCCTOCCACTCATC
GCAGTATTGTTGTAATTCATTAAGCATTCT GCCGACATGGAAGCCATCACAAACGGCATGAT
GAACTTGGATCGCCAGTGGCATTAACACCT TGTCGCCT TGCGTATAATA TT TTCCCATAGTG
AAAACGGGGGCGAAGAAGT TGTC CATAT TT GCTACGTTTAAATCAAAAC TGGTGAAACTCAC
CCACGGATTGGCACTGACGAAAAACATATT TTCGATAAACCCTTTAGGGAAATATGCTAAGT
TT TCACCGTAACACGCCACATCT TGACTATATATGTGTAGAAACTGCCGGAAATCGTCGTGG
TATTCTGACCACACCCATCAAAACCTTTCACTITCCTCATCCAAAACCGTGTAACAAGGCTC
AACACTATCCCATATCACCACCT CACCGTC TTICATTGCCATACCAAAC TCCGCATCTCCAT
TCATCAGGCCCGCAACAATCTGAATAAAGGCCCCATAAAACTTGTCCTTATTTTTCTTTACC
GT TTTTAAAAAGGCCGTAATATCCAGCTGAACGGTTTGGTTATAGGTGCACTGAGCAACTGA
CT GGAATGCCTCAAAATGTTCTT TACGATGCCATTGACTTATATCAACT GTAGTATATCCAG
TGAT TT TTTTCTCCAT TTTAGCT TCCTTAG CT TGCGAAATCTCGATAAC TCAAAAAATAGTA
GT GATCTTATTTCATTATGGTGAAAGTTGT CT TACGTGCAACATT TTCG CAAAAAGTTGGCG
CT TTATCAACACTGTCCCTCCTGTTCAGCTACCGGCCAGCCTCGCAGAGCAGGATTCCCGTT
GAGCACCGCCAGGTGCGAATAAGGGACAGT GAAGAAGGAACACCCGCTCGCGGGTGGGCCTA
CT TCACCTATCCTGCCCGGC7GACGCCGTT GGATACACC.AAGGAAAGTC TACACGAACCCTT
TGGCAAAATCCTGTATATCC-;''GCGAAAAAGGAlGGATATACCGAAPAAATCGCTATAATGAC:
CCCGAAGCAGGGTTATGCAGCGGAAAAGCGCTGGTACCCAATTCGCCCTATAGTGAGTCTCC
TGGAAGTGAGAGGGCCGCGGCAAAGCCGTT TTICCATAGGCTCCGCCCCCCTGACAAGCATC
ACGAAA T CT C=ACC;C: TC: A A A TC: TMITGGC. GA A A CC:t. GACAC;(1 AC:T AT A
AGA TAC:C:AGGC.C;
TT TCCC:C:CT C=GCC;GC:T CCC:TC:GT GCGC:T C:::T C;T T C:C:T GC:CT C:GGT TT
AC:CSGTGTC
TCCGCTGTTATGC;CC:SCGTT^GTCTC.ATTCCAC.C;CC:TGACACTC:AGTTC.CGGGTAGGC:AGTT
C:GCTCC:AAGC.TGC;AC:TGTATGC:ACGAAC:CCC:C:C.C;TTCASTC:CGAC:CGCTGC:GCCTTATCC:GC;

TAACTATCGTCTTGAGTCCAACCCGGAAAGACATGCAAAAGCACCACTGGCAGCAGCCACTG
GTAATTGATTTAGAGGAGTTAGT CTTGAAG TCATGCGCCGGTTAAGGCTAAACTGAAAGGAC
AAGTTTTCGTCACTGCGCTCCTCCAACCCAGTTACCTCCGTTCAAAGAGTTGCTAGCTCAGA
GAACCTTCGAAAAACCGCCCTGCAAGGCGGTTTTTTCGTTTTCAGAGCAAGAGATTACGCGC
AGACCAAAACGATCTCAA
pRL 021 -ecdB 1 9
GAAGATCCTITGATCTTTTCTACGGGGTCT GACGCTCAGTGGAACGAAAACTCACGTTAAGG
GATTTTGGTCATGACTAGTGCTT GGAGAAGGCCATCCTGACGGATGGCC TTTTat gcct t t a
CA 03206795 2023- 7-27

L -L -O G6L9OZ0 V3
gobeobboovebeobqgweagevabobbbbobbobbgboboovevebbooggeooglbabo
zob400ve0000eebgeboebbgabooe4obooboeeebb weeobeoeeoe gob 4bobeb
lob4bobbbeo7e104e414.4ebleoblbblo7bobolbeole00444.bobqbeeboeoeee
cob000peoblboeeo4boebblEbeoleor000eboqebebblolelb4161b4z4eobbo
cevabeogeboobbbbbwee9owbeebob400bbeoobo3leboboeogeeogebqoobo4
obobobwb4robogbborrbgborlob4.4.04.broggrurbrobbwrob000boroororo
geevoogeboobbgbbbqobbobovloboeogo4ebobboeoevobb4eebggebobbgabo
beeoee643._ a 34oboovobbof:444ebobobbe4 qobobboeebeeeeeobbbobbeebb
L.:ob.Lbe.2e- f, leeee pub .2 16 OUVE-'01 p6upu4boebo
44
t.. . ebeb . . bõõ,, -bebozm=Db
1' - - 6f) 466 ebb e6L.,ebbbf,
4fa 01,0b.60 550:-/ V V V ,.' b -- L -- 4 ebobobbbeeboob
4E,4.,..)6beObb 4
cobeebbeobeoeebeobooeoboeo 000epooboeb000b000 O.O obbo obeo leobelo
bboobobbebobbob000 gobbb000beogeo geopbeebo oeboebboog000pbooee
be _Lb vooeoub 4e...).1bu _Lobe, 4o5obb 4o6obb 4obeE,4 &bee ee eu e 4bbob
_obbo ee
4eb4bo4geoboo4e4.4oveebo5abblbboboeeboge4eb44e4eboqvgboboebobqb
beobleboloobeopeweeblbelovebeeeo 16e4Bb3 lb3blo3eb3befolb64bbebi.
bbeoboob44ebeooeebobbweeoeoobooe44o4bbeo44.eeoob4eoeoe4eebebbe
oo 4ebbee444464bbebobbbeborb44.644bbeb4e4bebbobeebbqb4obeoobob 4o
564evebeebbebbqebb44obobo4ob4.600bo4obeeobeobebooeb4e4o4oeob4eo
bbeeb4obobbb4bbob4b6464441,4ob4oeb4e4bbbeb4obboo4oeo 4eobbo; ooeo
eefob oeoeo obeeo oo oosebaboo 4.evo obeoo go gebbobeooeeobebbobbea
bobobobbroref000broerobobreooebblerrBEoBbobloboelbotobbf000l
otoubb000gobobogoobobbogobooboobberbobobotorrebogeboegobobbboo
boverebobbbooev000rtoobbootoogougorborb000bobbobbebooboroboboo
ebeboboobboebbeeogoobogobeebbobbboobbobboge000bebbebobbbobbeee
eeoe oobobob oobobboboevegeoboboob oebobboebobob0000boobbobobb000
eveo obboeboeooeeloboebobooboboboboobooeeOZOOObeeo oe000 OW?. Ol
eblelollobpoolo646bobbowleepB166l3oelqob3pbpweb4.6414booepbloi.
boas234312bao4b4o4ozreob4.34443400bbb4bbebavavab4.3124oboaebab4.643432
qt.44600bo44.64eb3rbr33gor44bruo44bbobbwobb4.34.orb3ub4orozgbrr..b4
foleerooreborooegegeebebbeebeoloreboboboobbboboobbr000f,eooebi.
vo obebbleelee oveoeeebeoweobovo olobbel l0000e 000bbeeobe go :o ow oob
bleeeeeeobobebbeoeoebbobeebeeobo 000beoo obeeoebbbeleb011b500161
beob obeeeleobeopeeoeobeoobeobor0000blee opeep00000bboo leeeoeb
oobqoobbovebeobevezeoo4.-ebloo 44.44.44.beqeevoevbeebqoebqeooegbeo
vebeoqeoeqbbeobeovoPOvewebbeoow4E4bbobbOPE004e4obebbobe=bboovb
ceeooe64o4eb4lobe4leebbb4.-eobqeooe04411b00000bo44.04.4oeeoebb4e4e
eoobb4.6oveel4leb4444beoovo4.446eblbbbl0004veoobeololbo44=44ble4
eebeb.1.-LU1 Abbbeee _Lou:" -1-1.1U.LoobbaLweeeebAbboe.Lab abobfr.1.6
Aebeeob:..)4
4eae aeoeoeaoaaabeobbooaaaeboeboeooeueeb 1bebbao 1oboaeo 44 zuboeee
Eloppeobebleoo4141booeop1464g000eo4461belebbblegebqbblobeb16bo
ebee-ebqweobbqeqbov44e94.boo4eowbleebwbqoob000b4lolqeoeo4.4-e44.4
cobbooqe4.4.4.4beeoeobeeqeeevebeevq.booebeee444.44.00bboe44.e4ebb4ob4
cqebbbbeveq-ebbobbqleeboebobbb64vobbbboqb4ob4bo44bevbeoovowebvb
..-;beebee44444o44boebeebovebeobevooboee4eeeeeobevoo 33 qqoepoef)4be
cqeeob444440oleoobob44boeoob44e00000eqqbbboeqe4bob7e64boebo4o4
eobleeeeblebogbeeoobobbewebegb4eblbgbbllo6gobbblbgerglgzelgo go
44.bbovoqq.belvooqeepeeobeeleeo4.4bo4o4.4.4.4.oeb4.34.ebeeboebb4bobbo qo
bbbr lrob4orbrbboqur4rorr4.4bbwrquorboor 4bob4brbelurbobbrobbboo
coob000bobbboob000b000beobobbob000lebeeb000vobbebolebeovoebo 44
efq.obegboobopegobovo q000el4ebebobbeooqee4o41.4.bbeoeeqeb4ebeob 44
le44e4beb447oobbleeobbeoeooboeoobb7eoobbloo4oblbeqeeeeobobb000
bollbeleeobeobeb4lebboo4eolbbl4obbogbbeol6begeeblolgoleeobo434
44.ve-eqebove44.4.evobbbebwvooebbevo4eb000bovobweolb4464.4veoobqlo
44.obbowoowoorwbborcrbobbabobolb484.4.rforbb4ob44buboquoobrrow
b4.4.ebeqbeb000eeweveasoqba4lobool4b4e4b444oeboqeobbeb000ebb44.o
4obqbeqeob0000qebTebobeebb4lobeobebo4.4.4.eobe000b4oblqeoebqobboe
c444eeeeb4ob44400e74.bbboebeee6o4oboqoeboeeb44eeb7eeoqoozo74ebe
6444bool4e4eb4obbeeo4oboove7oboo7obeooboleoqeo4oebb46obebobebe
lbllepeopebologlobqbobbooblbblbobbogqol lbegblebbobvIbboB44o6pb
..-.:44.4.ofq.ebbobobbeobeeo4.4.bolebb64oeobb4.4.voeoboewele446vo=ole4ob
bbeq.o4.6e4boobeebeelb4see000lbobeebobeoq444beqeqoaqaboboeeboabo
4oveobbeebboqbqbbbeoboebqleob4.44440044.ovoobevoqeeqbo4.4.boeve44.e (dcf
17Z6L)
6g
1760/ZZOZda/IDd 8eiLT/ZZOZ 0A1

LZ-L-EZOZ g6L9OZ0
VVOIDIVD9VVVVVVVDNLMO9DVIIVDVDDVDDVVDDIIIDIIIIIII5
DIDDODVIDDIODDWODVVVOVWDDDOD.WDZIDIDDVIDDIIDVDIVVVVDDOSIDOVI
I9VDO9VV9IDDIDIDSDDIDIV.199IIIVM9VDVDDVVTLIDVDVIDDDDVIDVVIDD99I
ODIOVVOIIDIIDVDVOVLDDIDEDDOWIDMVIDWDDOVDVDOVIIVDDVDVVIDDIDVDD
DVDDVDDSIOVOD DDIVILDVDDVDVDVVISO DDDVVDD IDVDIIDIDDIMIDWIDODD IV
IIDODODIODODVDOODDYDII9COODOOVVDDIZODIDI910999ICDVYDOIDDDII9D ID
UVIt9.11-LtIrdjijIVItUd.12J1jtUdji:;W:CdjiaLlijIVV.1.1jjji
:11.11JjtUJItUjjVIVtU,YJVIljtUJUIL:jjVLULLLIU.I.jj1j1LUJUIWIjjal.:X)TV
UtUjJjj:;11.124UtradjjVJ.V:YiVVIVI:Yd:)tYCJVULNYJ TV V W:.W.1.0t)VL)NeJltrd
V'JJ.0190
V.9j1VVVVVLYCILiQtAWJVLLUU,KUJWL:Ij9LANACJW.-.1.1.=VW:JW.IWW:YOUNif
VVVVIDDr.:VVDDVDDDDVVVVDDVDDDDVVVVDDVDIDIVDVVDVVVDDVDDDVVIVDDD DV
OIVVDVDYDOIVIIDDOVLVVID5D5DYVVCIDVDIDDVDIVIDDD.WDODDDDIDDDDIID
JtWtYJA.A.A.A.Vt)Jt)VV V9A0:-.M.WVDV!-;:) V V VtX9VV V9VV V VtUJA.DDJV A.VV V
V
MODOIIIIDDDODDIOOVDDODOSDDDOVIMDWOVDOOIVDIDDOZalIODIDDDOVIII0
VOIVIIVZ.IIVVIVDIVDVLDIVIVDIDIIDVDDOVDOIDDOVDVI:DOODOODOOIVWDVD
DVIDVDI:10MOIODOVVDLDDOI5IIIDII5IDIVIIMIDOZII005DDIOVDVVVDOIDVO
MODDVVV5OVVYVIWVOLVDDDVDVIDDI5ODOVIDDIVOODIVD5VOIVVIDDIVDOD OD
vmmv/spo 4 4o eboobbbbobbevebb4eeeeb4oe44ebo4boeeeeobbo4lbbo7. oe
44bo E. eebobbo 440b4004b4044bboobe 34.bb qbb e4eeo 400434o obb qobbbo bb
1136bo eloBo rr oo61313661rob413113 oro voo o6164:41volayDD/0000yyDD
VIDOVOIDOWYVVVIVOODIMOVOYWIDNOVVVOVDDIVZODDOVVVVVVV001VDOIONE
OIMIDOIVODIOVVVIMOOVVVIOIDDIDLIIIIDIDIOIDDOOVVIVDDVDVOVVVOVO OM
VOVIDIOMOVOIVIMVMVVOIDISODDDO=D DIDODIVDOVOIVDSDDDVOODDIOVOD ID
OVDVDOW;VDIDIVIIMOV9VDDSOYDIIDVVVIDD9DOVDOVDDIVIVDOODDIVOIII OD
V00909IVOIMOIDDI990IVINED099IONLViD9900V0V990VVIVD00001.1.0110VO OD
VOIVDVDMODDIDDODOILIIVOMDIVDIIMIIDOODIVVIZODVOIVOVVDVOODOIIVIV
MIDIODVEMMODVVVVOVVDDDVVVIDIDIOODVVIIVVODOODDIIVVOIODVDVD0127412
uubbb4obboobbbvu.10V.I.OVI.W.I.V.1.07V.1.00.1.0VOIDDVIDONiZrd0.1.1.1DINFOWed.1.
0.1.
=61b0 14
bbliepeoz64444 ef6vor. 4e lebopobo 4 144400 leo o 4e le 1B8D41414 wboBw4e4
4R3 1fibobo 111451poo aR0540b001141001511530onoo 1 MD no Deb0= 1R-241R
R1RE.4.414111PRO1R554:461R8101R0141Rp3nloRpp3ppb1R4R5B3olmq1E.641E.
cbb4-6.4.evvbE.34.4.4144o4bovvwweob qopbRb000bv444RE.bobbb 4006 qou,Rbb4b4
c4.4yoobbqoqybbeogy44boqebbobqqyobbbqbgboobbbqbgeb4.4yob4obb44yb
qoeeyobbeeb4ebo4.44.4ebqopee4e86400be4.4.4.4ebobboe000eb46b4bobe4eb
b000beqq4boobqobqoyeeqeob4eyebeeebbqbqbbeoeybbqbboobbqobboveqb
L:vebuebuebuuu4ebubebobuebbubbuoubbouvovebueuboobebeobobbe000bb
aboaa2bababoaebobeotsaaab000bapeeoba62bbooaaeobeaeobzouboabo
blolllblboobbqoboblybblbllyoyerebobbobylybboolylyebyebb=o616by
c444bobooyypyobbboo449bobooeopyblob4obbgbobwbgebboopoyoboggy4
444eoeee44eooeboobloboob4e444eybooeb4obbweeb4o4bob.46b4eeebgeb
coeb4b4eboeybobbgbobeobbeyeob64eoeyeb4o444b4oeebboobob4ebbooee
eobb 4e4b4o4bo 4e444yoosbobobbobebeo 4bbb4boee4eb lbobobbb4eee 4e 4o
bb4e4b4o4ebbobqebbqepeeobeoee64o4bobooqboobeqbqobeboeeebqbobeo
41elyoofieblyqqbqnbbbyeoelyelbeoeyelypellobw4bloyyypleeppybqeo
gyo4y4egyvyyy4ebyyoeob44yoe4464yb4o4o4yeeyo4o4b4b44.4eye=44-eboy
44yr lrep000 lbbbbroo4ecboblbb440444444bobbbooboob44bbo 4obourbro
44b444ebb4olepoweybeogooeblee4beoogebegeb44b400loeb644o4begoe
bobbloopeeeebbb4oeb4bo4boboobbobl4obeepb4eobbeoblooebo4bebe4o4
ebqbebqebeeebqoopbeeeeqebobbooqeoel4lebllb401ee0Y41b404bblbbbo
cbooboyeeeblbqqopeobool8Tellbboeybb4boebbobbqoblooppelepobbeoe
cbb4pyoqbb4bypbob4yoeybobybleb4oleeeboyoblgebbqbeobob44y4yyobb
goorybog4r4boo4brobrebobbloougobbelub4oreb400ggeb4logreyou444e
eeeoeeooboel4pooe44400bqbeobbbeoleopeb4b4eobeoeob4ebeo4beoo444
eb44e4bbb44eebbobbl000bwlbeobobeeeeebeq4eobbogeeeeebeeboob4o4
44beeeelobbobqqbooqlebqobqoboobepoobob444b000bobeeb46bbooboeb4
leeoeobbeo4bob4obbllboyebyobb4bepob41001elqaloeolqleDeeobbo?bo
bp 6Pb61.7.611b36464oPbo o 14Po o 16o 1P loP64oP 1666316511416P6P346Po PP
bobbob4goyob4poyybbeobooyybboobobeeey44yeyeeobgeboobob000boeye
cb4444obeoobo4e4b4beeb4bbeeobeoeeebepoegobbeoobeobobgeeboobb4o
c000b4obboolbo4beobbeogeobeophobloo4peoe4oboboo4obeghboobeb4bo
09
1760/ZZOZda/IDd rtiteiLT/ZZOZ 0A1

WO 2022/175383
PCT/EP2022/053934
61
TR_AMO 11
19 GT TTGGCGGTCTGGGTGCCTTCATACGGAC GGCCCTCGCCTTCGCCTTC GATCTCGAACTCG
TGACCGTT
pRL 038 -ccdB 23
GAAGAT CAT CT TATTAATCAGAT AAAATAT rECTAGArrICAG:GCAAT TTAT CiC CAAA
(5265 bp) TGTAGCACgat tt tacggctagctcagt cctaggtacaatgct agcgaat cat
taaagagga
gaaaggmactATGGCACGTACCCCGTCACGTAGTAGCATTGGTAGCCTGCGTAGICCGCATA
CC CATAAAGCAATTCTGACCAGTACCATCGAGATCCTGAAAGAATGTGG TTATAGCGGACTG
AGCATTGAAAGCGTTGCACG7CGIGCCGGAGCAAGCAAACCGACCATTIATCGTIGGTGGAC
GAATAAAGCAGCACTGATTGCCGAAGTGTATGAAAATGAAAGCGAACAGGTGCGTAAATTTC
CGGATCTGGGTAGCTTTAAAGCAGATCTGGATITTTTACTGCGTAATTTATGGAAAGTTTGG
CG TGAAACTATTTGCGGTGAAGCATTTCGT TGTGTTATTGCAGAAGCTCAGCTGGATCCTGC
AACCCTGACCCAGTTAAAGGATCAATTTAT GGAACGTCGTCGTGAGATGCCGAAAAAACTGG
TT GAAAATGCCATTAGCAATGGT GAACTGC CGAAAGATACCAATCGTGAACTTCTTCTGGAT
AT GATTTTTGGTTTTTGTTGGTATCGCCTG TTAACCGAACAGCTGACCG TTGAACAGGATAT
TGAAGAATTTACCTTCCTTC7GATTAATGGTGITTGTCCGGGTACTCAGCGTTAACTAGGCC
ATAATCGCTACCAAATTCCAGAAAACAGAC GCTTTCGAGCGTCTTTTTT CGTTTTGGTCACG
AC GTACTGAATCTGATTCGTTAC CAATTGACATGATACGAAACGTACCG TATCGTTAAGGTG
CACCCATATCAAACCACCACTCCACCTCCCAAAAATCCCAAACCCCCATCCTAACCTTATTC
AC CAAATCACCACCTGGGAAAAT TTGTTAGATGCGTACCGTAAGACTAGCCACGGTAAGCGC
CGCACTTGGGGGTATCTTGAATTTAAGGAGTACGATTTGGCCAATTTGCTTGCACTTCAGGC
GGAGTTGAAGGCAGGCAATTACGAGCGCGGACCGTATCGCGAGTTTCTGGTTTACGAACCGA
AACCCCCCTTCATTACCCCACTC CAATTTAAACATCCTCTTCTTCAACACCCCCTCTCCAAC
AT CCTTCCCCCAATCTTTCAACCACCTTTC CTICCATATACATACCCCT CTCCTCCCCATAA
CC CCACCCATCCACCACTCTCTCATCTTCACCCACAATTCCCTCCCACC CCCCCCACTCATT
TT TTGAAGAGTGACTTTAGCAAG TTCTTCC CAAGCATTGACCGTGCCGC TCTTTATGCGATG
AT TGATAAAAAAATCCACTGCGC CGCTACACGTCGCCTTTTGCGCGTTG TCCTGCCGGATGA
GGGAGTTGGAATTCCCATTGGCAGCTTAACCTCTCAGTTATTTGCCAACGTTTACGGGGGCG
CGGTAGATCGTTTGTTACACGAC GAGTTAAAGCAGCGCCACTGGGCTCG TTACATGGATGAC
AT TGTCGTACTTGGGGATGATCCAGAAGAACTGCGCGCGGTCTTCTATC GTTTGCGTGATTT
TGCGTCCGAACGCTTAGGTTTGAAGATTTCACATTGGCAGGTAGCGCCTGTGTCTCGTGGAA
TCAATTTTCTTGGTTACCGCATC TGGCCGACCCATAAATTGCTGCGTAAGAGTAGTGTGAAG
CGCGCTAAGCGTAAGGTGGCAAATTTCATCAAACACGGAGAAGACGAGT CACTGCAACGCTT
CC TTGCCTCGTGGTCGGGTCACGCCCAATGGGCGGACACTCACAATTTATTCACITGGATGG
AGGP.GC:AATATGC;CATCGC:C;-C;TCATTAATAAC.C;TTAAAGTC:PATTTC:PC.CTGTTTTACATT
AAAACCCGCTTCGGCGGGT=TACTTTTGGGtt tAGCCGAACGCCCCALAAAAGCCTCGCTT
TCAGCACCTGTCGTTTCCTTTCTTTTCAGAGGGTATTTTAAATAAAAACATTAAGTTATGAC
GAAGAAGAACGGAAACGCCTTAAACCGGAAAATTTTCATAAATAGCGAAAACCCGCGAGGTC
GC CGCCCCGTAACCTGTCGGATCACCGGAAAGGACCCGTAAAGTGATAATGATTATCATCTA
CATATCACAACGTGCGTAAAGGGACTagtggatGtt tGATCTCAAAAAAAGCACCTTATTTT
CC GGCTTGTCCTTTACGGTTCACACGAGCAATCCAGGACCCTAAGATAC GGCCAACCTCCGC
GATAAGCACCTGAGCAGTCTCGACCTGGTGCGGAGTCATGGCATGCGGC TTTTGAATGCCTG
CCAAAAAGCGAAGCCAGAAGCGTAACATCGCCAAACCCGCGTCAGCAGCATACAGTTTCGAG
AC CTGCTTTCATTTACCTGCTACAATCAACACTTCCACCTCTCCCAACACACATTTCAAAAA
CATTTCGCGTGCGACTCCATGTT TACGAGGGATTGACTGAGCAATGGGATACAAGTATGAGA
TGACGCGTTCATAGCGCTCTACAATCAACATCTGATCGTAACACTTGGTAGCCTCTTCAATA
GGTTCCATaciaaact tt ct cct ctttaataCTAGTatt atacctaggactgagctagctgt c
agTCGGGTAGCACCAGAAGTCTATAGCATGtgcataCCTTTGGTCGAAA_AAAAAAGCCCGCA
CTGTCAGGTGCGGGCTTTTT7CaGTGTTTCCt tgccggagaagt cggccccgcctt t ccatt
tt cagt aat cgacgt tttgccgaaccgtgt aacgtgtt cgccgaagcaggacagaa ccggct
ga ccacctatt gaggaaaggccagcccgccaagccgtagcgtt ggcagagccatgcaagaag
gt gatgggcacagaTGAGACCAGCGGATCCTACCTGACGCTTTTTATCGCAACTCTCTACTG
TT TCTCCATACCCGTITTTTTGGATGGAGT GAAACGATGCAGT7TAAGG TTTACACCTATAA
AAGAGAGAGCCGTTATCGTC: T;TTTGIT;GAT1 ACAGA;;TGATATTATTGACACC;CCMGGC
GACGGATGGTGATCCCCCTGGCCAGTGCACGTCTGCTGTCAGATAAAGTCTCCCGTGAACTT
TACCCGGTGGTGCATATCGGGGATGA.AAGCTGGCGCATGATGACCACCGATATGGCCAGTGT
GC CGGTGTCCGTTATCGGGGAAGAAGTGGC TGATCTCAGCCACCGCGAAAATGACATCAAAA
AC GCCATTAACCTGATGTTCTGGGGAATATAAGAGCTCCAGCTTTTGTT CCCTTTAGTGAGG
GT TAATTGCGCGCAATTCAGGTC TCGt t at tt cccagcctgccct tACTAt gcaaccatt at
ca ccgccagaggt aaaat t gt caacacgca cggt gt t agct caaaaat a aa caaaa gagt tt
gt agaaacgcaaaaaggccat ccgt caggatggcct t ctgctt aat t t gat gcctggcagtt
t a tggcgggcgt cct gcccgcca ccct ccgggccgt tgctt cgcaacgt t caaat ccgct cc
cggcggat t tgt cct TTACCCCC CCCCCTGCCACTCATCCCACTATTCT TCTAATTCATTAA
GCATTCTGCCGACATGGAAGCCATCACAAACGGCATGATGAACTTGGAT CGCCAGTGGCATT
CA 03206795 2023- 7-27

WC) 2022/175383
PCT/EP2022/053934
62
AACACCTIGTCGCCTTGCGTATAATATTTTCCCATAGTGAAAACGGGGGCGAAGAAGTTOTC
CATAT TTGC TACGTT TAAATCAAAAC TGGT GAAACTCACCCACGGATTGGCACTGACGAAAA
ACATAT TT T CGATAAACCC TAGGGAAATAT GC TAAGTTT TCAC CGTAACAC GCCACAT C T
TGAC TATATAT GT GTAGAAAC TG CCGGAAATC GT CGT GGTATT CT GACCAGAGCGATGAAAA
CG TT TCAGT TT GC TCATGGAAAACGGIGTAACAAGGGT GAACACTATCC CATATCACCAGCT
CACC GI Cl riCArEGC CATAC GAAAC IC CG GA1GIGCATTCAT CAGGCG GGCAAGAAT GT GA
ATAAAGGCC GGATAAAACT TGTG CTTAT TT TT CT TTAC GGT TT TTAAAAAGGC CGTAATAT C
CAGC TGAAC GGTT TGOTTATAGG TGCAC TGAGCAAC T GACT GGAATGCC TCAAAAT GT TC TT
TACCAT CCCAT TCAC T TATA: CAACT CTAC TATATCCACTCAT TT TT TT CT CCATT TTACCT
TO CT TAGCT TGCGAAATCT CGATAAC TCAAAAAATAGTAGT GATO TTAT TT CATTATGGT GA
AA GT TGTCT TACGTGCAACAT TT TCGCAAAAAGT TGGC GCT TTAT CAACAC TGTCCCT CC T G
TT CAGC TAC CGGC CAGCCT CGCAGAGCAGGAT IC CC GT TGAGCAC CGCCAGGT GCGAATAAG
GGACAGTGAAGAAGGAACACC CG CTC GC GC GT GGGC C TACT TCAC CTAT CC TGCCC GGCT GA
CG CC GT TGGATACAC C,AAGGAAAGTC TACACGAACC C T TTGGCAA AATC CT GTATA TC GT CO
CAAAAACCATC CATATACC CAAAAAATC CC TATAATCACCCCCAACCAC CC TTATC CACC C C
AAAAGC GCT GGTACC CAAT TC GC CCTATAGTGAGTCTCCTGGAAGTGAGAGGGCCGCGGCAA
AG CC GT TTT IC CATAGGCT CC GC CCC CC TGACAAGCAT CAC GAAATCTGAC GC TCAAATCAG
TGGT GGCGAAACC t GACAGGACTATAAAGATACCAGGC GTT TCCCCCTGGC GGCTCCC TC GT
CC GC TC TCC TGTT CC T GCC TC GGTTTAC CGGT GT CATTCCGCT GTTA TGGCCGC GT TT GT
CT CATT CCACGCC TGACAC TCAG TTCCGGG TAGGCAGT TCGCT CCAAGC TGGACTGTATGCA
CGAACCCCCCGTTCAGTCCGACC GCT GC GC CT TATCC GGTAAC TATCGT CT TGAGT CCAACC
CG GAAAGACAT GCAAAAGCAC CACTGGCAG CAGC CAC T GGTAATT GATT TAGAGGAGTTAGT
CT TGAAGTCAT GC GC C GGT TAAG GCTAAAC TGAAAGGACAAGT TGGT GACT GCG CT CC T C
CAAGCCAG1 IACC1CGGJICAAAGAG 1 GG1AGC1CAGAGAACC1 ZCGAAAAACCG CC C.: GL_:
,ACSGC:c60I 1 1 1 1 IT:(-il I 1 I C:AC,AGC:AACA(-:A I I
ACC7C:(7L:AC6ACCAAAACGATC: I L:AA
TR_AMO 01 21 cg ct gctgcgctatt cggcggcaactggaa caacacgt
cgaact cgggt t ct cgcg ct gaga
actggaacaacgggccgtcgaactcgaacgcgaacat cggggcgcgcggcg
TR AMO 04 22 ttc.tatggcttttggttcgtttotttcgcaaacgcttgag
TR_AMO 07 23
tgccgtatgtttccttatatggcttttggttcytttctttcgcaaacgcttgagttgcgcct
cctgccagcagtgcggtagtaaaggttaat actgttgc
TR AMO 09 24 tttgtggcetttatettetaL:gtdgtgaggdt L-
tdgL:gtatgyttgtegoetgagetgt
agtt gcct
TR_AMO 10 25
cgtgatagtttgL:gdL:dgtgL:cgtL:dgL:gttttgtaatggL:LatgtLL:cdddL:gtcL:dgg
cc tttt go
140 01 26
ATAACGCTGCTOCGCTATTCGOGGGCAACTGSAACAACACOTCGAACTCGGGTTCTCGCGC
AMO 02 27
000A00000AGAA0000AGT700ACGTGTTOTTCCAGTT0000000AATA2000ACCA000
AMO 30 25
TGCGAACTGCAACAAOGGGCCGICGAACTCGAACGCGAACATOGGGGCGCGOGGLOG
100 31 29
CAGACGCCGCGCGCCCOG/LICTIOCCGTTCGACTICGACCGCCC:GTTGITC:CAGIT
AMC) 0 7 33 ATAATTATAT000TMCGT700TTICITTCSCAAA000TTGAG
71/10 0 8 31 CA GACT CAACC GT TT S'CGAAACAAAC GAAC
CAAAACCCATATAA
AMO 17 32 ATCATOCCSTIA'l I CC'I 'I A'l A'l f-OC7
iiiiri I CC'l I C'l 'I "C(;CAFACC;C'l
CA 03206795 2023- 7- 27

WC)2022/175383
PCT/EP2022/053934
63
AMO1B 33 CT
CAACCGTITCCGAAAGAAACCAACCAAAAGCCATATAAGGAAACATACCGCA
2\_MO 19 34
TCAGTTCCCCCTICTSCIAGCACTGCGCTACIAAAGGTIAATAC-EGTTCC
A_M 0 2 0 35
CAGAGCAACAGTATTAACCT7TACIPCCGCA7TGCTGC7AGCACCCCCPA
A_M 0 2 1 36
ATAATTIGICGCCITIATCT7CIACGTACTGACGATCTOICACCGIATGGITGICGCCIGAG
CI CIAC ITC CCT
AM 0 22 37
CAGAAGGCAACTAGAC6CICAGGCGACAACCATACGCTGAGAGAGCCICACIACGIAGAAGAI
AAAGGC CACP_AA
AM024 35
ATAACGTGATACTTTGCGACAGTGCCGTCAGICTITTGIAATGGCCAGCTGTOCCAAACGT:
CAGGCC ITT TGC
AM025 39
CAGACCAAAAGGCCTGGACG7TTGGCACACCICGCCATTACAAAACGCTGACGGCACIGICS
CAAACTATCACC
AM027 43
ATAACTTTGCCCGTCIGGCTGCCTTCATACGGACCGCCCTCGCCTICGCCITCGATCICGAA
CT CGTGACCCT T
.Z\_MO 28 41
CACAAACGGICAGGASTICGAGATCGAAGGCCAAGGCGAGGGCCGICCGTAIGAAGGrA007
AGACCGCC1TAA7
TR RL 0 16 42 tgccgtatgtttccttatatggcttttggttcgttt
ctttcgcaaacgcttgagttgcgcct
cot go c
TR_A94009 target 43
ATACTAACTATTTCTCCCCIGTATCTICTACCTACTCACCATC7CICACCGTATCCTICIC2
(wt/nt strand 1; CC TGAGCTGTAGT TGCCTTCATC GATGA
Fig. 30)
TR_AM009 target 44
TATGATICATAA/ICACCGGAAATAGAAGAIGCTCACTICTAGAGAGTCGCATACCAACAG7
(wt/nt strand 2; GGACTCGACATCAACGGAAGEAGCTACT
Fig. 30)
TR_A94009 target 45 VLYKEIGKDEVYHPDRLTHNDCSS YNGED IF
(wt/aa; Fig.30)
VaLiallL-TR_AM009 46
ATACTAACIPITTGIGGCCT7TICCTICIACSIAGIGAGGATC7CICAGCCIAIGGITGICC
(Fic3C n 1) CC TGAGCTGIACTIGGCTICATC GATGA
Variant-OR_AM009 47
ATACTAAGTATTTCTSGCCIGTGTCTICTGCSTGCTGAGGATC7CICAGCGTATSCTIGICG
(Fic3C n 2) CC TCACCTCTACT TCCCTTCATC CATCA
Variant-TR_AM009 46
ATACTAAGTAITTGT'GGCCT7TAICTICIACSTAGIGASCATC7CICOGCCIATSGTIGICC
(Fic3C n 3) CC TGAGCTCTACTIGCCITCATC CATGA
Variant-OR_AM009 49
ATACTAACTATTTGTCGCCIGGATCTICTACGTACTGAGGIT070ICAGCGTOTGCTIGICC
(Fic3C n 4) CC TGAGCTOTACT TGCCTTCATC GATGA
TR_Alv0 10 target 53 AT CTCGTAGCCGTGATAGT TGGC
GACACTGCCCTCACCGTITTGTAATCGCCAGCTGICCCA
(wt/nt strand 1; AA CGTC CAGGC CT T T I GCAGAAGAGATA
Fig .SC)
CA 03206795 2023- 7- 27

WC)2022/175383
PCT/EP2022/053934
64
TR_At60 10 target 51
TACACCATCCGCACTATCAAACGCTOTCACGC1CAGTOCCAAAACATTAC000TCCACA000T
(wt/nt strand 2; TTGCAGGTCCGGA71A1CGTC7TCTCTAT
Fig. 3C)
TR AM010 target 52 IHYCHYNAVTGD/2NQLPWSDWVDLOKY2SSI
(wt/aa; Fig.3C)
Variant-SR_AMOTO 53
ATGTGGTACCCGTGATAGIT7GOCACAGTCCCnTnAGCSTTITGTAATCCOCACC7CTCCC1
(Flo-3C n 1) CEnC7CnACCOnT7TTOCAGAAGACATA
Variant-TR AMOTO 54
ATGTGGTAGCCGTGATAGITTGCGACAGTGCOCTCAGCGTITTGTAATGGCOGGOTGTOCCA
(Fin-3C n 2) AACGTOCAGGCCITTIGCAGAAGAGATA
Variant-TR AMOTO 55
_ATGTGUEACCCGTGTTAGIT7GOCACAGTCCCnTnAGCSTTITG7CATCCOCACC7CTCCC1
(Fic3C n 3) AACCICCACCOnT7TTOCAGAAGACATA
Variant-SR AMOTO 56
ATCTCCTACCOCTCATACT=COCTCACTCCOCTCACCCTITTOTACTCCCCACOTCTOCCA
(Fic3C n04) AACC7CCACCOCT7T7CCACAAnACATA
TR_RL016 target 57
ATATGCCAAATGCCGTATGITTCCTTATATCSCTITTGSTTC=TCTITCGCAAACGCTTC
(wt/nt strand 1; ACTTGCGCCTCCTGCCAGCAGTGCGG
Fig.3C)
TR_FLO16 target 53
TATACCCITTACGCCATACAAAGGAATATACCCAAAACCAACCAAAGAAACCGTITGOGAAC
(wt/nt strand 2; TCAACGCGGAGGACGOTCGTCACGCC
Fig.3C)
TR_2M009 target 59 IHSIGYTEKYPKONTEKAFAQTAGGALLA2
(wt/aa; Fig.3C)
Variant-TR_191,916 63
ATATGGGAAATGCCGTATGT7TOCTTATA=CTITTG=TCG7TTOTTTCGOTTACGOTTC
(Fic3C n 1) AGTTGCGCCTOCTGCCACCAGTGOGG
Variart-TF_FF016 61
ATATGGGAAATGCCGIGTGT7TOCTICTAIGGCT2T2GSTTCG7T2CTITCGCGAACGOTTG
(Flo-3C n 2) AGITGCGCCInnTGCCAGnAGTGCGG
Variant-SR_R1,016 62
ATATGOGAAATGCCGTATGITTOCTITTATOSCTOTTOSTTC=TOTTTCGCGTACGOTTC
(Fic3C n 3) AGITGCGCCICCTGCCAGnAGTGOGG
Variant-TR_RT.016 63
ATATGGGAAATGCCGTGTGT7TOCTICTATGGCTITTGSTTCG7TTOTTTCGCTITCGOTTG
(Flo-3C n 4) 667TGCGCCICCTGCCAGCAGTGCGG
TR_AM004 target 64
GTATGT7TOCTIATAT000T7TIGGITC=TCTIT000AAACGC117GAGT7GCSOCTCC
(wt/nt strand
1;Fig.3D)
TR_AM004 target 65
CATACAAAGGAATATACCGAAAACCAACCAAACAAAGCSTTTGOGAACTCAACCOGGAGG
(wt/nt strand 2;
Fig. 3D)
TR AMOU4 target 66 YTEKYPKQNTEKAFA.2TAGG
(wt/nt aa;
Fig 3D)
Variant-SR_AM004 67
GTATGTTTCCTTATAIGGCTTTIGGITCGTTTCTITCGCCTACGCTTGAGTTGCSCCTCC
CA02067952023-7-27

WC)2022/175383
PCT/EP2022/053934
(Fic3D)
TR AM007 target 64
OGGAAAT0000TATOTTTCC7TATATOGOTTTTOGTTOSTITC7TT000AAA000TTGAGTT
(wt/nt strand 1;
CCGCCTCCTGCCAGCAGTGCGGTAGTAAAGCTTAATACTGTTCCTTCTTT
Fig. 3D)
TR_AM007 target 69
CCOTTTACCCCATACAAACCAATATACCCAAAACCAACCAAACAAACCCTTTCCCAACTCAA
(wt/nt strand 2;
CGCGGAGGACGGTCGTCACGCCATCATTTCCAATTATGACAACGAACAAA
Fig. 3D)
TR_AM007 target 73 SICYTEKYPKQNTEKAFAQTACCALLATTFTLVTAQK
(wL/aa; Fig.3D)
Variant-DR_AM007 71
GGGAAATGCCGIGTGITTCC7TATATGGCTTTIGGTTCGTITC7TTCGOTTACGCTTGAGTT
(Fic3D n 1)
GCGCCTCCTGCCAGCAGTGCGGTAGTRAAGGTTRATACTGITGOTTGITT
Variant-DR_AM007 72 C;(;(-4AAAT=C;TA=TTCC-TATATCTTTTTTMTFTTC-
TTCC;C;TTTC=TC;AC;TT
(Fic3D n'2)
CCCCCTCCTCCCCCCACTCCCGTACTAAACCTICATACTCTICCTTCTIT
Variant-DR_AM007 73
GGGAAATGOCGTATGITTOC7TTTATGGOTTTIGGTTOSTITC7TTCGCARACGCTTGGGTT
(Fic3D n'3)
GCGOCTOCTECCAGCSGTGCGGIAGTRAAGGTITATACTGITGCTTGITT
Variant-DR AM007 74
GOGAAATG000TATGITTCC7TATATOGOTTTIGGTTOSTITC7TT0000TT000TTGAGTT
(Fic3D n'4)
COCCCTOCTCCCACCCGTCCCGTACTAAACCTTAATACTCTICCTTCTIT
TR_AM011 target 75
GTCACTTICAGITTGGOGGICTGGGIGCCTTCATACGGACGGCCOTCGCCTTCGOCTTCGAT
(wt/nt strand 1; CTCGAACTCGTGACCGTTAACAGAACCC
Fig .30)
TR AM011 target 76
CAGTGAAAGTCAAACCGCCAGACCCACGGAAGTATGOCTGCCGGGAGOGGAAGOGGAAGOTA
(wt/nt strand 2; GAGCTTGAGCACTGGCAATTGTCTTGGG
Fig. 3D)
VarianL-TR AM011 77
GTCACTTICAGITTGGOGGICTGGGIGCCTTCATACGGACGGCCOTCGCCTTCGOOTTCGAT
(Fic3D n 1) CTOGGTOTCGTGACCGTTAACAGAACCC
Variant-DR_AM011 73 prcig,rim:r(-;cci
1/V1(;CG(;;;C(;(;CCC'I'C(;CC'l LC(,=1-VCC,CI
(Fic3D n 2) CTOGAACTOCTGACCTTAACAGAACCC
Variant-DR AM011 79
OTCACTTICAGITTOSCOGICTGOGTOCCTTCACACGOACCGCCOTCGCCTTCOCCTTOGAT
(tic 3D n 3) CTOGGCCTOCTGACCSTTAACAGAACCC
Variant-DR_AM011 82 FLCA(,11M;;CIC'll,1(,CCI 1/V1(;C(-
4(;ACC;(;CCC'LCC;CCT'1==r1
(Fic3D n 4) CTCGTOCTOCTGACCTTAACAGAACCC
Variant-TR_AM 09 81
ATACTAACTATTTCTCCOCT7TOTOTTOTACCTCCTCACCATCTOTCACCGTATCCTTCTCC
(Fic6A n 5) COTGAGOTGTAGTTGOCTICATCGATGA
Variant-TR_AM009 82
ATACTAAGTATTIGTGGCCT7TGTOTTOTACGTAGTGAGGATC7CTOGGCGTATSGTTGTOG
(Fic6A n 6) CCTGAGCTGTAGTTGCCTICATOGATGA
Variant-DR_AM009 83
ATACTAACTATTTCTCCOCT7CTTOTTOTACCTACTCACCATCTOTCACCGTCTCCTTCTCC
(Fic6A n07) COTGGGCTGTAGTTGOCTICATCGATGA
CA02067952023-7-27

W02022/175383
PCT/EP2022/053934
66
Variant-SR AM009 84
ATACTAACTATTTCT8CCCT7TATCTICTAC=ACTCAC1CTTC7CTCA000TOTCCTTCTCC
(F c6A n 8) CCTGAGCTCTACTTGCCTICATCCATGA
TR_AM011 target 85
CACTITCAGTTIG000GTCTGGGTGGCTTCATACGGACGG00070000ITCGCCITCGATCT
(wt/nt strand 1; CGAACTCCTCACCCTTAACACAAC
Fig. 6B)
TR AM01 target 1 86
GTGAAAGTCAAACCGCCAGACCOACGGAAGTATGOCTGOCGGGAGOGGAAGOGGAAGOTAGA
(wt/nt strand 2; GoTTGAGCACTGGCAATTGTCTTG
Fig. 6B)
Variant-SR_AMOTT 87
GAGTTTCAGTTIGGCGGGCTCGGTGGCTTCATACCGGCGGCCC70000ITCGCCITCGATCT
(Fic6B n 5) GGAACTCGTGACCGTTAACACAAC
Variant-SR_AM011 88
CACTITCAGITTGGCSGTCTGGCTGCCITCGTCCGGCCSGCCCTCGCCTTCGCCIAGGATCT
(Fic6B n 6) CCTCCTCCTCACCCTTAACACAAC
TR_AM009 target 89
ACTAACTATTT=000TITATCTTGTACCTACTCACCATCTC7CA000TATCCITCT0000
(wt/nt strand 1; TGAGCTGTAGTTGCCTTCATCGATGA
Fig.6B)
TR_AM009 target 93
T0ATT0ATAAACACCOGA0A7ACAA0ATC0ATCA0T00TA0ACA0T0GCATA007\1\0A0000
(wt/nt strand 2; ACTCGACATCAACGGAAGTAGCTACT
Fig. 6B)
Variant-SR_AM009 91
ACTAACTATTTGT0000TITTTOTTGTACCTACTGAGGTTCT070A000TTTGGITGT0000
(Fic6B n 9) TCACCTCTACTICCCITCATCCATCA
Variant-SR_AM009 92
ACTAAGTATTTGT0000TITATCTTCGACGTGCTGGGGATCTC7CA000TCTGGIAGTCGCC
(Fic6B n 10) TCAGCTGTACTTGCCTTCATCGATGA
Sequence below 93
0000ITCCTA000ATCTTCACTICCAGTCTTTCCITCAAATAC7AACTATTTOT0000TTTA
ploL in Figure
TCTTCTACGTAGTGAGGATC7CTCAGCGTATSGTTGTCGCCTGAGCTGTAGTTGCCTTCATC
7A GATGAACTGCTGTACATTTTGATACG
lamB reference 94
ATGAT1T11LACT(TCGCAAACTICC1=G17(1(AFEGCCGICGCAGCCTG2GTAATGTCTGC
sequence
TCAGGCAATGGCTGI2GACCACGGClATGCACGT1CCGGTA1TGGTTGGACAGGTAGC5
GOGGTGAACAACAGTGTTTCCAGACTACCGGTGCTCAAAGTAAATACCGICTTGGCAACGAA
TOTGAAACTTATGCTGAATTAAAATIGGGTCAGGAAGTGIGGAAAGAGGGCGATAAGAGCTT
CTATTICGACACTAACGIGGCCTATICCGTCGCACAACAGAATGACTGGGAAGCTACCGATC
GGGCCTICCGTGAAGCAAACGTGCAGGGTAAAAACCTGATCGAATGGCTGCCAGGCTCCACC
ATCTGGGCAGGTAAGCGCTTCTACCAACGTCATGACGTTCATATGATCGACTICTACTACTG
GGATATTTCTGGTCCTGGTGCCGGTCTGGAAAACATCGATGTTGGCTTCGGTAAACTCTCTC
TGCCACCAACCCCCTCCTCTCAACCTCCTCCTTCTTCCTCTTTCCCCACCAACAATATTTAT
GACTATACCAACGAAACCGCGAACGACGTITTCGATGTGCGTTTAGCGCAGATGGAAATCAA
CCCGGGCGGCACATTAGAACTGGGTGTCGACTACGGTCGTGCCAACTTGCGTGATAACTATC
GTCTGGTTGATGGCGCATCGAAAGACGGCTGGTTATTCACTGCTGAACATACTCAGAGTGTC
CTGAAGGGCTTT1\ACAAGTT-GTTSTTCAGTACGCTACTGACTCGATgACCTCG0AGGGTA4
AGGGCTGTCGCAGGGTTCTGGCGTTGCATTTGATAACGAAAAATTTGCCTACAATATCAACA
ACAACGGTCACATGCTGCGTATCCTCGACCACGGTGCGATCTCCATGGGCGACAACTGGGAC
ATGATGTACGTGGGTATGTACCAGGATATCAACTGGGATAACGACAACGGCACCAAGTGGTG
GACCGTCGGTATTCGCCCGATGTACAAGTGGACGCCAATCATGAGCACCGTGATGGAAATCG
GCTACGACAACGTCGAATCCCAGCGCACCGGCGACAAGAACAATCAGTACAAAATTACCCTC
CCACAACAATCCCACCCTCCCCACACCATCTCCTCACCOCCCCCTATTCCTCTCTICCCAAC
CTACGCCAAGTGGGATGAGAAATGGGGTTACGACTACACCGGTAACGCTGATAACAACGCGA
ACTTCGGCAAAGCCGTTCCTGCTGATTTCAACGGCGGCAGCTTCGGTCGTGGCGACAGCGAC
GACTCGACCTTCCCTCCCCACATCGAAATCTCCTCCTAA
CA 0205795 2023-7-27

WO 2022/175383
PCT/EP2022/053934
67
qpC," re ference
95 AT GGGTAAAGGAAGCAGTAAGGG GCATACC CCGCGCGAAGCGAAGGACAACCTGAAGTCCAC
$equence
GCAGTTGCTGAGTGTGATCGATGCCATCAGCGAAGGGCCGATTGAAGGTCCGGTGGATGGCT
TAAAAAGCGTGCTGCTGAACAGTACGCCGG TGCTGGACACTGAGGGGAATACCAACATATCC
GG TGTCACGGTGGTGTTCCGGGC TGGTGAG CAGGAGCAGACTCCGCCGGAGGGATTTGAATC
CT CCGGCTCCGAGACGGTGCTGG GTACGGAAGTGAAATATGACACGCCGATCACCCGCACCA
TTACGTCTGCAAACATCGACCGTCTGCGCTTEACCTTCGGEGTACAGGCACTGGIGGAAACC
AC CTCAAAGGGTGACAGGAATCC GTCGGAAGTCCGCCTGCTGGTTCAGATACAACGTAACGG
TGGCTGGGTGACGGAAAAAGACATCACCAT TAAGGGCAAAACCACCTCGCAGTATCTGGCCT
CC CTCCTCATCCCTAACCTCCCC CCCCCCC CCTTTAATATCCCGATCCC CACCATCACCCCC
GACAGCACCACAGACCAGCTGCAGAACAAAACGCTCTGGTCGTCATACACTGAAATCATCGA
TGTGAAACAGTGCTACCCGAACACGGCACTGGTCGGCGTGCAGGTGGACTCGGAGCAGTTCG
GCAGCCAGCAGGTGAGCCGTAAT TATCATCTGCGCGGGCGTAT7CTGCAGGTGCCGTCGAAC
TATAACCCGCAGACGCGGCAATACAGCGGTATCTGGGACGGAACGTTTA_AACCGGCATACAG
CAACAACATGGCCTGOTGTC7GTGGGATATGCTGACCCATCCGCGCTACGGCATGGGGAAAC
CT CTTCCTCCGCCCCATCTCCATAAATCCC CCCTCTATCTCATCCCCCACTACTCCCACCAC
TCAGTGCCGGACGGCTTTGGCGGCACGGAGCCGCGCATCACCTGTAATGCGTACCTGACCAC
ACAGCGTAAGGCGTGGGATGTGCTCAGCGATTTCTGCTCGGCGATGCGCTGTATGCCGGTAT
GGAACGGGCAGACGCTGACGTTCGTGCAGGACCGACCGTCGGA7AAGACGTGGACCTATAAC
CG CAGTAATGTGGTGATGCCGGATGATGGC GCGCCGTTCCGCTACAGCT TCAGCGCCCTGAA
GGACCGCCATAATGCC.GTTGAGGTGAACTGGATTGACCC.GAACAACGGCTGGGAGACGGCGA
CAGAGCTTGTTGAAGATACGCAGGCCATTGCCCGTTACGGTCGTAATGTTACGAAGATGGAT
GC CTTTGGCTGTACCAGCCGGGG GCAGGCACACCGCGCCGGGCTGTGGC TGATTAAAACAGA
AC TGCTGGAAACGCAGACCGTGGATTTCAG CGTCGGCGCAGAAGGGCTT CGCCATGTACCGG
GC GAM T Al GAAAT CIGCGAT GAT GACT AI GC CG GIATCAG CACCGG
CGT GT GC IG
GC GG' I'GA AC: AG CC: AGA CC:CGG Ai: GC' G A CC; i
ACC C; = "GA A A* T: ACC;C:* Ti CC A- i -CC:*i*CC: GC;
TACCGCGCTGATAAGC.CTGGCTGACGGAAGTGGCAATCCGGTCAGCGTGGAGGTICAGTCCG
TCACCGACGGCGTGAAGGTAAAAGTGAGCCGTGTTCCTGACGG7GTTGCTGAATACAGCGT21
TG GGAGCTGAAGCTGCCGACGCT GCGCCAG CGACTGTTCCGCTGCGTGAGTATCCGTGAGAA
CGACGACGGCACGTATGCCACCACCGCCGT GCAGCATGTGCCGGAAAAAGAGGCCATCGTGG
ATAACGGGGCGCACTTTGACGGC GAACAGAGTGGCACGGTGAA7GGTGT CACGCCGCCAGCG
GT GCAGCACCTGACCGCAGAAGT CACTGCAGACAGCGGGGAATATCAGG TGCTGGCGCGATG
GGACACACCGAAGGTGGTGAAGG GCGTGAG TTTCCTGCTCCGTCTGACC GTAACAGCGGACG
AC GGCAGTGAGCGGCTGGTCAGCACGGCCC GGACGACGGAAACCACATACCGCTTCACGCAA
CT COCCCTCCCCAACTACACCCT CACACTC CCCOCCCTAAATCCCTCCC CGCACCACCCCCA
TCCGGCGTCGGTATCGTTCCGGATTGCCGCACCGGCAGCACCGCCGAGGATTGAGCTGACGC
CGGGCTATTITCAGATAACCGCCACGCCGCATCTTGCCGTTTATGACCCGACGGTACAGTTT
GAGTTCTGGITCTCGGAAAAGCAGATTGCGGATATCAGACAGG7TGAAACCAGCACGCGTTA
TC TTGGTACGGCGCTGTACTGGATAGCCGC CAGTATCAATATCAAACCG GGCCATGATTATT
AC TTTTATATCCGCAGTGTGAACACCGTTG GCAAATCGGCATTCGTGGAGGCCGTCGGTCGG
CC CACCCATCATCCCGAACCTTACCTCCAT TTITTCAAACCCAACATAACCCAATCCCATCT
CC CCAACCACCTCCTGCAAAAAC TCCACCT CACCCACCATAACGCCACCAGACTCCACCACT
TT TCCAAACACTCCAACCATCCCACTCATAACTCCAATCCCATGTCCCCTGTCAAAATTCAC
CAGACCAAAGACGGCAAACACTATGTCGCG GGTATTGGCCTCAGCATGGAGGACACGGAGGA
AG GCAAACTGAGCCAGTTTCTGG TTGCCGC CAATCGTATCGCATTTATT GACCCGGCAAACG
GGAATGAAACGCCGATGTTTGTGGCGCAGGGCAACCAGATATTCATGAACGACGTGTTCCTG
AAGCGCCTGACGGCCCCCACCATTACCAGCGGCGGCAATCCTCCGGCCTTTTCCCTGACACC
GGACGGAAAGCTGACCGCTAAAAATGCGGATATCAGTGGCAGTGTGAAT GCGAACTCCGGGA
CGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACGGTACGCTGAGGGCGGAAAAA
AT CGTCGGGGACATTGTAAAGGC GGCGAGC GCGGCTTTTCCGCGCCAGC GTGAAAGCAGTGT
(-X; ACTGGC:CC:i 'CAGGT AC:CCGTA CM' re AC C:;-;*i
GACCGATGACCATC:CTTTTGA'irGCCAGA
TAGTGGTGCTTCCGCTGACGTTT CGCGGAAGTAAGCGTACTGTCAGCGG CAGGACAACGTAT
TCGATGTGTTATCTGAAAGTACTGATGAACGCTGCGGTGATTTATGATGGCGCGOCGAACGA
GGC.C4GT AC:AC=GTGTTC:TCC:C:GTATTGTTGACATGC.CAGCGGGTCGGGC;A AACGTSATCCTGA
CGTTC:AC:C;C:TT AC:GTC:C:AC:ACGGCATTCGSCAC=ATATTCCGCC:GT ATAC GTTTC.C.C:AGCG
AT
GT GCAGGTT ATGGTGATTA AG AA ACAGGCC4CTMGC ATCAC;C(1^(1GTCT
pAM020
96 AC TCACACATACACAGCCTCAAACACCCCATCCTCCTTATCCAATCAAACCTCCCCACAACA
CGGGAGCCAGTGACGCCTCCCGTGGGGAAAAAATCATGGCAATTCTGGAAGAAATAGCGCTT
TCAGCCGGCAAACCTGAAGCCGGATCTGCGATTCTGATAACAAACTAGCAACACCAGAACAG
CC CGTTTGCGGGCAGCAAAACCC GTACCCTAGGTCTAGGGCGGCGGATT TGTCCTACTCAGG
AGAGCGTTCACCGACAAACAACAGATAAAACGAAAGGCCCAGTCTTTCGACTGAGCCTTTCG
TT TTATTTGATGCCTGGTTTGTAGAGTTCATCCATGCCGTGCGTGATACCTGCTGCAGTAAC
GAACTCCAGCAGCACCATGTGGTCGCGCTTTTCGTTCGGGTC=GGACAGt t t AGACTGGG
TGGACAGGTAGTGGTTATCCGGCAGCAGAACCGGACCATCACCGATCGGAGTGTTCTGCTGG
CA 03206795 2023- 7-27

WO 2022/175383
PCT/EP2022/053934
68
TAGTGGTCCGCCAGCTGTACGCTACCGTCTTCAACGTTATGGCGAATTTTGAAGTTAGCTTT
GATACCGTTCTTCTGTTTGTCTGCGGTGATGTAAACGTTATGGGAGTTGAAGTTATATTCCA
GTTTGTGGCCCAGGATGTTGCCGTCCTCTTTGAAATCAATGCCTTTCAGTTCAATACGGTTC
ACCAGAGTATCACCTTCAAATTTAACCTCTGCACGGGTTTTGTAGGTGCCATCGTCTTTGAA
AGAAATGGTGCGCTCCTGTACATAACCTTCCGGCATTGCAGATTTGAAGAAATCATGCTGCT
'ECATGTGATCCGGGTAACGAGAAAAACACTGAACACCATAGGICAGGGTAGICACCAGAGIC
GGCCATGGAACCGGCAGTTTACCGGTAGTGCAGATGAATTTCAGGGTCAGTTTACCGTTGGT
TGCATCACCTTCACCTTCACCACGAACAGAGAATTTGTGGCCGTTAACATCACCATCCAGTT
CAACCACCATCCCAACAACACCGCTCAACACTTCTTCACCTTTACTCATTTTTCCCTCCTAA
CTAGGTCATTTGATATGCCTCCGGATATCACTCTATCAATGATAGAGAGCTTATTTTAATTA
TGCTCTATCAATGATAGAGTGTCAATATTTTTTTTAGTTTTTCATGAACTCGAGGGGATCCA
AATAAAAAACTAGTTIGACAAATAACTCTATCAATGATATAATGTCAACAAAAAGGAGGAAT
TAATGATGTCTAGATTAGATAAAAGTAAAGTGATTAACAGCGCATTAGAGCTGCTTAATGAG
GTCGGAATCGAAGGTTTAACAACCCGTAAACTCGCCCAGAAGCTAGGTGTAGAGCAGCCTAC
ATTCTATTCCCATCTAAAAAATAACCCCCCTTTCCTCCACCCCTTACCCATTCACATGTTAC
ATAGGCACCATACTCACTTTTGCCCTTTAGAAGGGGAAAGCTGGCAAGATTTTTTACGTAAT
AACGCTAAAAGTTTTAGATGTGCTTTACTAAGTCATCGCGATGGAGCAAAAGTACATTTAGG
TACACGGCCTACAGAAAAACAGTATGAAACTCTCGAAAATCAATTAGCCTTTTTATGCCAAC
AAGGTTTTTCACTAGAGAATGCATTATATGCACTCAGCGCTGTGGGGCATTTTACTTTAGGT
TGCGTATTGGAAGATC.AAGAGCATCAAGTCGC.TAAAGAAGAAAGGGAAACACCTACTACTGA
TAGTATGCCGCCATTATTACGACAAGCTATCGAATTATTTGATCACCAAGGTGCAGAGCCAG
CCTTCTTATTCGGCCTTGAATTGATCATATGCGGATTAGAAAAACAACTTAAATGTGAAAGT
GGGTCTTAAGACGTCGGAATTGCCAGCTGGGGCGCCCTCTGGTAAGGTTGGGAAGCCCTGCA
AAGIAAACIGGAIGGCTITC:IGCCGCCAAGGATCTGAIGGCGCAGGGGATCAAGATCTGAT
CA AC-;AG ACAC-G ATGAGATCGTVICGCMIT C-11-1 T ATT*1"1-ICTA AATACA TTCA AAT
AM'!" Al"
CCGCTCATGAGACAATAACCCTGATAAATGCTICAATAATATTGAAAAAGGAAGAGTATGAG
T2\TTC7111C1TTTCCGTGTCGCCCTT1TTCCCTITTTTGCGGCATTTTGCCTTCCIGTTTTTG
CTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGI
TACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGT=CGCCCCGAAGAACGTTI
TCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTGTIGACGCCG
GGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCA
GTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAAC
CATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAA
CCCCTTTTTICCACAACATCCGCCATCATCTAACTCCCCTTCATCCTTCCCAACCCCACCTC
AATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGCAGCAATGGCAACAACGTT
GCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGA
TGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATT
GCTGATAAATCTGGAGCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGA
TGGTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAAC
CAAATACACACATCCCTCACATACCTCCCTCACTCATTAACCATTCCTAACCCCCACTCTCC
CCTTCCACACCTCCCTTCCACTCCTCTTCATACATCCACTAATCACCTCACAACTCCATCTC
CATTTCTTCACAACCCTCCOTTCCCCCCGCCCCTTTTTTATTCCTCACAATCCAACCACTAC
GGACAGTAAGACGGGTAAGCCTGTTGATGATACCGCTGCCTTACTGGGTGCATTAGCCAGTC
TGAATGACCTGTCACCGGATAATCCGAAGTGGICAGACTGGAAAATCAGAGGGCAGGAACTG
CTGAACAGCAAAAAGTCAGATAGCACCACATAGCAGACCCGCCATAAAACGCCCTGAGAAGC
CCGTGACGGGCTTTTCTTGTATTATGGGTAGTITCCTTGCATGAATCCATAAAAGGCGCCTG
TAGTGCCATTTACCCCCATTCACTGCCAGAGCCGTGAGCGCAGCGAACTGAATGTCACGAAA
AAGACAGCGACTCAGGTGCCTGATGGTCGGAGACAAAAGGAATATTCAGCGATTTGCCCGAG
CTTGCGAGGGTGCTAC.TTAAGCCTTTAGGGTTITAAGGTCTGT7TTGTAGAGGAGCAAACAG
''rC-; CC; A C A I 'C' A AT AC:*I'GC C-;(-; AA
C' 0; A C A AA Or T-; *I- I' Al = AC A CA GC-;
TGGGATCTATTCTTTTTATCTTTTTTTATTCTITCTTTATTCTATAAATTATAACCACTTGA
ATATAAAC1121.21AAAAACACACAAAGGTCTAGCGGAATTTACAGAGGGTCTAGCAGAATTTAC
AAGTTTTC.C:AGCAAASGTC:TAGC:AGAATTTAC:AGATAC:MACW-C:TCC.C.CAGAAAAGGACTA
C;TAATTATC:ATTGAC:TAGC:C:CATCTC.AATTGSTATAGTSATTAAAATC:ACCTAGAC:C:AATTC;
AGATGTATGTCTGAATTAGT^GTTTTC:AAAGC:AAATGAACTAGCGATTAGTCGCTATGACTT
AACGGAGCATGAAAC:::APAC:-AATTTTATGCTC:TGTGGC:AC:TAC:TC:AACCC:CACSATTGAAA
ACCCTACAAGGAAAGAACGGACGGTATCGTTCACTTATAACCAATACGCTCAGATGATGAAC
ATCAGTAGGGAAAATOCTTATGGTGTATTAGCTAAAGCAACCAGAGAGCTGATGACGAGAAC
TCTCCAAATCACCAATCCTTTCGTTAAACCCTTTCACATTTTCCACTCCACAAACTATCCCA
AGTTCTCAAGCGAAAAATTAGAATTAGTTTTTAGTGAAGAGATATTGCCTTATCTTTTCCAG
TTAAAAAAATTCATAAAATATAATCTGGAACATGTTAAGTCTTTTGAAAACAAATACTCTAT
GAGGATTTATGAGTGGTTATTAAAAGAACTAACACAAAAGAAAACTCACAAGGCAAATATAG
AGATTAGCCTTGATGAATTTAAGTTCATGTTAATGCTTGAAAATAACTACCATGAGTTTAAA
AGGCTTAACCAATGGOTTTTGAAACCAATAAGTAAAGATTTAAACACTTACAGCAATATGAA
CA 03206795 2023- 7-27

W02022/175383
PCT/EP2022/053934
69
ATTOGTOGTTGATAASCGAGGCCGCCCGACTGATACGT7GATT7TCCAAG7TGAACTAGATA
GACAAATGGATCTCGTAACCGAACTTGAGAACAACCAGATAAAAATGAATGGTGACAAAATA
CCAACAACCATTACATCAGNETCCTACCTACGTAACGGACTAAGAAAAACACTACACGATGC
TTTAACTGCAAAAATTCAGC7CACCAGTTTTGAGGCAA1ATTT7TGAGTGACATGCAAAGTA
AGCATGATCTCAATGGTTCG=TCTCATGGCTCACGCAAAAACAACGAACCACACTAGAGAAC
ATACIGGCTAAATAGGGAAGGATCTGAGGTIC1TAIGGCTC1IGTAICGATCAGIGAAGCA1
CAAGACTAACAAACAAAAGTAGAACAACTGTTCACCGTTAGATATCAAAGGGAAAACTGTCC
ATATGCACAGATGAAAACGG7GTAAPAAAGATAGATACATCAGAGCTTTTACGAGTTTTTGG
TGCATTTAAACCTGTTCACCATCAACAGATCCACAATCTAACAGATCAACACCATCTAACAC
CTAATAGAACAGGTGAAACCAGTAAAACAAAGCAACTAGAACATGAAATTGAACACCTGAGA
CAACTTGTTACAGCTCAAC
TRRL 0 29 97
GGCGGTACAGGIGTTCTCCCGTATTGTTGACATGCCAGCGGGTCGGGGAAACGTCATCCTGA
_
CGTTCACGCTTACGTCCACACGGCATTCGGCAAATATTCCGCCGT
TRRLO 31 9B
TTIGTOGCCITTATOTTOTACGTAGTGAGGATCTCACASCGTA7GGTTGTOGOCTGAGCTOT
_ A(4T'1(;CC'l
TFLJRL039 99
GTCGGGGACATTG2AAAGGCGGCGAGCGCGGCTITTCCSCGCCACCGTGAAAGCAGTGTGGA
CTGGCCGTCAGGTACCCGTACTGTCACCGTGACCGATGACCATCCTTTTGATCGCCAGATAG
TGCTC(a_TrICCGCTGACSTICSCCCGC
TR RL 043 100
COGTACGCTGAGGGCOGAAAAAATCGTCGGGGACATTOTAAAGGTGOCGAGCGCSGC7T7TC
CGCGCCAGCGTGAAAGCAGTGTGGACTGGCCGTCAGGTACCCGGACTGIC
TRRL 0 55 101
CGGCTGGITATTCACTGCTGAACATACTCAGAGTGTCC7GAAGGGCTITAACAAGTT7G7TG
_ TTCAGTACGCTACTGACTCGATGACCTCGC
TRRL 0 61 102
TTICTGGICSTGGTGSCGGICTGGARAACATGEATGTTGGCTTCGGTARACTSTSTCTGGCA
_ GCAACCCGCTCCTCTS
TPLJW_021 103
CCCGAGTGTCATCATCTOGICGCTOGGGAATSAATCAGSCCACGOCGCTAATCACGACGCGC
TCTATCGCTCCATC
T R_AN: 0 22 104
GA7CCAGCT;ATACAG=C;TCGTC;ATTAGC(=G7GC4C7,TGAT-CATTCCCCA=ACCAC;A
1MATCACACTC
TR_AMO 23 105
COGGGTAACCAGAAAAACAC7GAACACCATAGETCAGGGTAG7CACCAGAGTOGGCCATGGA
AC C:(4(4C:AC1 IIAC:CC4:4
TR 11/.024 106
CCGGTAAACTOCCCGTTCCAGGGCCGACTC=TGACTACCCTGACCTATGOTOTTCAGTOT
TTITCTCGTTACCCGS
TR FLO55 target 107 GACGC1CTMTTATTCACTC;C-
GAACATACTCAC,AC1T=CTC;AACW=FTTAACAA=MTTT
(wt/nt strand 1; TGTTCAGTACSCTAGTGACTCGATGAGCTCGCA
Fig. 8B)
TR_RL055 target 108
CTCCCCACCAATAACTGACCACTTECTATCACTOTCACACCACTSCCCCAAATTGITCAAACA
(wt/nt strand 2 ACAAGTCATGCGATGACTGAGCTACTGGAGCGT
from 3' to 5';
Fig. 8B)
TR_RL055 target 109 DOWLFTAEHTQSVLKSFNKFVVQYATDSMTS2
(wL/aa; Fiy.8B)
Variant-TR_R1,055 110
GACGGCTGOTTATTCACTGC7GAACATACTCACGGTGTOCTGAAGGGCTTTACCSGOTT7GT
(Fic8B n 1) TCTTCCGTACCCTACTCCCTCCATCACCTCCCA
CA 03206795 2023- 7- 27

WC)2022/175383
PCT/EP2022/053934
Variant-SR R11055 111
GACGGCTGGITATT000TGC7GAGCATACTCAGAGTGGCCTGAAGGGCTTT000AAGTTTGT
(Fic8B n 2) TOTTCAGTACGCTACTGACTCGATGACCTCGCA
Variant-TR_R1,055 112
GACGGCTGGTTATTCACTGC7GGACATCCTCAGAGTGTOCTGAAGGGCTTTAACAAGTTTGT
(Fic8B n'3) TCTTCAGTACCOTACTCACTOCATCACCTCCCA
VariantTR_RL055 113
GACGGCTGGITATTCACTGC7GTCCGTCCTCGGAGTGTCCTGAAGGGCTTTAACAAGTTTGT
(Fic8B n 4) TGTTCAGTCCGCTACTGACTCGATGACCTCGCA
Variant-SR RL055 114
CACGCCTCCTTATTCACTCC70GC0ATCCTO6CACTGTOCTCAACCCCITTCACAACTTTCT
(Fic8B n 5) TCTTCACTACCCTACTCACTCGATCACCTOCCA
Variant-SR_RL055 115 CACCC_71117-kl1t7CCTCCICGCA1AC1CA6AGIGCC1CAGC7T1MCAATiThi
(Fic8B n 6) TOTTCAGTACCCTACTGACTOGATCACCTOCCA
Variant-SR RL055 116
GACGGCTGGITATTCACTGC7GAACATACTCAGAGTGTOCTGAA6660TTTAACAAGTTTGT
(Fic8B n 7) TGTTCAGTACGCTGCTGTCTCGATGACCTCGCA
TR_BL029 target 117
AACGAGGCGGTACAGGTGITCTCCCGTATTGTTGACATSCCAGCGGGICGGGGAAACGTGAT
(wt/nt strand 1;
CCTGACGTTCACGCTTACGTCCACACGGCATTCGGCAAATATTCCGCCGTATA
Fig. 8C)
TR RL029 target 11S (;(:'I
(;;ACJAA(;1\(;(:;CA'l A ACAAC7 ICL A;(;(;'1 (:(;C:CCAC;(:(:(:(:'1"1 'I
(;CAC'l 4 \
(wt/nt strand 2
GGACTGCAAGTGCGAATGCAGGTGTGCCGTAAGCCGTTTATAAGGCGGCATAT
from 3' to 5';
Fig. 86)
TR_FL029 target 119 NEAVWFSRIVDMPAGRCNVILIFTLTSTRHSADIPPY
(wt/aa; Fig.8C)
Variant-SR RL029 120
AACGAGGCGGTACGGSTGTICTOCCGTATTGTTGACATSCCGGCGGGICGGGGAAACGTGAT
(Fic8C n 1)
CCTGACGTTCGCGCTTACGTGCACACGGCATTCGGCAAATATTCCGCCGTAT
Variant-SR_R1,029 121
AACGACCTCCGACACCTCTICTCCCGTGTTCTTCACATSCCAGCCGGICCGCCAAACGTCCT
(Fic8C n 2)
COTCACniTnACCOTTACninCACACCGOATTnGnOACATATTCOCCOnTAT
Variant-TR_RL029 122
AACGAGGCGGTACAGSTGTICTOCCGTGTTGTTGACATSCCGGCGGGICGGGGASACGTGAT
(Fiq80 n 3)
CCTGGCGTTCACGCTTACGTCCACACGGCATTCGGCAASTATTCCGCCGTAT
Variant-SR_RL029 123
AACCACCCCGTACACCTCTICTOCCCTGTTCTTGACGTCCCACCCCCTCCGCCAAACGTCAT
(Fic8C n 4)
CCTCACCTTCACCCTTACCTCCACACCCCATTCCCCAAATATTCCCCCCTAT
Variant-SR_RL029 124
AACGAGGCGGTACAGSTGTICTOCCGTGTTGTTGACATSCCAGCGGGICGGGGAAACGTGAT
(tic 80 n 5)
CCTGACGTTCACGCTTACGTCCACACGGCATTCGGCAAATATTCCGCCGTAT
Variant-TR_R12029 125
AACCAGGCGGSACAGGTOTTCTCCCGTATTGTTGGCATGCCAGCGCGTCGGGGAAACGTGAT
(Fic8C n 6)
OCTGACOTTCACGOTTACGTOCACACCGOATTOGGOAGATATTOCCCOGTAT
VarianL-TR_RI,029 126
AACGAGGCGCGACAGGTGTICTOCCGTATTGTTGACATSCCAGCGGGTOGGGGASACGTGAT
(Fic8C n 7)
CCTGACGTTCGCGOTTGCGTOCACTCGGCATTCGGCAAGTATTCCGCCGTAT
* Recoded gene sequences
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
71
Table 6 - TR cloning oligonucleotide sequences. Oligonucleotide sequences used
for TR cloning by
Golden gate assembly. Forward (fwd) and reverse (rvs) oligos are annealed,
producing sticky ends
compatible for Golden gate assembly into plasmid pRL021. The longer Ti?
sequences can be assembled
by two or three pairs of oligos, annealed independently and further joined
during the Golden Gate
assembly reaction.
Oligo name Sequence TR
AM001 ATAACGCTGCTGCGCTATTCGGCGGCAACTGGAACAACACGTCGAAC
Plasmid pAM001 TR pair 1, fwd
TCGGGTTCTCGCGC (SEQ ID NO : 26)
AM002 CGCAGCGCGAGAACCCGAGTTCGACGTGTTGTTCCAGTTGCCGCCGA
Plasmid pAM001 TR pair 1, rvs
ATAGCGCAGCAGCG (SEQ ID NO : 27)
AM030 TGCGAACTGGAACAACGGGCCGTCGAACTCGAACGCGAACATCGGG Plasmid
pAM001 TR pair 2, fwd
GCGCGCGGCG (SEQ ID NO : 28)
AM031 CAGACGCCGCGCGCCCCGATGTTCGCGTTCGAGTTCGACGGCCCGT Plasmid
pAM001 TR pair 2, rvs
TGTTCCAGTT (SEQ ID NO :29)
AM007 ATAATTATATGGCTTTTGGTTCGTTTCTTTCGCAAACGCTTGAG (SEQ
Plasmid pAM004 TR fwd
ID NO: 30)
AM008 CAGACTCAAGCGTTTGCGAAAGAAACGAACCAAAAGCCATATAA (SEQ
Plasmid pAM004 TR rvs
ID NO: 31)
AM017 ATAATGCCGTATGTTTCCTTATATGGCTTTTGGTTCGTTTCTTTCGCAA
Plasmid pAM007 TR pair 1, fwd
ACGCT (SEQ ID NO 32)
AM018 CTCAAGCGTTTGCGAAAGAAACGAACCAAAAGCCATATAAGGAAACAT
Plasmid pAM007 TR pair 1, rvs
ACGGCA (SEQ ID NO .33)
AM019 TGAGTTGCGCCTCCTGCCAGCAGTGCGGTAGTAAAGGTTAATACTGTT
Plasmid pAM007 TR pair 2, fwd
GC ((SEQ ID NO: 34)
AM020 CAGAGCAACAGTATTAACCTTTACTACCGCACTGCTGGCAGGAGGCG
Plasmid pAM007 TR pair 2, rvs
CAA (SEQ ID NO :35)
AM021 ATAATTTGT GGCCTTTATCTTCTACGTAGTGAGGATCTCTCAGCGTATG
Plasmid pAM009 TR fwd
GTTGTCGCCTGAGCTGTAGTTGCCT (SEQ ID NO: 36)
AM022 CAGAAGGCAACTACAGCTCAGGCGACAACCATACGCTGAGAGATCCT
Plasmid pAM009 TR rvs
CACTACGTAGAAGATAAAGGCCACAAA (SEQ ID NO: 37)
AM024 ATAACGTGATAGTTTGCGACAGTGCCGTCAGCGTTTTGTAATGGCCAG
Plasmid pAM010 TR fwd
CTGTCCCAAACGTCCAGGCCTTTTGC (SEQ ID NO : 38)
AM025 CAGAGCAAAAGGCCIGGACGITTGGGACAGCTGGCCATTACAAAACG
Plasmid pAM010 TR rvs
CTGACGGCACTGTCGCAAACTATCACG (SEQ ID NO: 39)
AM027 ATAAGTTTGGCGGICTGGGIGCCTICATACGGACGGCCCTCGCCTIC
Plasmid pAM011 TR fwd
GCCTTCGATCTCGAACTCGTGACCGTT (SEQ ID NO. 40)
AM028 CAGAAACGGTCACGAGTTCGAGATCGAAGGCGAAGGCGAGGGCCGT Plasmid
pAM011 TR rvs
CCGTATGAAGGCACCCAGACCGCCAAAC (SEQ ID NO: 41)
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
72
REFERENCES
[1] S. Doulatov et al., "Tropism switching in Bordetella
bacteriophage defines a family of
diversity-generating retroelements," Nature, vol. 431, no. 7007, pp. 476-481,
Sep.
2004.
[2] B. G. Paul et al., "Retroelement-guided protein diversification abounds in
vast lineages
of Bacteria and Archaea," Nature Microbiology, vol. 2, no. 6, pp. 1-7, Apr.
2017.
[3] L. Wu et al., -Diversity-generating retroelements: natural
variation, classification and
evolution inferred from a large-scale genomic survey," Nucleic Acids Res.,
vol. 46, no.
1, pp. 11-24, Jan. 2018.
[4] W. Dai, A. Hodes, W. H. Hui, M. Gingery, J. F. Miller, and Z. H. Zhou, -
Three-
dimensional structure of tropism-switching Bordetella bacteriophage," Proc.
Natl.
Acad. Sci. U. S. A., vol. 107, no. 9, pp. 4347-4352, Mar. 2010.
[5] H. Guo et at., "Target site recognition by a diversity-
generating retroelement," PLoS
Genet., vol. 7, no. 12, p. e1002414, Dec. 2011.
[6] S. Handa et al., "Template-assisted synthesis of adenine-mutagenized cDNA
by a
retroelement protein complex," Nucleic Acids Res., vol. 46, no. 18, pp. 9711-
9725, Oct.
2018.
[7] S. A. McMahon etal., "The C-type lectin fold as an evolutionary
solution for massive
sequence variation," Nat. Struct. Mol. Biol., vol. 12, no. 10, pp. 886-892,
Oct. 2005.
[8] S. Handa, B. G. Paul, J. F. Miller, D. L. Valentine, and P. Ghosh,
"Conservation of the
C-type lectin fold for accommodating massive sequence variation in archaeal
diversity-
generating retroelements," BMC Struct. Biol., vol. 16. no. 1, p. 13, Aug.
2016.
[9] S. S. Naorem et at., "DGR mutagenic transposition occurs via
hypermutagenic reverse
transcription primed by nicked template RNA," Proc. Natl. Acad. Sci. U. S. A.,
vol. 114,
no. 47, pp. E10187-E10195, Nov. 2017.
[10] B.simon, A. Nyerges, and C. Pal, "Targeted mutagenesis of multiple
chromosomal
regions in microbes," Curr. Opin. Microbiol., vol. 57. pp. 22-30, Jun. 2020.
[11] A. J. Simon, S. d'Oelsnitz, and A. D. Ellington, "Synthetic evolution,-
Nat. Biotechnol.,
vol. 37, no. 7, pp. 730-743, Jul. 2019.
[12] K. M. Esvelt, J. C. Carlson, and D. R. Liu, "A system for the continuous
directed
evolution of bic-)molecules," Nature, vol. 472, no. 7344, pp. 499-503, Apr.
2011.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
73
[13] S. 0. Halperin, C. J. Tou. E. B. Wong, C. Modavi, D. V. Schaffer, and J.
E. Dueber,
-CRISPR-guided DNA polymerases enable diversification of all nucleotides in a
tunable window," Nature, vol. 560, no. 7717, pp. 248-252, Aug. 2018.
[14] B. Alvarez, M. Mencla, V. de Lorenzo, and L. A. Fernandez, "In vivo
diversification of
target genomic sites using processive base deaminase fusions blocked by
dCas9," Nat.
Commun., vol. 11, no. 1, p. 6436, Dec. 2020.
[15] A. J. Simon, B. R. Morrow, and A. D. Ellington, "Retroelement-Based
Genome Editing
and Evolution," ACS Synth. Biol., vol. 7, no. 11, pp. 2600-2611, Nov. 2018.
[16] N. Crook, J. Abatemarco, J. Sun, J. M. Wagner, A. Schmitz, and H. S.
Alper, "In vivo
continuous evolution of genes and pathways in yeast," Nat. Commun., vol. 7, p.
13051,
Oct. 2016.
[17] S. P. Finney-Manchester and N. Maheshri, "Harnessing mutagenic homologous

recombination for targeted mutagenesis in vivo by TaGTEAM,- Nucleic Acids
Res., vol.
41, no. 9, p. e99, May 2013.
[18] E. Sharon, S.-A. A. Chen, N. M. Khosla, J. D. Smith, J. K. Pritchard, and
H. B. Fraser,
"Functional Genetic Variants Revealed by Massively Parallel Precise Genome
Editing,"
Cell, vol. 175. no. 2, pp. 544-557.e16, Oct. 2018.
[19] S. C. Lopez, K. D. Crawford, S. Bhattarai-Kline, and S. L. Shipman,
"Improved
architectures for flexible DNA production using retrons across kingdoms of
life,"
bioRxiv, p. 2021.03.26.437017, Mar. 26, 2021.
[20] B. Zhao, S.-A. A. Chen, J. Lee, and H. B. Fraser, "Bacterial retrons
enable precise gene
editing in human cells," bioRxiv, p. 2021.03.29.437260, Mar. 29, 2021.
[21] E. M. Barbieri, P. Muir, B. O. AkInietie-Oni, C. M. Yellman, and F. J.
Isaacs, "Precise
Editing at DNA Replication Forks Enables Multiplex Genome Engineering in
Eukaryotes," Cell, vol. 171, no. 6, pp. 1453-1467.e13, Nov. 2017.
[22] F. Farzadfard and T. K. Lu, "Synthetic biology. Gcnomically encoded
analog memory
with precise in vivo DNA writing in living cell populations," Science, vol.
346, no. 6211,
p. 1256272, Nov. 2014.
[23] M. G. Schubert et al., -High throughput functional variant screens via in-
vivo
production of single-stranded DNA," bioRxiv, p. 2020.03.05.975441, Mar. 06,
2020.
[24] F. Farzadfard, N. Gharaei, R. J. Citorik, and T. K. Lu, "Efficient
Retroelement-Mediated
DNA Writing in Bacteria," Cold Spring Harbor Laboratory, p. 2020.02.21.958983,

Feb. 22, 2020.
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
74
[25] S. Handa, A. Reyna, T. Wiryaman, and P. Ghosh, "Determinants of Selective
Fidelity
in Diversity-Generating Retroelements," Cold Spring Harbor Laboratory. p.
2020.04.29.068544, Apr. 30, 2020.
[26] T. M. Wannier etal., "Recombineering and MAGE," Nature Reviews Methods
Primers,
vol. 1, no. 1. pp. 1-24, Jan. 2021
[27] J. Garamella, R. Marshall, M. Rustad, and V. Noireaux, "The All E. coli
TX-TL
Toolbox 2.0: A System for Cell-Free Synthetic Biology," ACS Synth. Biol., vol.
5, no.
4, pp. 344-355, Apr. 2016.
[28] K. Yehl et al., "Engineering Phage Host-Range and Suppressing Bacterial
Resistance
through Phage Tail Fiber Mutagenesis." Cell, vol. 179. no. 2, pp. 459-469.e9.
Oct. 2019.
[29] S. Lemire, K. M. Yehl, and T. K. Lu, "Phage-Based Applications in
Synthetic Biology,"
Annu Rev Vim!, vol. 5, no. 1, pp. 453-476, Sep. 2018.
[30] S. Chatterjee and E. Rothenberg, -Interaction of bacteriophagel with its
E. coli receptor,
LamB," Viruses, vol. 4, no. 11, pp. 3162-3178, Nov. 2012.
[31] E. Berkane, F. Orlik, J. F. Stegmeier, A. Charbit, M. Winterhalter, and
R. Benz,
"Interaction of bacteriophage lambda with its cell surface receptor: an in
vitro study of
binding of the viral tail protein gpJ to LamB (Maltoporin)," Biochemistry,
vol. 45, no.
8, pp. 2708-2720, Feb. 2006.
[32] J. R. Meyer, D. T. Dobias, J. S. Weitz, J. E. Barrick, R. T. Quick, and
R. E. Lenski,
"Repeatability and contingency in the evolution of a key innovation in phage
lambda,"
Science, vol. 335. no. 6067, pp. 428-432, Jan. 2012.
[33] C. Anders, 0. Niewoehner, A. Duerst, and M. Jinek, "Structural basis of
PAM-
dependent target DNA recognition by the Cas9 endonuclease," Nature, vol. 513,
no.
7519, pp. 569-573, Sep. 2014.
[34] F. St-Pierre, L. Cui, D. G. Priest, D. Endy, I. B. Dodd, and K. E.
Shearwin, "One-Step
Cloning and Chromosomal Integration of DNA," ACS Synth. Biol., vol. 2, no. 9,
pp.
537-541, Sep. 2013.
[35] L. C. Thomason, N. Costantino, and D. L. Court, "E. coli genome
manipulation by P1
transduction," Curr. Protoc. Mal. Biol., vol. Chapter 1, p. Unit 1.17, Jul.
2007.
[36] D. G. Gibson, L. Young, R.-Y. Chuang, J. C. Venter, C. A. Hutchison 3rd,
and H. 0.
Smith, "Enzymatic assembly of DNA molecules up to several hundred kilobases,"
Nat.
Methods, vol. 6, no. 5, pp. 343-345, May 2009.
[37] C. Engler, R. Gruetzner, R. Kandzia, and S. Marillonnet, "Golden gate
shuffling: a one-
CA 03206795 2023- 7- 27

WO 2022/175383
PCT/EP2022/053934
pot DNA shuffling method based on type Hs restriction enzymes," PLoS One, vol.
4,
no. 5, p. c5553, May 2009.
[38] J. L. Hartley, G. F. Temple, and M. A. Brasch, "DNA cloning using in
vitro site-specific
recombination," Genonte Res., vol. 10, no. 11, pp. 1788-1795, Nov. 2000.
5 [39] C. A. Schneider, W. S. Rasband, and K. W. Elicciri, "NIH Image
to Imaga: 25 years
of image analysis," Not. Methods, vol. 9, no. 7, pp. 671-675, Jul. 2012.
[40] T. M. Wannier et al., "Improved bacterial recombineering by parallelized
protein
discovery," Proc. Natl. Acad. Sci. U. S. A., vol. 117, no. 24, pp. 13689-
13698, Jun.
2020.
CA 03206795 2023- 7- 27

Representative Drawing

Sorry, the representative drawing for patent document number 3206795 was not found.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date Unavailable
(86) PCT Filing Date 2022-02-17
(87) PCT Publication Date 2022-08-25
(85) National Entry 2023-07-27

Abandonment History

There is no abandonment history.

Maintenance Fee

Last Payment of $125.00 was received on 2024-01-23


 Upcoming maintenance fee amounts

Description Date Amount
Next Payment if standard fee 2025-02-17 $125.00
Next Payment if small entity fee 2025-02-17 $50.00

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Patent fees are adjusted on the 1st of January every year. The amounts above are the current amounts if received by December 31 of the current year.
Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Application Fee $421.02 2023-07-27
Maintenance Fee - Application - New Act 2 2024-02-19 $125.00 2024-01-23
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
INSTITUT PASTEUR
Past Owners on Record
None
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Patent Cooperation Treaty (PCT) 2023-07-27 1 54
Drawings 2023-07-27 14 3,371
Description 2023-07-27 75 3,646
International Search Report 2023-07-27 4 108
Claims 2023-07-27 3 117
Patent Cooperation Treaty (PCT) 2023-07-27 1 62
Patent Cooperation Treaty (PCT) 2023-07-27 1 34
Correspondence 2023-07-27 2 49
National Entry Request 2023-07-27 9 262
Abstract 2023-07-27 1 16
Cover Page 2023-10-06 1 34

Biological Sequence Listings

Choose a BSL submission then click the "Download BSL" button to download the file.

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Please note that files with extensions .pep and .seq that were created by CIPO as working files might be incomplete and are not to be considered official communication.

BSL Files

To view selected files, please enter reCAPTCHA code :