Language selection

Search

Patent 2296814 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 2296814
(54) English Title: TREPONEMA PALLIDUM POLYNUCLEOTIDES AND SEQUENCES
(54) French Title: POLYNUCLEOTIDES ET SEQUENCES DE TREPONEMA PALLIDIUM
Status: Deemed Abandoned and Beyond the Period of Reinstatement - Pending Response to Notice of Disregarded Communication
Bibliographic Data
(51) International Patent Classification (IPC):
  • C12N 15/12 (2006.01)
  • A61K 38/00 (2006.01)
  • A61K 39/00 (2006.01)
  • C07H 21/00 (2006.01)
  • C07K 14/20 (2006.01)
  • C12N 15/11 (2006.01)
  • C12N 15/63 (2006.01)
  • G11B 23/00 (2006.01)
(72) Inventors :
  • FRASER, CLAIRE M. (United States of America)
(73) Owners :
  • HUMAN GENOME SCIENCES, INC.
(71) Applicants :
  • HUMAN GENOME SCIENCES, INC. (United States of America)
(74) Agent: MBM INTELLECTUAL PROPERTY AGENCY
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date: 1998-06-23
(87) Open to Public Inspection: 1998-12-30
Examination requested: 2003-06-12
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): Yes
(86) PCT Filing Number: PCT/US1998/013041
(87) International Publication Number: US1998013041
(85) National Entry: 1999-12-21

(30) Application Priority Data:
Application No. Country/Territory Date
60/050,667 (United States of America) 1997-06-24

Abstracts

English Abstract


The present invention provides polynucleotide sequences of the genome of T.
pallidum, polypeptide sequences encoded by the polynucleotide sequences,
corresponding polynucleotides and polypeptides, vectors and hosts comprising
the polynucleotides, and assays and other uses thereof. The present invention
further provides polynucleotide and polypeptide sequence information stored on
computer readable media, and computer-based systems and methods which
facilitate its use.


French Abstract

La présente invention concerne des séquences polynucléotidiques du génome du T. Pallidium, les séquences polypeptidiques codées par les séquences polynucléotidiques, les polynucléotides et polypeptides correspondants, les vecteurs et hôtes comportant ces polynucléotides, ainsi que leurs dosages et autres utilisations. La présente invention concerne également les informations relatives aux séquences polynucléotidiques et polypeptidiques qui sont stockées sur supports exploitables par ordinateur, ainsi que les systèmes et procédés informatiques facilitant leur utilisation.

Claims

Note: Claims are shown in the official language in which they were submitted.


1144
What Is Claimed Is:
1. Computer readable medium having recorded thereon the nucleotide sequence
depicted
in SEQ ID NOS:1-744, a representative fragment thereof or a nucleotide
sequence at least 95%
identical to a nucleotide sequence depicted in SEQ ID NOS:.
2. Computer readable medium having recorded thereon any one of the fragments
of SEQ
ID NOS:1-744 depicted in Tables 2 and 3 or a degenerate variant thereof.
3. The computer readable medium of claim 1, wherein said medium is selected
from the
group consisting of a floppy disc, a hard disc, random access memory (RAM),
read only memory
(ROM), and CD-ROM.
4. The computer readable medium of claim 3, wherein said medium is selected
from the
group consisting of a floppy disc, a hard disc, random access memory (RAM),
read only memory
(ROM), and CD-ROM.
5. A computer-based system for identifying fragments of the T. pallidum genome
of
commercial importance comprising the following elements:
a) a data storage means comprising the nucleotide sequence of SEQ ID NOS:1-
744, a
representative fragment thereof, or a nucleotide sequence at least 95%
identical to a nucleotide
sequence of SEQ ID NOS: 1-744;
b) search means for comparing a target sequence to the nucleotide sequence of
the data
storage means of step (a) to identify homologous sequence(s), and
c) retrieval means for obtaining said homologous sequence(s) of step (b).
6. A method for identifying commercially important nucleic acid fragments of
the T.
pallidum genome comprising the step of comparing a database comprising the
nucleotide
sequences depicted in SEQ ID NOS: 1-744, a representative fragment thereof, or
a nucleotide
sequence at least 95% identical to a nucleotide sequence of SEQ ID NOS 1-744:
with a target
sequence to obtain a nucleic acid molecule comprised of a complementary
nucleotide sequence to
said target sequence, wherein said target sequence is not randomly selected.
7. A method for identifying an expression modulating fragment of T. pallidum
genome
comprising the step of comparing a database comprising the nucleotide
sequences depicted in SEQ
ID NOS: 1-744, a representative fragment thereof, or a nucleotide sequence at
least 95% identical
to the nucleotide sequence of SEQ ID NOS 1-744: with a target sequence to
obtain a nucleic acid
molecule comprised of a complementary nucleotide sequence to said target
sequence, wherein said
target sequence comprises sequences known to regulate gene expression.

1145
8. An isolated protein-encoding nucleic acid fragment of the T. pallidum
genome, wherein
said fragment consists of the nucleotide sequence of any one of the fragments
of SEQ ID NOS
1-744: depicted in Tables 2 and 3, or a degenerate variant thereof.
9. A vector comprising any one of the fragments of the T. pallidum genome SEQ
ID
NOS: 1-744 depicted in Tables 2 and 3 or a degenerate variant thereof.
10. An isolated fragment of the T. pallidum genome, wherein said fragment
modulates the
expression of an operably linked open reading frame, wherein said fragment
consists of the
nucleotide sequence from about 10 to 200 bases in length which is 5 to any one
of the open
reading frames depicted in Tables 2 and 3 or a degenerate variant thereof.
11. A vector comprising any one of the fragments of the T. pallidum genome of
claim 8.
12. An organism which has been altered to contain any one of the fragments of
the T.
pallidum genome of claim 8.
13. An organism which has been altered to contain any one of the fragments of
the T.
pallidum genome of claim 10.
14. A method for regulating the expression of a nucleic acid molecule
comprising the step
of covalently attaching to said nucleic acid molecule a nucleic acid molecule
consisting of the
nucleotide sequence from about 10 to 100 bases 5 to any one of the fragments
of the T. pallidum
genome depicted in SEQ ID NOS: 1-744 and Tables 2 and 3 or a degenerate
variant thereof.
15. An isolated nucleic acid molecule encoding a homolog of any of the
fragments of the
T. pallidum genome of SEQ ID NOS 1-744: and Tables 2 and 3, wherein said
nucleic acid
molecule is produced by a process comprising steps of:
a) screening a genomic DNA library using as a probe a target sequence defined
by any of
SEQ ID NOS: 1-744 and Tables 2 and 3, including fragments thereof;
b) identifying members of said library which contain sequences that hybridize
to said target
sequence; and
c) isolating the nucleic acid molecules from said members identified in step
(b).
16. An isolated DNA molecule encoding a homolog of any one of the fragments of
the T.
pallidum genome of SEQ ID NOS: 1-744 and Tables 2 and 3, wherein said nucleic
acid molecule
is produced a process comprising steps of:

1146
a) isolating mRNA, DNA, or cDNA produced from an organism;
b) amplifying nucleic acid molecules whose nucleotide sequence is homologous
to
amplification primers derived from said fragment of said T. pallidum genome to
prime said
amplification;
c) isolating said amplified sequences produced in step (b).
17. An isolated polypeptide encoded by any of the fragments of the T. pallidum
genome
of SEQ ID NOS:1-744 and depicted in Table 2 and 3 or by a degenerate variant
of said fragments.
18. An isolated polynucleotide molecule encoding any one of the polypeptides
of claim
17.
19. An antibody which selectively binds to any one of the polypeptides of
claim 17.
20. A method for producing a polypeptide in a host cell comprising the steps
of:
a) incubating a host containing a heterologous nucleic acid molecule whose
nucleotide
sequence consists of any one of the fragments of the T. pallidum genome of SEQ
ID NOS:
1-744and depicted in Tables 2 and 3, under conditions where said heterologous
nucleic acid
molecule is expressed to produce said protein, and
b) isolating said protein.

Description

Note: Descriptions are shown in the official language in which they were submitted.


DEMANDES OU BREVETS VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DE~V1ANDE OU CE BREVET
COMPREND PLUS D'UN TOME.
CEC~ EST LE TOME ~ DE 'J~
NOTE. Pour (es tomes additionels, veuillez contacter Ie Bureau canadien des
brevets
JUMBO APPLICATIONS/PATENTS
THIS SECTION OF THE APPUCATION/PATENT CONTAINS MORE
THAN ONE VOLUME
. THIS IS VOLUME OF
' NOTE: For additional volumes-phase contact the Canadian Patent Officz; .

CA 02296814 1999-12-21
WO 98/59034 PCTIUS98/13041
Treponema pallidum Polynucleotides and Sequences
FIELD OF THE INVENTION
The present invention relates to the field of molecular biology. In
particular, it relates to,
among other things, nucleotide sequences of Treponema pallidum, contigs, ORFs,
fragments,
probes, primers and related polynucleotides thereof, peptides and polypeptides
encoded by the
sequences, and uses of the polynucleotides and sequences thereof, such as in
fermentation,
polypeptide production, assays and pharmaceutical development, among others.
BACKGROUND OF THE INVENTION
Spirochetes are a fanuly of motile, unicellular, spiral-shaped bacteria which
share a
number of structural characteristics. Three genera of the spirochetes are
pathogenic in humans:
(a) Treponema, which includes the pathogexts that cause syphilis (T.
pallidum), yaws (T.
pertenue), and pima (T. carateum); (b) Borrelia, which includes the pathogens
that cause
epidemic and endemic relapsing fever and Lyme disease; and (c) Leptospira,
which includes a
wide variety of small spirochetes that cause mild to serious systemic human
illness (Koff, A. B.
and Rosen, T. J. Am. Acad. Dermatol. 29:519-535 (1993)). In 1986, more than
27,000 cases
of early infectious syphilis were diagnosed in the United States alone. Such
statistics indicate
that infection with T. pallidum is the largest source of human disease
resulting from the
spirochetes.
T. pallidum is morphologically indistinguishable from several other pathogenic
spirochetes, but, in general, treponemes and other spirochetes, are easily
identifiable when
compared to other bacteria. A key morphological characteristic of T. pallidum,
and other
spirochetes, is the presence of a central protoplasmic cylinder composed
primarily of
peptidoglycan and one or more adjacent axial fibrils {also designated
periplasmic flagella or
endoflagella; Charon, N. W., et al., Res. Microbiol. 143:597-603 ( 1992)).
These structures
provide a source of corkscrew-like motion to the treponemes. In aqueous media,
treponemes
move in an apparently random fashion and, unlike the majority of motile
bacteria, continue to
move in a more viscous medium. In tissues, treponemes are highly moldable to
intercellular
spaces; a characteristic which is thought to be mediated by the interactions
of bacterial adhesins
and cellular fibronectins.
Syphilis is the primary clinical manifestation of infection with T. pallidum.
The clinical
manifestations of syphilis can resemble many diseases. Syphilis is typically
transmitted by
sexual contact, but can also be transmitted transplacentally. The infecting
organism multiplies at
the site of infection within 10 to 60 days postinfection and results in a
primary ulcer-like lesion
termed a chancre. A small number of organisms move from the primary lesion to
the regional
lymph nodes and establish small infectious centers termed satellite buboes.
Organisms from

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
2
these locations enter the blood stream and result in a systemic infection
(Goens, J. L., et al., Am.
Fam. Physician 50:1013-1020 (1994)).
The secondary stage of syphilis manifests itself as a widespread skin rash and
begins
between two and twelve weeks following the primary infection. During this
stage, the infected
individual often experiences a low grade fever coupled with swollen lymph
nodes. Also during
this period, lesions of various degrees of severity may develop in a number of
phyical locations
including bone, liver, kidney, central nervous system (CNS), and other organs
(Veeravahu, M.
Arch. Intern. Med. 145:132-134 ( 1985)). Such secondary infections are highly
infectious, but
will, in time, subside spontaneously.
A third stage of syphilis occurs in approximately 30% of infected, but not
treated,
individuals. The third stage occurs several years following the first and
second stages. The
lesions which characterize the third stage of infection are minor in terms of
the number of
organisms, but may be severe in terms of tissue damage. Such lesions may
result in necrosis,
scar formation, general paresis, damage to aortic valves, permanent blindness,
and other
extensive tissue damage, all probably related to a delayed type
hypersensitivity reaction by the
host to the T. pallidum organisms (Scheck, D. N. and Hook, E. W. 3'°
Infect. Dis. Clin. North
Am. 8:769-795 ( 1994)).
A further, and increasingly common, complication of syphilis infection is
coinfection
with the human immunodeficiency virus (HIV). In fact, a recent study indicates
that ulcerous
genital diseases such as those exhibited during the primary stages of
infection with syphilis may
facilitate the transmission of HIV (Rufli, T. Dermatologica 179:113-117
(1989)). In addition, it
is clear that the CNS is regularly involved in the early stages of syphilis.
In the timespan
between the introduction of penecillin and other antibotics and the spread of
HIV, early
neurosyphilis was an exceptionally uncommon development. However, since the
standard
antibiotic dosage used to treat syphilis is not exceptionally high and since a
successful treatment
requires an adequate host immune response, individuals infected with HIV often
exhibit a highly
increased occurance of many neurosyphilis-related sequalae including
asymptomatic
neurosyphilis, syphilitic meinigitis, cranial nerve abnormalities, or
cerebrovascular problems
(Musher, D. M., et al., Ann. Intern. Med. 113:872-881 ( 1990)).
T. pallidum has a remarkable ability to evade both the humoral and cellular
components of
the immune system. It was originally thought that the ability of T. pallidum
to evade the immune
system of the host organism was due to the presence of an outer coat of
mucopolysaccharides.
However, recent evidence suggests it is more likely that T. pallidum make use
of the organization
of the relative immunogenicity of its complement of outer membrane proteins to
evade the
immune system (Radolf, J. D. Mol. Microbiol. 16:1067-1073 (1995)). Unlike most
other
bacterial outer membranes characterized thus far, the T. pallidum outer
membrane contains a
scarcity of immunogenic transmembrane proteins (with regard to T. pallidum,
these are termed
"rare outer membrane proteins"). Among the highly immunogenic proteins of
treponemes are a
number of lipoproteins anchored to the periplasmic leaflet of the cytoplasmic
membrane. As a

CA 02296814 1999-12-21
WO 98/59034 PCTIUS98/13041
result of their physical location, the lipoproteins may be less susceptible to
typical immunologic
surveillance (Norris, 3. Microbiol. Rev. 57:750-779 ( 1993)). In addition to
the periplasmic
lipoproteins, T. pallidum also secretes a number of small, but immunogenic
proteins which may
induce an immune response (Hindersson, P. et al., Res. Microbiol. 143:629-639
( 1992)).
It is clear that the etiology of diseases mediated or exacerbated by T.
pallidum genes, and
that characterizing the genes and their patterns of expression would add
dramatically to our
understanding of the organism and its host interactions. Knowledge of T.
pallidum genes and
genomic organization would dramatically improve understanding of disease
etiology and lead to
improved and new ways of preventing, ameliorating, arresting and reversing
diseases.
Moreover, characterized genes and genomic fragments of T. pallidum would
provide reagents
for, among other things, detecting, characterizing and controlling T. pallidum
infections. There
is a need therefore to characterize the genome of T. pallidum and for
polynucleotides and
sequences of this organism.
SUMMARY OF THE INVENTION
The present invention is based on the sequencing of fragments of the T.
pallidum
genome. The primary nucleotide sequences which were generated are provided in
SEQ ID
NOS:1-744.
The present invention provides the nucleotide sequence of several thousand
contigs of the
T. pallidum genome, which are listed in tables below and set out in the
Sequence Listing
submitted herewith, and representative fragments thereof, in a form which can
be readily used;
analyzed, and interpreted by a skilled artisan. In one embodiment, the present
invention is
provided as contiguous strings of primary sequence information corresponding
to the nucleotide
sequences depicted in SEQ ID NOS: 1-744.
The present invention further provides nucleotide sequences which are at least
95%
identical to the nucleotide sequences of SEQ ID NOS: 1-744.
The nucleotide sequence of SEQ B3 NOS: 1-744 , a representative fragment
thereof, or a
nucleotide sequence which is at least 95% identical to the nucleotide sequence
of SEQ ID NOS:
1-744 may be provided in a variety of mediums to facilitate its use. In one
application of this
embodiment, the sequences of the present invention are recorded on computer
readable media.
Such media includes, but is not limited to: magnetic storage media, such as
floppy discs, hard
disc storage medium, and magnetic tape; optical storage media such as CD-ROM;
electrical
storage media such as RAM and ROM; and hybrids of these categories such as
magnetic%ptical
storage media.
The present invention further provides systems, particularly computer-based
systems
which contain the sequence information herein described stored in a data
storage means. Such
systems are designed to identify commercially important fragments of the T.
pallidum genome.
Another embodiment of the present invention is directed to fragments of the T.
pallidum
genome having particular structural or functional attributes. Such fragments
of the T. pallidum

1
CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
4
genome of the present invention include, but are not limited to, fragments
which encode peptides,
hereinafter referred to as open reading frames or ORFs, fragments which
modulate the
expression of an operably linked ORF, hereinafter referred to as expression
modulating
fragments or EMFs, and fragments which can be used to diagnose the presence of
T. pallidum
in a sample, hereinafter referred to as diagnostic fragments or DFs.
Each of the ORFs in fragments of the T. pallidum genome disclosed in Tables 1,
2 and 3,
and the EMFs found 5' to the ORFs, can be used in numerous ways as
polynucleotide reagents.
For instance, the sequences can be used as diagnostic probes or amplification
primers for
detecting or determining the presence of a specific microbe in a sample, to
selectively control
gene expression in a host and in the production of polypeptides, such as
polypeptides encoded by
ORFs of the present invention, particular those polypeptides that have a
pharmacological activity.
The present invention further includes recombinant constructs comprising one
or more
fragments of the T. pallidum genome of the present invention. The recombinant
constructs of the
present invention comprise vectors, such as a plasmid or viral vector, into
which a fragment of
the T. pallidum has been inserted.
The present invention further provides host cells containing any of the
isolated fragments
of the T. pallidum genome of the present invention. The host cells can be a
higher eukaryotic
host cell, such as a mammalian cell, a lower eukaryotic cell, such as a yeast
cell, or a procaryotic
cell such as a bacterial cell.
The present invention is further directed to isolated polypeptides and
proteins encoded by
ORFs of the present invention. A variety of methods, well known to those of
skill in the art,
routinely may be utilized to obtain any of the polypeptides and proteins of
the present invention.
For instance, polypeptides and proteins of the present invention having
relatively short, simple
amino acid sequences readily can be synthesized using commercially available
automated peptide
synthesizers. Polypeptides and proteins of the present invention also may be
purified from
bacterial cells which naturally produce the protein. Yet another alternative
is to purify
polypeptide and proteins of the present invention from cells which have been
altered to express
them.
The invention further provides methods of obtaining homologs of the fragments
of the T.
pallidum genome of the present invention and homologs of the proteins encoded
by the ORFs of
the present invention. Specifically, by using the nucleotide and amino acid
sequences disclosed
herein as a probe or as primers, and techniques such as PCR cloning and
colonylplaque
hybridization, one skilled in the art can obtain homologs.
The invention further provides antibodies which selectively bind polypeptides
and
proteins of the present invention. Such antibodies include both monoclonal and
polyclonal
antibodies.
The invention further provides hybridomas which produce the above-described
antibodies. A hybridoma is an immortalized cell line which is capable of
secreting a specific
monoclonal antibody.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
The present invention further provides methods of identifying test samples
derived from
cells which express one of the ORFs of the present invention, or a homolog
thereof. Such
methods comprise incubating a test sample with one or more of the antibodies
of the present
invention, or one or more of the DFs of the present invention, under
conditions which allow a
S skilled artisan to determine if the sample contains the ORF or product
produced therefrom.
In another embodiment of the present invention, kits are provided which
contain the
necessary reagents to carry out the above-described assays.
Specifically, the invention provides a compartmentalized kit to receive, in
close
confinement, one or more containers which comprises: (a) a first container
comprising one of the
antibodies, or one of the DFs of the present invention; and (b) one or more
other containers
comprising one or more of the following: wash reagents, reagents capable of
detecting presence
of bound antibodies or hybridized DFs.
Using the isolated proteins of the present invention, the present invention
further provides
methods of obtaining and identifying agents capable of binding to a
polypeptide or protein
encoded by one of the ORFs of the present invention. Specifically, such agents
include, as
further described below, antibodies, peptides, carbohydrates, pharmaceutical
agents and the like.
Such methods comprise steps of: (a)contacting an agent with an isolated
protein encoded by one
of the ORFs of the present invention; and (b)determining whether the agent
binds to said protein.
The present genomic sequences of T. pallidum will be of great value to all
laboratories
working with this organism and for a variety of commercial purposes. Many
fragments of the T.
pallidum genome will be immediately identified by similarity searches against
GenBank or
protein databases and will be of immediate value to T. pallidum researchers
and for immediate
commercial value for the production of proteins or to control gene expression.
The methodology and technology for elucidating extensive genomic sequences of
bacterial and other genomes has and will greatly enhance the ability to
analyze and understand
chromosomal organization. In particular, sequenced contigs and genomes will
provide the
models for developing tools for the analysis of chromosome structure and
function, including the
ability to identify genes within large segments of genomic DNA, the structure,
position, and
spacing of regulatory elements, the identification of genes with potential
industrial applications,
and the ability to do comparative genomic and molecular phylogeny.
DESCRIPTION OF THE FIGURES
FIGURE 1 is a block diagram of a computer system ( 102) that can be used to
implement computer-based systems of present invention.
FIGURE 2 is a schematic diagram depicting the data flow and computer programs
used
to collect, assemble, edit and annotate the condgs of the T. pallidum genome
of the present
invention. Both Macintosh and Unix platforms are used to handle the AB 373 and
377 sequence

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
6
data files, largely as described in Kerlavage et al., Proceedings of the
Twenty-Sixth Annual
Hawaii International Conference on System Sciences, 585, IEEE Computer Society
Press,
Washington D.C. ( 1993). Factura (AB) is a Macintosh program designed for
automatic vector
sequence removal and end-trimming of sequence files. The program Loadis runs
on a Macintosh
platform and parses the feature data extracted from the sequence files by
Factura to the Unix
based T. pallidum relational database. Assembly of contigs (and whole genome
sequences) is
accomplished by retrieving a specific set of sequence files and their
associated features using
Extrseq, a Unix utility for retrieving sequences from an SQL database. The
resulting sequence
file is processed to trim portions of the sequences with a high rate ambiguous
nucleotides. The
sequence files were assembled using TIGR Assembler, an assembly engine
designed at The
Institute for Genomic Research (TIGR ) for rapid and accurate assembly of
thousands of
sequence fragments. The collection of contigs generated by the assembly step
is loaded into the
database with the lassie program. Identification of open reading frames (ORFs)
is accomplished
by processing contigs with zorf. The ORFs are searched against T. pallidum
sequences from
GenBank and against all protein sequences using the BLASTN and BLASTP programs
(using
default parameters), described in Altschul et al., J. Mol. Biol. 215: 403-410
{1990). Results of
the ORF determination and similarity searching steps were loaded into the
database. As
described below, some results of the determination and the searches are set
out in Tables 1-3.
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
The present invention is based on the sequencing of fragments of the T.
pallidum genome
and analysis of the sequences. The primary nucleotide sequences generated by
sequencing the
fragments are provided in SEQ ID NOS: 1-744. As used herein, the "primary
sequence" refers
to the nucleotide sequence represented by the ILTPAC nomenclature system.).
In addition to the aforementioned T. pallidum polynucleotide and
polynucleotide
sequences, the present invention provides the nucleotide sequences of SEQ ID
NOS: 1-744, ORF
IDs and ORFs within, or representative fragments thereof, in a form which can
be readily used,
analyzed, and interpreted by a skilled artisan.
As used herein, a "representative fragment of the nucleotide sequence depicted
in SEQ ID
NOS:1-744" refers to any portion of the SEQ ID NOS: 1-744 which is not
presently represented
within a publicly available database. Preferred representative fragments of
the present invention
are T. pallidum open reading frames ( ORFs ), expression modulating fragment (
EMFs ) and
fragments which can be used to diagnose the presence of T. pallidum in sample
(DFs}. A non-
limiting identification of preferred representative fragments is provided in
Tables 1-3. As
discussed in detail below, the information provided in SEQ 1D NOS:1-744. and
in Tables 1-3
together with routine cloning, synthesis, sequencing and assay methods will
enable those skilled
in the art to clone and sequence all "representative fragments" of interest,
including open reading
frames encoding a large variety of T. pallidum proteins.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98113041
The present invention is further directed to nucleic acid molecules encoding
portions or
fragments of the nucleotide sequences described herein. Fragments include
portions of the
nucleotide sequences of SEQ ID NOS:1-744, at least 10 contiguous nucleotides
in length selected
from any two integers, one of which representing a 5' nucleotide position and
a second of which
representing a 3' nucleotide position, where the first nucleotide for each
nucleotide sequence in
SEQ ID NOS:1-744 is position 1. That is, every combination of a 5' and 3'
nucleotide position
that a fragment at least 10 contiguous nucleotides in length could occupy is
included in the
invention. At least means a fragment may be 10 contiguous nucleotide bases in
length or any
integer between 10 and the length of an entire nucleotide sequence of SEQ ID
NOS: l-744 minus
1. Therefore, included in the invention are contiguous fragments specified by
any 5' and 3'
nucleotide base positions of a nucleotide sequences of SEQ ID NOS:1-744
wherein the
contiguous fragment is any integer between 10 and the length of an entire
nucleotide sequence
minus 1.
Further, the invention includes polynucleotides comprising fragments specified
by size,
in nucleotides, rather than by nucleotide positions. The invention includes
any fragment size, in
contiguous nucleotides, selected from integers between 10 and the length of an
entire ORF ID,
ORF, or SEQ ID NO:, minus 1. Preferred sizes of contiguous nucleotide
fragments include 20
nucleotides, 30 nucleotides, 40 nucleotides, 50 nucleotides. Other preferred
sizes of contiguous
nucleotide fragments, which may be useful as diagnostic probes and primers,
include fragments
50-300 nucleotides in length which include, as discussed above, fragment sizes
representing each
integer between 50-300. Larger fragments are also useful according to the
present invention
corresponding to most, if not all, of the nucleotide sequences shown in Tables
1-3 (ORF IDs)
and SEQ ID NOS:1-744. The preferred sizes are, of course, meant to exemplify
not limit the
present invention as all size fragments, representing any integer between 10
and the length of an
entire nucleotide sequence minus 1, of each ORF ID, ORF, and SEQ ID NO:, are
included in the
invention.
The present invention also provides for the exclusion of any fragment,
specified by 5'
and 3' base positions or by size in nucleotide bases as described above for
any ORF ID or SEQ
ID NOS:l-744. Any number of fragments of nucleotide sequences in ORF IDs or
SEQ ID
NOS:1-744, specified by 5' and 3' base positions or by size in nucleotides, as
described above,
may be excluded from the present invention.
While the presently disclosed sequences of SEQ ID NOS: 1-744 are highly
accurate,
sequencing techniques are not perfect and, in relatively rare instances,
further investigation of a
fragment or sequence of the invention may reveal a nucleotide sequence error
present in a
nucleotide sequence disclosed in SEQ m NOS: 1-744. However, once the present
invention is
made available (i.e., once the information in SEQ ID NOS: 1-744 and Tables 1-3
has been made
available), resolving a rare sequencing error in SEQ ID NOS: 1-744 will be
well within the skill
of the art. The present disclosure makes available sufficient sequence
information to allow any of
the described contigs or portions thereof to be obtained readily by
straightforward application of

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
routine techniques. Further sequencing of such polynucleotide may proceed in
like manner using
manual and automated sequencing methods which are employed ubiquitous in the
art. Nucleotide
sequence editing software is publicly available. For example, Applied
Biosystem's (AB)
AutoAssembler can be used as an aid during visual inspection of nucleotide
sequences. By
employing such routine techniques potential errors readily may be identified
and the correct
sequence then may be ascertained by targeting further sequencing effort, also
of a routine nature,
to the region containing the potential error.
Even if all of the very rare sequencing errors in SEQ ID NOS: 1-744 were
corrected, the
resulting nucleotide sequences would still be at least 95% identical, nearly
all would be at least
99% identical, and the great majority would be at least 99.9% identical to the
nucleotide
sequences of SEQ ID NOS: 1-7441-744.
As discussed elsewhere herein, polynucleotides of the present invention
readily may be
obtained by routine application of well known and standard procedures for
cloning and
sequencing DNA. Detailed methods for obtaining libraries and for sequencing
are provided
below, for instance. A wide variety of T. pallidum strains can be used to
prepare T. pallidum
genomic DNA for cloning and for obtaining polynucleotides of the present
invention which are
known in th art.
The nucleotide sequences of the genomes from different strains of T. pallidum
differ
somewhat. However, the nucleotide sequences of the genomes of all T. pallidum
strains will be
at least 95% identical, in corresponding part, to the nucleotide sequences
provided in SEQ ID
NOS: 1-744 and the ORF IDs and ORFs within. Nearly all will be at least 99%
identical and the
great majority will be 99.9% identical.
The present application is further directed to nucleic acid molecules at least
90%, 95%,
96%, 97%, 98% or 99% identical to a nucleic acid sequence shown in SEQ ID NOS:
1-744, the
ORF IDs and ORFs within. The above nucleic acid sequences are included
irrespective of
whether they encode a polypeptide having T. pallidum activity. This is because
even where a
particular nucleic acid molecule does not encode a polypeptide having T.
pallidum activity, one of
skill in the art would still know how to use the nucleic acid molecule, for
instance, as a
hybridization probe. Uses of the nucleic acid molecules of the present
invention that do not
encode a polypeptide having T. pallidum activity include, inter alia,
isolating an T. pallidum gene
or allelic variants thereof from a DNA library, and detecting T. pallidum mRNA
expression
samples, environmental samples, suspected of containing T. pallidum by
Northern Blot, PCR, or
similar analysis.
Preferred, are nucleic acid molecules having sequences at least 90%, 95%, 96%,
97%,
98% or 99% identical to the nucleic acid sequence shown in SEQ ID NOS: 1-744,
the ORF IDs,
and the ORF within each ORF ID, which do, in fact, encode a polypeptide having
T. pallidum
protein activity By "a polypeptide having T. pallidum activity" is intended
polypeptides
exhibiting activity similar, but not necessarily identical, to an activity of
the T. pallidum protein of
the invention, as measured in a particular biological assay suitable for
measuring activity of the

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
9
specified protein.
Due to the degeneracy of the genetic code, one of ordinary skill in the art
will immediately
recognize that a large number of the nucleic acid molecules having a sequence
at least 90%, 95%,
96%, 97%, 98%, or 99% identical to the nucleic acid sequences shown in SEQ ID
NOS: 1-744,
the ORF IDs, and the ORF within each O1RF ID, will encode a polypeptide having
T. pallidum
protein activity. In fact, since degenerate variants of these nucleotide
sequences all encode the
same polypeptide, this will be clear to the skilled artisan even without
performing the above
described comparison assay. It will be further recognized in the art that, for
such nucleic acid
molecules that are not degenerate variants, a reasonable number will also
encode a polypeptide
having T. pallidum protein activity. This is because the skilled artisan is
fully aware of amino
acid substitutions that are either less likely or not likely to significantly
effect protein function
(e.g., replacing one aliphatic amino acid with a second aliphatic amino acid),
as further described
below.
The biological activity or function of the polypeptides of the present
invention are
expected to be similar or identical to polypeptides from other bacteria that
share a high degree of
structural identity/similarity. Table 1-3 lists accession numbers and
descriptions far the closest
matching sequences of polypeptides available through Genbank. It is therefore
expected that the
biological activity or function of the polypeptides of the present invention
will be similar or
identical to those polypeptides from other bacterial genuses, species, or
strains listed in Table 1-
3.
By a polynucleotide having a nucleotide sequence at least, for example, 95%
"identical"
to a reference nucleotide sequence of the present invention, it is intended
that the nucleotide
sequence of the polynucleotide is identical to the reference sequence except
that the
polynucleotide sequence may include up to five point mutations per each 100
nucleotides of the
reference nucleotide sequence encoding the T. pallidum polypeptide. In other
words, to obtain a
polynucleotide having a nucleotide sequence at least 95% identical to a
reference nucleotide
sequence, up to 5% of the nucleotides in the reference sequence may be
deleted, inserted, or
substituted with another nucleotide. The query sequence may be an entire
sequence shown in
SEQ ID NOS: 1-744, the ORF IDs, or the ORF within each ORF ID, or any fragment
specified
as described herein.
As a practical matter, whether any particular nucleic acid molecule or
polypeptide is at
least 90%, 95%, 96%, 97%, 98% or 99% identical to a nucleotide sequence of the
presence
invention can be determined conventionally using known computer programs. A
preferred
method for determining the best overall match between a query sequence (a
sequence of the
present invention) and a subject sequence, also referred to as a global
sequence alignment, can be
determined using the FASTDB computer program based on the algorithm of Brutlag
et al. See
Brutlag et al. ( 1990) Comp. App. Biosci. 6:237-245. In a sequence alignment
the query and
subject sequences are both DNA sequences. An RNA sequence can be compared by
first
converting U's to T's. The result of said global sequence alignment is in
percent identity.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
Preferred parameters used in a FASTDB alignment of DNA sequences to calculate
percent
identity are: Matrix=Unitary, k-tuple=4, Mismatch Penalty=1, Joining
Penalty=30,
Randomization Group Length=0, Cutoff Score=1, Gap Penalty=5, Gap Size Penalty
0.05,
Window Size=500 or the lenght of the subject nucleotide sequence, whichever is
shorter.
5 If the subject sequence is shorter than the query sequence because of 5' or
3' deletions,
not because of internal deletions, a manual correction must be made to the
results. This is
because the FASTDB program does not account for 5' and 3' truncations of the
subject sequence
when calculating percent identity. For subject sequences truncated at the 5'
or 3' ends, relative to
the query sequence, the percent identity is corrected by calculating the
number of bases of the
10 query sequence that are 5' and 3' of the subject sequence, which are not
matched/aligned, as a
percent of the total bases of the query sequence. Whether a nucleotide is
matched/aligned is
determined by results of the FASTDB sequence alignment. This percentage is
then subtracted
from the percent identity, calculated by the above FASTDB program using the
specified
parameters, to arrive at a final percent identity score. This corrected score
is what is used for the
purposes of the present invention. Only nucleotides outside the 5' and 3'
nucleotides of the
subject sequence, as displayed by the FASTDB alignment, which are not
matched/aligned with
the query sequence, are calculated for the purposes of manually adjusting the
percent identity
score.
For example, a 90 nucleotide subject sequence is aligned to a 100 nucleotide
query sequence to
determine percent identity. The deletions occur at the 5' end of the subject
sequence and
therefore, the FASTDB alignment does not show a matched/alignment of the first
10 nucleotides
at S' end. The 10 unpaired nucleotides represent 10% of the sequence (number
of nucleotides at
the 5' and 3' ends not matchedltotal number of nucleotides in the query
sequence) so 10% is
subtracted from the percent identity score calculated by the FASTDB program.
If the remaining
90 nucleotides were perfectly matched the final percent identity would be 90%.
In another
example, a 90 nucleotide subject sequence is compared with a 100 nucleotide
query sequence.
This time the deletions are internal deletions so that there are no
nucleotides on the 5' or 3' of the
subject sequence which are not matched/aligned with the query. In this case
the percent identity
calculated by FASTDB is not manually corrected. Once again, only nucleotides
5' and 3' of the
subject sequence which are not matched/aligned with the query sequence are
manually corrected
for. No other manual corrections are to made for the purposes of the present
invention.
COMPUTER RELATED EMBODIMENTS
The nucleotide sequences provided in SEQ ID NOS: 1-744, including ORF IDs and
corresponding ORFs, a representative fragment thereof, or a nucleotide
sequence at least 95%,
preferably at least 99% and most preferably at Least 99.9% identical to said
polynucleotide
sequences may be "provided" in a variety of mediums to facilitate use thereof.
As used herein,
"provided" refers to a manufacture, other than an isolated nucleic acid
molecule, which contains a
nucleotide sequence of the present invention. Such a manufacture provides a
large portion of the

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
11
T. pallidurn genome and parts thereof (e.g., a T. pallidum open reading frame
(ORF)) in a form
which allows a skilled artisan to examine the manufacture using means not
directly applicable to
examining the T. pallidum genome or a subset thereof as it exists in nature or
in purified form.
In one application of this embodiment, a nucleotide sequence of the present
invention can
be recorded on computer readable media. As used herein, "computer readable
media" refers to
any medium which can be read and accessed directly by a computer. Such media
include, but are
not limited to: magnetic storage media, such as floppy discs, hard disc
storage medium, and
magnetic tape; optical storage media such as CD- ROM; electrical storage media
such as RAM
and ROM; and hybrids of these categories, such as magnetic/optical storage
media. A skilled
artisan can readily appreciate how any of the presently known computer
readable mediums can be
used to create a manufacture comprising computer readable medium having
recorded thereon a
nucleotide sequence of the present invention. Likewise, it will be clear to
those of skill how
additional computer readable media that may be developed also can be used to
create analogous
manufactures having recorded thereon a nucleotide sequence of the present
invention.
As used herein, "recorded" refers to a process for storing information on
computer
readable medium. A skilled artisan can readily adopt any of the presently know
methods for
recording information on computer readable medium to generate manufactures
comprising the
nucleotide sequence information of the present invention.
A variety of data storage structures are available to a skilled artisan for
creating a
computer readable medium having recorded thereon a nucleotide sequence of the
present
invention. The choice of the data storage structure will generally be based on
the means chosen
to access the stored information. In addition, a variety of data processor
programs and formats
can be used to store the nucleotide sequence information of the present
invention on computer
readable medium. The sequence information can be represented in a word
processing text file,
formatted in commercially- available software such as WordPerfect and
Microsoft Word, or
represented in the form of an ASCII file, stored in a database application,
such as DB2, Sybase,
Oracle, or the like. A skilled artisan can readily adapt any number of data-
processor structuring
formats (e.g., text file or database) in order to obtain computer readable
medium having recorded
thereon the nucleotide sequence information of the present invention.
Computer software is publicly available which allows a skilled artisan to
access sequence
information provided in a computer readable medium. Thus, by providing in
computer readable
form the nucleotide sequences of SEQ ID NOS: 1-744, including ORF IDs and
corresponding
ORFs, a representative fragment thereof, or a nucleotide sequence at least
95%, preferably at
least 99% and most preferably at least 99.9°lo identical to said
polynucleotide sequences, the
present invention enables the skilled artisan routinely to access the provided
sequence information
for a wide variety of purposes.
The examples which follow demonstrate how software which implements the BLAST
(Altschul et al., J. Mol. Biol. 215:403-410 ( 1990)) and BLAZE (Brutlag et
al., Comp. Chem.
17:203-207 (1993)) search algorithms on a Sybase system was used to identify
open reading

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
12
frames (ORFs) within the T. pallidum genome which contain homology to ORFs or
proteins
from both T. pallidum and from other organisms. Among the ORFs discussed
herein ,are protein
encoding fragments of the T. pallidum genome useful in producing commercially
important
proteins, such as enzymes used in fermentation reactions and in the production
of commercially
useful metabolites.
The present invention further provides systems, particularly computer-based
systems,
which contain the sequence information described herein. Such systems are
designed to identify,
among other things, commercially important fragments of the T: pallidum
genome.
As used herein, "a computer-based system" refers to the hardware means,
software
means, and data storage means used to analyze the nucleotide sequence
information of the present
invention. The minimum hardware means of the computer-based systems of the
present
invention comprises a central processing unit (CPU), input means, output
means, and data
storage means. A skilled artisan can readily appreciate that any one of the
currently available
computer-based system are suitable for use in the present invention.
1 S As stated above, the computer-based systems of the present invention
comprise a data
storage means having stored therein a nucleotide sequence of the present
invention and the
necessary hardware means and software means for supporting and implementing a
search means.
As used herein, "data storage means" refers to memory which can store
nucleotide
sequence information of the present invention, or a memory access means which
can access
manufactures having recorded thereon the nucleotide sequence information of
the present
invention.
As used herein, "search means" refers to one or more programs which are
implemented
on the computer- based system to compare a target sequence or target
structural motif with the
sequence information stored within the data storage means. Search means are
used to identify
fragments or regions of the present genomic sequences which match a particular
target sequence
or target motif. A variety of known algorithms are disclosed publicly and a
variety of
commercially available software for conducting search means are and can be
used in the
computer-based systems of the present invention. Examples of such software
includes, but is
not limited to, MacPattern (EMBL), BLASTN and BLASTX (NCBIA). A skilled
artisan can
readily recognize that any one of the available algorithms or implementing
software packages for
conducting homology searches can be adapted for use in the present computer-
based systems.
As used herein, a "target sequence" can be any DNA or amino acid sequence of
six or
more nucleotides or two or more amino acids. A skilled artisan can readily
recognize that the
longer a target sequence is, the less likely a target sequence will be present
as a random
occurrence in the database. The most preferred sequence length of a target
sequence is from
about 10 to 100 amino acids or from about 30 to 300 nucleotide residues.
However, it is well
recognized that searches for commercially important fragments, such as
sequence fragments
involved in gene expression and protein processing, may be of shorter length.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
13
As used herein, "a target structural motif," or "target motif," refers to any
rationally
selected sequence or combination of sequences in which the sequences) are
chosen based on a
three-dimensional configuration which is formed upon the folding of the target
motif. There are a
variety of target motifs known in the art. Protein target motifs include, but
are not limited to,
enzymic active sites and signal sequences. Nucleic acid target motifs include,
but are not limited
to, promoter sequences, hairpin structures and inducible expression elements
(protein binding
sequences).
A variety of structural formats for the input and output means can be used to
input and
output the information in the computer-based systems of the present invention.
A preferred
format for an output means ranks fragments of the T. pallidum genomic
sequences possessing
varying degrees of homology to the target sequence or target motif. Such
presentation provides a
skilled artisan with a ranking of sequences which contain various amounts of
the target sequence
or target motif and identifies the degree of homology contained in the
identified fragment.
A variety of comparing means can be used to compare a target sequence or
target motif
with the data storage means to identify sequence fragments of the T. pallidum
genome. In the
present examples, implementing software which implement the BLAST and BLAZE
algorithms,
described in Altschul et al., J. Mol. Biol. 215: 403-410 (1990), is used to
identify open reading
frames within the T. pallidum genome. A skilled artisan can readily recognize
that any one of the
publicly available homology search programs can be used as the search means
for the computer-
based systems of the present invention. Of course, suitable proprietary
systems that may be
known to those of skill also may be employed in this regard.
Figure 1 provides a block diagram of a computer system illustrative of
embodiments of
this aspect of present invention. The computer system 102 includes a processor
106 connected to
a bus 104. Also connected to the bus 104 are a main memory 108 (preferably
implemented as
random access memory, RAM) and a variety of secondary storage devices 110,
such as a hard
drive 112 and a removable medium storage device 114. The removable medium
storage device
114 may represent, for example, a floppy disk drive, a CD-ROM drive, a
magnetic tape drive,
etc. A removable storage medium 116 (such as a floppy disk, a compact disk, a
magnetic tape,
etc.) containing control logic and/or data recorded therein may be inserted
into the removable
medium storage device 114. The computer system 102 includes appropriate
software for reading
the control logic and/or the data from the removable medium storage device
114, once it is
inserted into the removable medium storage device 114.
A nucleotide sequence of the present invention may be stored in a well known
manner in
the main memory 108, any of the secondary storage devices 110, and/or a
removable storage
medium 116. During execution, software for accessing and processing the
genomic sequence
(such as search tools, comparing tools, etc. ) reside in main memory 108, in
accordance with the
requirements and operating parameters of the operating system, the hardware
system and the
software program or programs.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
14
BIOCHEMICAL EMBODIMENTS
Other embodiments of the present invention are directed to isolated fragments
of the T.
pallidum genome. The fragments of the T. pallidum genome of the present
invention include, but
are not limited to fragments which encode peptides, hereinafter open reading
frames (ORFs),
fragments which modulate the expression of an operably linked ORF, hereinafter
expression
modulating fragments (EMFs) and fragments which can be used to diagnose the
presence of T.
pallidum in a sample, hereinafter diagnostic fragments (DFs).
As used herein, an "isolated nucleic acid molecule" or an "isolated fragment
of the T.
pallidum genome" refers to a nucleic acid molecule possessing a specific
nucleotide sequence
which has been subjected to purification means to reduce, from the
composition, the number of
compounds which are normally associated with the composition. Particularly,
the term refers to
the nucleic acid molecules having the sequences set out in SEQ ID NOS: 1-744,
to representative
fragments thereof as described above including ORF IDs and ORFs, to
polynucleotides at least
95%, preferably at least 96%, 97%, 98%, or 99% and especially preferably at
least 99.9%
identical in sequence thereto, also as set out above.
A variety of purification means can be used to generate the isolated fragments
of the
present invention. These include, but are not limited to methods which
separate constituents of a
solution based on charge, solubility, or size.
In one embodiment, T. pallidum DNA can be enzymatically sheared to produce
fragments
of 15-20 kb in length. These fragments can then be used to generate a T.
pallidum library by
inserting them into lambda clones as described in the Examples below. Primers
flanking, for
example, an ORF, such as those enumerated in the ORF IDs of Tables 1-3, can
then be generated
using nucleotide sequence information provided in SEQ ID NOS: 1-744. Well
known and
routine techniques of PCR cloning then can be used to isolate the ORF from the
lambda DNA
library or T. pallidum genomic DNA. Thus, given the availability of SEQ B7
NOS:l-744, the
information in Tables 1, 2 and 3, and the information that may be obtained
readily by analysis of
the sequences of SEQ ID NOS:1-744 using methods set out above, those of skill
will be enabled
by the present disclosure to isolate any ORF-containing or other nucleic acid
fragment of the
present invention.
The isolated nucleic acid molecules of the present invention include, but are
not limited to
single stranded and double stranded DNA, and single stranded RNA. For purposes
of
numbering and reference to polynucleotide and polypeptide sequences the entire
sequence of each
sequence of SEQ m NOS:1-744 is included with the first nucleotide being
position 1.
Therefore, for reference purposes the numbering used in the present invention
is that provided in
the sequence listing for SEQ ID NOS:1-744.
As used herein, an open reading frame (ORF), means a series of nucleotide
triplets
coding for annino acid residues without any termination codons and is a
sequence translatable into
protein. Further, unless specified, the term "ORF" for each ORF 1D is defined
by the termination

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
colon at the 3' end and the 5' most methionine colon, at the 5' end, in frame
with said 3'
termination colon. Unless specified, the term "ORF" also refers to a
particular polypeptide
sequence defined by the ORF polynucleotide sequence, wherein the N-terminus is
defined by the
5' most methionine colon in frame with the termination colon at the 3' end of
the ORF ID and
5 the C-terminus is defined by the last colon before the said 3' termination
colon. As used herein,
an ORF ID represents a sequence without any internal termination colons
flanked by termination
colons.
Tables 1, 2, and 3 list ORF IDs in the T. pallidum genomic contigs of the
present
invention that were identified as putative coding regions by the GeneMark
software using
10 organism-specific second-order Markov probability transition matrices. It
will be appreciated that
other criteria can be used, in accordance with well known analytical methods,
such as those
discussed herein, to generate more inclusive, more restrictive, or more
selective lists.
Table 1 sets out ORF IDs in the T. pallidum contigs of the present invention
that over a
continuous region of at least 50 bases are 95% or more identical (by BLAST
analysis) to a
15 nucleotide sequence available through GenBank in June, 1997.
Table 2 sets out ORF IDs in the T. pallidum contigs of the present invention
that are not
in Table 1 and match, with a BLASTP probability score of 0.01 or less, a
polypeptide sequence
available through GenBank in July, 1996.
Table 3 sets out ORF IDs in the T. pallidum contigs of the present invention
that do not
match significantly, by BLASTP analysis, a polypeptide sequence available
through GenBank in
July, 1996.
In each table, the first and second columns identify the ORF ID by,
respectively, contig
number and ORF ID number within the contig; the third column indicates the
first nucleotide of
the ORF ID, counting from the 5' end of the contig strand; and the fourth
column indicates the
last nucleotide of the ORF ID, counting from the S' end of the contig strand.
In Tables 1 and 2, column six, lists the Reference for the closest matching
sequence
available through GenBank. These reference numbers are the databases entry
numbers
commonly used by those of skill in the art, who will be familiar with their
denominators.
Descriptions of the nomenclature are available from the National Center for
Biotechnology
Information. Column seven in Tables 1 and 2 provides the gene name of the
matching
sequence; column eight provides the BLAST identity score from the comparison
of the ORF and
the homologous gene; and column nine indicates the length in nucleotides of
the highest scoring
segment pair identified by the BLAST identity analysis.
In Table 3, the last column, column six, indicates the length of each ORF ID
in amino
acid residues.
The concepts of percent identity and percent similarity of two polypeptide
sequences is
well understood in the art. For example, two polypeptides 10 amino acids in
length which differ
at three amino acid positions (e.g., at positions 1, 3 and 5) are said to have
a percent identity of
70%. However, the same two polypeptides would be deemed to have a percent
similarity of

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
16
80% if, for example at position 5, the amino acids moieties, although not
identical, were
"sinular" (i.e., possessed similar biochemical characteristics). Many programs
for analysis of
nucleotide or amino acid sequence similarity, such as FASTA and BLAST
specifically list percent
identity of a matching region as an output parameter. Thus, for instance,
Tables 1 and 2 herein
enumerate the percent identity of the highest scoring segment pair in each ORF
and its listed
relative. Further details concerning the algorithms and criteria used for
homology searches are
provided below and are described in the pertinent literature highlighted by
the citations provided
below.
It will be appreciated that other criteria can be used to generate more
inclusive and more
exclusive listings of the types set out in the tables. As those of skill will
appreciate, narrow and
broad searches both are useful. Thus, a skilled artisan can readily identify
ORFs in contigs of the
T. pallidum genome other than those specified for Tables 1-3, such as ORFs
which are
overlapping or encoded by the opposite strand of an identified ORF in addition
to those
ascertainable using the computer-based systems of the present invention.
As used herein, an "expression modulating fragment," EMF, means a series of
nucleotide
molecules which modulates the expression of an operably linked ORF or EMF.
As used herein, a sequence is said to "modulate the expression of an operably
linked
sequence" when the expression of the sequence is altered by the presence of
the EMF. EMFs
include, but are not limited to, promoters, and promoter modulating sequences
(inducible
elements). One class of EMFs are fragments which induce the expression or an
operably linked
ORF in response to a specific regulatory factor or physiological event.
EMF sequences can be identified within the contigs of the T. pallidum genome
by their
proximity to the ORF IDs provided in Tables 1-3 and ORFs within each ORF ID.
An intergenic
segment, or a fragment of the intergenic segment, from about 10 to 200
nucleotides in length,
taken from any one of the ORFs of Tables 1-3 will modulate the expression of
an operably linked
ORF in a fashion similar to that found with the naturally linked ORF sequence.
As used herein,
an "intergenic segment" refers to fragments of the T. pallidum genome which
are between two
ORF(s) herein described. EMFs also can be identified using known EMFs as a
target sequence
or target motif in the computer-based systems of the present invention.
Further, the two methods
can be combined and used together.
The presence and activity of an EMF can be confirmed using an EMF trap vector.
An
EMF trap vector contains a cloning site linked to a marker sequence. A marker
sequence encodes
an identifiable phenotype, such as antibiotic resistance or a complementing
nutrition auxotrophic
factor, which can be identified or assayed when the EMF trap vector is placed
within an
appropriate host under appropriate conditions. As described above, a EMF will
modulate the
expression of an operably linked marker sequence. A more detailed discussion
of various marker
sequences is provided below. A sequence which is suspected as being an EMF is
cloned in all
three reading frames in one or more restriction sites upstream from the marker
sequence in the
EMF trap vector. The vector is then transformed into an appropriate host using
known

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/I3041
17
procedures and the phenotype of the transformed host in examined under
appropriate conditions.
As described above, an EMF will modulate the expression of an operably linked
marker
sequence.
As used herein, a "diagnostic fragment," DF, means a series of nucleotide
molecules
which selectively hybridize to T. pallidum sequences. DFs can be readily
identified by
identifying unique sequences within contigs of the T. pallidum genome, such as
by using well-
known computer analysis software, and by generating and testing probes or
amplification
primers consisting of the DF sequence in an appropriate diagnostic format
which determines
amplification or hybridization selectivity.
The sequences falling within the scope of the present invention are not
limited to the
specific sequences herein described, but also include allelic and species
variations thereof. Allelic
and species variations can be routinely determined by comparing the
polynucleotide sequences
provided in SEQ ID NOS:1-744, ORF IDs and ORFs within, a representative
fragment thereof,
or a nucleotide sequence at least 99% and preferably 99.9% identical to said
polynucleotide
sequences, with a sequence from another isolate of the same species.
Furthermore, to
accommodate codon variability, the invention includes nucleic acid molecules
coding for the same
amino acid sequences as do the specific ORFs disclosed herein. In other words,
in the coding
region of an ORF, substitution of one codon for another which encodes the same
amino acid is
expressly contemplated.
Any specific sequence disclosed herein can be readily screened for errors by
resequencing
a particular fragment, such as an ORF, in both directions (i.e., sequence both
strands).
Alternatively, error screening can be performed by sequencing corresponding
polynucleotides of
T. pallidum origin isolated by using part or all of the fragments in question
as a probe or primer.
Each of the ORFs of the T. pallidum genome within the ORF IDs of Tables 1, 2
and 3,
and the EMFs found 5' to the ORFs, can be used as polynucleotide reagents in
numerous ways.
For example, the sequences can be used as diagnostic probes or diagnostic
amplification primers
to detect the presence of a specific microbe in a sample, particularly T.
palluium. Especially
preferred in this regard are ORFs such as those of Table 3, which do not match
previously
characterized sequences from other organisms and thus are most likely to be
highly selective for
T. pallidum. Also particularly preferred are ORFs that can be used to
distinguish between strains
of T. pallidum, particularly those that distinguish medically important
strain, such as drug-
resistant strains.
In addition, the fragments of the present invention, as broadly described, can
be used to
control gene expression through triple helix formation or antisense DNA or
RNA, both of which
methods are based on the binding of a polynucleotide sequence to DNA or RNA.
Triple helix-
formation optimally results in a shut-off of RNA transcription from DNA, while
antisense RNA
hybridization blocks translation of an mRNA molecule into polypepdde.
Information from the
sequences of the present invention can be used to design antisense and triple
helix-forming
oligonucleotides. Polynucleotides suitable for use in these methods are
usually 20 to 40 bases in

CA 02296814 1999-12-21
WO 98/59034 PCT/ITS98/13041
18
length and are designed to be complementary to a region of the gene involved
in transcription, for
triple-helix formation, or to the mRNA itself, for antisense inhibition. Both
techniques have been
demonstrated to be effective in model systems, and the requisite techniques
are well known and
involve routine procedures. Triple helix techniques are discussed in, for
example, Lee et al.,
Nucl. Acids Res. 6:3073 (1979); Cooney et al., Science 241:456 (1988); and
Dervan et al.,
Science 251:1360 ( 1991 }. Antisense techniques in general are discussed in,
for instance, Okano,
J. Neurochem. 56:560 (1991) and Oligodeoxynucleotides as Antisense Inhibitors
of Gene
Expression, CRC Press, Boca Raton, FL ( 1988)).
The present invention further provides recombinant constructs comprising one
or more
fragments of the T. pallidum genomic fragments and contigs of the present
invention. Certain
preferred recombinant constructs of the present invention comprise a vector,
such as a plasmid or
viral vector, into which a fragment of the T. pallidum genome has been
inserted, in a forward or
reverse orientation. In the case of a vector comprising one of the ORFs of the
present invention,
the vector may further comprise regulatory sequences, including for example, a
promoter,
operably linked to the ORF. For vectors comprising the EMFs of the present
invention, the
vector may further comprise a marker sequence or heterologous ORF operably
linked to the
EMF.
Large numbers of suitable vectors and promoters are known to those of skill in
the art and
are commercially available for generating the recombinant constructs of the
present invention.
The following vectors are provided by way of example. Useful bacterial vectors
include
phagescript, PsiXl74, pBluescript SK, pBS KS, pNHBa, pNHl6a, pNHl8a, pNH46a
(available from Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRITS
(available from
Pharmacia). Useful eukaryotic vectors include pWLneo, pSV2cat, pOG44, pXTI,
pSG
(available from Stratagene) pSVK3, pBPV, pMSG, pSVL (available from
Pharmacia).
Promoter regions can be selected from any desired gene using CAT
(chloramphenicol
transferase) vectors or other vectors with selectable markers. Two appropriate
vectors are
pKK232-8 and pCM7. Particular named bacterial promoters include IacI, lacZ,
T3, T7, gpt,
lambda PR, and trc. Eukaryotic promoters include CMV immediate early, HSV
thymidine
kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-
I. Selection of
the appropriate vector and promoter is well within the level of ordinary skill
in the art.
The present invention further provides host cells containing any one of the
isolated
fragments of the T. pallidum genomic fragments and contigs of the present
invention, wherein
the fragment has been introduced into the host cell using known methods. The
host cell can be
a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic
host cell, such as a
yeast cell, or a procaryotic cell, such as a bacterial cell.
A polynucleotide of the present invention, such as a recombinant construct
comprising an
ORF of the present invention, may be introduced into the host by a variety of
well established
techniques that are standard in the art, such as calcium phosphate
transfection, DEAF, dextran

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
19
mediated transfection and electroporation, which are described in, for
instance, Davis, L. et al.,
BASIC METHODS IN MOLECULAR BIOLOGY ( 1986).
A host cell containing one of the fragments of the T. pallidum genomic
fragments and
contigs of the present invention, can be used in conventional manners to
produce the gene
product encoded by the isolated fragment (in the case of an ORF) or can be
used to produce a
heterologous protein under the control of the EMF.
The present invention further provides isolated polypeptides encoded by the
nucleic acid
fragments of the present invention or by degenerate variants of the nucleic
acid fragments of the
present invention. By "degenerate variant" is intended nucleotide fragments
which differ from a
nucleic acid fragment of the present invention (e.g., an OR>~ by nucleotide
sequence but, due to
the degeneracy of the Genetic Code, encode an identical polypeptide sequence.
Preferred nucleic acid fragments of the present invention are the ORF IDs
depicted in
Tables 2 and 3 and the ORFs within which encode proteins.
A variety of methodologies known in the art can be utilized to obtain any one
of the
isolated polypeptides or proteins of the present invention. At the simplest
level, the amino acid
sequence can be synthesized using commercially available peptide synthesizers.
This is
particularly useful in producing small peptides and fragments of larger
polypeptides. Such short
fragments as may be obtained most readily by synthesis are useful, for
example, in generating
antibodies against the native polypeptide, as discussed further below.
In an alternative method, the polypeptide or protein is purified from
bacterial cells which
naturally produce the polypeptide or protein. One skilled in the art can
readily employ well-
known methods for isolating polypeptides and proteins to isolate and purify
poiypeptides or
proteins of the present invention produced naturally by a bacterial strain, or
by other methods.
Methods for isolation and purification that can be employed in this regard
include, but are not
limited to, immunochromatography, HPLC, size-exclusion chromatography, ion-
exchange
chromatography, and immuno-affinity chromatography.
The polypeptides and proteins of the present invention also can be purified
from cells
which have been altered to express the desired polypeptide or protein. As used
herein, a cell is
said to be altered to express a desired polypeptide or protein when the cell,
through genetic
manipulation, is made to produce a polypeptide or protein which it normally
does not produce or
which the cell normaily produces at a lower level. Those skilled in the art
can readily adapt
procedures for introducing and expressing either recombinant or synthetic
sequences into
eukaryotic or prokaryotic cells in order to generate a cell which produces one
of the polypeptides
or proteins of the present invention.
The polypepddes of the present invention are preferably provided in an
isolated form, and
preferably are substantially purified. A recombinantly produced version of the
T. pallidum
polypeptide can be substantially purified by the one-step method described by
Smith et al. ( 1988)
Gene 67:31-40. Polypeptides of the invention also can be purified from natural
or recombinant
sources using antibodies directed against the polypeptides of the invention in
methods which are

CA 02296814 1999-12-21
WO 98/59034 PCT/US98113041
well known in the art of protein purification.
The invention further provides for isolated T. pallidum polypeptides
comprising an amino
acid sequence selected from the group including: (a) the amino acid sequence
of a full-length T.
pallidum polypeptide having the complete amino acid sequence from the first
methionine codon to
5 the termination codon of each sequence listed in SEQ ID NOS:1-744, wherein
said termination
codon is at the end of each SEQ ID NO: and said first methionine is the first
methionine in frame
with said termination codon; and (b) the amino acid sequence of a full-length
T. pallidum
polypeptide having the complete amino acid sequence in (a) excepting the N-
terminal methionine.
The polypeptides of the present invention also include polypepddes having an
amino acid
10 sequence at least 80% identical, more preferably at least 90% identical,
and still more preferably
95%, 9b%, 97%, 98% or 99% identical to those described in (a) and (b) above.
The present invention is further directed to polynucleotides encoding portions
or
fragments of the amino acid sequences described herein as well as to portions
or fragments of the
isolated amino acid sequences described herein. Fragments include portions of
the amino acid
15 sequences described herein at least 5 contiguous amino acid in length and
selected from any two
integers, one of which representing an N-terminal position and another
representing a C-terminal
position. The initiation codon of the ORFs of the present invention is
position 1. The initiation
codon {positon 1 ) for purposes of the present invention is the first
methionine codon of each
ORF ID which is in frame with the termination codon at the end of each said
sequence. Every
20 combination of a N-terminal and C-terminal position that a fragment at
least S contiguous amino
acid residues in length could occupy, on any given ORF is included in the
invention, i.e., from
initiation codon up to the termination codon. "At least" means a fragment may
be 5 contiguous
amino acid residues in length or any integer between 5 and the number of
residues in an ORF,
minus 1. Therefore, included in the invention are contiguous fragments
specified by any N-
terminal and C-terminal positions of amino acid sequence set forth in SEQ ID
NOS:1-744 or
Tables 1-3 wherein the contiguous fragment is any integer between 5 and the
number of residues
in an ORF minus 1.
Further, the invention includes polypeptides comprising fragments specified by
size, in
amino acid residues, rather than by N-terminal and C-terminal positions. The
invention includes
any fragment size, in contiguous amino acid residues, selected from integers
between 5 and the
number of residues in an ORF, minus 1. Preferred sizes of contiguous
polypeptide fragments
include about 5 amino acid residues, about 10 amino acid residues, about 20
amino acid residues,
about 30 amino acid residues, about 40 amino acid residues, about 50 amino
acid residues, about
100 amino acid residues, about 200 amino acid residues, about 300 amino acid
residues, and
about 400 amino acid residues. The preferred sizes are, of course, meant to
exemplify, not limit,
the present invention as all size fragments representing any integer between 5
and the number of
residues in a full length sequence minus 1 are included in the invention. The
present invention
also provides for the exclusion of any fragments specified by N-terminal and C-
terminal
positions or by size in amino acid residues as described above. Any number of
fragments

CA 02296814 1999-12-21
WO 98/59034 PC'TNS98/13041
21
specified by N-terminal and C-terminal positions or by size in amino acid
residues as described
above may be excluded.
The above fragments need not be active since they would be useful, for
example, in
immunoassays, in epitope mapping, epitope tagging, to generate antibodies to a
particular portion
of the protein, as vaccines, and as molecular weight markers.
Further polypeptides of the present invention include polypeptides which have
at least
90% similarity, more preferably at least 95% similarity, and still more
preferably at least 96%,
97%, 98% or 99% similarity to those described above.
A further embodiment of the invention relates to a polypeptide which comprises
the amino
acid sequence of a T. pallidum polypeptide having an amino acid sequence which
contains at least
one conservative amino acid substitution, but not more than 50 conservative
amino acid
substitutions, not more than 40 conservative amino acid substitutions, not
more than 30
conservative amino acid substitutions, and not more than 20 conservative amino
acid
substitutions. Also provided are polypeptides which comprise the amino acid
sequence of a T.
pallidum polypeptide, having at least one, but not more than 10, 9, 8, 7, 6,
5, 4, 3, 2 or 1
conservative amino acid substitutions.
By a polypeptide having an amino acid sequence at least, for example, 95%
"identical" to
a query amino acid sequence of the present invention, it is intended that the
amino acid sequence
of the subject polypeptide is identical to the query sequence except that the
subject polypeptide
sequence may include up to five amino acid alterations per each 100 amino
acids of the query
amino acid sequence. In other words, to obtain a polypeptide having an amino
acid sequence at
least 95% identical to a query amino acid sequence, up to 5% of the amino acid
residues in the
subject sequence may be inserted, deleted, (indels) or substituted with
another amino acid. These
alterations of the reference sequence may occur at the amino or carboxy
terminal positions of the
reference amino acid sequence or anywhere between those terminal positions,
interspersed either
individually among residues in the reference sequence.
As a practical matter, whether any particular polypeptide is at least 90%,
95%, 96%,
97%, 98% or 99% identical to the ORF amino acid sequences encoded by the
sequences of SEQ
)D NOS:1-744, as described hererin, can be determined conventionally using
known computer
programs. A preferred method for determining the best overall match between a
query sequence
(a sequence of the present invention) and a subject sequence, also referred to
as a global sequence
alignment, can be determined using the FASTDB computer program based on the
algorithm of
Brutlag et al., ( 1990) Comp. App. Biosci. 6:237-245. In a sequence alignment
the query and
subject sequences are both amino acid sequences. The result of said global
sequence alignment is
in percent identity. Preferred parameters used in a FASTDB amino acid
alignment are:
Matrix=PAM 0, k-tuple=2, Mismatch Penalty=1, Joining Penalty=20, Randomization
Group
Length=0, Cutoff Score=1, Window Size=sequence length, Gap Penalty=5, Gap Size
Penalty=0.05, Window Size=500 or the length of the subject amino acid
sequence, whichever is
shorter.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
22
If the subject sequence is shorter than the query sequence due to N- or C-
terminal
deletions, not because of internal deletions, the results, in percent
identity, must be manually
corrected. This is because the FASTDB program does not account for N- and C-
terminal
truncations of the subject sequence when calculating global percent identity.
For subject
sequences truncated at the N- and C-termini, relative to the query sequence,
the percent identity is
corrected by calculating the number of residues of the query sequence that are
N- and C-tern>inal
of the subject sequence, which are not matchedlaligned with a corresponding
subject residue, as a
percent of the total bases of the query sequence. Whether a residue is
matched/aligned is
determined by results of the FASTDB sequence alignment. This percentage is
then subtracted
from the percent identity, calculated by the above FASTDB program using the
specified
parameters, to arnve at a final percent identity score. This final percent
identity score is what is
used for the purposes of the present invention. Only residues to the N- and C-
termini of the
subject sequence, which are not matched/aligned with the query sequence, are
considered for the
purposes of manually adjusting the percent identity score. That is, only query
amino acid
residues outside the farthest N- and C-terminal residues of the subject
sequence.
For example, a 90 amino acid residue subject sequence is aligned with a 100
residue
query sequence to determine percent identity. The deletion occurs at the N-
terminus of the
subject sequence and therefore, the FASTDB alignment does not match/align with
the first 10
residues at the N-terminus. The 10 unpaired residues represent 10% of the
sequence (number of
residues at the N- and C- termini not matched/total number of residues in the
query sequence) so
10% is subtracted from the percent identity score calculated by the FASTDB
program. If the
remaining 90 residues were perfectly matched the final percent identity would
be 90%. In
another example, a 90 residue subject sequence is compared with a 100 residue
query sequence.
This time the deletions are internal so there are no residues at the N- or C-
termini of the subject
sequence which are not matched/aligned with the query. In this case the
percent identity
calculated by FASTDB is not manually corrected. Once again, only residue
positions outside the
N- and C-terminal ends of the subject sequence, as displayed in the FASTDB
alignment, which
are not matched/aligned with the query sequence are manually corrected. No
other manual
corrections are to made for the purposes of the present invention.
The above polypeptide sequences are included irrespective of whether they have
their
normal biological activity. This is because even where a particular
polypeptide molecule does not
have biological activity, one of skill in the art would still know how to use
the polypeptide, for
instance, as a vaccine or to generate antibodies. Other uses of the
polypeptides of the present
invention that do not have T. pallidum activity include, inter alia, as
epitope tags, in epitope
mapping, and as molecular weight markers on SDS-PAGE gels or on molecular
sieve gel
filtration columns using methods known to those of skill in the art.
As described below, the polypeptides of the present invention can also be used
to raise polyclonal
and monoclonal antibodies, which are useful in assays for detecting T.
pallidum protein
expression or as agonists and antagonists capable of enhancing or inhibiting
T. pallidum protein

CA 02296814 1999-12-21
WO 98!59034 PCT/US98/13041
23
function. Further, such polypeptides can be used in the yeast two-hybrid
system to "capture" T.
pallidum protein binding proteins which are also candidate agonists and
antagonists according to
the present invention. See, e.g., Fields et al. (1989) Nature 340:245-246.
Any host/vector system can be used to express one or more of the ORFs of the
present
S invention. These include, but are not limited to, eukaryotic hosts such as
HeLa cells, CV-1 cell,
COS cells, and Sf9 cells, as well as prokaryotic host such as E. toll and B.
subtilis. The most
preferred cells are those which do not normally express the particular
polypeptide or protein or
which expresses the polypeptide or protein at low natural level.
"Recombinant," as used herein, means that a polypeptide or protein is derived
from
recombinant (e.g., microbial or mammalian) expression systems. "Microbial"
refers to
recombinant polypeptides or proteins made in bacterial or fungal (e.g., yeast)
expression
systems. As a product, "recombinant microbial"defines a polypeptide or protein
essentially free
of native endogenous substances and unaccompanied by associated native
glycosylation.
PoIypeptides or proteins expressed in most bacterial cultures, e.g., E. toll,
will be free of
glycosylation modifications; polypeptides or proteins expressed in yeast will
have a glycosylation
pattern different from that expressed in mammalian cells.
"Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides.
Generally,
DNA segments encoding the polypeptides and proteins provided by this invention
are assembled
from fragments of the T. pallidum genome and short oligonucleotide linkers, or
from a series of
oligonucleotides, to provide a synthetic gene which is capable of being
expressed in a
recombinant transcriptional unit comprising regulatory elements derived from a
microbial or viral
operon.
Recombinant expression vehicle or vector" refers to a plasmid or phage or
virus or
vector, for expressing a polypeptide from a DNA (RNA) sequence. The expression
vehicle can
comprise a transcriptional unit comprising an assembly of (1) a genetic
regulatory elements
necessary for gene expression in the host, including elements required to
initiate and maintain
transcription at a level sufficient for suitable expression of the desired
polypeptide, including, for
example, promoters and, where necessary, an enhancer and a polyadenylation
signal; (2) a
structural or coding sequence which is transcribed into mRNA and translated
into protein, and (3)
appropriate signals to initiate translation at the beginning of the desired
coding region and
terminate translation at its end. Structural units intended for use in yeast
or eukaryotic expression
systems preferably include a leader sequence enabling extracellular secretion
of translated protein
by a host cell. Alternatively, where recombinant protein is expressed without
a leader or
transport sequence, it may include an N-terminal methionine residue. This
residue may or may
not be subsequently cleaved from the expressed recombinant protein to provide
a final product.
"Recombinant expression system" means host cells which have stably integrated
a
recombinant transcripdonal unit into chromosomal DNA or carry the recombinant
transcriptional
unit extra chromosomally. The cells can be prokaryotic or eukaryotic.
Recombinant expression

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
24
systems as defined herein will express heterologous polypeptides or proteins
upon induction of
the regulatory elements linked to the DNA segment or synthetic gene to be
expressed.
Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other
cells under
the control of appropriate promoters. Cell-free translation systems can also
be employed to
produce such proteins using RNAs derived from the DNA constructs of the
present invention.
Appropriate cloning and expression vectors for use with prokaryotic and
eukaryotic hosts are
described in Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd
Edition, Cold
Spring Harbor Laboratory Press, Cold Spring Harbor, New York ( 1989), the
disclosure of
which is hereby incorporated by reference in its entirety.
Generally, recombinant expression vectors will include origins of replication
and
selectable markers permitting transformation of the host cell, e.g., the
ampicillin resistance gene
of E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly
expressed gene to
direct transcription of a downstream structural sequence. Such promoters can
be derived from
operons encoding glycolytic enzymes such as 3- phosphoglycerate kinase (PGK),
alpha-factor,
acid phosphatase, or heat shock proteins, among others. The heterologous
structural sequence is
assembled in appropriate phase with translation initiation and ternunation
sequences, and
preferably, a leader sequence capable of directing secretion of translated
protein into the
periplasmic space or extracellular medium. Optionally, the heterologous
sequence can encode a
fusion protein including an N-terminal identification peptide imparting
desired characteristics,
e.g., stabilization or simplified purification of expressed recombinant
product.
Useful expression vectors for bacterial use are constructed by inserting a
structural DNA
sequence encoding a desired protein together with suitable translation
initiation and termination
signals in operable reading phase with a functional promoter. The vector will
comprise one or
more phenotypic selectable markers and an origin of replication to ensure
maintenance of the
vector and, when desirable, provide amplification within the host.
Suitable prokaryotic hosts for transformation include strains of E. coli, B.
subtilis,
Salmonella typhimurium and various species within the genera Pseudomonas and
Streptomyces.
Others may, also be employed as a matter of choice.
As a representative but non-limiting example, useful expression vectors for
bacterial use
can comprise a selectable marker and bacterial origin of replication derived
from commercially
available plasmids comprising genetic elements of the well known cloning
vector pBR322
(ATCC 37017). Such commercial vectors include, for example, pKK223-3
(available form
Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (available from Promega
Biotec,
Madison, WI, USA). These pBR322 "backbone" sections are combined with an
appropriate
promoter and the structural sequence to be expressed.
Following transformation of a suitable host strain and growth of the host
strain to an
appropriate cell density, the selected promoter, where it is inducible, is
derepressed or induced by
appropriate means (e.g., temperature shift or chemical induction) and cells
are cultured for an
additional period to provide for expression of the induced gene product.
Thereafter cells are

CA 02296814 1999-12-21
WO 98/59034 PCTNS98/13041
typically harvested, generally by centrifugation, disrupted to release
expressed protein, generally
by physical or chemical means, and the resulting crude extract is retained nor
rurther purification.
Various mammalian cell culture systems can also be employed to express
recombinant
protein. Examples of mammalian expression systems include the COS-7 lines of
monkey kidney
fibroblasts, described in Gluzman, Cell 23:175 ( 1981 ), and other cell lines
capable of expressing
a compatible vector, for example, the C 127, 3T3, CHO, HeLa and BHK cell
lines.
Mammalian expression vectors will comprise an origin of replication, a
suitable promoter
and enhancer, and also any necessary ribosome binding sites, polyadenylation
site, splice donor
and acceptor sites, transcriptional termination sequences, and 5 flanking
nontranscribed
10 sequences. DNA sequences derived from the SV40 viral genome, for example,
SV40 origin,
early promoter, enhancer, splice, and polyadenylation sites may be used to
provide the required
nontranscribed genetic elements.
Recombinant polypeptides and proteins produced in bacterial culture is usually
isolated by
initial extraction from cell pellets, followed by one or more salting-out,
aqueous ion exchange or
15 size exclusion chromatography steps. Microbial cells employed in expression
of proteins can be
disrupted by any convenient method, including freeze-thaw cycling, sonication,
mechanical
disruption, or use of cell lysing agents. Protein refolding steps can be used,
as necessary, in
completing configuration of the mature protein. Finally, high performance
liquid
chromatography (HPLC) can be employed for final purification steps.
20 The present invention further includes isolated polypeptides, proteins and
nucleic acid
molecules which are substantially equivalent to those herein described. As
used herein,
substantially equivalent can refer both to nucleic acid and amino acid
sequences, for example a
mutant sequence, that varies from a reference sequence by one or more
substitutions, deletions,
or additions, the net effect of which does not result in an adverse functional
dissimilarity between
25 reference and subject sequences. For purposes of the present invention,
sequences having
equivalent biological activity, and equivalent expression characteristics are
considered
substantially equivalent. For purposes of determining equivalence, truncation
of the mature
sequence should be disregarded.
The invention further provides methods of obtaining homologs from other
strains of T.
pallidum, of the fragments of the T. pallidum genome of the present invention
and homologs of
the proteins encoded by the ORFs of the present invention. As used herein, a
sequence or
protein of T. pallidum is defined as a homolog of a fragment of the T.
pallidum fragments or
contigs or a protein encoded by one of the ORFs of the present invention, if
it shares significant
homology to one of the fragments of the T. pallidum genome of the present
invention or a protein
encoded by one of the ORFs of the present invention. Specifically, by using
the sequence
disclosed herein as a probe or as primers, and techniques such as PCR cloning
and colony/plaque
hybridization, one skilled in the art can obtain homologs.
As used herein, two nucleic acid molecules or proteins are said to "share
significant
homology" if the two contain regions which possess greater than 85°Xv
sequence (amino acid or

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
26
nucleic acid) homology. Preferred homologs in this regard are those with more
than 90%
homology. Especially preferred are those with 93% or more homology. Among
especially
preferred homologs those with 95% or more homology are particularly preferred.
Very
particularly preferred among these are those with 97% and even more
particularly preferred
S among those are homologs with 99% or more homology. The most preferred
homologs among
these are those with 99.9% homology or more. It will be understood that, among
measures of
homology, identity is particularly preferred in this regard.
Region specific primers or probes derived from the nucleotide sequence
provided in SEQ
ID NOS: 1-744 or from a nucleotide sequence at Ieast 95%, particularly at
least 99%, especially
at least 99.5% identical to a sequence of SEQ ID NOS: 1-744 can be used to
prime DNA
synthesis and PCR amplification, as well as to identify colonies containing
cloned DNA encoding
a homolog. Methods suitable to this aspect of the present invention are well
known and have
been described in great detail in many publications such as, for example,
Innis et al., PCR
Protocols, Academic Press, San Diego, CA (1990)).
When using primers derived from SEQ ID NOS: 1-744 or from a nucleotide
sequence
having an aforementioned identity to a sequence of SEQ ID NOS:1-744, one
skilled in the art will
recognize that by employing high stringency conditions (e.g., annealing at 50-
60°C in 6X SSPC
and 50% formamide, and washing at 50- 65°C in O.SX SSPC) only sequences
which are greater
than 75% homologous to the primer will be amplified. By employing lower
stringency
conditions (e.g., hybridizing at 35-37°C in SX SSPC and 40-45%
formamide, and washing at
42°C in O.SX SSPC), sequences which are greater than 40-SO% homologous
to the primer will
also be amplified.
When using DNA probes derived from SEQ ID NOS:1-744, or from a nucleotide
sequence having an aforementioned identity to a sequence of SEQ ID NOS: 1-744
, for
colony/plaque hybridization, one skilled in the art will recognize that by
employing high
stringency conditions (e.g., hybridizing at SO- 65°C in SX SSPC and 50%
formamide, and
washing at 50- 65°C in O.SX SSPC), sequences having regions which are
greater than 90%
homologous to the probe can be obtained, and that by employing lower
stringency conditions
(e.g., hybridizing at 35-37°C in SX SSPC and 40-45% formamide, and
washing at 42°C in O.SX
SSPC), sequences having regions which are greater than 35-45% homologous to
the probe will
be obtained.
Any organism can be used as the source for homologs of the present invention
so long as
the organism naturally expresses such a protein or contains genes encoding the
same. The most
preferred organism for isolating homologs are bacteria which are closely
related to T. pallidum.
ILLUSTRATIVE USES OF COMPOSITIONS
OF THE INVENTION
Each ORF corresponding to the ORF IDs provided in Tables 1 and 2 is identified
with a
function by homology to a known gene or polypeptide. As a result, one skilled
in the art can use

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
27
the polypeptides of the present invention for commercial, therapeutic and
industrial purposes
consistent with the type of putative identification of the poiypeptide. Such
identifications permit
one skilled in the art to use the T. pallidum ORFs in a manner similar to the
known type of
sequences for which the identification is made; for example, to ferment a
particular sugar source
or to produce a particular metabolite. A variety of reviews illustrative of
this aspect of the
invention are available, including the following reviews on the industrial use
of enzymes, for
example, BIOCHEMICAL ENGINEERING AND BIOTECHNOLOGY HANDBOOK, 2nd
Ed., MacMillan Publications, Ltd. NY (1991) and BIOCATALYSTS IN ORGANIC
SYNTI~SES, Tramper et al., Eds., Elsevier Science Publishers, Amsterdam, The
Netherlands
(1985). A variety of exemplary uses that illustrate this and similar aspects
of the present
invention are discussed below.
1. Biosynthetic Enzymes
Open reading frames encoding proteins involved in mediating the catalytic
reactions
involved in intermediary and macromolecular metabolism, the biosynthesis of
small molecules,
cellular processes and other functions includes enzymes involved in the
degradation of the
intermediary products of metabolism, enzymes involved in central intermediary
metabolism,
enzymes involved in respiration, both aerobic and anaerobic, enzymes involved
in fermentation,
enzymes involved in ATP proton motor force conversion, enzymes involved in
broad regulatory
function, enzymes involved in amino acid synthesis, enzymes involved in
nucleotide synthesis,
enzymes involved in cofactor and vitamin synthesis, can be used for industrial
biosynthesis.
The various metabolic pathways present in T. pallidum can be identified based
on
absolute nutritional requirements as well as by examining the various enzymes
identified in Table
1-3 and SEQ ID NOS:1-744.
Of particular interest are polypeptides involved in the degradation of
intermediary
metabolites as well as non-macromolecular metabolism. Such enzymes include
amylases,
glucose oxidases, and catalase.
Proteolytic enzymes are another class of commercially important enzymes.
Proteolytic
enzymes find use in a number of industrial processes including the processing
of flax and other
vegetable fibers, in the extraction, clarification and depectinization of
fruit juices, in the extraction
of vegetables' oil and in the maceration of fruits and vegetables to give
unicellular fruits. A
detailed review of the proteolytic enzymes used in the food industry is
provided in Rombouts et
al., Symbiosis 21: 79 ( 1986) and Voragen et al. in Biocatalysts In
Agricultural Biotechnology,
Whitaker et al., Eds., American Cycernical Society Symposium Series 389:93 (
1989) .
The metabolism of sugars is an important aspect of the primary metabolism of
T.
pallidum. Enzymes involved in the degradation of sugars, such as,
particularly, glucose,
galactose, fructose and xylose, can be used in industrial fermentation. Some
of the important
sugar transforming enzymes, from a commercial viewpoint, include sugar
isomerases such as
glucose isomerase. Other metabolic enzymes have found commercial use such as
glucose

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
28
oxidases which produces ketogulonic acid (KGA). KGA is an intermediate in the
commercial
production of ascorbic acid using the Reichstein's procedure, as described in
Krueger et al.,
Biotechnology 6(A), Rhine et al., Eds., Verlag Press, Weinheim, Germany
(1984).
Glucose oxidase (GOD) is commercially available and has been used in purified
form as
well as in an immobilized form for the deoxygenation of beer. See, for
instance, Hartmeir et al.,
Biotechnology Letters 1:21 ( 1979). The most important application of GOD is
the industrial
scale fermentation of gluconic acid. Market for gluconic acids which are used
in the detergent,
textile, leather, photographic, pharmaceutical, food, feed and concrete
industry, as described, for
example, in Bigelis et al., beginning on page 357 in GENE MANIPULATIONS AND
FUNGI;
Benett et al., Eds., Academic Press, New York ( 1985). In addition to
industrial applications,
GOD has found applications in medicine for quantitative determination of
glucose in body fluids
recently in biotechnology for analyzing syrups from starch and cellulose
hydrosylates. This
application is described in Owusu et al., Biochem. et Biophysica. Acta. 872:
83 (1986), for
instance.
The main sweetener used in the world today is sugar which comes from sugar
beets and
sugar cane. In the field of industrial enzymes, the glucose isomerase process
shows the largest
expansion in the market today. Initially, soluble enzymes were used and later
immobilized
enzymes were developed (Krueger et al., Biotechnology, The Textbook of
Industrial
Microbiology, Sinauer Associated Incorporated, Sunderland, Massachusetts
(1990)). Today, the
use of glucose- produced high fructose syrups is by far the largest industrial
business using
immobilized enzymes. A review of the industrial use of these enzymes is
provided by
Jorgensen, Starch 40:307 ( 1988).
Proteinases, such as alkaline serine proteinases, are used as detergent
additives and thus
represent one of the largest volumes of microbial enzymes used in the
industrial sector. Because
of their industrial importance, there is a large body of published and
unpublished information
regarding the use of these enzymes in industrial processes. (See Faultman et
al., Acid Proteases
Structure Function and Biology, Tang, J., ed., Plenum Press, New York (1977)
and Godfrey et
al., Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner et
al., Report
Industrial Enzymes by 1990, Hel Hepner & Associates, London ( 1986)).
Another class of commercially usable proteins of the present invention are the
microbial
lipases, described by, for instance, Macrae et al., Philosophical Transactions
of the Chiral
Society of London 310:227 ( 1985) and Poserke, Journal of the American Oil
Chemist Society
61:1758 (1984). A major use of lipases is in the fat and oil industry for the
production of neutral
glycerides using lipase catalyzed inter-esterification of readily available
triglycerides. Application
of lipases include the use as a detergent additive to facilitate the removal
of fats from fabrics in the
course of the washing procedures.
The use of enzymes, and in particular microbial enzymes, as catalyst for key
steps in the
synthesis of complex organic molecules is gaining popularity at a great rate.
One area of great
interest is the preparation of chirai intermediates. Preparation of chiral
intermediates is of interest

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
ag
to a wide range of synthetic chemists particularly those scientists involved
with the preparation of
new pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et
al., Recent
Advances in the Generation of Chiral Intermediates Using Enzymes, CRC Press,
Boca Raton,
Florida ( 1990)). The following reactions catalyzed by enzymes are of interest
to organic
chemists: hydrolysis of carboxylic acid esters, phosphate esters, amides and
nitrites,
esterification reactions, traps-esterification reactions, synthesis of amides,
reduction of alkanones
and oxoalkanates, oxidation of alcohols to carbonyl compounds, oxidation of
sulfides to
sulfoxides, and carbon bond forming reactions such as the aldol reaction.
When considering the use of an enzyme encoded by one of the ORFs of the
present
invention for biotransformation and organic synthesis it is sometimes
necessary to consider the
respective advantages and disadvantages of using a microorganism as opposed to
an isolated
enzyme. Pros and cons of using a whole cell system on the one hand or an
isolated partially
purified enzyme on the other hand, has been described in detail by Bud et al.,
Chemistry in
Britain ( 1987), p. 127.
Amino transferases, enzymes involved in the biosynthesis and metabolism of
amino
acids, are useful in the catalytic production of amino acids. The advantages
of using microbial
based enzyme systems is that the amino transferase enzymes catalyze the stereo-
selective
synthesis of only L-amino acids and generally possess uniformly high catalytic
rates. A
description of the use of amino transferases for amino acid production is
provided by Roselle-
David, Methods of Enzymology 136:479 ( 1987).
Another category of useful proteins encoded by the ORFs of the present
invention include
enzymes involved in nucleic acid synthesis, repair, and recombination.
2. Generation of Antibodies
As described here, the proteins of the present invention, as well as homologs
thereof, can
be used in a variety of procedures and methods known in the art which are
currently applied to
other proteins. The proteins of the present invention can further be used to
generate an antibody
which selectively binds the protein.
T. pallidum protein-specific antibodies for use in the present invention can
be raised
against the intact T. pallidum protein or an antigenic polypeptide fragment
thereof, which may be
presented together with a carrier protein, such as an albumin, to an animal
system (such as rabbit
or mouse) or, if it is long enough (at least about 25 amino acids), without a
carrier.
As used herein, the term "antibody" (Ab) or "monoclonal antibody" (Mab) is
meant to
include intact molecules, single chain whole antibodies, and antibody
fragments. Antibody
fragments of the present invention include Fab and F(ab')2 and other fragments
including single-
chain Fvs (scFv) and disulfide-linked Fvs (sdFv). Also included in the present
invention are
chimeric and humanized monoclonal antibodies and polyclonal antibodies
specific for the
polypeptides of the present invention. The antibodies of the present invention
may be prepared
by any of a variety of methods. For example, cells expressing a polypeptide of
the present

CA 02296814 1999-12-21
WO 98/59034 PCT/I1S98/13041
invention or an antigenic fragment thereof can be administered to an animal in
order to induce the
production of sera containing polyclonal antibodies. For example, a
preparation of T. pallidum
polypeptide or fragment thereof is prepared and purified to render it
substantially free of natural
contaminants. Such a preparation is then introduced into an animal in order to
produce
5 polyclonal antisera of greater specific activity.
In a preferred method, the antibodies of the present invention are monoclonal
antibodies
or binding fragments thereof. Such monoclonal antibodies can be prepared using
hybridoma
technology. See, e.g., Harlow et al., ANTIBODIES: A LABORATORY MANUAL, (Cold
Spring Harbor Laboratory Press, 2nd ed. 1988); Hammerling, et al., in:
MONOCLONAL
10 ANTIBODIES AND T-CELL HYBRIDOMAS 563-681 (Elsevier, N.Y., 1981 }. Fab and
F(ab')2 fragments may be produced by proteolytic cleavage, using enzymes such
as papain (to
produce Fab fragments) or pepsin (to produce F(ab')2 fragments).
Alternatively, T. pallidum
polypeptide-binding fragments, chimeric, and humanized antibodies can be
produced through the
application of recombinant DNA technology or through synthetic chemistry using
methods
15 known in the art.
Alternatively, additional antibodies capable of binding to the polypeptide
antigen of the
present invention may be produced in a two-step procedure through the use of
anti-idiotypic
antibodies. Such a method makes use of the fact that antibodies are themselves
antigens, and
that, therefore, it is possible to obtain an antibody which binds to a second
antibody. In
20 accordance with this method, T. pallidum polypeptide-specific antibodies
are used to immunize
an animal, preferably a mouse. The splenocytes of such an animal are then used
to produce
hybridoma cells, and the hybridoma cells are screened to identify clones which
produce an
antibody whose ability to bind to the T. pallidum polypeptide-specific
antibody can be blocked
by the T. pallidum polypeptide antigen. Such antibodies comprise anti-
idiotypic antibodies to
25 the T. pallidum poIypeptide-specific antibody and can be used to immunize
an animal to induce
formation of further T. pallidum polypeptide-specific antibodies.
Antibodies and fragements thereof of the present invention may be described by
the
portion of a polypeptide of the present invention recognized or specifically
bound by the
antibody. Antibody binding fragements of a polypeptide of the present
invention may be
30 described or specified in the same manner as for polypeptide fragements
discussed above., i.e,
by N-terminal and C-terminal positions or by size in contiguous amino acid
residues. Any
number of antibody binding fragments, of a polypeptide of the present
invention, specified by N-
terminal and C-terminal positions or by size in amino acid residues, as
described above, may also
be excluded from the present invention. Therefore, the present invention
includes antibodies the
specifically bind a particuarlly discribed fragement of a polypeptide of the
present invention and
allows for the exclusion of the same.
Antibodies and fragements thereof of the present invention may also be
described or specified in
terms of their cross-reactivity. Antibodies and fragements that do not bind
polypeptides of any
other species of Borrelia other than T. pallidum are included in the present
invention. Likewise,

CA 02296814 1999-12-21
WO 98/59034 PCT1US98/13041
31
antibodies and fragements that bind only species of Borrelia, i.e. antibodies
and fragements that
do not bind bacteria from any genus other than Borrelia, are included in the
present invention.
The present invention further provides the above- described antibodies in
detectably
labelled form. Antibodies can be detectably labelled through the use of
radioisotopes, affinity
labels (such as biotin, avidin, etc.), enzymatic labels (such as horseradish
peroxidase, alkaline
phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.),
paramagnetic atoms, etc.
Procedures for accomplishing such labeling are well-known in the art, for
example see
Sternberger et al., J. Histochem. Cytochem. 18: 315 ( 1970); Bayer, E. A. et
al., Meth. Enzym.
b2:308 ( 1979); Engval, E. et al., Immunol. 109:129 { 1972); Goding, J. W., J.
Immunol.
Meth. 13: 215 ( 1976)}.
The labeled antibodies of the present invention can be used for in vitro, in
vivo, and in
situ assays to identify cells or tissues in which a fragment of the T.
pallidum genome is
expressed.
The present invention further provides the above-described antibodies
immobilized on a
solid support. Examples of such solid supports include plastics such as
polycarbonate, complex
carbohydrates such as agarose and sepharose, acrylic resins and such as
polyacrylamide and latex
beads. Techniques for coupling antibodies to such solid supports are well
known in the art
(Weir, D. M. et al., "Handbook of Experimental Immunology" 4th Ed., Blackwell
Scientific
Publications, Oxford, England, Chapter 10 (1986); Jacoby, W. D. et al., Meth.
Enzym. 34
Academic Press, N. Y. ( 1974)). The immobilized antibodies of the present
invention can be
used for in vitro, in vivo, and in situ assays as well as for immunoaffinity
purification of the
proteins of the present invention.
3. Epitope-Bearing Portions
In another aspect, the invention provides peptides and polypeptides comprising
epitope-bearing portions of the T. pallidum polypeptides of the present
invention. These epitopes
are immunogenic or antigenic epitopes of the polypeptides of the present
invention. An
"immunogenic epitope" is defined as a part of a protein that elicits an
antibody response when the
whole protein or polypepdde is the immunogen. These immunogenic epitopes are
believed to be
confined to a few loci on the molecule. On the other hand, a region of a
protein molecule to
which an antibody can bind is defined as an "antigenic determinant" or
"antigenic epitope." The
number of immunogenic epitopes of a protein generally is less than the number
of antigenic
epitopes. See, e.g., Geysen, et al. ( 1983) Proc. Natl. Acad. Sci. USA 81:3998-
4002. Amino
acid residues comprising anigenic epitopes may be determined by algorithms
such as the the
3ameson-Wolf analysis or similar algorithms or by in vivo testing for an
antigenic response using
the methods described herein or those known in the art.
As to the selection of peptides or polypeptides bearing an antigenic epitope
(i.e., that
contain a region of a protein molecule to which an antibody can bind), it is
well known in that art
that relatively short synthetic peptides that mimic part of a protein sequence
are routinely capable

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
32
of eliciting an antiserum that reacts with the partially mimicked protein.
See, e.g., Sutcliffe, et
al., (1983) Science 219:660-666. Peptides capable of eliciting protein-
reactive sera are
frequently represented in the primary sequence of a protein, can be
characterized by a set of
simple chemical rules, and are confined neither to immunodominant regions of
intact proteins
(i.e., immunogenic epitopes) nor to the amino or carboxyl terminals. Peptides
that are extremely
hydrophobic and those of six or fewer residues generally are ineffective at
inducing antibodies
that bind to the mimicked protein; longer, peptides, especially those
containing proline residues,
usually are effective. See, Sutcliffe, et al., supra, p. 661. For instance, 18
of 20 peptides
designed according to these guidelines, containing 8-39 residues covering 75%
of the sequence
of the influenza virus hemagglutinin HA1 polypeptide chain, induced antibodies
that reacted with
the HA1 protein or intact virus; and 12/12 peptides from the MuLV poiymerase
and 18/18 from
the rabies glycoprotein induced antibodies that precipitated the respective
proteins.
Antigenic epitope-bearing peptides and polypeptides of the invention are
therefore useful
to raise antibodies, including monoclonal antibodies, that bind specifically
to a polypeptide of the
invention. Thus, a high proportion of hybridomas obtained by fusion of spleen
cells from
donors immunized with an antigen epitope-bearing peptide generally secrete
antibody reactive
with the native protein. See Sutcliffe, et al., supra, p. 663. The antibodies
raised by antigenic
epitope-bearing peptides or polypeptides are useful to detect the mimicked
protein, and antibodies
to different peptides may be used for tracking the fate of various regions of
a protein precursor
which undergoes post-translational processing. The peptides and anti-peptide
antibodies may be
used in a variety of qualitative or quantitative assays for the mimicked
protein, for instance in
competition assays since it has been shown that even short peptides (e.g.,
about 9 amino acids)
can bind and displace the larger peptides in immunoprecipitation assays. See,
e.g., Wilson, et
al., ( 1984) Cell 37:767-778. The anti-peptide antibodies of the invention
also are useful for
purification of the mimicked protein, for instance, by adsorption
chromatography using methods
known in the art.
Antigenic epitope-bearing peptides and polypeptides of the invention designed
according
to the above guidelines preferably contain a sequence of at least seven, more
preferably at least
nine and most preferably between about 10 to about 50 amino acids (i.e. any
integer between 7
and 50) contained within the amino acid sequence of a polypeptide of the
invention. However,
peptides or polypeptides comprising a larger portion of an amino acid sequence
of a polypeptide
of the invention, containing about 50 to about 100 amino acids, or any length
up to and including
the entire amino acid sequence of a polypeptide of the invention, also are
considered
epitope-bearing peptides or polypeptides of the invention and also are useful
for inducing
antibodies that react with the mimicked protein. Preferably, the amino acid
sequence of the
epitope-bearing peptide is selected to provide substantial solubility in
aqueous solvents (i.e., the
sequence includes relatively hydrophilic residues and highly hydrophobic
sequences are
preferably avoided); and sequences containing proline residues are
particularly preferred.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
33
The epitope-bearing peptides and polypeptides of the present invention may be
produced
by any conventional means for making peptides or polypeptides including
recombinant means
using nucleic acid molecules of the invention. For instance, an epitope-
bearing amino acid
sequence of the present invention may be fused to a larger polypeptide which
acts as a earner
during recombinant production and purification, as well as during immunization
to produce
anti-peptide antibodies. Epitope-bearing peptides also may be synthesized
using known methods
of chemical synthesis. For instance, Houghten has described a simple method
for synthesis of
large numbers of peptides, such as 10-20 mg of 248 different 13 residue
peptides representing
single amino acid variants of a segment of the HAi polypeptide which were
prepared and
characterized (by ELISA-type binding studies) in less than four weeks
(Houghten, R. A. Proc.
Natl. Acad. Sci. USA 82:5131-5135 (1985)). This "Simultaneous Multiple Peptide
Synthesis
(SMPS)" process is further described in U.S. Patent No. 4,631,211 to Houghten
and coworkers
(1986). In this procedure the individual resins for the solid-phase synthesis
of various peptides
are contained in separate solvent-permeable packets, enabling the optimal use
of the many
identical repetitive steps involved in solid-phase methods. A completely
manual procedure
allows 500-1000 or more syntheses to be conducted simultaneously (Houghten et
al. ( 1985)
Proc. Natl. Acad. Sci. 82:5131-5135 at 5134.
Epitope-bearing peptides and polypeptides of the invention are used to induce
antibodies
according zo methods well known in the art. See, e.g., Sutcliffe, et al.,
supra;; Wilson, et al.,
supra;; and Bittle, et al. (1985) J. Gen. Virol. 66:2347-2354. Generally,
animals may be
immunized with free peptide; however, anti-peptide antibody titer may be
boosted by coupling of
the peptide to a macromolecular carrier, such as keyhole limpet hemacyanin
(KLH) or tetanus
toxoid. For instance, peptides containing cysteine may be coupled to carrier
using a linker such
as m-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS), while other peptides
may be
coupled to carrier using a more general linking agent such as glutaraldehyde.
Animals such as
rabbits, rats and mice are immunized with either free or carrier-coupled
peptides, for instance, by
intraperitoneal and/or intradermal injection of emulsions containing about
10(? p,g peptide or
earner protein and Freund's adjuvant. Several booster injections may be
needed, for instance, at
intervals of about two weeks, to provide a useful titer of anti-peptide
antibody which can be
detected, for example, by ELISA assay using free peptide adsorbed to a solid
surface. The titer
of anti-peptide antibodies in serum from an immunized animal may be increased
by selection of
anti-peptide antibodies, for instance, by adsorption to the peptide on a solid
support and elution
of the selected antibodies according to methods well known in the art.
Immunogenic epitope-bearing peptides of the invention, i.e., those parts of a
protein that
elicit an antibody response when the whole protein is the immunogen, are
identified according to
methods known in the art. For instance, Geysen, et al., supra, discloses a
procedure for rapid
concurrent synthesis on solid supports of hundreds of peptides of sufficient
purity to react in an
ELISA. Interaction of synthesized peptides with antibodies is then easily
detected without
removing them from the support. In this manner a peptide bearing an
immunogenic epitope of a

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
34
desired protein may be identified routinely by one of ordinary skill in the
art. For instance, the
immunologically important epitope in the coat protein of foot-and-mouth
disease virus was
located by Geysen et al. supra with a resolution of seven amino acids by
synthesis of an
overlapping set of all 208 possible hexapeptides covering the entire 213 amino
acid sequence of
the protein. Then, a complete replacement set of peptides in which all 20
amino acids were
substituted in turn at every position within the epitope were synthesized, and
the particular amino
acids conferring specificity for the reaction with antibody were deterniined.
Thus, peptide
analogs of the epitope-bearing peptides of the invention can be made routinely
by this method.
U.S. Patent No. 4,708,781 to Geysen ( 1987) further describes this method of
identifying a
peptide bearing an immunogenic epitope of a desired protein.
Further still, U.S. Patent No. 5,194,392, to Geysen (1990), describes a
general method
of detecting or determining the sequence of monomers (amino acids or other
compounds) which
is a topological equivalent of the epitope (i.e., a "mimotope") which is
complementary to a
particular paratope (antigen binding site) of an antibody of interest. More
generally, U.S. Patent
No. 4,433,092, also to Geysen (1989), describes a method of detecting or
determining a
sequence of monomers which is a topographical equivalent of a ligand which is
complementary
to the ligand binding site of a particular receptor of interest. Similarly,
U.S. Patent No.
5,480,971 to Houghten, R. A. et al. ( 1996) discloses linear C,-C~ alkyl
peralkylated
oligopeptides and sets and libraries of such peptides, as well as methods for
using such
oligopeptide sets and libraries for determining the sequence of a peralkylated
oligopeptide that
preferentially binds to an acceptor molecule of interest. Thus, non-peptide
analogs of the
epitope-bearing peptides of the invention also can be made routinely by these
methods. The
entire disclosure of each document cited in this section on "Polypeptides and
Fragments" is
hereby incorporated herein by reference.
As one of skill in the art will appreciate, the polypeptides of the present
invention and the
epitope-bearing fragments thereof described above can be combined with parts
of the constant
domain of immunoglobulins (IgG), resulting in chimeric polypeptides. These
fusion proteins
facilitate purification and show an increased half life in vivo. This has been
shown, e.g., for
chimeric proteins consisting of the first two domains of the human CD4-
polypeptide and various
domains of the constant regions of the heavy or light chains of mammalian
immunoglobulins.
(EPA 0,394,827; Traunecker et al. ( 1988) Nature 331:84-86. Fusion proteins
that have a
disulfide-linked dimeric structure due to the IgG part can also be more
efficient in binding and
neutralizing other molecules than a monomeric T. pallidum polypeptide or
fragment thereof
alone. See Fountoulakis et al. (1995) J. Biochem. 270:3958-3964. Nucleic acids
encoding the
above epitopes of T. pallidum polypeptides can also be recombined with a gene
of interest as an
epitope tag to aid in detection and purification of the expressed polypepdde.
3. Diagnostic Assays and Kits

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
The present invention further relates to methods for assaying Borrelia
infection in an
animal by detecting the expression of genes encoding Borrelia polypeptides of
the present
invention. The methods comprise analyzing tissue or body fluid from the animal
for
5 Borrelia-specific antibodies, nucleic acids, or proteins. Analysis of
nucleic acid specific to
Borrelia is assayed by PCR or hybridization techniques using nucleic acid
sequences of the
present invention as either hybridization probes or primers. See, e.g.,
Sambrook et al.
Molecular cloning: A Laboratory Manual (Cold Spring Harbor Laboratory Press,
2nd ed., 1989,
page 54 reference); Eremeeva et al. ( 1994) J. Clin. Microbiol. 32:803-810
(describing
10 differentiation among spotted fever group Rickettsiae species by analysis
of restriction fragment
length polymorphism of PCR-amplified DNA) and Chen et al. 1994 J. Clin.
Microbiol. 32:589-
595 (detecting T. pallidum nucleic acids via PCR).
Where diagnosis of a disease state related to infection with Borrelia has
already been
made, the present invention is useful for monitoring progression or regression
of the disease state
15 whereby patients exhibiting enhanced Borrelia gene expression will
experience a worse clinical
outcome relative to patients expressing these genes) at a lower level.
By "biological sample" is intended any biological sample obtained from an
animal, cell
line, tissue culture, or other source which contains Borrelia polypeptide,
mRNA, or DNA.
Biological samples include body fluids (such as saliva, blood, plasma, urine,
mucus, synovial
20 fluid, etc.) tissues (such as muscle, skin, and cartilage) and any other
biological source suspected
of containing Borrelia polypeptides or nucleic acids. Methods for obtaining
biological samples
such as tissue are well known in the art.
The present invention is useful for detecting diseases related to Borrelia
infections in
animals. Preferred animals include monkeys, apes, cats, dogs, birds, cows,
pigs, mice, horses,
25 rabbits and humans. Particularly preferred are humans.
Total RNA can be isolated from a biological sample using any suitable
technique such as
the single-step guanidinium-thiocyanate-phenol-chloroform method described in
Chomczynski et
al. (198'7) Anal. Biochem. 162:156-159. mRNA encoding Borrelia polypepddes
having
sufficient homology to the nucleic acid sequences identified in SEQ m NOS:1-
744 to allow for
30 hybridization between complementary sequences are then assayed using any
appropriate method.
These include Northern blot analysis, S 1 nuclease mapping, the polymerise
chain reaction
(PCR), reverse transcription in combination with the polymerise chain reaction
(RT-PCR), and
reverse transcription in combination with the ligase chain reaction (RT-LCR).
Northern blot analysis can be performed as described in Harada et al. ( 1990)
Cell
35 63:303-312. Briefly, total RNA is prepared from a biological sample as
described above. For
the Northern blot, the RNA is denatured in an appropriate buffer (such as
glyoxaUdimethyl
sulfoxide/sodium phosphate buffer), subjected to agarose gel electrophoresis,
and transferred
onto a nitrocellulose filter. After the RNAs hive been linked to the filter by
a W linker, the filter
is prehybridized in a solution containing formamide, SSC, Denhardt's solution,
denatured

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
36
salmon sperm, SDS, and sodium phosphate buffer. A T. pallidum polynucleotide
sequence
shown in SEQ ID NOS:1-744, or portion thereof, labeled according to any
appropriate method
(such as the 32P-multiprimed DNA labeling system (Amersham)) is used as probe.
After
hybridization overnight, the filter is washed and exposed to x-ray film. DNA
for use as probe
according to the present invention is described in the sections above and will
preferably at least
nucleotides in length.
S 1 mapping can be performed as described in Fujita et al. ( 1987) Cell 49:357-
367. To
prepare probe DNA for use in S 1 mapping, the sense strand of an above-
described T. pallidum
DNA sequence of the present invention is used as a template to synthesize
labeled antisense
10 DNA. The antisense DNA can then be digested using an appropriate
restriction endonuclease to
generate further DNA probes of a desired length. Such antisense probes are
useful for
visualizing protected bands corresponding to the target mRNA (i.e., mRNA
encoding Borrelia
polypeptides).
Levels of mRNA encoding Borrelia polypeptides are assayed, for e.g., using the
15 RT-PCR method described in Makino et al. ( 1990) Technique 2:295-301. By
this method, the
radioactivities of the "amplicons" in the polyacrylamide gel bands are
linearly related to the initial
concentration of the target mRNA. Briefly, this method involves adding total
RNA isolated from
a biological sample in a reaction mixture containing a RT primer and
appropriate buffer. After
incubating for primer annealing, the mixture can be supplemented with a RT
buffer, dNTPs,
DTT, RNase inhibitor and reverse transcriptase. After incubation to achieve
reverse transcription
of the RNA, the RT products are then subject to PCR using labeled primers.
Alternatively, rather
than labeling the primers, a labeled dNTP can be included in the PCR reaction
mixture. PCR
amplification can be performed in a DNA thermal cycler according to
conventional techniques.
After a suitable number of rounds to achieve amplification, the PCR reaction
mixture is
electrophoresed on a polyacrylamide gel. After drying the gel, the
radioactivity of the appropriate
bands (corresponding to the mRNA encoding the Borrelia polypeptides of the
present invention)
are quantified using an imaging analyzer. RT and PCR reaction ingredients and
conditions,
reagent and gel concentrations, and labeling methods are well known in the
art. Variations on the
RT-PCR method will be apparent to the skilled artisan. Other PCR methods that
can detect the
nucleic acid of the present invention can be found in PCR PRllViER: A
LABORATORY
MANUAL (C.W. Dieffenbach et al. eds., Cold Spring Harbor Lab Press, 1995).
The polynucleotides of the present invention, including both DNA and RNA, may
be
used to detect polynucleotides of the present invention or Borrelia species
including T. pallidum
using bio chip technology. The present invention includes both high density
chip arrays (> 1000
oligonucleotides per cm2) and low density chip arrays (<1000 oligonucleotides
per cmz). Bio
chips comprising arrays of polynucleotides of the present invention may be
used to detect
Borrelia species, including T. pallidum, in biological and environmental
samples and to diagnose
an animal, including humans, with an T. pallulum or other Borrelia infection.
The bio chips of
the present invention may comprise polynucleotide sequences of other pathogens
including

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
37
bacteria, viral, parasitic, and fungal polynucleotide sequences, in addition
to the polynucleotide
sequences of the present invention, for use in rapid diffenertial pathogenic
detection and
diagnosis. The bio chips can also be used to monitor an T. pallidum or other
Borrelia infections
and to monitor the genetic changes (deletions, insertions, mismatches, etc.)
in response to drug
therapy in the clinic and drug development in the laboratory. The bio chip
technology comprising
arrays of polynucleotides of the present invention may also be used to
simultaneously monitor the
expression of a multiplicity of genes, including those of the present
invention. The
polynucleotides used to comprise a selected array may be specified in the same
manner as for the
fragements, i.e, by their 5' and 3' positions or length in contigious base
pairs and include from.
Methods and particular uses of the polynucleotides of the present invention to
detect Borrelia
species, including T. pallidum, using bio chip technology include those known
in the art and
those of: U.S. Patent Nos. 5510270, 5545531, 5445934, 5677195, 5532128,
5556752,
5527681, 5451683, 5424186, 5607646, 5658732 and World Patent Nos. W0/9710365,
WO/9511995, WO/9743447, WO/9535505, each incorporated herein in their
entireties.
Biosensors using the polynucleotides of the present invention may also be used
to detect,
diagnose, and monitor T. pallidum or other Borrelia species and infections
thereof. Biosensors
using the polynucleotides of the present invention may also be used to detect
particular
polynucleotides of the present invention. Biosensors using the polynucleotides
of the present
invention may also be used to monitor the genetic changes (deletions,
insertions, mismatches,
etc.) in response to drug therapy in the clinic and drug development in the
laboratory. Methods
and particular uses of the polynucleotides of the present invention to detect
Borrelia species,
including T. pallidum, using biosenors include those known in the art and
those of: U.S. Patent
Nos 5721102, 5658732, 5631170, and World Patent Nos. W097/35011, WO/9720203,
each
incorporated herein in their entireties.
Thus, the present invention includes both bio chips and biosensors comprising
polynucleotides of the present invention and methods of their use.
Assaying Borrelia polypeptide levels in a biological sample can occur using
any
art-known method, such as antibody-based techniques. For example, Borrelia
polypeptide
expression in tissues can be studied with classical immunohistological
methods. In these, the
specific recognition is provided by the primary antibody (polyclonal or
monoclonal) but the
secondary detection system can utilize fluorescent, enzyme, or other
conjugated secondary
antibodies. As a result, an immunohistological staining of tissue section for
pathological
examination is obtained. Tissues can also be extracted, e.g., with urea and
neutral detergent, for
the liberation of Borrelia polypeptides for Western-blot or dotlslot assay.
See, e.g., Jalkanen,
M. et al. ( 1985) J. Cell. Biol. 101:976-985; Jalkanen, M. et al. ( 1987) J.
Cell . Biol.
105:3087-3096. In this technique, which is based on the use of cationic solid
phases,
quantitadon of a Borrelia polypeptide can be accomplished using an isolated
Borrelia polypeptide
as a standard. This technique can also be applied to body fluids.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
38
Other antibody-based methods useful for detecting Borrelia polypeptide gene
expression
include immunoassays, such as the ELISA and the radioimmunoassay (RIA). For
example, a
Borrelia polypeptide-specific monoclonal antibodies can be used both as an
immunoabsorbent
and as an enzyme-labeled probe to detect and quantify a Borrelia polypeptide.
The amount of a
Borrelia polypeptide present in the sample can be calculated by reference to
the amount present in
a standard preparation using a linear regression computer algorithm. Such an
ELISA is described
in Iacobelli et al. ( 1988} Breast Cancer Research and Treatment 11:19-30. In
another ELISA
assay, two distinct specific monoclonal antibodies can be used to detect
Borrelia polypeptides in a
body fluid. In this assay, one of the antibodies is used as the
immunoabsorbent and the other as
the enzyme-labeled probe.
The above techniques may be conducted essentially as a "one-step" or "two-
step" assay.
The "one-step" assay involves contacting the Borrelia polypeptide with
immobilized antibody
and, without washing, contacting the mixture with the labeled antibody. The
"two-step" assay
involves washing before contacting the mixture with the labeled antibody.
Other conventional
methods may also be employed as suitable. It is usually desirable to
immobilize one component
of the assay system on a support, thereby allowing other components of the
system to be brought
into contact with the component and readily removed from the sample.
Variations of the above
and other immunological methods included in the present invention can also be
found in Harlow
et al., ANTIBODIES: A LABORATORY MANUAL, (Cold Spring Harbor Laboratory Press,
2nd ed. 1988).
Suitable enzyme labels include, for example, those from the oxidase group,
which
catalyze the production of hydrogen peroxide by reacting with substrate.
Glucose oxidase is
particularly preferred as it has good stability and its substrate (glucose) is
readily available.
Activity of an oxidase label may be assayed by measuring the concentration of
hydrogen peroxide
formed by the enzyme-labeled antibody/substrate reaction. Besides enzymes,
other suitable
labels include radioisotopes, such as iodine ('uI,'z'I), carbon ('4C), sulphur
(3sS), tritium (3H),
indium ("zIn), and technetium (~''"Tc), and fluorescent labels, such as
fluorescein and
rhodamine, and biotin.
Further suitable labels for the Borrelia polypeptide-specific antibodies of
the present
invention are provided below. Examples of suitable enzyme labels include
malate
dehydrogenase, Borrelia nuclease, delta-5-steroid isomerase, yeast-alcohol
dehydrogenase,
alpha-glycerol phosphate dehydrogenase, triose phosphate isomerase,
peroxidase, alkaline
phosphatase, asparaginase, glucose oxidase, beta-galactosidase, ribonuclease,
urease, catalase,
glucose-6-phosphate dehydrogenase, glucoamylase, and acetylcholine esterase.
Examples of suitable radioisotopic labels include 3H, "'In, 'zsl, '3'I, 32P,
3sS, 'aC, s'Cr,
s'To, saCo, s9Fe, 'sSe, 'szEu, 9oY, 6'Cu, z"Ci, z"At, z'zPb, °'Sc,
'°9Pd, etc. "'In is a preferred
isotope where in vivo imaging is used since its avoids the problem of
dehalogenation of the'~I
or'3'I-labeled monoclonal antibody by the liver. In addition, this
radionucleotide has a more
favorable gamma emission energy for imaging. See, e.g., Perkins et al. (1985)
Eur. J. Nucl.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
39
Med. 10:296-301; Carasquillo et al. (1987) J. Nucl. Med. 28:281-287. For
example, "'In
coupled to monoclonal antibodies with 1-(P-isathiocyanatobenzyl)-DPTA has
shown little uptake
in non-tumors tissues, particularly the liver, and therefore enhances
specificity of tumor
localization. See, Esteban et al. (1987) J. Nucl. Med. 28:861-870.
Examples of suitable non-radioactive isotopic labels include'S'Gd, SSMn,'62Dy,
S~Tr,
and S6Fe.
Examples of suitable fluorescent labels include an'SZEu label, a fluorescein
label, an
isothiocyanate label, a rhodamine label, a phycoerythrin label, a phycocyanin
label, an
allophycocyanin label, an o-phthaldehyde label, and a fluorescamine label.
Examples of suitable toxin labels include, Pseudomonas toxin, diphtheria
toxin, ricin,
and cholera toxin.
Examples of chemiluminescent labels include a luminal label, an isoluminal
label, an
aromatic acridinium ester label, an imidazole label, an acridinium salt label,
an oxalate ester label,
a luciferin label, a luciferase label, and an aequorin label.
Examples of nuclear magnetic resonance contrasting agents include heavy metal
nuclei
such as Gd, Mn, and iron.
Typical techniques for binding the above-described labels to antibodies are
provided by
Kennedy et al. (1976) Clin. Chim. Acta 70:1-31, and Schurs et al. (1977) Clin.
Chim. Acta
81:1-40. Coupling techniques mentioned in the latter are the glutaraldehyde
method, the
periodate method, the dimaleimide method, the m-maleinudobenzyl-N-hydroxy-
succinimide ester
method, all of which methods are incorporated by reference herein.
In a related aspect, the invention includes a diagnostic kit for use in
screening serum
containing antibodies specific against T. pallidum infection. Such a kit may
include an isolated
T. pallidum antigen comprising an epitope which is specifically immunoreactive
with at least one
anti-T. pallidum antibody. Such a kit also includes means for detecting the
binding of said
antibody to the antigen. In specific embodiments, the kit may include a
recombinantly produced
or chemically synthesized peptide or polypeptide antigen. The peptide or
polypeptide antigen
may be attached to a solid support.
In a more specific embodiment, the detecting means of the above-described kit
includes a
solid support to which said peptide or polypeptide antigen is attached. Such a
kit may also
include a non-attached reporter-labeled anti-human antibody. In this
embodiment, binding of the
antibody to the T. pallidum antigen can be detected by binding of the reporter
labeled antibody to
the anti-T. pallutum polypepdde antibody.
Specifically, the invention provides a compartmentalized kit to receive, in
close
confinement, one or more containers which comprises: (a) a first container
comprising one of the
DFs or antibodies of the present invention; and (b) one or more other
containers comprising one
or more of the following: wash reagents, reagents capable of detecting
presence of a bound DF or
antibody.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
In detail, a compartmentalized kit includes any kit in which reagents are
contained in
separate containers. Such containers include small glass containers, plastic
containers or strips of
plastic or paper. Such containers allows one to efficiently transfer reagents
from one
compartment to another compartment such that the samples and reagents are not
cross-
5 contaminated, and the agents or solutions of each container can be added in
a quantitative fashion
from one compartment to another. Such containers will include a container
which will accept the
test sample, a container which contains the antibodies used in the assay,
containers which contain
wash reagents (such as phosphate buffered saline, Tris-buffers, etc.), and
containers which
contain the reagents used to detect the bound antibody or DF.
In a related aspect, the invention includes a method of detecting T. pallidum
infection in a
subject. This detection method includes reacting a body fluid, preferably
serum, from the subject
with an isolated T. pallidum antigen, and examining the antigen for the
presence of bound
antibody. In a specific embodiment, the method includes a polypeptide antigen
attached to a solid
support, and serum is reacted with the support. Subsequently, the support is
reacted with a
reporter-labeled anti-human antibody. The support is then examined for the
presence of reporter-
labeled antibody.
The solid surface reagent employed in the above assays and kits is prepared by
known
techniques for attaching protein material to solid support material, such as
polymeric beads, dip
sticks, 96-well plates or filter material. These attachment methods generally
include non-specific
adsorption of the protein to the support or covalent attachment of the protein
, typically through a
free amine group, to a chemically reactive group on the solid support, such as
an activated
carboxyl, hydroxyl, or aldehyde group. Alternatively, streptavidin coated
plates can be used in
conjunction with biotinylated antigen{s).
The polypeptides and antibodies of the present invention, including fragments
thereof,
may be used to detect Borrelia species including T. pallidum using bio chip
and biosensor
technology. Bio chip and biosensors of the present invention may comprise the
polypeptides of
the present invention to detect antibodies, which specifically recognize
Borrelia species, including
T. pallidum. Bio chip and biosensors of the present invention may also
comprise antibodies
which specifically recognize the polypeptides of the present invention to
detect Borrelia species,
including T. palluium or specific polypeptides of the present invention. Bio
chips or biosensors
comprising polypeptides or antibodies of the present invention may be used to
detect Borrelia
species, including T. pallidum, in biological and environmental samples and to
diagnose an
animal, including humans, with an T. pallidum or other Borrelia infection.
Thus, the present
invention includes both bio chips and biosensors comprising polypeptides or
antibodies of the
present invention and methods of their use.
The bio chips of the present invention may further comprise polypeptide
sequences of
other pathogens including bacteria, viral, parasitic, and fungal polypeptide
sequences, in addition
to the polypepdde sequences of the present invention, for use in rapid
diffenertial pathogenic

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
41
detection and diagnosis. The bio chips of the present invention may further
comprise antibodies
or fragements thereof specific for other pathogens including bacteria, viral,
parasitic, and fungal
polypeptide sequences, in addition to the antibodies or fragements thereof of
the present
invention, for use in rapid diffenertial pathogenic detection and diagnosis.
The bio chips and
biosensors of the present invention may also be used to monitor an T. pallidum
or other Borrelia
infection and to monitor the genetic changes (amio acid deletions, insertions,
substitutions, etc.)
in response to drug therapy in the clinic and drug development in the
laboratory. The bio chip
and biosensors comprising polypeptides or antibodies of the present invention
may also be used
to simultaneously monitor the expression of a multiplicity of polypeptides,
including those of the
present invention. The polypeptides used to comprise a bio chip or biosensor
of the present
invention may be specified in the same manner as for the fragements, i.e, by
their N-terminal and
C-terminal positions or length in contigious amino acid residue. Methods and
particular uses of
the polypeptides and antibodies of the present invention to detect Borrelia
species, including T.
pallidum, or specific polypeptides using bio chip and biosensor technology
include those known
in the art, those of the U.S. Patent Nos. and World Patent Nos. listed above
for bio chips and
biosensors using polynucleotides of the present invention, and those of: U.S.
Patent Nos.
5658732, 5135852, 5567301, 5677196, 5690894 and World Patent Nos. W09729366,
W09612957, each incorporated herein in their entireties.
4. Screening Assay for Binding Agents
Using the isolated proteins of the present invention, the present invention
further provides
methods of obtaining and identifying agents which bind to a protein encoded by
one of the ORFs
of the present invention or to one of the fragments and the T. pallidum
fragment and contigs
herein described.
In general, such methods comprise steps of:
(a) contacting an agent with an isolated protein encoded by one of the ORFs of
the
present invention, or an isolated fragment of the T. pallidum genome; and
(b) determining whether the agent binds to said protein or said fragment.
The agents screened in the above assay can be, but are not limited to,
peptides,
carbohydrates, vitamin derivatives, or other pharmaceutical agents. The agents
can be selected
and screened at random or rationally selected or designed using protein
modeling techniques.
For random screening, agents such as peptides, carbohydrates, pharmaceutical
agents and
the like are selected at random and are assayed for their ability to bind to
the protein encoded by
the ORF of the present invention.
Alternatively, agents may be rationally selected or designed. As used herein,
an agent is
said to be "rationally selected or designed" when the agent is chosen based on
the configuration
of the particular protein. For example, one skilled in the art can readily
adapt currently available
procedures to generate peptides, pharmaceutical agents and the like capable of
binding to a
specific peptide sequence in order to generate rationally designed antipepdde
peptides, for

CA 02296814 1999-12-21
WO 98/59034 PCT/US98l13041
42
example see Hurby et al., "Application of Synthetic Peptides: Antisense
Peptides," in Synthetic
Peptides, A User's Guide, W. H. Freeman, NY (1992), pp. 289-307, and Kaspczak
et al.,
Biochemistry 28: 9230-8 ( 1989), or pharmaceutical agents, or the like.
In addition to the foregoing, one class of agents of the present invention, as
broadly
described, can be used to control gene expression through binding to one of
the ORFs or EMFs
of the present invention. As described above, such agents can be randomly
screened or rationally
designed/selected. Targeting the ORF or EMF allows a skilled artisan to design
sequence
specific or element specific agents, modulating the expression of either a
single ORF or multiple
ORFs which rely on the same EMF for expression control.
One class of DNA binding agents are agents which contain base residues which
hybridize
or form a triple helix by binding to DNA or RNA. Such agents can be based on
the classic
phosphodiester, ribonucleic acid backbone, or can be a variety of sulfhydryl
or polymeric
derivatives which have base attachment capacity.
Agents suitable for use in these methods usually contain 20 to 40 bases and
are designed
to be complementary to a region of the gene involved in transcription (triple
helix - see Lee et al.,
Nucl. Acids Res. 6:3073 (1979); Cooney et al., Science 241:456 (1988); and
Dervan et al.,
Science 251:1360 ( 1991 )) or to the mRNA itself (antisense - Okano, J.
Neurochem. 56: 560
( 1991 ); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression,
CRC Press, Boca
Raton, FL ( 1988)). Triple helix- formation optimally results in a shut-off of
RNA transcription
from DNA, while antisense RNA hybridization blocks translation of an mRNA
molecule into
polypeptide. Both techniques have been demonstrated to be effective in model
systems.
Information contained in the sequences of the present invention can be used to
design antisense
and triple helix-forming oligonucleotides, and other DNA binding agents.
5. Pharmaceutical Compositions and Vaccines
The present invention further provides pharmaceutical agents which can be used
to
modulate the growth or pathogenicity of T. pallidum, or another related
organism, in vivo or in
vitro. As used herein, a "pharmaceutical agent" is defined as a composition of
matter which can
be formulated using known techniques to provide a pharmaceutical compositions.
As used
herein, the "pharmaceutical agents of the present invention" refers the
pharmaceutical agents
which are derived from the proteins encoded by the ORFs of the present
invention or are agents
which are identified using the herein described assays.
As used herein, a pharmaceutical agent is said to "modulate the growth
pathogenicity of
T. pallidum or a related organism, in vivo or in vitro," when the agent
reduces the rate of growth,
rate of division, or viability of the organism in question. The pharmaceutical
agents of the
present invention can modulate the growth or pathogenicity of an organism in
many fashions,
although an understanding of the underlying mechanism of action is not needed
to practice the
use of the pharmaceutical agents of the present invention. Some agents will
modulate the growth
by binding to an important protein thus blocking the biological activity of
the protein, while other

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
43
agents may bind to a component of the outer surface of the organism blocking
attachment or
rendering the organism more prone to act the bodies nature immune system.
Alternatively, the
agent may comprise a protein encoded by one of the ORFs of the present
invention and serve as a
vaccine. The development and use of a vaccine based on outer membrane
components are well
known in the art.
As used herein, a "related organism" is a broad term which refers to any
organism whose
growth can be modulated by one of the pharmaceutical agents of the present
invention. In
general, such an organism will contain a homolog of the protein which is the
target of the
pharmaceutical agent or the protein used as a vaccine. As such, related
organisms do not need to
be bacterial but may be fungal or viral pathogens.
The pharmaceutical agents and compositions of the present invention may be
administered
in a convenient manner, such as by the oral, topical, intravenous,
intraperitoneal, intramuscular,
subcutaneous, intranasal or intradermal routes. The pharmaceutical
compositions are
administered in an amount which is effective for treating and/or prophylaxis
of the specific
indication. In general, they are administered in an amount of at least about 1
mg/kg body weight
and in most cases they will be administered in an amount not in excess of
about 1 g/kg body
weight per day. In most cases, the dosage is from about 0.1 mg/kg to about 10
g/kg body
weight daily, taking into account the routes of administration, symptoms, etc.
The agents of the present invention can be used in native form or can be
modified to form
a chemical derivative. As used herein, a molecule is said to be a "chemical
derivative" of another
molecule when it contains additional chemical moieties not normally a part of
the molecule. Such
moieties may improve the molecule's solubility, absorption, biological half
life, etc. The
moieties may alternatively decrease the toxicity of the molecule, eliminate or
attenuate any
undesirable side effect of the molecule, etc. Moieties capable of mediating
such effects are
disclosed in, among other sources, REMINGTON'S PHARMACEUTICAL SCIENCES ( 1980)
cited elsewhere herein.
For example, such moieties may change an immunological character of the
functional
derivative, such as affinity for a given antibody. Such changes in
immunomodulation activity are
measured by the appropriate assay, such as a competitive type immunoassay.
Modifications of
such protein properties as redox or thermal stability, biological half life,
hydrophobicity,
susceptibility to proteolytic degradation or the tendency to aggregate with
carriers or into
multimers also may be effected in this way and can be assayed by methods well
known to the
skilled artisan.
The therapeutic effects of the agents of the present invention may be obtained
by
providing the agent to a patient by any suitable means (e.g., inhalation,
intravenously,
intramuscularly, subcutaneously, enterally, or parenterally). It is preferred
to administer the
agent of the present invention so as to achieve an effective concentration
within the blood or
tissue in which the growth of the organism is to be controlled. To achieve an
effective blood

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
44
concentration, the preferred method is to administer the agent by injection.
The administration
may be by continuous infusion, or by single or multiple injections.
In providing a patient with one of the agents of the present invention, the
dosage of the
administered agent will vary depending upon such factors as the patient's age,
weight, height,
sex, general medical condition, previous medical history, etc. In general, it
is desirable to
provide the recipient with a dosage of agent which is in the range of from
about 1 pg/kg to 10
mg/kg (body weight of patient), although a lower or higher dosage may be
administered. The
therapeutically effective dose can be lowered by using combinations of the
agents of the present
invention or another agent.
As used herein, two or more compounds or agents are said to be administered
"in
combination" with each other when either ( 1 ) the physiological effects of
each compound, or (2)
the serum concentrations of each compound can be measured at the same time.
The composition
of the present invention can be administered concurrently with, prior to, or
following the
administration of the other agent.
The agents of the present invention are intended to be provided to recipient
subjects in an
amount sufficient to decrease the rate of growth (as defined above) of the
target organism.
The administration of the agents) of the invention may be for either a
"prophylactic" or
"therapeutic" purpose. When provided prophylactically, the agents) are
provided in advance of
any symptoms indicative of the organisms growth. The prophylactic
administration of the
agents) serves to prevent, attenuate, or decrease the rate of onset of any
subsequent infection.
When provided therapeutically, the agents) are provided at (or shortly after)
the onset of an
indication of infection. The therapeutic administration of the compounds)
serves to attenuate the
pathological symptoms of the infection and to increase the rate of recovery.
The agents of the present invention are administered to a subject, such as a
mammal, or a
patient, in a pharmaceutically acceptable form and in a therapeutically
effective concentration. A
composition is said to be "pharmacologically acceptable" if its administration
can be tolerated by a
recipient patient. Such an agent is said to be administered in a
"therapeutically effective amount"
if the amount administered is physiologically significant. An agent is
physiologically significant
if its presence results in a detectable change in the physiology of a
recipient patient.
The agents of the present invention can be formulated according to known
methods to
prepare pharmaceutically useful compositions, whereby these materials, or
their functional
derivatives, are combined in a mixture with a pharmaceutically acceptable
carrier vehicle.
Suitable vehicles and their formulation, inclusive of other human proteins,
e.g., human serum
albumin, are described, for example, in REMINGTON'S PHARMACEUTICAL SCIENCES,
16th Ed., Osol, A., Ed., Mack Publishing, Easton PA ( 1980). In order to form
a
pharmaceutically acceptable composition suitable for effective administration,
such compositions
will contain an effective amount of one or more of the agents of the present
invention, together
with a suitable amount of carrier vehicle.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
Additional pharmaceutical methods may be employed to control the duration of
action.
Control release preparations may be achieved through the use of polymers to
complex or absorb
one or more of the agents of the present invention. The controlled delivery
may be effectuated by
a variety of well known techniques, including formulation with macromolecules
such as, for
5 example, polyesters, polyamino acids, polyvinyl, pyrrolidone,
ethylenevinylacetate,
methylcellulose, carboxymethylcellulose, or protamine, sulfate, adjusting the
concentration of the
macromolecules and the agent in the formulation, and by appropriate use of
methods of
incorporation, which can be manipulated to effectuate a desired time course of
release. Another
possible method to control the duration of action by controlled release
preparations is to
10 incorporate agents of the present invention into particles of a polymeric
material such as
polyesters, polyamino acids, hydrogels, poly(lactic acid) or ethylene
vinylacetate copolymers.
Alternatively, instead of incorporating these agents into polymeric particles,
it is possible to
entrap these materials in microcapsules prepared, for example, by coacervation
techniques or by
interfacial polymerization with, for example, hydroxymethylcellulose or
gelatine-microcapsules
1 S and poly(methylinethacylate) microcapsules, respectively, or in colloidal
drug delivery systems,
for example, liposomes, albumin microspheres, microemulsions, nanoparticles,
and
nanocapsules or in macroemulsions. Such techniques are disclosed in
REMINGTON'S
PHARMACEUTICAL SCIENCES ( 1980).
The invention further provides a pharmaceutical pack or kit comprising one or
more
20 containers filled with one or more of the ingredients of the pharmaceutical
compositions of the
invention. Associated with such containers) can be a notice in the form
prescribed by a
governmental agency regulating the manufacture, use or sale of pharmaceuticals
or biological
products, which notice reflects approval by the agency of manufacture, use or
sale for human
administration.
25 In addition, the agents of the present invention may be employed in
conjunction with
other therapeutic compounds.
6. Shot-Gun Approach to Megabase DNA Sequencing
The present invention further demonstrates that a large sequence can be
sequenced using a
30 random shotgun approach. This procedure, described in detail in the
examples that follow, has
eliminated the up front cost of isolating and ordering overlapping or
contiguous subclones prior
to the start of the sequencing protocols.
Certain aspects of the present invention are described in greater detail in
the examples that
follow. The examples are provided by way of illustration. Other aspects and
embodiments of
35 the present invention are contemplated by the inventors, as will be clear
to those of skill in the art
from reading the present disclosure.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
46
LIBRARIES AND SEQUENCING
1. Shotgun Sequencing Probability Analysis
The overall strategy for a shotgun approach to whole genome sequencing follows
from
the Lander and Waterman (Landerman and Waterman, Genomics 2: 231 ( 1988))
application of the
equation for the Poisson distribution. According to this treatment, the
probability, P0, that any
given base in a sequence of size L, in nucleotides, is not sequenced after a
certain amount, n, in
nucleotides, of random sequence has been determined can be calculated by the
equation PO = e-
m, where m is L/n, the fold coverage. For instance, for a genome of 2.8 Mb,
m=1 when 2.8
Mb of sequence has been randomly generated (1X coverage). At that point, PO =
e-1 = 0.37.
The probability that any given base has not been sequenced is the same as the
probability that any
region of the whole sequence L has not been determined and, therefore, is
equivalent to the
fraction of the whole sequence that has yet to be determined. Thus, at one-
fold coverage,
approximately 37% of a polynucleotide of size L, in nucleotides has not been
sequenced. When
14 Mb of sequence has been generated, coverage is SX for a 2.8 Mb and the
unsequenced
fraction drops to .0067 or 0.67%. SX coverage of a 2.8 Mb sequence can be
attained by
sequencing approximately 17,000 random clones from both insert ends with an
average sequence
read length of 410 bp.
Similarly, the total gap length, G, is determined by the equation G = Le-m,
and the
average gap size, g, follows the equation, g = L/n. Thus, SX coverage leaves
about 240 gaps
averaging about 82 by in size in a sequence of a polynucleotide 2.8 Mb long.
The treatment above is essentially that of Lander and Waterman, Genomics 2:
231
( 1988).
2. Random Library Construction
In order to approximate the random model described above during actual
sequencing, a
nearly ideal library of cloned genomic fragments is required. The following
library construction
procedure was developed to achieve this end.
T. pallidum DNA is prepared by phenol extraction. A mixture containing 200 ~,g
DNA in
1.0 ml of 300 mM sodium acetate, 10 mM Tris-HCI, 1 mM Na-EDTA, 50% glycerol is
processed through a nebulizer (IPI Medical Products) with a stream of nitrogen
adjusted to 35
Kpa for 2 minutes. The sonicated DNA is ethanol precipitated and redissolved
in 500 ~.l TE
buffer.
To create blunt-ends, a 100 N.1 aliquot of the resuspended DNA is digested
with 5 units of
BAL31 nuclease (New England BioLabs) for 10 min at 30°C in 200 Etl
BAL31 buffer. The
digested DNA is phenol-extracted, ethanol-precipitated, redissolved in 100 N.l
TE buffer, and
then size-fractionated by electrophoresis through a 1.0% low melting
temperature agarose gel.
The section containing DNA fragments 1.6-2.0 kb in size is excised from the
gel, and the LGT
agarose is melted and the resulting solution is extracted with phenol to
separate the agarose from

CA 02296814 1999-12-21
WO 98/59034 PCTNS98/13041
47
the DNA. DNA is ethanol precipitated and redissolved in 20 p,l of TE buffer
for ligation to
vector.
A two-step ligation procedure is used to produce a plasmid library with 97%
inserts, of
which >99% were single inserts. The first ligation mixture (50 ul) contains 2
p.g of DNA
fragments, 2 p,g pUCl$ DNA (Pharmacia) cut with Smal and dephosphorylated with
bacterial
alkaline phosphatase, and 10 units of T4 ligase (GIBCOBRL) and is incubated at
14°C for 4 hr.
The ligation mixture then is phenol extracted and ethanol precipitated, and
the precipitated DNA is
dissolved in 20 p,l TE buffer and electrophoresed on a 1.0% low melting
agarose gel. Discrete
bands in a ladder are visualized by ethidium bromide-staining and UV
illumination and identified
by size as insert {I), vector (v), v+I, v+2i, v+3i, etc. The portion of the
gel containing v+I DNA
is excised and the v+I DNA is recovered and resuspended into 20 ltl TE. The
v+I DNA then is
blunt-ended by T4 polymerase treatment for 5 min. at 37°C in a reaction
mixture (50 ul)
containing the v+I linears, 500 p.Ivl each of the 4 dNTPs, and 9 units of T4
polymerase (New
England BioLabs), under recommended buffer conditions. After phenol extraction
and ethanol
precipitation the repaired v+I linears are dissolved in 20 p.l TE. The final
ligation to produce
circles is carried out in a 50 ~tl reaction containing 5 ltl of v+I linears
and 5 units of T4 ligase at
14°C overnight. After 10 min. at 70°C the following day, the
reaction mixture is stored at -20°C.
This two-stage procedure results in a molecularly random collection of single-
insert
plasmid recombinants with minimal contamination from double-insert chimeras
(<1%) or free
vector (<3%).
Since deviation from randomness can arise from propagation the DNA in the
host, E. coli
host cells deficient in all recombination and restriction functions (A.
Greener, Strategies 3 (1 ):5
( 1990)) are used to prevent rearrangements, deletions, and loss of clones by
restriction.
Furthermore, transformed cells are plated directly on antibiotic diffusion
plates to avoid the usual
broth recovery phase which allows multiplication and selection of the most
rapidly growing cells.
Plating is carried out as follows. A 100 Itl aliquot of Epicurian Coli SURE II
Supercompetent Cells (Stratagene 200152) is thawed on ice and transferred to a
chilled Falcon
2059 tube on ice. A 1.7 l,tl aliquot of 1.42 M beta-mercaptoethanol is added
to the aliquot of cells
to a final concentration of 25 mM. Cells are incubated on ice for 10 min. A 1
lt,l aliquot of the
final ligation is added to the cells and incubated on ice for 30 min. The
cells are heat pulsed for
30 sec. at 42°C and placed back on ice for 2 min. The outgrowth period
in liquid culture is
eliminated from this protocol in order to minimize the preferential growth of
any given
transformed cell. Instead the transformation mixture is plated directly on a
nutrient rich SOB
plate containing a 5 ml bottom layer of SOB agar (5% SOB agar: 20 g tryptone,
5 g yeast extract,
0.5 g NaCI, 1.5% Difco Agar per liter of media). The 5 ml bottom layer is
supplemented with
0.4 ml of 50 mg/ml ampicillin per 100 ml SOB agar. The 15 ml top layer of SOB
agar is
supplemented with 1 ml X-Gal (2%), 1 ml MgCl2 ( 1 M), and 1 ml MgS041100 ml
SOB agar.
The 15 ml top layer is poured just prior to plating. Our titer is
approximately 100 colonies110 E.tl
aliquot of transformation.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
48
All colonies are picked for template preparation regardless of size. Thus,
only clones lost
due to "poison" DNA or deleterious gene products are deleted from the library,
resulting in a
slight increase in gap number over that expected.
3. Random DNA Sequencing
High quality double stranded DNA plasmid templates are prepared using a
"boiling bead"
method developed in collaboration with Advanced Genetic Technology Corp.
(Gaithersburg,
MD) (Adams et al., Science 252:1651 ( 1991 ); Adams et al., Nature 355: 632 (
1992)). Plasmid
preparation is performed in a 96-well format for all stages of DNA preparation
from bacterial
growth through final DNA purification. Template concentration is determined
using Hoechst
Dye and a Millipore Cytofluor. DNA concentrations are not adjusted, but low-
yielding templates
are identified where possible and not sequenced.
Templates are also prepared from two T. pallidum lambda genomic libraries. An
amplified library is constructed in the vector Lambda GEM-12 (Promega) and an
unamplified
library is constructed in Lambda DASH II (Stratagene). In particular, for the
unamplified lambda
library, T. pallidum DNA (> 100 kb) is partially digested in a reaction
mixture (200 ul)
containing 50 l.t,g DNA, 1X Sau3AI buffer, 20 units Sau3AI for 6 min. at
23°C. The digested
DNA was phenol-extracted and electrophoresed on a 0.5% low melting agarose gel
at 2V/cm for
7 hours. Fragments from 15 to 25 kb are excised and recovered in a final
volume of 6 ul. One
p,l of fragments is used with 1 p,l of DASHII vector (Stratagene) in the
recommended ligation
reaction. One E,tl of the ligation mixture is used per packaging reaction
following the
recommended protocol with the Gigapack II XL Packaging Extract (Stratagene,
#227711).
Phage are plated directly without amplification from the packaging mixture
(after dilution with
500 ltl of recommended SM buffer and chloroform treatment). Yield is about
2.5x103 pfu/ul.
The amplified library is prepared essentially as above except the lambda GEM-
12 vector is used.
After packaging, about 3.5x104 pfu are plated on the restrictive NM539 host.
The lysate is
harvested in 2 ml of SM buffer and stored frozen in 7% dimethylsulfoxide. The
phage titer is
approximately 1 x 109 pfu/ml.
Liquid lysates ( 100 ~tl) are prepared from randomly selected plaques (from
the
unamplified library) and template is prepared by long-range PCR using T7 and
T3 vector-specific
primers.
Sequencing reactions are carried out on plasmid and/or PCR templates using the
AB
Catalyst LabStation with Applied Biosystems PRISM Ready Reaction Dye Primer
Cycle
Sequencing Kits for the M 13 forward (M 13-21 ) and the M 13 reverse (M 13RP 1
) primers (Adams
et al., Nature 368:474 ( 1994)). Dye terminator sequencing reactions are
carned out on the
lambda templates on a Perkin-Elmer 9600 Thermocycler using the Applied
Biosystems Ready
Reaction Dye Terminator Cycle Sequencing kits. T7 and SP6 primers are used to
sequence the
ends of the inserts from the Lambda GEM-12 library and T7 and T3 primers are
used to sequence
the ends of the inserts from the Lambda DASH II library. Sequencing reactions
are performed

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
49
by eight individuals using an average of fourteen AB 373 DNA Sequencers per
day. All
sequencing reactions are analyzed using the Stretch modification of the At3
373, primarily using a
34 cm well-to-read distance. The overall sequencing success rate very
approximately is about
85% for M13-21 and M13RP1 sequences and 65% for dye-terminator reactions. The
average
usable read length is 485 by for M13-21 sequences, 445bp for M13RP1 sequences,
and 375 by
for dye-terminator reactions.
Richards et al., Chapter 28 in AUTOMATED DNA SEQUENCING AND ANALYSIS,
M. D. Adams, C. Fields, J. C. Venter, Eds., Academic Press, London, ( 1994)
described the
value of using sequence from both ends of sequencing templates to facilitate
ordering of contigs
in shotgun assembly projects of lambda and cosmid clones. We balance the
desirability of both-
end sequencing (including the reduced cost of lower total number of templates)
against shorter
read-lengths for sequencing reactions performed with the M13RP1 (reverse)
primer compared to
the M 13-21 (forward) primer. Approximately one-half of the templates are
sequenced from both
ends. Random reverse sequencing reactions are done based on successful forward
sequencing
reactions. Some M13RP1 sequences are obtained in a semi-directed fashion: M13-
21: seguences
pointing outward at the ends of contigs are chosen for M13RP1 sequencing in an
effort to
specifically order contigs.
4. Protocol for Automated Cycle Sequencing
The sequencing is carried out using ABI Catalyst robots and AB 373 Automated
DNA
Sequencers. The Catalyst robot is a publicly available sophisticated pipetting
and temperature
control robot which has been developed specifically for DNA sequencing
reactions. The Catalyst
combines pre-aliquoted templates and reaction mixes consisting of deoxy- and
dideoxynucleotides, the thermostable Taq DNA polymerase, fluorescently-
labelled sequencing
primers, and reaction buffer. Reaction mixes and templates are combined in the
wells of an
aluminum 96-well thermocycling plate. Thirty consecutive cycles of linear
amplification (i.e..,
one primer synthesis) steps are performed including denaturation, annealing of
primer and
template, and extension; i.e., DNA synthesis. A heated lid with rubber gaskets
on the
thermocycling plate prevents evaporation without the need for an oil overlay.
Two sequencing protocols are used: one for dye-labelled primers and a second
for dye-
labelled dideoxy chain terminators. The shotgun sequencing involves use of
four dye-labelled
sequencing primers, one for each of the four terminator nucleotide. Each dye-
primer is labelled
with a different fluorescent dye, permitting the four individual reactions to
be combined into one
lane of the 373 DNA Sequencer for electrophoresis, detection, and base-
calling. ABI currently
supplies pre-mixed reaction mixes in bulk packages containing all the
necessary non-template
reagents for sequencing. Sequencing can be done with both plasmid and PCR-
generated
templates with both dye-primers and dye- terminators with approximately equal
fidelity, although
plasmid templates generally give longer usable sequences.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
Thirty-two reactions are loaded per AB373 Sequences each day, for a total of
960
samples. Electrophoresis is run overnight following the manufacturer's
protocols, and the data is
collected for twelve hours. Following electrophoresis and fluorescence
detection, the ABI 373
performs automatic lane tracking and base-calling. The lane-tracking is
confirmed visually. Each
5 sequence electropherogram (or fluorescence lane trace) is inspected visually
and assessed for
quality. Trailing sequences of low quality are removed and the sequence itself
is loaded via
software to a Sybase database (archived daily to 8mm tape). Leading vector
polylinker sequence
is removed automatically by a software program. Average edited lengths of
sequences from the
standard ABI 373 are around 400 by and depend mostly on the quality of the
template used for
10 the sequencing reaction. ABI 373 Sequencers converted to Stretch Liners
provide a longer
electrophoresis path prior to fluorescence detection and increase the average
number of usable
bases to 500-600 bp.
INFORMATICS
15 1. Data Management
A number of information management systems for a large-scale sequencing lab
have been
developed. (For review see, for instance, Kerlavage et al., Proceedings of the
Twenty-Sixth
Annual Hawaii International Conference on System Sciences, IEEE Computer
Society Press,
Washington D. C., 585 ( 1993)) The system used to collect and assemble the
sequence data was
20 developed using the Sybase relational database management system and was
designed to
automate data flow wherever possible and to reduce user error. The database
stores and
correlates all information collected during the entire operation from template
preparation to final
analysis of the genome. Because the raw output of the ABI 373 Sequencers was
based on a
Macintosh platform and the data management system chosen was based on a Unix
platform, it
25 was necessary to design and implement a variety of mufti- user, client-
server applications which
allow the raw data as well as analysis results to flow seamlessly into the
database with a
minimum of user effort.
2. Assembly
30 An assembly engine (TIGR Assembler) developed for the rapid and accurate
assembly of
thousands of sequence fragments was employed to generate contigs. The TIGR
assembler
simultaneously clusters and assembles fragments of the genome. In order to
obtain the speed
necessary to assemble more than 104 fragments, the algorithm builds a hash
table of 12 by
oligonucleotide subsequences to generate a list of potential sequence fragment
overlaps. The
35 number of potential overlaps for each fragment determines which fragments
are likely to fall into
repetitive elements. Beginning with a single seed sequence fragment, TIGR
Assembler extends
the current contig by attempting to add the best matching fragment based on
oligonucleotide
content. The contig and candidate fragment are aligned using a modified
version of the Smith-
Waterman algorithm which provides for optimal gapped alignments (Waterman, M.
S., Methods

CA 02296814 1999-12-21
WO 98/59034 PC1'/US98I13041
51
in Enzymology 164: 765 ( 1988)). The contig is extended by the fragment only
if strict criteria for
the quality of the match are met. The match criteria include the minimum
length of overlap, the
maximum length of an unmatched end, and the minimum percentage match. These
criteria are
automatically lowered by the algorithm in regions of minimal coverage and
raised in regions with
a possible repetitive element. The number of potential overlaps for each
fragment determines
which fragments are likely to fall into repetitive elements. Fragments
representing the boundaries
of repetitive elements and potentially chimeric fragments are often rejected
based on partial
mismatches at the ends of alignments and excluded from the current contig.
TIGR Assembler is
designed to take advantage of clone size information coupled with sequencing
from both ends of
each template. It enforces the constraint that sequence fragments from two
ends of the same
template point toward one another in the contig and are located within a
certain range of base
pairs (definable for each clone based on the known clone size range for a
given library).
The process resulted in 744 contigs as represented by SEQ ID NOs:l-744.
3. Identifying Genes
The predicted coding regions of the T. pallidum genome were initially defined
with the
program GeneMark, which finds ORFs using a probabilistic classification
technique. The
predicted coding region sequences were used in searches against a database of
all nucleotide
sequences from GenBank (June, 1997), using the BLASTN search method to
identify overlaps
of 50 or more nucleotides with at least a 95% identity. Those ORFs with
nucleotide sequence
matches are shown in Table 1. The ORFs without such matches were translated to
protein
sequences and compared to a non-redundant database of known proteins generated
by combining
the Swiss-prot, PIR and GenPept databases. ORFs that matched a database
protein with
BLASTP probability less than or equal to 0.01 are shown in Table 2. The table
also lists
assigned functions based on the closest match in the databases. ORFs that did
not match protein
or nucleotide sequences in the databases at these levels are shown in Table 3.
ILLUSTRATIVE APPLICATIONS
1. Production of an Antibody to a T. pallidum Protein
Substantially pure protein or polypeptide is isolated from the transfected or
transformed
cells using any one of the methods known in the art. The protein can also be
produced in a
recombinant prokaryotic expression system, such as E. coli, or can be
chemically synthesized.
Concentration of protein in the final preparation is adjusted, for example, by
concentration on an
Amicon filter device, to the level of a few micrograms/ml. Monoclonal or
polyclonal antibody to
the protein can then be prepared as follows.
2. Monoclonal Antibody Production by Hybridoma Fusion
Monoclonal antibody to epitopes of any of the peptides identified and isolatcd
as
described can be prepared from marine hybridomas according to the classical
method of Kohler,

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
52
G. and Milstein, C., Nature 256:495 (1975) or modifications of the methods
thereof. Briefly, a
mouse is repetitively inoculated with a few micrograms of the selected protein
over a period of a
few weeks. The mouse is then sacrificed, and the antibody producing cells of
the spleen
isolated. The spleen cells are fused by means of polyethylene glycol with
mouse myeloma cells,
and the excess unfused cells destroyed by growth of the system on selective
media comprising
aminopterin (HAT media). The successfully fused cells are diluted and aliquots
of the dilution
placed in wells of a microtiter plate where growth of the culture is
continued. Antibody-
producing clones are identified by detection of antibody in the supernatant
fluid of the wells by
immunoassay procedures, such as ELISA, as originally described by Engvall, E.,
Meth.
EnZymol. 70: 419 ( 1980), and modified methods thereof. Selected positive
clones can be
expanded and their monoclonal antibody product harvested for use. Detailed
procedures for
monoclonal antibody production are described in Davis, L. et al., Basic
Methods in Molecular
Biology, Elsevier, New York. Section 21-2 (1989).
3. Polyclonal Antibody Production by Immunization
Polyclonal antiserum containing antibodies to heterogenous epitopes of a
single protein
can be prepared by immunizing suitable animals with the expressed protein
described above,
which can be unmodified or modified to enhance immunogenicity. Effective
polyclonal antibody
production is affected by many factors related both to the antigen and the
host species. For
example, small molecules tend to be less immunogenic than others and may
require the use of
carriers and adjuvant. Also, host animals vary in response to site of
inoculations and dose, with
both inadequate or excessive doses of antigen resulting in low titer antisera.
Small doses (ng
level) of antigen administered at multiple intradermal sites appears to be
most reliable. An
effective immunization protocol for rabbits can be found in Vaitukaitis, J. et
al., J. Clin.
Endocrinol. Metab. 33:988-991 ( 1971 ).
Booster injections can be given at regular intervals, and antiserum harvested
when
antibody titer thereof, as determined semi-quantitatively, for example, by
double
immunodiffusion in agar against known concentrations of the antigen, begins to
fall. See, for
example, Ouchterlony, O. et al., Chap. 19 in: Handbook of Experimental
Immunology, Wier,
D., ed, Blackwell (1973). Plateau concentration of antibody is usually in the
range of 0.1 to 0.2
mg/ml of serum (about 12M). Affinity of the antisera for the antigen is
determined by preparing
competitive binding curves, as described, for example, by Fisher, D., Chap. 42
in: Manual of
Clinical Immunology, second edition, Rose and Friedman, eds., Amer. Soc. For
Microbiology,
Washington, D. C. ( 1980)
Antibody preparations prepared according to either protocol are useful in
quantitative
immunoassays which determine concentrations of antigen-bearing substances in
biological
samples; they are also used semi- quantitatively or qualitatively to identify
the presence of antigen
in a biological sample. In addition, antibodies are useful in various animal
models of
pneumococcal disease as a means of evaluating the protein used to make the
antibody as a

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
53
potential vaccine target or as a means of evaluating the antibody as a
potential immunotherapeutic
or immunoprophylactic reagent.
4. Preparation of PCR Primers and Amplification of DNA
Various fragments of the T. pallidum genome, such as those of Tables 1-3 and
SEQ 1D
NOS: 1-744 can be used, in accordance with the present invention, to prepare
PCR primers for a
variety of uses. The PCR primers are preferably at least 15 bases, and more
preferably at least
18 bases in length. When selecting a primer sequence, it is preferred that the
primer pairs have
approximately the same G/C ratio, so that melting temperatures are
approximately the same. The
PCR primers and amplified DNA of this Example find use in the Examples that
follow.
5. Isolation of a Selected DNA Clone From T. pallidum
Three approaches are used to isolate a T. pallidum clone comprising a
polynucleodde of
the present invention from any T. pallidum genomic DNA library. The T.
pallidum strain
B31PU has been deposited as a convienent source for obtaining a T. pallidum
strain although a
wide vanity of strains T. pallidum strains can be used which are known in the
art.
T. pallidum genomic DNA is prepared using the following method. A 20m1
overnight
bacterial culture grown in a rich medium (e.g., Trypticase Soy Broth, Brain
Heart Infusion broth
or Super broth), pelleted, fished two times with TES (30mM Tris-pH 8.0, 25mM
EDTA, 50mM
NaCI), and resuspended in 5m1 high salt TES (2.SM NaCI). Lysostaphin is added
to ~fmal
concentration of approx 50ug/ml and the mixture is rotated slowly 1 hour at
37C to make
protoplast cells. The solution is then placed in incubator (or place in a
shaking water bath) and
warmed to 55C. Five hundred micro liter of 20% sarcosyl in TES (final
concentration 2%) is
then added to lyse the cells. Next, guanidine HCl is added to a final
concentration of 7M (3.698
in 5.5 ml). The mixture is swirled slowly at 55C for 60-90 min (solution
should clear). A CsCI
gradient is then set up in SW41 ultra clear tubes using 2.Om15.7M CsCI and
overlaying with
2.85M CsCI. The gradient is carefully overlayed with the DNA-containing GuHCI
solution.
The gradient is spun at 30,000 rpm, 20C for 24 hr and the lower DNA band is
collected. The
volume is increased to 5 ml with TE buffer. The DNA is then treated with
protease K ( 10 ug/ml)
overnight at 37 C, and precipitated with ethanol. The precipitated DNA is
resuspended in a
desired buffer.
In the first method, a plasmid is directly isolated by screening a plasmid T.
pallidum
genomic DNA library using a polynucleotide probe corresponding to a
polynucleotide of the
present invention. Particularly, a specific polynucleotide with 30-40
nucleotides is synthesized
using an Applied Biosystems DNA synthesizer according to the sequence
reported. The
oligonucleodde is labeled, for instance, with 32P-~-ATP using T4
polynucleotide kinase and
purified according to routine methods. (See, e.g., Maniatis et al., Molecular
Cloning: A
Laboratory Manual, Cold Spring Harbor Press, Cold Spring, NY ( 1982).) The
library is

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
54
transformed into a suitable host, as indicated above (such as XL-1 Blue
(Stratagene)) using
techniques known to those of skill in the art. See, e.g., Sambrook et al.
MOLECULAR
CLONING: A LABORATORY MANUAL (Cold Spring Harbor, N.Y. 2nd ed. 1989); Ausubel
et al., CURRENT PROTOCALS IN MOLECULAR BIOLOGY (John Wiley and Sons, N.Y.
1989). The transformants are plated on 1.5% agar plates (containing the
appropriate selection
agent, e.g., ampicillin) to a density of about 150 transformants (colonies)
per plate. These plates
are screened using Nylon membranes according to routine methods for bacterial
colony
screening. See, e.g., Sambrook et al. MOLECULAR CLONING: A LABORATORY
MANUAL (Cold Spring Harbor, N.Y. 2nd ed. 1989); Ausubel et al., CURRENT
PROTOCALS
IN MOLECULAR BIOLOGY (John Wiley and Sons, N.Y. 1989) or other techniques
known to
those of skill in the art.
Alternatively, two primers of 15-25 nucleotides derived from the 5' and 3'
ends of a
polynucleotide of SEQ m NOS:1-744 are synthesized and used to amplify the
desired DNA by
PCR using a T. pallidum genomic DNA prep as a template. PCR is carried out
under routine
conditions, for instance, in 25 N,1 of reaction mixture with 0.5 ug of the
above DNA template. A
convenient reaction mixture is 1.5-5 mM MgCl2, 0.01 % (w/v} gelatin, 20 u,M
each of dATP,
dCTP, dGTP, dTTP, 25 pmol of each primer and 0.25 Unit of Taq polymerase.
Thirty five
cycles of PCR (denaturation at 94°C for 1 min; annealing at 55°C
for 1 min; elongation at 72°C
for 1 min) are performed with a Perkin-Elmer Cetus automated thermal cycier.
The amplified
product is analyzed by agarose gel electrophoresis and the DNA band with
expected molecular
weight is excised and purified. The PCR product is verified to be the selected
sequence by
subcloning and sequencing the DNA product.
Finally, overlapping oligos of the DNA sequences of SEQ ID NOS:1-744 can be
chemically synthesized and used to generate a nucleotide sequence of desired
length using PCR
methods known in the art.
6(a). Expression and Purification Borrelia polypeptides in E. coli
The bacterial expression vector pQE60 is used for bacterial expression of some
of the
polypeptide fragements of the present invention. (QIAGEN, Inc., 9259 Eton
Avenue,
Chatsworth, CA, 91311 ). pQE60 encodes ampicillin antibiotic resistance
("Ampr") and contains
a bacterial origin of replication ("ori"), an IPTG inducible promoter, a
ribosome binding site
("RBS"), six codons encoding histidine residues that allow affinity
purification using nickel-
nitrilo-tri-acetic acid ("Ni-NTA") affinity resin (QIAGEN, Inc., supra) and
suitable single
restriction enzyme cleavage sites. These elements are arranged such that an
inserted DNA
fragment encoding a polypeptide expresses that polypeptide with the six His
residues (i.e., a "6
X His tag") covalently linked to the carboxyl terminus of that polypeptide.
The DNA sequence encoding the desired portion of a T. pallidum protein of the
present
invention is amplified from T. pallidum genomic DNA using PCR oligonucleodde
primers

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
which anneal to the 5' and 3' sequences coding for the portions of the T.
pallidum polynucleotide
shown in SEQ ID NOS:1-744. Additional nucleotides containing restriction sites
to facilitate
cloning in the pQE60 vector are added to the 5' and 3' sequences,
respectively.
For cloning the mature protein, the 5' primer has a sequence containing an
appropriate
5 restriction site followed by nucleotides of the amino terminal coding
sequence of the desired T.
pallidum polynucleotide sequence in SEQ ID NOS:1-744. One of ordinary skill in
the art would
appreciate that the point in the protein coding sequence where the 5' and 3'
primers begin may be
varied to amplify a DNA segment encoding any desired portion of the complete
protein shorter or
longer than the mature form. The 3' primer has a sequence containing an
appropriate restriction
10 site followed by nucleotides complementary to the 3' end of the polypeptide
coding sequence of
SEQ ID NOS:1-744, excluding a stop codon, with the coding sequence aligned
with the
restriction site so as to maintain its reading frame with that of the six His
codons in the pQE60
vector.
The amplified T. pallidum DNA fragment and the vector pQE60 are digested with
1 S restriction enzymes which recognize the sites in the primers and the
digested DNAs are then
ligated together. The T. pallidum DNA is inserted into the restricted pQE60
vector in a manner
which places the T. pallidum protein coding region downstream from the IPTG-
inducible
promoter and in-frame with an initiating AUG and the six histidine codons.
The ligation mixture is transformed into competent E. toll cells using
standard procedures
20 such as those described by Sambrook et al., supra.. E. toll strain M
l5lrep4, containing multiple
copies of the plasmid pREP4, which expresses the lac repressor and confers
kanamycin
resistance ("Kanr"), is used in carrying out the illustrative example
described herein. This strain,
which is only one of many that are suitable for expressing a T. pallidum
polypeptide, is available
commercially (QIAGEN, Inc., supra). Transformants are identified by their
ability to grow on
25 LB agar plates in the presence of ampicillin and kanamycin. Plasmid DNA is
isolated from
resistant colonies and the identity of the cloned DNA confirmed by restriction
analysis, PCR and
DNA sequencing.
Clones containing the desired constructs are grown overnight ("O/N") in liquid
culture in
LB media supplemented with both ampicillin (100 ~,g/ml) and kanamycin (25
~tg/ml). The O/N
30 culture is used to inoculate a large culture, at a dilution of
approximately 1:25 to 1:250. The cells
are grown to an optical density at 600 nm ("OD600") of between 0.4 and 0.6.
Isopropyl-[3-D-
thiogalactopyranoside ("IPTG") is then added to a final concentration of 1 mM
to induce
transcription from the lac repressor sensitive promoter, by inactivating the
IacI repressor. Cells
subsequently are incubated further for 3 to 4 hours. Cells then are harvested
by centrifugation.
35 The cells are then stirred for 3-4 hours at 4°C in bM guanidine-HCl,
pH 8. The cell
debris is removed by centrifugation, and the supernatant containing the T.
pallidum polypeptide
is loaded onto a nickel-nitrilo-tri-acetic acid {"Ni-NTA") affinity resin
column (QIAGEN, Inc.,
supra). Proteins with a 6 x His tag bind to the Ni-NTA resin with high
affinity are purified in a

CA 02296814 1999-12-21
WO 98/59034 PCTNS98/13041
56
simple one-step procedure (for details see: The QIAexpressionist, 1995,
QIAGEN, Inc., supra).
Briefly the supernatant is loaded onto the column in 6 M guanidine-HCI, pH 8,
the column is
first washed with 10 volumes of 6 M guanidine-HCI, pH 8, then washed with 10
volumes of 6
M guanidine-HCl pH 6, and finally the T. pallidum polypeptide is eluted with 6
M guanidine-
HCI, pH 5.
The purified protein is then renatured by dialyzing it against phosphate-
buffered saline
(PBS) or 50 mM Na-acetate, pH 6 buffer plus 200 mM NaCI. Alternatively, the
protein could be
successfully refolded while immobilized on the Ni-NTA column. The recommended
conditions
are as follows: renature using a linear 6M-1M urea gradient in 500 mM NaCI,
20% glycerol, 20
mM Tris/HCl pH 7.4, containing protease inhibitors. The renaturation should be
performed over
a period of 1.5 hours or more. After renaturation the proteins can be eluted
by the addition of
250 mM immidazole. Immidazole is removed by a final dialyzing step against PBS
or SO mM
sodium acetate pH 6 buffer plus 200 mM NaCI. The purified protein is stored at
4° C or frozen at
-80° C.
The polypeptide of the present invention are also prepared using a non-
denaturing protein
purification method. For these polypeptides, the cell pellet from each liter
of culture is
resuspended in 25 mls of Lysis Buffer A at 4°C (Lysis Buffer A = 50 mM
Na-phosphate, 300
mM NaCI, 10 mM 2-mercaptoethanol, 10% Glycerol, pH 7.5 with 1 tablet of
Complete ED'TA-
free protease inhibitor cocktail (Boehringer Mannheim #1873580) per 50 ml of
buffer).
Absorbance at 550 nm is approximately 10-20 O.D./ml. The suspension is then
put through
three freeze/thaw cycles from -70°C (using a ethanol-dry ice bath) up
to room temperature. The
cells are lysed via sonication in short 10 sec bursts over 3 minutes at
approximately 80W while
kept on ice. The sonicated sample is then centrifuged at 15,000 RPM for 30
minutes at 4°C. The
supernatant is passed through a column containing 1.0 ml of CL-4B resin to pre-
clear the sample
of any proteins that may bind to agarose non-specifically, and the flow-
through fraction is
collected.
The pre-cleared flow-through is applied to a nickel-nitrilo-tri-acetic acid
("Ni-NTA")
affinity resin column {Quiagen, Inc., supra). Proteins with a 6 X His tag bind
to the Ni-NTA
resin with high affinity and can be purified in a simple one-step procedure.
Briefly, the
supernatant is loaded onto the column in Lysis Buffer A at 4°C, the
column is first washed with
10 volumes of Lysis Buffer A until the A280 of the eluate returns to the
baseline. Then, the
column is washed with 5 volumes of 40 mM Imidazole (92% Lysis Buffer A / 8%
Buffer B)
(Buffer B = 50 mM Na-Phosphate, 300 mM NaCI, 10% Glycerol, 10 mM 2-
mercaptoethanol,
500 mM Imidazole, pH of the final buffer should be 7.5). The protein is eluted
off of the column
with a series of increasing Imidazole solutions made by adjusting the ratios
of Lysis Buffer A to
Buffer B. Three different concentrations are used: 3 volumes of 75 mM
Imidazole, 3 volumes of
150 mM Imidazole, 5 volumes of 500 mM Imidazole. The fractions containing the
purified
protein are analyzed using 8 %, 10- % or 14% SDS-PAGE depending on the protein
size. The
purified protein is then dialyzed 2X against phosphate-buffered saline (PBS)
in order to place it

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
57
into an easily workable buffer. The purified protein is stored at 4° C
or frozen at -80°.
The following alternative method may be used to purify T. pallidum expressed
in E coli
when it is present in the form of inclusion bodies. Unless otherwise
specified, all of the
following steps are conducted at 4-10°C.
Upon completion of the production phase of the E. coli fermentation, the cell
culture is
cooled to 4-10°C and the cells are harvested by continuous
centrifugation at 15,000 rpm
(Heraeus Sepatech). On the basis of the expected yield of protein per unit
weight of cell paste
and the amount of purified protein required, an appropriate amount of cell
paste, by weight, is
suspended in a buffer solution containing 100 mM Tris, 50 mM EDTA, pH 7.4. The
cells are
dispersed to a homogeneous suspension using a high shear mixer.
The cells are then lysed by passing the solution through a microfluidizer
(Microfuidics,
Corp. or APV Gaulin, Inc.) twice at 4000-6000 psi. The homogenate is then
mixed with NaCI
solution to a final concentration of 0.5 M NaCI, followed by centrifugation at
7000 x g for 15
min. The resultant pellet is washed again using 0.5M NaCI, 100 mM Tris, 50 mM
EDTA, pH
7.4.
The resulting washed inclusion bodies are solubilized with 1.5 M guanidine
hydrochloride (GuHCI) for 2-4 hours. After 7000 x g centrifugation for 15
min., the pellet is
discarded and the T. pallidum polypeptide-containing supernatant is incubated
at 4°C overnight to
allow further GuHCI extraction.
Following high speed centrifugation (30,000 x g) to remove insoluble
particles, the
GuHCI solubilized protein is refolded by quickly mixing the GuHCI extract with
20 volumes of
buffer containing 50 mM sodium, pH 4.5, 150 mM NaCI, 2 mM EDTA by vigorous
stirnng.
The refolded diluted protein solution is kept at 4°C without mixing for
12 hours prior to further
purification steps.
To clarify the refolded T. pallidum polypeptide solution, a previously
prepared tangential
filtration unit equipped with 0.16 ~m membrane filter with appropriate surface
area (e.g.,
Filtron), equilibrated with 40 mM sodium acetate, pH 6.0 is employed. The
filtered sample is
loaded onto a cation exchange resin (e.g., Poros HS-50, Perseptive
Biosystems). The column is
washed with 40 mM sodium acetate, pH 6.0 and eluted with 250 mM, 500 mM, 1000
mM, and
1500 mM NaCI in the same buffer, in a stepwise manner. The absorbance at 280
mm of the
effluent is continuously monitored. Fractions are collected and further
analyzed by SDS-PAGE.
Fractions containing the T. pallidum polypeptide are then pooled and mixed
with 4
volumes of water. The diluted sample is then loaded onto a previously prepared
set of tandem
columns of strong anion (Poros HQ-50, Perseptive Biosystems) and weak anion
(Poros CM-20,
Perceptive Biosystems) exchange resins. The columns are equilibrated with 40
mM sodium
acetate, pH 6Ø Both columns are washed with 40 mM sodium acetate, pH 6.0,
200 mM NaCI.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
58
The CM-20 column is then eluted using a 10 column volume linear gradient
ranging from 0.2 M
NaCI, 50 mM sodium acetate, pH 6.0 to 1.0 M NaCI, 50 mM sodium acetate, pH
6.5. Fractions
are collected under constant Ago monitoring of the effluent. Fractions
containing the T. pallidum
polypeptide (determined, for instance, by 16% SDS-PAGE) are then pooled.
The resultant T. pallidum polypeptide exhibits greater than 95% purity after
the above
refolding and purification steps. No major contaminant bands are observed from
Commassie
blue stained 16% SDS-PAGE gel when 51..~,g of purified protein is loaded. The
purified protein
is also tested for endotoxin/LPS contamination, and typically the LPS content
is less than 0.1
ng/ml according to LAL assays.
6(b). Alternative Expression and Purification Borrelia polypeptides in E.
coli
Tthe vector pQE 10 is alternatively used to clone and express some of the
polypeptides of
the present invention for use in the soft tissue and systemic infection models
discussed below.
The difference being such that an inserted DNA fragment encoding a polypeptide
expresses that
polypeptide with the six His residues (i.e., a "6 X His tag") covalently
linked to the amino
terminus of that polypeptide. The bacterial expression vector pQElO (QIAGEN,
Inc., 9259 Eton
Avenue, Chatsworth, CA, 91311) was used in this example . The components of
the pQElO
plasmid are arranged such that the inserted DNA sequence encoding a
polypeptide of the present
invention expresses the polypepdde with the six His residues (i.e., a "6 X His
tag"}) covalently
linked to the amino terminus.
The DNA sequences encoding the desired portions of a polypeptide of SEQ ID
NOS:1-
744 were amplified using PCR oligonucleotide primers from genomic T. pallidum
DNA. The
PCR primers anneal to the nucleotide sequences encoding the desired amino acid
sequence of a
polypeptide of the present invention. Additional nucleotides containing
restriction sites to
facilitate cloning in the pQElO vector were added to the 5' and 3' primer
sequences, respectively.
For cloning a polypeptide of the present invention, the 5' and 3' primers were
selected to
amplify their respective nucleotide coding sequences. One of ordinary skill in
the art would
appreciate that the point in the protein coding sequence where the 5' and 3'
primers begins may
be varied to amplify a DNA segment encoding any desired portion of a
polypeptide of the present
invention. The 5' primer was designed so the coding sequence of the 6 X His
tag is aligned with
the restriction site so as to maintain its reading frame with that of T.
pallidum polypeptide. The
3' was designed to include an stop codon. The amplified DNA fragment was then
cloned, and
the protein expressed, as described above for the pQE60 plasmid.
The DNA sequences encoding the amino acid sequences of SEQ ID NOS:1-744 may
also
be cloned and expressed as fusion proteins by a protocol similar to that
described directly above,
wherein the pET-32b(+) vector (Novagen, 601 Science Drive, Madison, WI 53711)
is
preferentially used in place of pQE 10.

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
59
The above methods are not limited to the polypeptide fragements actually
produced. The
above method, like the methods below, can be used to produce either full
length polypeptides or
desired fragements therof.
S b(c). Alternative Expression and Purification of Borrelia polypeptides in
E. coli
The bacterial expression vector pQE60 is used for bacterial expression in this
example
(QIAGEN, Inc., 9259 Eton Avenue, Chatsworth, CA, 91311 ). However, in this
example, the
polypeptide coding sequence is inserted such that translation of the six His
codons is prevented
and, therefore, the polypeptide is produced with no b X His tag.
The DNA sequence encoding the desired portion of the T. pallidum amino acid
sequence
is amplified from an T. pallidum genomic DNA prep the deposited DNA clones
using PCR
oligonucleotide primers which anneal to the 5' and 3' nucleotide sequences
corresponding to the
desired portion of the T. pallidum polypeptides. Additional nucleotides
containing restriction
sites to facilitate cloning in the pQE60 vector are added to the 5' and 3'
primer sequences.
For cloning a T. pallidum polypeptides of the present invention, 5' and 3'
primers are
selected to amplify their respective nucleotide coding sequences. One of
ordinary skill in the art
would appreciate that the point in the protein coding sequence where the 5'
and 3' primers begin
may be varied to amplify a DNA segment encoding any desired portion of a
polypepdde of the
present invention. The 3' and 5' primers contain appropriate restriction sites
followed by
nucleotides complementary to the 5' and 3' ends of the coding sequence
respectively. The 3'
primer is additionally designed to include an in-frame stop codon.
The amplified T. pallidum DNA fragments and the vector pQE60 are digested with
restriction enzymes recognizing the sites in the primers and the digested DNAs
are then ligated
together. Insertion of the T. pallidum DNA into the restricted pQE60 vector
places the T.
palluium protein coding region including its associated stop codon downstream
from the IPTG-
inducible promoter and in-frame with an initiating AUG. The associated stop
codon prevents
translation of the six hisddine codons downstream of the insertion point.
The ligation mixture is transformed into competent E. coli cells using
standard procedures
such as those described by Sambrook et al. E. coli strain M15/rep4, containing
multiple copies
of the plasmid pREP4, which expresses the !ac repressor and confers kanamycin
resistance
("Kanr"), is used in carrying out the illustrative example described herein.
This strain, which is
only one of many that are suitable for expressing T. pallidum polypeptide, is
available
commercially (QIAGEN, Inc., supra). Transformants are identified by their
ability to grow on
LB plates in the presence of ampicillin and kanamycin. Plasmid DNA is isolated
from resistant
colonies and the identity of the cloned DNA confirmed by restriction analysis,
PCR and DNA
sequencing.
Clones containing the desired constructs are grown overnight ("O/N") in liquid
culture in
LB media supplemented with both ampicillin ( 100 ~,glml) and kanamycin (25
ug/ml). The O/N

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
culture is used to inoculate a large culture, at a dilution of approximately
1:25 to 1:250. The cells
are grown to an optical density at 600 nm ("OD600") of between 0.4 and 0.6.
isopropyl-b-D-
thiogalactopyranoside ("IPTG") is then added to a final concentration of 1 mM
to induce
transcription from the lac repressor sensitive promoter, by inactivating the
IacI repressor. Cells
5 subsequently are incubated further for 3 to 4 hours. Cells then are
harvested by centrifugation.
To purify the T. pallidum polypeptide, the cells are then stirred for 3-4
hours at 4°C in
6M guanidine-HCI, pH 8. The cell debris is removed by centrifugation, and the
supernatant
containing the T. pallidum polypeptide is dialyzed against 50 mM Na-acetate
buffer pH 6,
supplemented with 200 mM NaCI. Alternatively, the protein can be successfully
refolded by
10 dialyzing it against 500 mM NaCI, 20% glycerol, 25 mM Tris/HCI pH 7.4,
containing protease
inhibitors. After renaturation the protein can be purified by ion exchange,
hydrophobic
interaction and size exclusion chromatography. Alternatively, an affinity
chromatography step
such as an antibody column can be used to obtain pure T. pallidum polypeptide.
The purified
protein is stored at 4° C or frozen at -80° C.
15 The following alternative method may be used to purify T. pallidum
polypeptides
expressed in E coli when it is present in the form of inclusion bodies. Unless
otherwise
specified, all of the following steps are conducted at 4-10°C.
Upon completion of the production phase of the E. coli fermentation, the cell
culture is
cooled to 4-10°C and the cells are harvested by continuous
centrifugation at 15,000 rpm
20 (Heraeus Sepatech). On the basis of the expected yield of protein per unit
weight of cell paste
and the amount of purified protein required, an appropriate amount of cell
paste, by weight, is
suspended in a buffer solution containing 100 mM Tris, 50 mM EDTA, pH 7.4. The
cells are
dispersed to a homogeneous suspension using a high shear mixer.
The cells ware then lysed by passing the solution through a microfluidizer
(Microfuidics,
25 Corp. or APV Gaulin, Inc.) twice at 4000-6000 psi. The homogenate is then
mixed with NaCI
solution to a final concentration of 0.5 M NaCI, followed by centrifugation at
7000 x g for 15
min. The resultant pellet is washed again using O.SM NaCI, 100 mM Tris, 50 mM
EDTA, pH
7.4.
The resulting washed inclusion bodies are solubilized with 1.5 M guanidine
30 hydrochloride (GuHCI) for 2-4 hours. After 7000 x g centrifugation for 15
min., the pellet is
discarded and the T. pallidum polypeptide-containing supernatant is incubated
at 4°C overnight to
allow further GuHCI extraction.
Following high speed centrifugation (30,000 x g) to remove insoluble
particles, the
GuHCI solubilized protein is refolded by quickly mixing the GuHCI extract with
20 volumes of
35 buffer containing 50 mM sodium, pH 4.5, 1 SO mM NaCI, 2 mM EDTA by vigorous
stirring.
The refolded diluted protein solution is kept at 4°C without mixing for
12 hours prior to further

CA 02296814 1999-12-21
WO 98/59034 PCTNS98/13041
61
purification steps.
To clarify the refolded T. pallutum polypeptide solution, a previously
prepared tangential
filtration unit equipped with 0.16 N.m membrane filter with appropriate
surface area (e.g.,
Filtron), equilibrated with 40 mM sodium acetate, pH 6.0 is employed. The
filtered sample is
loaded onto a cation exchange resin (e.g., Poros HS-50, Perseptive
Biosystems). The column is
washed with 40 mM sodium acetate, pH 6.0 and eluted with 250 mM, 500 mM, 1000
mM, and
1500 mM NaCI in the same buffer, in a stepwise manner. The absorbance at 280
mm of the
effluent is continuously monitored. Fractions are collected and further
analyzed by SDS-PAGE.
Fractions containing the T. pallidum polypeptide are then pooled and mixed
with 4
volumes of water. The diluted sample is then loaded onto a previously prepared
set of tandem
columns of strong anion (Poros HQ-50, Perseptive Biosystems) and weak anion
(Poros CM-20,
Perceptive Biosystems) exchange resins. The columns are equilibrated with 40
mM sodium
acetate, pH 6Ø Both columns are washed with 40 mM sodium acetate, pH 6.0,
200 mM NaCI.
The CM-20 column is then eluted using a 10 column volume linear gradient
ranging from 0.2 M
NaCI, 50 mM sodium acetate, pH 6.0 to 1.0 M NaCI, 50 mM sodium acetate, pH
6.5. Fractions
are collected under constant Ago monitoring of the effluent. Fractions
containing the T, pallidum
polypeptide (determined, for instance, by 16% SDS-PAGE) are then pooled.
The resultant T. pallidum polypeptide exhibits greater than 95% purity after
the above
refolding and purification steps. No major contanunant bands are observed from
Commassie
blue stained 16% SDS-PAGE gel when 5 p.g of purified protein is loaded. The
purified protein
is also tested for endotoxin/LPS contamination, and typically the LPS content
is less than 0.1
ng/ml according to LAL assays.
6(d). Cloning and Expression of T. pallidum in Other Bacteria
T. pallidum polypeptides can also be produced in: T. pallidum using the
methods of S.
Skinner et al., ( 1988) Mol. Microbiol. 2:289-297 or J. I. Moreno ( 1996)
Protein Expr. Purif.
8(3):332-340; Lactobacillus using the methods of C. Rush et al., 1997 Appl.
Microbiol.
Biotechnol. 47(5):537-542; or in Bacillus subtilis using the methods Chang et
al., U.S. Patent
No. 4,952,508.
7. Cloning and Expression in COS Cells
A T. pallidum expression plasmid is made by cloning a portion of the DNA
encoding a T.
pallidum polypeptide into the expression vector pDNAI/Amp or pDNATII (which
can be
obtained from Invitrogen, Inc.). The expression vector pDNAI/amp contains: ( 1
) an E. coli
origin of replication effective for propagation in E. coli and other
prokaryotic cells; (2) an
ampicillin resistance gene for selection of plasmid-containing prokaryotic
cells; (3) an SV40
origin of replication for propagation in eukaryotic cells; (4) a CMV promoter,
a polylinker, an
SV40 intron; (5) several codons encoding a hemagglutinin fragment (i.e., an
"HA" tag to

CA 02296814 1999-12-21
WO 98/59034 PCT/C1S98/13041
62
facilitate purification) followed by a termination codon and polyadenylation
signal arranged so
that a DNA can be conveniently placed under expression control of the CMV
promoter and
operably linked to the SV40 intron and the polyadenylation signal by means of
restriction sites in
the polylinker. The HA tag corresponds to an epitope derived from the
influenza hemagglutinin
protein described by Wilson et al. 1984 Cell 37:767. The fusion of the HA tag
to the target
protein allows easy detection and recovery of the recombinant protein with an
antibody that
recognizes the HA epitope. pDNAIII contains, in addition, the selectable
neomycin marker.
A DNA fragment encoding a T. pallidum polypeptide is cloned into the
polylinker region
of the vector so that recombinant protein expression is directed by the CMV
promoter. The
plasmid construction strategy is as follows. The DNA from a T. pallulum
genomic DNA prep is
amplified using primers that contain convenient restriction sites, much as
described above for
construction of vectors for expression of T. pallidum in E. coli. The 5'
primer contains a Kozak
sequence, an AUG start codon, and nucleotides of the 5' coding region of the
T. pallidum
polypeptide. The 3' primer, contains nucleotides complementary to the 3'
coding sequence of the
T. pallidum DNA, a stop codon, and a convenient restriction site.
The PCR amplified DNA fragment and the vector, pDNAI/Amp, are digested with
appropriate restriction enzymes and then ligated. The ligation mixture is
transformed into an
appropriate E. coli strain such as SURET"" (Stratagene Cloning Systems, La
Jolla, CA 92037),
and the transformed culture is plated on ampicillin media plates which then
are incubated to allow
growth of ampicillin resistant colonies. Plasmid DNA is isolated from
resistant colonies and
examined by restriction analysis or other means for the presence of the
fragment encoding the T.
pallidum polypeptide
For expression of a recombinant T. pallidum polypeptide, COS cells are
transfected with
an expression vector, as described above, using DEAE-dextran, as described,
for instance, by
Sambrook et al. (supra). Cells are incubated under conditions for expression
of T. pallidum by
the vector.
Expression of the T. pallidum-HA fusion protein is detected by radiolabeling
and
immunoprecipitation, using methods described in, for example Harlow et al.,
supra.. To this
end, two days after transfection, the cells are labeled by incubation in media
containing 35S-
cysteine for 8 hours. The cells and the media are collected, and the cells are
washed and the
lysed with detergent-containing RIPA buffer: 1 SO mM NaCI, 1 % NP-40, 0.1 %
SDS, 1 % NP-
40, 0.5% DOC, 50 mM TRIS, pH 7.5, as described by Wilson et al. (supra ).
Proteins are
precipitated from the cell lysate and from the culture media using an HA-
specific monoclonal
antibody. The precipitated proteins then are analyzed by SDS-PAGE and
autoradiography. An
expression product of the expected size is seen in the cell lysate, which is
not seen in negative
controls.
8. Cloning and Expression in CHO Cells

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
63
The vector pC4 is used for the expression of T. pallidum polypeptide in this
example.
Plasmid pC4 is a derivative of the plasmid pSV2-dhfr (ATCC Accession No.
37146). The
plasmid contains the mouse DHFR gene under control of the SV40 early promoter.
Chinese
hamster ovary cells or other cells lacking dihydrofolate activity that are
transfected with these
plasmids can be selected by growing the cells in a selective medium (alpha
minus MEM, Life
Technologies) supplemented with the chemotherapeutic agent methotrexate. The
amplification of
the DHFR genes in cells resistant to methotrexate (MTX) has been well
documented. See, e.g.,
Alt et al., 1978, J. Biol. Chem. 253:1357-1370; Hamlin et al., 1990, Biochem.
et Biophys.
Acta, 1097:107-143; Page et al., 1991, Biotechnology 9:64-68. Cells grown in
increasing
concentrations of MTX develop resistance to the drug by overproducing the
target enzyme,
DHFR, as a result of amplification of the DHFR gene. If a second gene is
linked to the DHFR
gene, it is usually co-amplified and over-expressed. It is known in the art
that this approach may
be used to develop cell lines carrying more than 1,000 copies of the amplified
gene(s).
Subsequently, when the methotrexate is withdrawn, cell lines are obtained
which contain the
amplified gene integrated into one or more chromosome{s) of the host cell.
Plasmid pC4 contains the strong promoter of the long terminal repeat (LTR) of
the Rouse
Sarcoma Virus, for expressing a polypeptide of interest, Cullen, et al. (
1985) Mol. Cell. Biol.
5:438-447; plus a fragment isolated from the enhancer of the immediate early
gene of human
cytomegalovirus (CMV), Boshart, et al., 1985, Cell 41:521-530. Downstream of
the promoter
are the following single restriction enzyme cleavage sites that allow the
integration of the genes:
Bam HI, Xba I, and Asp 718. Behind these cloning sites the plasmid contains
the 3' intron and
polyadenylation site of the rat preproinsulin gene. Other high efficiency
promoters can also be
used for the expression, e.g., the human B-actin promoter, the SV40 early or
late promoters or
the long terminal repeats from other retroviruses, e.g., HIV and HTLVI.
Clontech's Tet-Off and
Tet-On gene expression systems and similar systems can be used to express the
T. pallidum
polypeptide in a regulated way in mammalian cells (Gossen et aL, 1992, Proc.
Natl. Acad. Sci.
USA 89:5547-5551. For the polyadenylation of the mRNA other signals, e.g.,
from the human
growth hormone or globin genes can be used as well. Stable cell lines carrying
a gene of interest
integrated into the chromosomes can also be selected upon co-transfection with
a selectable
marker such as gpt, 6418 or hygromycin. It is advantageous to use more than
one selectable
marker in the beginning, e.g., G418 plus methotrexate.
The plasmid pC4 is digested with the restriction enzymes and then
dephosphorylated
using calf intestinal phosphates by procedures known in the art. The vector is
then isolated from
a 1 % agarose gel. The DNA sequence encoding the T. pallidum polypeptide is
amplified using
PCR oligonucleotide primers corresponding to the 5' and 3' sequences of the
desired portion of
the gene. A 5' primer containing a restriction site, a Kozak sequence, an AUG
start codon, and
nucleotides of the 5' coding region of the T. pallidum polypeptide is
synthesized and used. A 3'
primer, containing a restriction site, stop codon, and nucleotides
complementary to the 3' coding
sequence of the T. pallidum polypeptides is synthesized and used. The
amplified fragment is

CA 02296814 1999-12-21
WO 98/59034 PCT/US98/13041
64
digested with the restriction endonucleases and then purified again on a 1 %
agarose gel. The
isolated fragment and the dephosphorylated vector are then ligated with T~+
DNA ligase. E. coli
HB l0i or XL-1 Blue cells are then transformed and bacteria are identified
that contain the
fragment inserted into plasmid pC4 using, for instance, restriction enzyme
analysis.
Chinese hamster ovary cells lacking an active DHFR gene are used for
transfection. Five
p.g of the expression plasmid pC4 is cotransfected with 0.5 p.g of the plasmid
pSVneo using a
lipid-mediated transfection agent such as LipofectinT"' or LipofectAMINE.T""
(LifeTechnologies
Gaithersburg, MD). The plasmid pSV2-neo contains a dominant selectable marker,
the neo gene
from Tn5 encoding an enzyme that confers resistance to a group of antibiotics
including 6418.
The cells are seeded in alpha minus MEM supplemented with 1 mg/ml G4I8. After
2 days, the
cells are trypsinized and seeded in hybridoma cloning plates (Greiner,
Germany) in alpha minus
MEM supplemented with 10, 25, or 50 ng/ml of methotrexate plus 1 mg/ml 6418.
After about
10-14 days single clones are trypsinized and then seeded in 6-well petri
dishes or 10 ml flasks
using different concentrations of methotrexate (50 nM, 100 nM, 200 nM, 400 nM,
800 nM).
Clones growing at the highest concentrations of methotrexate are then
transferred to new 6-well
plates containing even higher concentrations of methotrexate ( 1 p,M, 2 ~.M, 5
I1.M, 10 mM, 20
mM). The same procedure is repeated until clones are obtained which grow at a
concentration of
100-200 N.M. Expression of the desired gene product is analyzed, for instance,
by SDS-PAGE
and Western blot or by reversed phase HPLC analysis.
The disclosure of all publications (including patents, patent applications,
journal articles,
laboratory manuals, books, or other documents) cited herein are hereby
incorporated by reference
in their entireties SEQ >D NOS: 1-744 are hereby incorporated into the
specification by reference.
The present invention is not to be limited in scope by the specific
embodiments described
herein, which are intended as single illustrations of individual aspects of
the invention.
Functionally equivalent methods and components are within the scope of the
invention, in
addition to those shown and described herein and will become apparant to those
skilled in the art
from the foregoing description and accompanying drawings. Such modifications
are intended to
fall within the scope of the appended claims.

DEMANDES OU BREVETS VOLUMINEUX
LA PRESENTE PART1E DE CETTE DEMANDE OU CE BREVET
COMPREND PLUS D'UN TOME.
CECt EST ~E TOME ~ DE
NOTE: Pour les tomes additionels, veuiilez contacter le Bureau canadien des
brevets
JUMBO APPLICAT10NS/PATE1VTS
TH1S SECT10N OF THE APPLICATION/PATENT CONTAINS MORE
THAN ONE VOLUME
THtS IS VOLUME Ot=
' NOTE: For additional volumes-phase contact the Canadian Patent Ofific~ . i -

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Inactive: IPC expired 2018-01-01
Inactive: IPC deactivated 2011-07-29
Inactive: IPC from MCD 2006-03-12
Inactive: IPC from MCD 2006-03-12
Inactive: IPC from MCD 2006-03-12
Time Limit for Reversal Expired 2005-06-23
Application Not Reinstated by Deadline 2005-06-23
Deemed Abandoned - Failure to Respond to Maintenance Fee Notice 2004-06-23
Letter Sent 2003-07-17
Request for Examination Received 2003-06-12
Request for Examination Requirements Determined Compliant 2003-06-12
All Requirements for Examination Determined Compliant 2003-06-12
Inactive: Correspondence - Formalities 2000-06-19
Letter Sent 2000-05-24
Inactive: Single transfer 2000-04-12
Inactive: Cover page published 2000-03-21
Inactive: IPC assigned 2000-03-17
Inactive: IPC assigned 2000-03-17
Inactive: IPC assigned 2000-03-17
Inactive: IPC assigned 2000-03-17
Inactive: First IPC assigned 2000-03-17
Inactive: IPC assigned 2000-03-17
Inactive: IPC assigned 2000-03-17
Inactive: Incomplete PCT application letter 2000-03-07
Inactive: Notice - National entry - No RFE 2000-02-24
Application Received - PCT 2000-02-23
Application Published (Open to Public Inspection) 1998-12-30

Abandonment History

Abandonment Date Reason Reinstatement Date
2004-06-23

Maintenance Fee

The last payment was received on 2003-06-09

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Patent fees are adjusted on the 1st of January every year. The amounts above are the current amounts if received by December 31 of the current year.
Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Fee History

Fee Type Anniversary Year Due Date Paid Date
Basic national fee - standard 1999-12-21
Registration of a document 2000-04-12
MF (application, 2nd anniv.) - standard 02 2000-06-23 2000-06-07
MF (application, 3rd anniv.) - standard 03 2001-06-26 2001-06-11
MF (application, 4th anniv.) - standard 04 2002-06-24 2002-06-03
MF (application, 5th anniv.) - standard 05 2003-06-23 2003-06-09
Request for examination - standard 2003-06-12
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
HUMAN GENOME SCIENCES, INC.
Past Owners on Record
CLAIRE M. FRASER
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Representative drawing 2000-03-20 1 6
Description 1999-12-20 302 15,517
Description 1999-12-20 245 10,163
Description 1999-12-20 302 16,448
Description 1999-12-20 238 13,933
Description 1999-12-20 66 4,899
Claims 1999-12-20 3 142
Abstract 1999-12-20 1 55
Drawings 1999-12-20 2 36
Reminder of maintenance fee due 2000-02-23 1 113
Notice of National Entry 2000-02-23 1 195
Courtesy - Certificate of registration (related document(s)) 2000-05-23 1 113
Reminder - Request for Examination 2003-02-24 1 120
Acknowledgement of Request for Examination 2003-07-16 1 173
Courtesy - Abandonment Letter (Maintenance Fee) 2004-08-17 1 175
Correspondence 2000-02-28 2 22
PCT 1999-12-20 6 205
Correspondence 2000-06-18 1 32

Biological Sequence Listings

Choose a BSL submission then click the "Download BSL" button to download the file.

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Please note that files with extensions .pep and .seq that were created by CIPO as working files might be incomplete and are not to be considered official communication.

BSL Files

To view selected files, please enter reCAPTCHA code :