Language selection

Search

Patent 2887830 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 2887830
(54) English Title: GENETIC POLYMORPHISMS ASSOCIATED WITH LIVER FIBROSIS METHODS OF DETECTION AND USES THEREOF
(54) French Title: POLYMORPHISMES GENETIQUES ASSOCIES A DES TECHNIQUES DE DETECTION DE CIRRHOSE DU FOIE ET UTILISATION DE CES POLYMORPHISMES
Status: Granted
Bibliographic Data
(51) International Patent Classification (IPC):
  • C12Q 1/6883 (2018.01)
  • G16B 20/00 (2019.01)
  • C07H 21/00 (2006.01)
  • C07K 16/18 (2006.01)
  • C12N 15/12 (2006.01)
  • C12N 15/55 (2006.01)
  • C12Q 1/68 (2018.01)
  • C40B 30/04 (2006.01)
(72) Inventors :
  • CARGILL, MICHELE (United States of America)
  • HUANG, HONGJIN (United States of America)
(73) Owners :
  • CELERA CORPORATION (United States of America)
(71) Applicants :
  • CELERA CORPORATION (United States of America)
(74) Agent: SMART & BIGGAR LP
(74) Associate agent:
(45) Issued: 2017-06-20
(22) Filed Date: 2005-05-09
(41) Open to Public Inspection: 2005-11-24
Examination requested: 2015-05-15
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data:
Application No. Country/Territory Date
60/568,846 United States of America 2004-05-07
60/582,609 United States of America 2004-06-25
60/599,554 United States of America 2004-08-09

Abstracts

English Abstract

The present invention is based on the discovery of genetic polymorphisms that are associated with liver fibrosis and related pathologies. In particular, the present invention relates to nucleic acid molecules containing the polymorphisms, variant proteins encoded by such nucleic acid molecules, reagents for detecting the polymorphic nucleic acid molecules and proteins, and methods of using the nucleic acid and proteins as well as methods of using reagents for their detection.


French Abstract

La présente invention concerne la découverte que des polymorphismes génétiques sont associés à la cirrhose et à des pathologies associées. Cette invention concerne en particulier des molécules dacides nucléiques contenant ces polymorphismes, des protéines de variant codées par ces molécules dacides nucléiques, des réactifs permettant de détecter ces molécules et ces protéines dacides nucléiques polymorphiques, des techniques dutilisation de ces acides nucléiques et de ces protéines ainsi que des techniques dutilisation de réactifs qui permettent leur détection.

Claims

Note: Claims are shown in the official language in which they were submitted.



What Is Claimed Is:

1. A method for determining whether a human has an increased risk for
developing
liver fibrosis, comprising testing nucleic acid from said human to determine
the presence or
absence of a polymorphism in gene MICROSOMAL TRIGLYCERIDE TRANSFER
PROTEIN (MTP) at position 101 of the nucleotide sequence defined by SEQ ID
NO:55 or its
complement, wherein the presence of C at position 101 of SEQ ID NO:55 or G at
position 101
of its complement with said human has said increased risk for developing liver
fibrosis.
2. The method of claim 1, wherein said nucleic acid is a nucleic acid
extract from a
biological sample from said human.
3. The method of claim 2, wherein said biological sample is blood, saliva,
or
buccal cells.
4. The method of claim 2 or 3, further comprising preparing said nucleic
acid
extract from said biological sample prior to said testing.
5. The method of any one of claims 1 to 4, wherein said testing comprises
nucleic
acid amplification.
6. The method of claim 5, wherein said nucleic acid amplification is
carried out by
polymerase chain reaction.
7. The method of any one of claims 1 to 6, wherein said testing is
performed using
sequencing, 5' nuclease digestion, molecular beacon assay, oligonucleotide
ligation assay, size
analysis, single-stranded conformation polymorphism analysis, or denaturing
gradient gel
electrophoresis (DGGE).

144


8. The method of any one of claims 1 to 7, wherein said testing is
performed using
an allele-specific method.
9. The method of claim 8, wherein said allele-specific method is allele-
specific
probe hybridization, allele-specific primer extension, or allele-specific
amplification.
10. The method of claim 8 or 9, wherein said testing is carried out using
an allele-
specific primer that comprises a sequence selected from the group consisting
of SEQ ID NOS:
71, 72, and sequences fully complementary thereto.
11. The method of any one of claims 1 to 10, wherein said human is
homozygous
for said C or said G.
12. The method of any one of claims 1 to 10, wherein said human is
heterozygous
for said C or said G.
13. The method of any one of claims 1 to 12, which is an automated method.
14. The method of any one of claims 1 to 13, wherein the human is a
hepatitis C
virus-infected human.
15. An allele-specific polynucleotide for use in a method as defined in any
one of
claims 1 to 14, wherein said polynucleotide is specific for a polymorphism
comprising C at
position 101 of SEQ ID NO:55 or G at position 101 of its complement, and
wherein said allele-
specific polynucleotide is at least 16 nucleotides in length.
16. The allele-specific polynucleotide of claim 15, wherein said
polynucleotide is
detectably labeled.

145


17. The allele-specific polynucleotide of claim 16, wherein said
polynucleotide is
labeled with a fluorescent dye.
18. A kit for use in a method as defined in any one of claims 1 to 14,
wherein said
kit comprises at least one polynucleotide as defined in any one of claims 15
to 17 and at least
one further component, wherein the at least one further component is a buffer,
deoxynucleotide
triphosphates (dNTPs), an amplification primer pair, an enzyme, or any
combination thereof.
19. The kit of claim 18, wherein said enzyme is a polymerase or a ligase.

146

Description

Note: Descriptions are shown in the official language in which they were submitted.


DEMANDES OU BREVETS VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVETS
COMPREND PLUS D'UN TOME.
CECI EST LE TOME 1 DE 2
NOTE: Pour les tomes additionels, veillez contacter le Bureau Canadien des
Brevets.
JUMBO APPLICATIONS / PATENTS
THIS SECTION OF THE APPLICATION / PATENT CONTAINS MORE
THAN ONE VOLUME.
THIS IS VOLUME 1 OF 2
NOTE: For additional volumes please contact the Canadian Patent Office.

CA 02887830 2015-08-28
CA 2887830
GENETIC POLYMORPHISMS ASSOCIATED WITH LIVER FIBROSIS METHODS
OF DETECTION AND USES THEREOF
FIELD OF THE INVENTION
The present invention is in the field of fibrosis diagnosis and therapy and in
particular
liver fibrosis diagnosis and therapy, and more particularly, liver fibrosis
associated with
hepatitis C virus (HCV) infection. More specifically, the present invention
relates to specific
single nucleotide polymorphisms (SNPs) in the human genome, and their
association with liver
fibrosis and related pathologies. Based on differences in allele frequencies
in the patient
population with advanced or bridging fibrosis/cirrhosis relative to
individuals with no or
minimal fibrosis, the naturally-occurring SNPs disclosed herein can be used as
targets for the
design of diagnostic reagents and the development of therapeutic agents, as
well as for disease
association and linkage analysis. In particular, the SNPs of the present
invention are useful for
identifying an individual who is at an increased or decreased risk of
developing liver fibrosis
and for early detection of the disease, for providing clinically important
information for the
prevention and/or treatment of liver fibrosis, and for screening and selecting
therapeutic agents.
The SNPs disclosed herein may also be useful for human identification
applications. Methods,
assays, kits, and reagents for detecting the presence of these polymorphisms
and their encoded
products are provided.
BACKGROUND OF THE INVENTION
Fibrosis
Fibrosis is a quantitative and qualitative change in the extracellular matrix
that
surrounds cells as a response to tissue injury. The trauma that generates
fibrosis is varied and
includes radiological trauma (i.e., x-ray, gamma ray, etc.), chemical trauma
(ie., radicals,
ethanol, phenols, etc.) viral infection and physical trauma. Fibrosis
encompasses pathological
conditions in a variety of tissues such as pulmonary fibrosis, retroperitoneal
fibrosis, epidural
fibrosis, congenital fibrosis, focal fibrosis, muscle fibrosis, massive
fibrosis, radiation fibrosis
(e.g. radiation induced lung fibrosis), liver fibrosis and cardiac fibrosis.
1

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
Liver Fibrosis in HCV-Infected Subjects
HCV affects about 4 million people in theUnited States and more than
170 million people worldwide. Approximately 85% of the infected individuals
develop chronic hepatitis, and up to 20% progress to bridging
fibrosis/cirrhosis, which is end-stage severe liver fibrosis and is generally
irreversible (Lauer et al. 2001, N Eng J Med 345: 41-52). HCV infection is
=
the major cause of cirrhosis and hepatocellular carcinoma (HOC), and
accounts for one third of liver transplantations. The interval between
infection and the development of Cirrhosis may exceed 30 years but varies
widely among individuals. Based on fibrosis progression rate, chronic HCV
patients can be roughly divided into three groups (Poynard et al 1997, Lancet
349: 825-832): rapid, median, and slow fibrosers.
Previous studies have indicated that host factors may play a role in the
progression of fibrosis, and these include age at infection, duration of.
infection, alcohol consumption, and gender. However, these host factors
account for only 17%-29% of the variability in fibrosis progression (Poynard
et
al., 1997, Lancet 349: 825-832; Wright et al Gut. 2003, 52(4):574-9). Viral
load or viral genotype has not shown significant correlation with fibrosis
progression (Poynard et al., 1997, Lancet 349: 825-832). Thus, other factors,
such as host genetic factors, are likely to play an important role in
determining the rate of fibrosis progression.
Recent studies suggest that some genetic polymorphisms influence the
progression of fibrosis in patients with HCV infection (Powell et al.
Hepatology 31(4): 828-33,2000), autoimmune chronic cholestasis (Tanaka et
al. J. Infec. Dis. 187:1822-5, 2003), alcohol induced liver diseases (Yamauchi

et al., J. Hepatology 23(5):519-23, 1995), and nonalcoholic fatty liver
diseases
(Bernard et at. Diabetologia 2000, 43(8):995-9). However, none of these
genetic polymorphisms have been integrated into clinical practice for various
reasons (Bataller et at Hepatology. 2003, 37(3):493-503). For example,

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
limitations in study design, such as small study populations, lack of
replication sample sets, and lack of proper control groups have contributed to

contradictory results; an example being the conflicting results reported on
the
role of mutations in the hemochromatosis gene (HFE) on fibrosis progression
in HCV-infected patients (Smith et al., Hepatalogy. 1998, 27(6):1695-9;
Thorburn et al., Gut. 2002, 50(2):248-52).
Currently, there is no diagnostic test that can identify. patients who
are predisposed to developing liver damage from chronic HCV infection,
despite the large variability in fibrosis progression rate among HCV patients.
Furthermore, diagnosis of fibrosis stage (early, middle or late) and =
monitoring of fibrosis progression is currently accomplished by liver biopsy,
which is invasive, painful, and costly, and generally must be performed
multiple times to assess fibrosis status. The discovery of genetic markers
which are useful in. identifying HCV-infected individuals who are at = =
increased risk for advancing from early stage fibrosis to cirrhosis and/or HOC
may lead to, for example, better therapeutic strategies, economic models, and.
health care policy decisions. ,
SNPs
The genomes of all organisms undergo spontaneous mutation in the course of
=
their continuing evolution, generating variant forms of progenitor genetic
sequences
(Gusella, Ann. Rev. Biochenz. 55, 831-854 (1986)). A variant form may confer
an
evolutionary advantage or disadvantage relative to a progenitor form or may be
neutral.
' In some instances, a variant form confers an evolutionary advantage to the
species and is
eventually incorporated into the DNA of many or most members of the species
and
effectively becomes the progenitor form. Additionally, the effects of a
variant form
may be both beneficial and detrimental, depending on the circumstances. For
example, a heterozygous sickle cell mutation confers resistance to malaria,
but a homozygous sickle cell mutation is usually lethal. In many cases, both
progenitor and variant forms survive and co-exist in a species population. The
3

CA 02887830 2014-11-27
WO 2005/111241 PCT/11S2005/016051
coexistence of multiple forms of a genetic sequence gives rise to genetic
polymorphisms,
including SNPs.
Approximately 90% of all polymorphisms in the human genome are SNPs. SNPs
are single base positions in DNA at which different alleles, or alternative
nucleotides,
exist in a population. The SNP position (interchangeably referred to herein as
SNP, SNP
site, SNP locus, SNP marker, or marker) is usually preceded by and followed by
highly
conserved sequences of the allele (e.g., sequences that vary in less than
1/100 or 1/1000 =
members of the populations). An individual may be homozygous or heterozygous
for an =
allele at each SNP position. A SNP can, in some instances, be referred to as a
"cSNP" to
denote that the nucleotide sequence containing the SNP is an amino acid coding
sequence.
A SNP may arise from a substitution of one nucleotide for another at the
polymorphic site: Substitutions can be transitions or transversions. A
transition is the
replacement of one purine nucleotide by another purine nucleotide, or one
pyrimidine by
another pyrimidine. A transversion is the replacement of a purine by a
pyrimidine, or
vice versa. A SNP may also be a single base insertion or deletion variant
referred to as
an "indel" (Weber et al., "Human diallelic insertion/deletion polymorphisms",
Am J Hum
Genet 2002 Oct;71(4):854-62).
A synonymous codon change, or silent mutation/SNP (terms such as "SNP".,
"polymorphism", "mutation", "mutant", "variation", and "variant" are used
herein
interchangeably), is one that doe's not result in a change of amino acid due
to the
degeneracy of the genetic code. A substitution that changes a codon coding for
one
amino acid to a codon coding for a different amino acid (i.e., a non-
synonymous codon
change) is referred to as a missense mutation. A nonsense mutation results in
a type of
non-synonymous codon change in which a stop codon is formed, thereby leading
to
premature termination of a polypeptide chain and a truncated protein. A read-
through
mutation is another type of non-synonymous codon change that causes the
destruction of
a stop codon, thereby resulting in an extended polypeptide product. While SNPs
can be
bi-, tri-, or tetra- allelic, the vast majority of the SNPs are bi-allelic,
and are thus often
referred to as "bi-allelic markers", or "di-allelic markers".
4

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
As used herein, references to SNPs and SNP genotypes include individual SNPs
and/or,haplotypes, which are groups of SNPs that are generally inherited
together.
Haplotypes can have stronger correlations with diseases or other phenotypic
effects
compared with individual SNPs, and therefore may provide increased diagnostic
accuracy in some cases (Stephens et at. Science 293,489-493, 20 July 2001).
Causative SNPs are those SNPs that produce alterations in gene expression or
in
the expression, structure, and/or function of a gene product, and therefore
are most
, predictive of a possible clinical phenotype. One such class includes SNPs
falling within
regions of genes encoding a polypeptide product, i.e. cSNPs. These SNPs may
result in
an alteration of the amino acid sequence of the polypeptide product (i.e., non-

synonymous codon changes) and give rise to the expression of a defective or
other
variant protein. Furthermore, in the case of nonsense mutations, a SNP may
lead to
premature termination of a polypeptide product. Such variant products can
result in a
. pathological condition, e.g., genetic disease. Examples of genes in which
a SNP within a
coding sequence causes a genetic disease include sickle cell anemia and cystic
fibrosis. .
Causative SNPs do not necessarily have to occur in coding regions; causative
SNPs can occur in, for example, any genetic region that can ultimately affect
the
expression, structure, and/or activity of the protein encoded by a nucleic
acid. Such
genetic regions include, for example, those involved in transcription, such as
SNPs-in
transcription factor binding domains, SNPs in promoter regions, in areas
involved in
transcript processing, such as SNPs at intron-exon boundaries that may cause
defective
splicing, or SNPs in mRNA processing signal sequences such as polyadenylation
signal
regions. Some SNPs that are not causative SNPs nevertheless are in close
association
-with, and therefore segregate with, a disease-causing sequence In this
situation, the
presence of a SNP correlates with the presence of, or predisposition to, or an
increased
risk in developing the disease. These SNPs, although not causative, are
nonetheless also
= useful for diagnostics, disease predisposition screening, and other uses.
An association study of a SNP and a specific disorder involves determining the

presence or frequency of the SNP allele in biological samples from individuals
with the
disorder of interest, such as liver fibrosis and related pathologies and
comparing the
information to that of controls (i.e., individuals who do not have the
disorder; controls
5

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
may be also referred to as "healthy" or "normal" individuals) who are
preferably of
similar, age and race. The appropriate selection of patients and controls is
important to
the success of SNP association studies. Therefore, a pool of individuals with
well-
characterized phenotypes is extremely desirable.
A SNP may be screened in diseased tissue samples or any biological sample
obtained from a diseased individual, and compared to control samples, and
selected for
= its increased (or decreased) occurrence in a specific pathological
condition, such as
.pathologies related to liver fibrosis, increased or decreased risk of
developing bridging
fibrosis/cirrhosis, and progression of liver fibrosis. Once a statistically
significant
association is established between one or more SNP(s) and a pathological
condition (or
other phenotype) of interest, then the region around the SNP can optionally be
thoroughly
screened to identify the.causative genetic locus/sequence(s) (e.g., causative
SNP/mutation, gene, regulatory region, etc.) that influences the pathological
condition or
= phenotype. Association studies may be conducted within the general
population and are
:not limited to studies performed on related individuals in affected families
(linkage
studies).
Clinical trials have shown that patient response to treatment with
pharmaceuticals
is often heterogeneous. There is a continuing need to improve pharmaceutical
agent
design and therapy. In that regard, SNPs can be used to identify patients most
suited to. =
therapy with particular pharmaceutical agents (this is often termed
"pharmacogenomics").
Similarly, SNPs can be used to exclude patients from certain treatment due to
the =
patient's increased likelihood of developing toxic side effects or their
likelihood of not
responding to the treatment. Pharmacogenomics can .also be used in
pharmaceutical
research to assist the drug development and selection process. (Linder et al.
(1997),
Clinical Chemistry, 43, 254; Marshall (1997), Nature Biotechnology, 15, 1249;
International Patent Application WO 97/40462, Spectra Biomedical; and Schafer
et al.
(1998), Nature Biotechnology, 16:3).
SUMMARY OF THE INVENTION
The present invention relates to the identification of novel SNPs, unique
combinations of such SNPs, and haplotypes of SNPs that are associated with
liver
6

CA 02887830 2015-08-28
CA 2887830
fibrosis and in particular the increased or decreased risk of developing
bridging
fibrosis/cirrhosis, and the rate of progression of liver fibrosis. The
polymorphisms disclosed
herein are may be useful as targets for the design of diagnostic reagents and
the development of
therapeutic agents for use in the diagnosis and treatment of liver fibrosis
and related
pathologies.
Based on the identification of SNPs associated with liver fibrosis, the
present disclosure
is also of methods of detecting these variants as well as the design and
preparation of detection
reagents needed to accomplish this task. The disclosure is specifically of,
for example, novel
SNPs in genetic sequences involved in liver fibrosis and related pathologies,
isolated nucleic
acid molecules (including, for example, DNA and RNA molecules) containing
these SNPs,
variant proteins encoded by nucleic acid molecules containing such SNPs,
antibodies to the
encoded variant proteins, computer-based and data storage systems containing
the novel SNP
information, methods of detecting these SNPs in a test sample, methods of
identifying
individuals who have an altered (i.e., increased or decreased) risk of
developing liver fibrosis
based on the presence or absence of one or more particular nucleotides
(alleles) at one or more
SNP sites disclosed herein or the detection of one or more encoded variant
products (e.g.,
variant mRNA transcripts or variant proteins), methods of identifying
individuals who are more
or less likely to respond to a treatment (or more or less likely to experience
undesirable side
effects from a treatment, etc.), methods of screening for compounds useful in
the treatment of a
disorder associated with a variant gene/protein, compounds identified by these
methods,
methods of treating disorders mediated by a variant gene/protein, methods of
using the novel
SNPs of the present invention for human identification, etc.
In Tables 1-2, the present disclosure is of gene information, transcript
sequences (SEQ
ID NOS:1-14), encoded amino acid sequences (SEQ ID NOS:15-28), genomic
sequences (SEQ
ID NOS:43-50), transcript-based context sequences (SEQ Ill NOS:29-42) and
genomic-based
context sequences (SEQ ID NOS:51-58) that contain the SNPs of the present
invention, and
extensive SNP information that includes observed alleles, allele frequencies,
populations/ethnic
groups in which alleles have been observed, information about the type of SNP
and
corresponding functional effect, and, for cSNPs, information about the encoded
7

CA 02887830 2015-08-28
CA2887830
polypeptide product. The transcript sequences (SEQ ID NOS:1-14), amino acid
sequences
(SEQ ID NOS:15-28), genomic sequences (SEQ ID NOS:43-50), transcript-based SNP
context
sequences (SEQ ID NOS: 29-42), and genomic-based SNP context sequences (SEQ ID

NOS:51-58) are also provided in the Sequence Listing.
In a specific embodiment of the present invention, SNPs that occur naturally
in the
human genome are provided as isolated nucleic acid molecules. These SNPs are
associated
with liver fibrosis and related pathologies. In particular the SNPs are
associated with either an
increased or decreased risk of developing bridging fibrosis/cirrhosis and
affect the rate of
progression of liver fibrosis. As such, they can have a variety of uses in the
diagnosis and/or
treatment of liver fibrosis and related pathologies. In an alternative
embodiment, a nucleic acid
of the invention is an amplified polynucleotide, which is produced by
amplification of a SNP-
containing nucleic acid template. In another embodiment, the invention
provides for a variant
protein that is encoded by a nucleic acid molecule containing a SNP disclosed
herein.
In yet another embodiment of the invention, a reagent for detecting a SNP in
the context
of its naturally-occurring flanking nucleotide sequences (which can be, e.g.,
either DNA or
mRNA) is provided. In particular, such a reagent may be in the form of, for
example, a
hybridization probe or an amplification primer that is useful in the specific
detection of a SNP
of interest. In an alternative embodiment, a protein detection reagent is used
to detect a variant
protein that is encoded by a nucleic acid molecule containing a SNP disclosed
herein. A
preferred embodiment of a protein detection reagent is an antibody or an
antigen-reactive
antibody fragment.
Various embodiments of the invention also provide kits comprising SNP
detection
reagents, and methods for detecting the SNPs disclosed herein by employing
detection
reagents. In a specific embodiment, the present invention provides for a
method of identifying
an individual having an increased or decreased risk of developing liver
fibrosis by detecting the
presence or absence of one or more SNP alleles disclosed herein.
8

CA 02887830 2015-08-28
CA 2887830
In another embodiment, a method for diagnosis of liver fibrosis and related
pathologies by
detecting the presence or absence of one or more SNP alleles disclosed herein
is provided.
The nucleic acid molecules of the invention can be inserted in an expression
vector,
such as to produce a variant protein in a host cell. Thus, the present
disclosure is also of a
vector comprising a SNP-containing nucleic acid molecule, genetically-
engineered host cells
containing the vector, and methods for expressing a recombinant variant
protein using such
host cells. In another specific embodiment, the host cells, SNP-containing
nucleic acid
molecules, and/or variant proteins can be used as targets in a method for
screening and
identifying therapeutic agents or pharmaceutical compounds useful in the
treatment of liver
fibrosis and related pathologies.
An aspect of this disclosure is a method for treating liver fibrosis in a
human subject
wherein said human subject harbors a SNP, gene, transcript, and/or encoded
protein identified
in Tables 1-2, which method comprises administering to said human subject a
therapeutically
or prophylactically effective amount of one or more agents counteracting the
effects of the
disease, such as by inhibiting (or stimulating) the activity of the gene,
transcript, and/or
encoded protein identified in Tables 1-2.
Another aspect of this disclosure is a method for identifying an agent useful
in
therapeutically or prophylactically treating liver fibrosis and related
pathologies in a human
subject wherein said human subject harbors a SNP, gene, transcript, and/or
encoded protein
identified in Tables 1-2, which method comprises contacting the gene,
transcript, or encoded
protein with a candidate agent under conditions suitable to allow formation of
a binding
complex between the gene, transcript, or encoded protein and the candidate
agent and detecting
the formation of the binding complex, wherein the presence of the complex
identifies said
agent.
Another aspect of this disclosure is a method for treating liver fibrosis and
related
pathologies in a human subject, which method comprises:
(i) determining that said human subject harbors a SNP, gene, transcript,
and/or encoded
protein identified in Tables 1-2, and
9

CA 02887830 2016-10-14
CA 2887830
(ii) administering to said subject a therapeutically or prophylactically
effective amount
of one or more agents counteracting the effects of the disease.
Various embodiments of the claimed invention related to a method for
determining
whether a human has an increased risk for developing liver fibrosis,
comprising testing nucleic
acid from said human to determine the presence or absence of a polymorphism in
gene
MICROSOMAL TRIGLYCERIDE TRANSFER PROTEIN (MTP) at position 101 of the
nucleotide sequence defined by SEQ ID NO:55 or its complement, wherein the
presence of C at
position 101 of SEQ ID NO:55 or G at position 101 of its complement with said
human has
said increased risk for developing liver fibrosis.
Many other uses and advantages of the present invention will be apparent to
those
skilled in the art upon review of the detailed description of the preferred
embodiments herein.
Solely for clarity of discussion, the invention is described in the sections
below by way of non-
limiting examples.
The Sequence Listing provides the transcript sequences (SEQ ID NOS: 1-14) and
is protein sequences (SEQ ID NOS:15-28) as shown in Table 1, and genomic
sequences (SEQ ID
NOS:43-50) as shown in Table 2, for each liver fibrosis-associated gene that
contains one or
more SNPs of the present invention. Also provided in the Sequence Listing are
context
sequences flanking each SNP, including both transcript-based context sequences
as shown in
Table 1 (SEQ ID NOS:29-42) and genomic-based context sequences as shown in
Table 2 (SEQ
ID NOS:51-58). The context sequences generally provide 100bp upstream (5') and
100b
downstream (3') of each SNP, with the SNP in the middle of the context
sequence, for a total of
200bp of context sequence surrounding each SNP.

CA 02887830 2015-08-28
CA 2887830
DESCRIPTION OF TABLE 1 AND TABLE 2
Table 1 and Table 2 disclose the SNP and associated gene/transcript/protein
information
of the present disclosure. For each gene, Table 1 and Table 2 each provide a
header containing
gene/transcript/protein information, followed by a transcript and protein
sequence (in Table 1)
or genomic sequence (in Table 2), and then SNP information regarding each SNP
found in that
gene/transcript.
NOTE: SNPs may be included in both Table 1 and Table 2; Table 1 presents the
SNPs
relative to their transcript sequences and encoded protein sequences, whereas
Table 2 presents
the SNPs relative to their genomic sequences (in some instances Table 2 may
also include, after
the last gene sequence, genomic sequences of one or more intergenic regions,
as well as SNP
context sequences and other SNP information for any SNPs that lie within these
intergenic
regions). SNPs can readily be cross-referenced between Tables based on their
hCV (or, in
some instances, hDV) identification numbers.
The gene/transcript/protein information includes:
- a gene number (1 through n, where n = the total number of genes in the
Table)
- a Celera hCG and DID internal identification numbers for the gene
- a Celera hCT and UID internal identification numbers for the transcript
(Table 1 only)
- a public Genbank accession number (e.g., RefSeq NM number) for the
transcript
(Table 1 only)
- a Cetera hCP and UM internal identification numbers for the protein encoded
by the
hCT transcript (Table 1 only)
- a public Genbank accession number (e.g., RefSeq NP number) for the protein
(Table 1
only)
- an art-known gene symbol
- an art-known gene/protein name
11

CA 02887830 2015-08-28
CA2887830
- Celera genomic axis position (indicating start nucleotide position-stop
nucleotide
position)
- the chromosome number of the chromosome on which the gene is located
- an OMTM (Online Mendelian Inheritance in Man; Johns Hopkins
University/NCBI)
public reference number for obtaining further information regarding the
medical significance of
each gene
- alternative gene/protein name(s) and/or symbol(s) in the OMIM entry
NOTE: Due to the presence of alternative splice forms, multiple
transcript/protein
entries can be provided for a single gene entry in Table 1; i.e., for a single
Gene Number,
multiple entries may be provided in series that differ in their
transcript/protein information and
sequences.
Following the gene/transcript/protein information is a transcript sequence and
protein
sequence (in Table 1), or a genomic sequence (in Table 2), for each gene, as
follows:
- transcript sequence (Table 1 only) (corresponding to SEQ ID NOS:1-14 of the
Sequence Listing), with SNPs identified by their IUB codes (transcript
sequences can include
5' UTR, protein coding, and 3' UTR regions). (NOTE: If there are differences
between the
nucleotide sequence of the hCT transcript and the corresponding public
transcript sequence
identified by the Genbank accession number, the hCT transcript sequence (and
encoded
protein) is provided, unless the public sequence is a RefSeq transcript
sequence identified by an
NM number, in which case the RefSeq NM transcript sequence (and encoded
protein) is
provided. However, whether the hCT transcript or RefSeq NM transcript is used
as the
transcript sequence, the disclosed SNPs are represented by their TUB codes
within the
transcript.)
- the encoded protein sequence (Table 1 only) (corresponding to SEQ ID NOS:15-
28 of
the Sequence Listing)
- the genomic sequence of the gene (Table 2 only), including 6kb on each side
of the
gene boundaries (i.e., 6kb on the 5' side of the gene plus 6kb on the 3' side
of the gene)
(corresponding to SEQ ID NOS:43-50 of the Sequence Listing).
12

CA 02887830 2015-08-28
CA2887830
After the last gene sequence, Table 2 may include additional genomic sequences
of
intergenic regions (in such instances, these sequences are identified as
"Intergenie region:"
followed by a numerical identification number), as well as SNP context
sequences and other
SNP information for any SNPs that lie within each intergenie region (and such
SNPs are
identified as "INTERGENIC" for SNP type).
NOTE: The transcript, protein, and transcript-based SNP context sequences are
provided in both Table 1 and in the Sequence Listing. The genomic and genomic-
based SNP
context sequences are provided in both Table 2 and in the Sequence Listing.
SEQ ID NOS are
indicated in Table 1 for each transcript sequence (SEQ ID NOS:1-14), protein
sequence (SEQ
ID NOS:15-28), and transcript-based SNP context sequence (SEQ ID NOS:29-42),
and SEQ ID
NOS are indicated in Table 2 for each genomic sequence (SEQ ID NOS:43-50), and
genomic-
based SNP context sequence (SEQ ID NOS:51-58).
The SNP information includes:
- context sequence (taken from the transcript sequence in Table 1, and taken
from the
genomic sequence in Table 2) with the SNP represented by its TUB code,
including 100 bp
upstream (5') of the SNP position plus 100 bp downstream (3') of the SNP
position (the
transcript-based SNP context sequences in Table 1 are provided in the Sequence
Listing as
SEQ ID NOS:15-28; the genomic-based SNP context sequences in Table 2 are
provided in the
Sequence Listing as SEQ ID NOS:51-58).
- Celera hCV internal identification number for the SNP (in some instances,
an "hDV"
number is given instead of an "hCV" number)
- SNP position [position of the SNP within the given transcript sequence
(Table 1) or
within the given genomic sequence (Table 2)1
- SNP source (may include any combination of one or more of the following five
codes,
depending on which internal sequencing projects and/or public databases the
SNP has been
observed in: "Applera" = SNP observed during the re-sequencing of genes and
regulatory
regions of 39 individuals, "Celera" = SNP observed during shotgun sequencing
and assembly
of the Celera human genome sequence, "Celera Diagnostics" =
13

CA 02887830 2016-10-14
SNP observed during re-sequencing of nucleic acid samples from individuals who
have a
. disease, "dbSNP" = SNP observed in the dbSNP public database, "HGBASE" =
SNP
observed in the HGBASE public database, "HGMD" = SNP observed in the Human
Gene Mutation Database (HGMD) public database, "HapMap" = SNP observed in the
International HapMa.p Project public database, "CSNP" = SNP observed in an
internal
TM
Applied Biosystems (Foster City, CA) database of coding SNPS (cSNPs)) (NOTE:
=
- multiple "Applera" source entries for a single SNP indicate that the same
SNP was
:Covered by multiple overlapping amplification products and the re-sequencing
results .
(e.g., observed allele counts) from each of these amplification products is
I:leingARrided) .
- Population/allele/allele count information in the format of =
[populationl(first_allele,countlsecond allele,count)population2(first
allele,countIsecond =
allele,count) total (first_allele,total countlsecond_allele,total count)].
Theinformation in
this field includes populations/ethnic groups in which particular SNP alleles
have been
observed ("cau" = Caubasian, "his" = Ilispanic, "chn" = Chinese, and "air" =
African-
American, "jpn" = Japanese, "incl" = Indian, "mex" = Mexican, "sin" =
"American
Indian, "cm" = Celera donor, "no_pop" = no population information available),
identified
µSNP alleles, and observed allele counts (within each population group and
total allele
counts), where available ["-" in the allele field represents a deletion allele
of an
. insertion/deletion ("inder) polymorphism (in which case the
corresponding insertion
allele, which may be comprised of one or more nucleotides, is indicated in the
allele field=
on the opposite side of the "r); "-"in the count field indicates that allele
count . , =
information is not available]. For certain SSA from the public dbSNP database,

population/ethnic information is indicated as follows (this population
informationis
publicly available in dbSNP): "HISP1" = human individual DNA (anonymized.
samples)
from 23 individuals of self-described FITSPANIC heritage; "PAC" = human
individual
DNA (anonymized samples) from 24 individuals of sell-described PACIFIC RIM
heritage; "CAUC1" = human individual DNA (anonymized samples) from 31
individuals
of self-described CAUCASIAN heritage; "AFR1" = human individual DNA
(anonymi7P/1 samples) from 24 individuals of self-described AFRICA/=UAFRICAN
AMERICAN heritage; "Pl" = human individual DNA (anonymized samples) from 102
individnalg of self-described heritage; "PA130299515"; "SC_I2_A" = SANGER 12
14

CA 02887830 2014-11-27
WO 2005/111241
PCT/US2005/016051
= DNAs of Asian origin from Corielle cell repositories, 6 of which are male
and 6 female;
"SC_12_C" = SANGER 12 DNAs of Caucasian origin from Corielle cell repositories

from the CEPH/UTAH library. Six male and 6 female; "SC_12_AA" = SANGER 12
DNAs of African-American origin from Corielle cell repositories 6 of which are
male
and 6 female; "SC_95_C" = SANGER 95 DNAs of Caucasian origin from Corielle
cell
repositories from the CEPH/UTAH library; and "SC_12 CA" = Caucasians - 12 DNAs-

= from Corielle cell repositories that are from the CEPH/UTAH library. Six
male and 6
= female.
_ NOTE: For SNPs of "Applera" SNP source, genes/regulatory regions of 39
individuals (20 Caucasians and 19 African Americans) were re-sequenced and,
since each
SNP position is represented by two chromosomes in each individual (with the
exception
of SNPs on X and Y chromosomes in males, for which each SNP position is
represented
= by a single chromosome), up to 78 chromosomes were genotyped for each SNP
position.
Thus, the sum of the African-American ("afr") allele counts is up to 38, the
sum of the
Caucasian allele counts ("cau") is up to 40, and the total sum of all allele
counts is ti-p to
78.
(NOTE: semicolons separate population/allele/count information corresponding
to
each indicated SNP source; i.e., if four SNP sources are indicated, such as
"Celera",
"dbSNP", "HGBASE", and "HGMD", then population/allele/count information is
provided in four groups which are separated by semicolons and listed in the
same order .
as the listing of SNP sources, with each population/allele/count information
group .
= corresponding to the respective SNP source based on order; thus, in this
example, the first
population/allele/count information group would correspond to the first listed
SNP source
(Celera) and the third population/allele/count information group separated by
semicolons
would correspond to the third listed SNP source (HGBASE); if
population/allele/count
information is not available for any particular SNP source, then a pair of
semicolons is
still inserted as a place-holder in order to maintain correspondence between
the list of
SNP sources and the corresponding listing of population/allele/count
information)
- SNP type (e.g., location within gene/transcript and/or predicted functional
effect) ["MIS-SENSE MUTATION" = SNP causes a change in the encoded amino acid
(i.e., a non-synonymous coding SNP); "SILENT MUTATION" = SNP does not cause a

CA 02887830 2015-08-28
CA2887830
change in the encoded amino acid (i.e., a non-synonymous coding SNP); "SILENT
MUTATION" = SNP does not cause a change in the encoded amino acid (i.e., a
synonymous
coding SNP); "STOP CODON MUTATION" = SNP is located in a stop codon; "NONSENSE

MUTATION" = SNP creates or destroys a stop codon; "UTR 5" = SNP is located in
a 5' UTR
of a transcript; "UTR 3" = SNP is located in a 3' UTR of a transcript;
"PUTATIVE UTR 5" =
SNP is located in a putative 5' UTR; "PUTATIVE UTR 3" = SNP is located in a
putative 3'
UTR; "DONOR SPLICE SITE" = SNP is located in a donor splice site (5' intron
boundary);
"ACCEPTOR SPLICE SITE" = SNP is located in an acceptor splice site (3' intron
boundary);
"CODING REGION" = SNP is located in a protein-coding region of the transcript;
"EXON" =
SNP is located in an exon; "INTRON" = SNP is located in an intron; "hmCS" =
SNP is located
in a human-mouse conserved segment; "TFBS" = SNP is located in a transcription
factor
binding site; "UNKNOWN" = SNP type is not defined; "TNTERGENTC" = SNP is
intergenic,
i.e., outside of any gene boundary]
- Protein coding information (Table I only), where relevant, in the format of
[protein
SEQ ID NO:#, amino acid position, (amino acid-1, codonl) (amino acid-2,
codon2)]. The
information in this field includes SEQ ID NO of the encoded protein sequence,
position of the
amino acid residue within the protein identified by the SEQ ID NO that is
encoded by the
codon containing the SNP, amino acids (represented by one-letter amino acid
codes) that are
encoded by the alternative SNP alleles (in the case of stop codons, "X" is
used for the one-letter
amino acid code), and alternative codons containing the alternative SNP
nucleotides which
encode the amino acid residues (thus, for example, for missense mutation-type
SNPs, at least
two different amino acids and at least two different codons are generally
indicated; for silent
mutation-type SNPs, one amino acid and at least two different codons are
generally indicated,
etc.). In instances where the SNP is located outside of a protein-coding
region (e.g., in a UTR
region), "None" is indicated following the protein SEQ ID NO.
16

CA 02887830 2015-08-28
CA2887830
DESCRIPTION OF TABLE 3
Table 3 provides sequences (SEQ ID NOS: 59-82) of primers that have been
synthesized and used in the laboratory to carry out allele-specific PCR
reactions in order to
assay the SNPs disclosed in Tables 4 and 5 during the course of association
studies to verify the
association of these SNPs with liver fibrosis.
Table 3 provides the following:
- the column labeled "Marker" provides an hCV identification number for
each SNP site
- the column labeled "Alleles" designates the two alternative alleles at
the SNP site
identified by the hCV identification number that are targeted by the allele-
specific primers (the
allele-specific primers are shown as "Sequence A" and "Sequence B") [NOTE:
Alleles may be
presented in Table 3 based on a different orientation (i.e., the reverse
complement) relative to
how the same alleles are presented in Tables 1, 2, 4, and 5].
- the column labeled "Sequence A (allele-specific primer)" provides an allele-
specific
primer that is specific for an allele designated in the "Alleles" column
- the column labeled "Sequence B (allele-specific primer)" provides an allele-
specific
primer that is specific for the other allele designated in the "Alleles"
column
- the column labeled "Sequence C (common primer)" provides a common primer
that is
used in conjunction with each of the allele-specific primers (the "Sequence A"
17

CA 02887830 2015-08-28
CA2887830
primer and the "Sequence B" primer) and which hybridizes at a site away from
the SNP
position.
All primer sequences arc given in the 5' to 3' direction.
Each of the nucleotides designated in the "Alleles" column matches or is the
reverse
complement of (depending on the orientation of the primer relative to the
designated allele) the
3' nucleotide of the allele-specific primer (either "Sequence A" or "Sequence
B") that is
specific for that allele.
DESCRIPTION OF TABLE 4
Table 4 provides results of statistical analyses for SNPs disclosed in Tables
1-2 (SNPs
can be cross-referenced between tables based on their hCV identification
numbers), and the
association of these SNPs with early and late stages of fibrosis (minimal or
moderate to severe
fibrosis). The statistical results shown in Table 4 provide support for the
association of a SNP
with minimal to severe fibrosis. Table 4 shows the association of this SNP
with fibrosis is
supported by p-values <0.05 in a genotype association based on ordinal (ord)
(major
homozygotes, heterozygotes and minor homozygotes) or dominant/recessive (dom)
modes
(major homozygotes vs. heterozygotes and minor homozygotes) of inheritance.
Table 4 presents statistical associations of the SNP with the trial endpoint.
The column
labeled "Marker" presents the SNP as identified by its unique identifier
number and its mode of
association with the fibrosis stage endpoint. The column labeled "Gene symbol"
presents the
common gene name of the gene containing the SNP. The data obtained from the
individual
sample sets are presented in two groups of columns. The groups of columns
labeled "Stanford
Samples " means the samples were obtained from patients at Stanford. This
sample set
contains samples obtained from patients that had extreme cases of fibrosis.
62% of the patients
had a minimum fibrosis stage (level 0-2) (controls) and 38% had a severe
fibrosis stage (level
3-4) (cases). The groups of columns labeled "UCSF Samples" means the samples
were
obtained from a study performed at the University of California, San
Francisco. These samples
were obtained from patients that had a variety of stages of fibrosis including
minimal, moderate
and severe stages of fibrosis (46%, 26% and 28% respectively), which reflects
the distribution
of fibrosis patients in clinics. The column labeled "OR" indicates the Odds
Ratio, an
18

CA 02887830 2015-08-28
CA2887830
approximation of the relative risk for an individual for the defined endpoint
associated with the
SNP. ORs less than 1 indicate the risk allele is protective for the defined
endpoint. and ORs
greater than 1 indicate the risk allele increases the risk of having the
defined endpoint. The
columns labeled "LCL" and "UCL" give the lower and upper confidence levels of
the ORs.
The column labeled "p vat" indicates the results of either the chi-square test
(Dom) or the
Fisher Exact test (Ord) to determine if the qualitative phenotype is a
function of the SNP
genotype.
DESCRIPTION OF TABLE 5
Table 5 provides results of statistical analyses for SNPs disclosed in Tables
1-2 (SNPs
can be cross-referenced between tables based on their hCV identification
numbers), and the
association of these SNPs with mild or severe fibrosis stage. The statistical
results shown in
Table 5 provide support for the association of these SNPs with bridging
fibrosis/cirrhosis. For
example, the statistical results provided in Table 5 show that the association
of these SNPs with
is supported by p-values <0.1 in an allelic association test in the University
of California
(UCSF) and the Virginia Commonwealth University (VCU) sample sets in at least
one of the
following strata; all patients (A), Caucasian only (C), or other than
Caucasian (0). Additional
SNP association with bridging fibrosis/cirrhosis is seen in the sample sets
obtained from the
University of Illinois, Chicago (UIC) and Stanford University (Stanford).
Table 5 presents statistical associations of SNPs with trial endpoints. The
column
labeled "Marker" presents each SNP as identified by its unique identifier
number. The column
labeled "Risk allele" presents the risk allele for each of the identified
SNPs. The risk allele
may also be presented in the Tables 1-2 as the reverse complement of the
allele presented in
Table 4. The column labeled "Strata" indicates the group of individuals in
which the
association was observed. "A" indicates that the association was observed in
all individuals,
"C" indicates that the association was observed in Caucasians, "0" indicates
the association
was observed in other than Caucasians. The groups of columns labeled "UCSF"
means the
samples were obtained from the University of California, San Francisco. Among
the 537
patients from UCSF, the samples had minimal (stage 0-1,
19

CA 02887830 2015-08-28
=
CA 2887830
52%), moderate (stage 2, 23%) or severe (stage 3-4, 25%) fibrosis. The groups
of columns
labeled "VC U" means the samples were obtained from the Virginia Commonwealth
University.
These samples were obtained from 483 patients that had minimal (stage 0-1,
18%), moderate
(stage 2, 34%) or severe (stage 3-4, 48%) fibrosis. The groups of columns
labeled "U IC"
means the samples were obtained from the University of Illinois, Chicago.
These samples were
obtained from 115 patients that had minimal (stage 0-1, 29%), moderate (stage
2, 30%) or
severe (stage 3-4, 41%) fibrosis. The groups of columns labeled "Stanford"
means the samples
were obtained from Stanford University. These samples were obtained from
extreme cases,
62% contained minimal (stage 0-1) fibrosis and 38% contained severe (stage 3-
4) fibrosis. The
column labeled "CT AF" gives the control allele frequency of that SNP in that
stratum. The
column labeled "CASE AF" gives the case allele frequency of that SNP in that
stratum. The
column labeled "OR" indicates an approximation of the relative risk for an
individual for the
defined endpoint associated with the SNP. ORs less than 1 indicate the risk
allele is protective
for the defined endpoint, and ORs greater than 1 indicate the risk allele
increases the risk of
having the defined endpoint. The column labeled "p_2tail" indicates the p-
value generated by
the Fisher Exact test (allelic association) to determine if the qualitative
phenotype is a function
of the SNP genotype and is either a protective or risk allele in the UCSF
sample set. The
column labeled "p hail" indicates the p-value generated by the Fisher Exact
test to determine
if the qualitative phenotype is a function of the SNP genotype in the VCU, UIC
or Stanford
samples and the OR is going in the same direction as the OR for that SNP in
the UCSF sample.
DESCRIPTION OF THE FIGURE
Figure 1 provides a diagrammatic representation of a computer-based discovery
system
containing the SNP information of the present invention in computer readable
form.
DETAILED DESCRIPTION OF THE INVENTION
The present disclosure is of SNPs associated with liver fibrosis and related
pathologies, nucleic acid molecules containing SNPs, methods and reagents for
the detection of the SNPs disclosed herein, uses of these SNPs for the
development of
detection reagents, and assays or kits that utilize such reagents. The liver
fibrosis-associated

CA 02887830 2015-08-28
CA 2887830
SNPs disclosed herein may be useful for diagnosing, screening for, and
evaluating
predisposition to liver fibrosis, including an increased or decreased risk of
developing bridging
fibrosis/cirrhosis, the rate of progression of fibrosis, and related
pathologies in humans.
Furthermore, such SNPs and their encoded products may be useful targets for
the development
of therapeutic agents.
A large number of SNPs have been identified from re-sequencing DNA from 39
individuals, and they are indicated as "Applere SNP source in Tables 1-2.
Their allele
frequencies observed in each of the Caucasian and African-American ethnic
groups are
provided. Additional SNPs included herein were previously identified during
shotgun
sequencing and assembly of the human genome, and they are indicated as
"Cetera'. SNP source
in Tables 1-2. Furthermore, the information provided in Table 1-2,
particularly the allele
frequency information obtained from 39 individuals and the identification of
the precise
position of each SNP within each gene/transcript, allows haplotypes (i.e.,
groups of SNPs that
are co-inherited) to be readily inferred. The present invention encompasses
SNP haplotypes, as
well as individual SNPs.
Thus, the present disclosure is of individual SNPs associated with liver
fibrosis, as well
as combinations of SNPs and haplotypes in genetic regions associated with
liver fibrosis,
polymorphic/variant transcript sequences (SEQ ID NOS:1-14) and genomic
sequences (SEQ
ID NOS:43-50) containing SNPs, encoded amino acid sequences (SEQ ID NOS: 15-
28), and
both transcript-based SNP context sequences (SEQ ID NOS: 29-42) and genomic-
based SNP
context sequences (SEQ ID NOS:51-58) (transcript sequences, protein sequences,
and
transcript-based SNP context sequences are provided in Table l'and the
Sequence Listing;
genomic sequences and genomic-based SNP context sequences are provided in
Table 2 and the
Sequence Listing), methods of detecting these polymorphisms in a test sample,
methods of
determining the risk of an individual of having or developing liver fibrosis,
methods of
screening for compounds useful for treating disorders associated with a
variant gene/protein
such as liver fibrosis, compounds identified by these screening methods,
methods of using the
disclosed SNPs to select a treatment strategy, methods of treating a disorder
associated with a
variant gene/protein (i.e., therapeutic methods), and methods of using the
SNPs disclosed
herein for human identification.
21

CA 02887830 2015-08-28
CA 2887830
The present disclosure is of novel SNPs associated with liver fibrosis and
related
pathologies, as well as SNPs that were previously known in the art, but were
not previously
known to be associated with liver fibrosis. Accordingly, the present
disclosure is of novel
compositions and methods based on the novel SNPs disclosed herein, and also of
novel
methods of using the known, but previously unassociated, SNPs in methods
relating to liver
fibrosis (e.g., for diagnosing liver fibrosis, etc.). In Tables 1-2, known
SNPs are identified
based on the public database in which they have been observed, which is
indicated as one or
more of the following SNP types: "dhSNP" = SNP observed in dbSNP, "HGBASE" =
SNP
observed in EIGBASE, and "FIGMD" = SNP observed in the Human Gene Mutation
Database
(HGMD).
Particular SNP alleles disclosed herein can be associated with either an
increased risk of
having or developing liver fibrosis and related pathologies, or a decreased
risk of having or
developing liver fibrosis. SNP alleles that are associated with a decreased
risk of having or
developing liver fibrosis may be referred to as "protective" alleles, and SNP
alleles that are
associated with an increased risk of having or developing liver fibrosis may
be referred to as
"susceptibility" alleles, "risk" alleles, or "risk factors". Thus, whereas
certain SNPs (or their
encoded products) can be assayed to determine whether an individual possesses
a SNP allele
that is indicative of an increased risk of having or developing liver fibrosis
(i.e., a susceptibility
allele), other SNPs (or their encoded products) can be assayed to determine
whether an
individual possesses a SNP allele that is indicative of a decreased risk of
having or developing
liver fibrosis (i.e., a protective allele). Similarly, particular SNP alleles
disclosed herein can be
associated with either an increased or decreased likelihood of responding to a
particular
treatment or therapeutic compound, or an increased or decreased likelihood of
experiencing
toxic effects from a particular treatment or therapeutic compound. The term
"altered" may be
used herein to encompass either of these two possibilities (e.g., an increased
or a decreased
risk/likelihood).
Those skilled in the art will readily recognize that nucleic acid molecules
may be
double-stranded molecules and that reference to a particular site on one
strand refers, as well, to
the corresponding site on a complementary strand. In defining a SNP position,
SNP allele, or
nucleotide sequence, reference to an adenine, a thymine (uridine), a cytosine,
or a guanine at a
22

CA 02887830 2015-08-28
CA 2887830
particular site on one strand of a nucleic acid molecule also defines the
thymine (uridine),
adenine, guanine, or cytosine (respectively) at the corresponding site on a
complementary
strand of the nucleic acid molecule. Thus, reference may be made to either
strand in order to
refer to a particular SNP position, SNP allele, or nucleotide sequence. Probes
and primers, may
be designed to hybridize to either strand and SNP genotyping methods disclosed
herein may
generally target either strand. Throughout the specification, in identifying a
SNP position,
reference is generally made to the protein-encoding strand, only for the
purpose of
convenience.
References to variant peptides, polypeptides, or proteins described herein
include
peptides, polypeptides, proteins, or fragments thereof, that contain at least
one amino acid
residue that differs from the corresponding amino acid sequence of the art-
known
peptide/polypeptide/protein (the art-known protein may be interchangeably
referred to as the
"wild-type", "reference", or "normal" protein). Such variant
peptides/polypeptides/proteins
can result from a codon change caused by a nonsynonymous nucleotide
substitution at a
protein-coding SNP position (i.e., a missense mutation) disclosed herein.
Variant
peptides/polypeptides/proteins described herein can also result from a
nonsense mutation, i.e., a
SNP that creates a premature stop codon, a SNP that generates a read-through
mutation by
abolishing a stop codon, or due to any SNP disclosed herein that otherwise
alters the structure,
function/activity, or expression of a protein, such as a SNP in a regulatory
region (e.g. a
promoter or enhancer) or a SNP that leads to alternative or defective
splicing, such as a SNP in
an intron or a SNP at an exon/intron boundary. As used herein, the terms
"polypeptide",
"peptide", and "protein" are used interchangeably.
ISOLATED NUCLEIC ACID MOLECULES
AND SNP DETECTION REAGENTS & KITS
Tables 1 and 2 provide a variety of information about each SNP disclosed
herein that is
associated with liver fibrosis, including the transcript sequences (SEQ ID
NOS:1-14), genomic
sequences (SEQ ID NOS:43-50), and protein sequences (SEQ ID NOS:15-28) of the
encoded
gene products (with the SNPs indicated by TUB codes in the nucleic acid
sequences). In
23

CA 02887830 2015-08-28
CA 2887830
addition, Tables 1 and 2 include SNP context sequences, which generally
include 100
nucleotide upstream (5') plus 100 nucleotides downstream (3') of each SNP
position (SEQ ID
NOS:29-42 correspond to transcript-based SNP context sequences disclosed in
Table 1, and
SEQ ID NOS:51-58 correspond to genomic-bascd context sequences disclosed in
Table 2), the
alternative nucleotides (alleles) at each SNP position, and additional
information about the
variant where relevant, such as SNP type (coding, missense, splice site, UTR,
etc.), human
populations in which the SNP was observed, observed allele frequencies,
information about the
encoded protein, etc.
Isolated Nucleic Acid Molecules
The present disclosure is of isolated nucleic acid molecules that contain one
or more SNPs
disclosed Table 1 and/or Table 2. Isolated nucleic acid molecules containing
one or more
SNPs disclosed in at least one of Tables 1-2 may be interchangeably referred
to throughout
the present text as "SNP-containing nucleic acid molecules". Isolated nucleic
acid
molecules may optionally encode a full-length variant protein or fragment
thereof. The
isolated nucleic acid molecules of the present invention also include probes
and primers
(which are described in greater detail below in the section entitled "SNP
Detection
Reagents"), which may be used for assaying the disclosed SNPs, and isolated
full-length
genes, transcripts, cDNA molecules, and fragments thereof, which may be used
for such
purposes as expressing an encoded protein.
As used herein, an "isolated nucleic acid molecule" generally is one that
contains a SNP of
the present invention or one that hybridizes to such molecule such as a
nucleic acid with a
complementary sequence, and is separated from most other nucleic acids present
in the natural
source of the nucleic acid molecule. Moreover, an "isolated" nucleic acid
molecule, such as a
cDNA molecule containing a SNP of the present invention, can be substantially
free of other
cellular material, or culture medium when produced by recombinant techniques,
or chemical
precursors or other chemicals when chemically synthesized. A nucleic acid
molecule can be fused
to other coding or regulatory sequences and still be considered "isolated".
Nucleic acid molecules
present in non-human transgenic animals, which do not naturally occur in the
animal, are also
considered "isolated". For example, recombinant DNA molecules contained in a
vector are
24

CA 02887830 2015-08-28
CA 2887830
considered "isolated". Further examples of "isolated" DNA molecules include
recombinant DNA
molecules maintained in heterologous host cells, and purified (partially or
substantially) DNA
molecules in solution. Isolated RNA molecules include in vivo or in vitro RNA
transcripts of the
isolated SNP-containing DNA molecules of the present invention. Isolated
nucleic acid molecules
according to the present invention further include such molecules produced
synthetically.
Generally, an isolated SNP-containing nucleic acid molecule comprises one or
more SNP
positions disclosed herein with flanking nucleotide sequences on either side
of the SNP positions.
A flanking sequence can include nucleotide residues that are naturally
associated with the SNP
site and/or heterologous nucleotide sequences. Preferably the flanking
sequence is up to about
500, 300, 100, 60, 50, 30, 25, 20, 15, 10, 8, or 4 nucleotides (or any other
length in-between) on
either side of a SNP position, or as long as the full-length gene or entire
protein-coding sequence
(or any portion thereof such as an exon), especially if the SNP-containing
nucleic acid molecule is
to be used to produce a protein or protein fragment.
For full-length genes and entire protein-coding sequences, a SNP flanking
sequence can
be, for example, up to about 5KB, 4KB, 3KB, 2KB, 1KB on either side of the
SNP.Furthermore,
in such instances, the isolated nucleic acid molecule comprises exonic
sequences (including
protein-coding and/or non-coding exonic sequences), but may also include
intronic sequences.
Thus, any protein coding sequence may be either contiguous or separated by
introns. The
important point is that the nucleic acid is isolated from remote and
unimportant flanking sequences
and is of appropriate length such that it can be subjected to the specific
manipulations or uses
described herein such as recombinant protein expression, preparation of probes
and primers for
assaying the SNP position, and other uses specific to the SNP-containing
nucleic acid sequences.
An isolated SNP-containing nucleic acid molecule can comprise, for example, a
full-length
gene or transcript, such as a gene isolated from genomic DNA (e.g., by cloning
or PCR
amplification), a cDNA molecule, or an mRNA transcript molecule. Polymorphic
transcript
sequences are provided in Table 1 and in the Sequence Listing (SEO ID NOS: 1-
14), and
polymorphic genomic sequences are provided in Table 2 and in the Sequence
Listing (SEQ ID
NOS:43-50). Furthermore, fragments of such full-length genes and transcripts
that contain one or
more SNPs disclosed herein are also encompassed by the present invention, and
such fragments

CA 02887830 2015-08-28
CA 2887830
may be used, for example, to express any part of a protein, such as a
particular functional domain
or an antigenic epitope.
Thus, the present disclosure also encompasses fragments of the nucleic acid
sequences
provided in Tables 1-2 (transcript sequences are provided in Table 1 as SEQ ID
NOS:1-14,
genomic sequences are provided in Table 2 as SEQ ID NOS:43-50, transcript-
based SNP context
sequences are provided in Table 1 as SEQ ID NO:29-42, and genomic-based SNP
context
sequences are provided in Table 2 as SEQ ID NO:51-58) and their complements. A
fragment
typically comprises a contiguous nucleotide sequence at least about 8 or more
nucleotides, more
preferably at least about 12 or more nucleotides, and even more preferably at
least about 16 or
more nucleotides. Further, a fragment could comprise at least about 18, 20,
22, 25, 30, 40, 50, 60,
80, 100, 150, 200, 250 or 500 (or any other number in-between) nucleotides in
length. The length
of the fragment will be based on its intended use. For example, the fragment
can encode epitope-
bearing regions of a variant peptide or regions of a variant peptide that
differ from the
normal/wild-type protein, or can be useful as a polynucleotide probe or
primer. Such fragments
can be isolated using the nucleotide sequences provided in Table 1 and/or
Table 2 for the synthesis
of a polynucleotide probe. A labeled probe can then be used, for example, to
screen a cDNA
library, genomic DNA library, or mRNA to isolate nucleic acid corresponding to
the coding
region. Further, primers can be used in amplification reactions, such as for
purposes of assaying
one or more SNPs sites or for cloning specific regions of a gene.
An isolated nucleic acid molecule described herein may further encompasses a
SNP-
containing polynucleotide that is the product of any one of a variety of
nucleic acid
amplification methods, which are used to increase the copy numbers of a
polynucleotide of
interest in a nucleic acid sample. Such amplification methods are well known
in the art, and
they include but are not limited to, polymerase chain reaction (PCR) (U.S.
Patent Nos.
4,683,195; and 4,683,202; PCR Technology: Principles and Applications for DNA
Amplification, ed. H.A. Erlich, Freeman Press, NY, NY, 1992), ligase chain
reaction (LCR)
(Wu and Wallace, Genomics 4:560, 1989; Landcgren et al., Science 241:1077,
1988), strand
displacement amplification (SDA) (U.S. Patent Nos. 5,270,184; and 5,422,252),
transcription-
mediated amplification (TMA) (U.S. Patent No. 5,399,491), linked linear
amplification (LLA)
(U.S. Patent No. 6,027,923), and the like, and isothermal amplification
methods such as nucleic
26

CA 02887830 2015-08-28
CA 2887830
acid sequence based amplification (NASBA), and self-sustained sequence
replication (Guatelli
etal., Proc. Nat). Acad. Sci. USA 87: 1874, 1990). Based on such
methodologies, a person
skilled in the art can readily design primers in any suitable regions 5' and
3' to a SNP disclosed
herein. Such primers may be used to amplify DNA of any length so long that it
contains the
SNP of interest in its sequence.
As used herein, an "amplified polynucleotide" is a SNP-containing nucleic acid

molecule whose amount has been increased at least two fold by any nucleic acid
amplification
method performed in vitro as compared to its starting amount in a test sample.
In other
preferred embodiments, an amplified polynucleotide is the result of at least
ten fold, fifty fold,
one hundred fold, one thousand fold, or even ten thousand fold increase as
compared to its
starting amount in a test sample. In a typical PCR amplification, a
polynucleotide of interest is
often amplified at least fifty thousand fold in amount over the unamplified
gcnomic DNA, but
the precise amount of amplification needed for an assay depends on the
sensitivity of the
subsequent detection method used.
Generally, an amplified polynucleotide is at least about 16 nucleotides in
length. More
typically, an amplified polynucleotide is at least about 20 nucleotides in
length. In a preferred
embodiment, an amplified polynucleotide is at least about 30 nucleotides in
length. In a more
preferred embodiment, an amplified polynucleotide is at least about 32, 40,
45, 50, or 60
nucleotides in length. In yet another preferred embodiment, an amplified
polynucleotide is at
least about 100, 200, 300, 400, or 500 nucleotides in length. While the total
length of an
amplified polynucleotide can be as long as an exon, an intron or the entire
gene where the SNP
of interest resides, an amplified product is typically up to about 1,000
nucleotides in length
(although certain amplification methods may generate amplified products
greater than 1000
nucleotides in length). More preferably, an amplified polynucleotide is not
greater than about
600-700 nucleotides in length. It is understood that irrespective of the
length of an amplified
polynucleotide, a SNP of interest may be located anywhere along its sequence.
In a specific embodiment, the amplified product is at least about 201
nucleotides in
length, comprises one of the transcript-based context sequences or the genomic-
based context
sequences shown in Tables 1-2. Such a product may have additional sequences on
its 5' end or
3' end or both. In another embodiment, the amplified product is about 101
nucleotides in
27

CA 02887830 2015-08-28
CA 2887830
length, and it contains a SNP disclosed herein. Preferably, the SNP is located
at the middle of
the amplified product (e.g., at position 101 in an amplified product that is
201 nucleotides in
length, or at position 51 in an amplified product that is 101 nucleotides in
length), or within 1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, or 20 nucleotides from the middle of the
amplified product
(however, as indicated above, the SNP of interest may be located anywhere
along the length of
the amplified product).
The present disclosure is also of isolated nucleic acid molecules that
comprise, consist of,
or consist essentially of one or more polynucleotide sequences that contain
one or more SNPs
disclosed herein, complements thereof, and SNP-containing fragments thereof.
Accordingly, the present disclosure is of nucleic acid molecules that consist
of any of the
nucleotide sequences shown in Table 1 and/or Table 2 (transcript sequences are
provided in Table
1 as SEQ ID NOS:1-14, genomic sequences are provided in Table 2 as SEQ ID
NOS:43-50,
transcript-based SNP context sequences are provided in Table 1 as SEQ ID NO:15-
42, and
genomic-based SNP context sequences are provided in Table 2 as SEQ ID NO:51-
58), or any
nucleic acid molecule that encodes any of the variant proteins provided in
Table 1 (SEQ ID
NOS:15-28). A nucleic acid molecule consists of a nucleotide sequence when the
nucleotide
sequence is the complete nucleotide sequence of the nucleic acid molecule.
The present disclosure is further of nucleic acid molecules that consist
essentially of any of
the nucleotide sequences shown in Table 1 and/or Table 2 (transcript sequences
are provided in
Table 1 as SEQ ID NOS:1-14, genomic sequences are provided in Table 2 as SEQ
ID NOS:43-50,
transcript-based SNP context sequences are provided in Table 1 as SEQ ID NO:29-
42, and
genomic-based SNP context sequences are provided in Table 2 as SEQ ID NO:51-
58), or any
nucleic acid molecule that encodes any of the variant proteins provided in
Table 1 (SEQ ID
NOS:15-28). A nucleic acid molecule consists essentially of a nucleotide
sequence when such a
nucleotide sequence is present with only a few additional nucleotide residues
in the final nucleic
acid molecule.
The present disclosure is further of nucleic acid molecules that comprise any
of the
nucleotide sequences shown in Table 1 and/or Table 2 or a SNP-containing
fragment thereof
(transcript sequences are provided in Table 1 as SEQ ID NOS:1-14, genomic
sequences are
provided in Table 2 as SEQ ID NOS:43-50, transcript-based SNP context
sequences are provided
28

CA 02887830 2015-08-28
CA 2887830
in Table 1 as SEQ ID NO:29-42, and genomic-based SNP context sequences are
provided in Table
2 as SEQ ID NO :51-58), or any nucleic acid molecule that encodes any of the
variant proteins
provided in Table 1 (SEQ ID NOS:15-28). A nucleic acid molecule comprises a
nucleotide
sequence when the nucleotide sequence is at least part of the final nucleotide
sequence of the
nucleic acid molecule. In such a fashion, the nucleic acid molecule can be
only the nucleotide
sequence or have additional nucleotide residues, such as residues that are
naturally associated with
it or heterologous nucleotide sequences. Such a nucleic acid molecule can have
one to a few
additional nucleotides or can comprise many more additional nucleotides. A
brief
29

CA 02887830 2014-11-27
WO 2005/111241
PCT/US2005/016051
description of how various types of these nucleic acid molecules can be
readily made and
= isolatectis provided below, and such techniques are well known to those
of ordinary skill in
the art (Sambrook and Russell, 2000, Molecular Cloning: A Laboratory Manual,
Cold
= Spring Harbor Press, NY).
The isolated nucleic acid molecules can encode mature proteins plus additional
=
= amino or carboxyl-terminal amino acids or both, or amino acids interior
to the mature
.peptide (when the mature form has more than one peptide chain, for instance).
Such
sequences may play a role in processing of a protein from precursor to a
mature form, =
facilitate protein trafficking, prolong or shorten protein half-life, or
facilitate manipulation' of
a protein for assay or production. As generally is the case in situ, the
additional amino acids
= may be processed away from the mature protein by cellular enzymes.
Thus, the isolated nucleic acid molecules include, but are not limited to,
nucleic acid
= molecules having a sequence encoding a peptide alone, a sequence encoding
a mature
peptide and additional coding sequences such as a leader or secretory sequence
(e.g., a pre-
. pro or pro-protein sequence), a sequence encoding a mature peptide with or
without
additional coding sequences, plus additional non-coding sequences, for example
introns and
non-coding 5' and 3' sequences such as transcribed but untranslated sequences
that play a
role in, for example, transcription, mRNA processing (including splicing and =
. polyadenylation signals), ribosome binding, and/or stability of mRNA. In
addition, the ,
= nucleic acid molecules may be fused to heterologous marker sequences
encoding, for
example, a peptide that facilitates purification.
=
Isolated nucleic acid molecules can be in the form of RNA, such as mRNA, or in
- the form DNA, including cDNA and genoraic DNA, which may be obtained,
for
example, by molecular cloning or produced: by chemical synthetic techniques or
by a
combination thereof (Sambrook and Russell, 2000, Molecular Cloning: A
Laboratory
Manual, Cold Spring Harbor Press, NY). Furthermore, isolated nucleic acid
molecules,
particularly SNP detection reagents such as probes and primers, can also be
partially or
completely in the form of one or more types of nucleic acid analogs, such as
peptide
nucleic acid (PNA) (U.S. Patent Nos. 5,539,082; 5,527,675; 5,623,049;
5,714,331). The
nucleic acid, especially DNA, can be double-stranded or single-stranded.
Single-stranded
nucleic acid can be the coding strand (sense strand) or the complementary non-
coding
=

CA 02887830 2016-10-14
CA 2887830
strand (anti-sense strand). DNA, RNA, or PNA segments can be assembled, for
example, from
fragments of the human genome (in the case of DNA or RNA) or single
nucleotides, short
oligonucleotide linkers, or from a series of oligonucleotides, to provide a
synthetic nucleic acid
molecule. Nucleic acid molecules can be readily synthesized using the
sequences provided
herein as a reference; oligonucleotide and PNA oligomer synthesis techniques
are well known
in the art (see, e.g., Corey, "Peptide nucleic acids: expanding the scope of
nucleic acid
recognition", Trends Biotechnol. 1997 Jun;15(6):224-9, and Hyrup et al.,
"Peptide nucleic acids
(PNA): synthesis, properties and potential applications", Bioorg Med Chem.
1996 Jan;4(1):5-
23). Furthermore, large-scale automated oligonucleotide/PNA synthesis
(including synthesis
on an array or bead surface or other solid support) can readily be
accomplished using
commercially available nucleic acid synthesizers, such as the Applied
Biosystems rm (Foster
City, CA) 3900 High-Throughput DNA Synthesizer or Expedite 8909 Nucleic Acid
Synthesis
System, and the sequence information provided herein.
The present disclosure encompasses nucleic acid analogs that contain modified,
synthetic, or non-naturally occurring nucleotides or structural elements or
other
alternative/modified nucleic acid chemistries known in the art. Such nucleic
acid analogs are
useful, for example, as detection reagents (e.g., primers/probes) for
detecting one or more SNPs
identified in Table 1 and/or Table 2. Furthermore, kits/systems (such as
beads, arrays, etc.) that
include these analogs are also encompassed by the present invention. For
example, PNA
oligomers that are based on the polymorphic sequences disclosed herein are
specifically
contemplated. PNA oligomers are analogs of DNA in which the phosphate backbone
is
replaced with a peptide-like backbone (Lagriffoul et al., Bioorganic &
Medicinal Chemistry
Letters, 4: 1081-1082 (1994), Petersen et al., Bioorganic & Medicinal
Chemistry Letters, 6:
793-796 (1996), Kumar et al., Organic Letters 3(9): 1269-1272 (2001),
W096/04000). PNA
hybridizes to complementary RNA or DNA with higher affinity and specificity
than
conventional oligonucleotides and oligonucleotide analogs. The properties of
PNA enable
novel molecular biology and biochemistry applications unachievable with
traditional
oligonucleotides and peptides.
Additional examples of nucleic acid modifications that improve the binding
properties
and/or stability of a nucleic acid include the use of base analogs such as
inosine, intercalators
31

CA 02887830 2015-08-28
CA 2887830
(U.S. Patent No. 4,835,263) and the minor groove binders (U.S. Patent No.
5,801,115). Thus,
references herein to nucleic acid molecules, SNP-containing nucleic acid
molecules, SNP
detection reagents (e.g., probes and primers),
oligonucleotides/polynucleotides include PNA
oligomers and other nucleic acid analogs. Other examples of nucleic acid
analogs and
alternative/modified nucleic acid chemistries known in the art are described
in Current
Protocols in Nucleic Acid Chemistry, John Wiley & Sons, N.Y. (2002).
The present disclosure is further of nucleic acid molecules that encode
fragments of the
variant polypeptides disclosed herein as well as nucleic acid molecules that
encode obvious
variants of such variant polypeptides. Such nucleic acid molecules may be
naturally occurring,
such as paralogs (different locus) and orthologs (different organism), or may
be constructed by
recombinant DNA methods or by chemical synthesis. Non-naturally occurring
variants may be
made by mutagenesis techniques, including those applied to nucleic acid
molecules, cells, or
organisms. Accordingly, the variants can contain nucleotide substitutions,
deletions, inversions
and insertions (in addition to the SNPs disclosed in Tables 1-2). Variation
can occur in either
or both the coding and non-coding regions. The variations can produce
conservative and/or
non-conservative amino acid substitutions.
Further variants of the nucleic acid molecules disclosed in Tables 1-2, such
as naturally
occurring allelic variants (as well as orthologs and paralogs) and synthetic
variants produced by
mutagenesis techniques, can be identified and/or produced using methods well
known in the
art. Such further variants can comprise a nucleotide sequence that shares at
least 70-80%, 80-
85%, 85-90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity
with a
nucleic acid sequence disclosed in Table 1 and/or Table 2 (or a fragment
thereof) and that
includes a novel SNP allele disclosed in Table 1 and/or Table 2. Further,
variants can comprise
a nucleotide sequence that encodes a polypeptide that shares at least 70-80%,
80-85%, 85-90%,
91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity with a
polypeptide
sequence disclosed in Table 1 (or a fragment thereof) and that includes a
novel SNP allele
disclosed in Table 1 and/or Table 2. Thus, an aspect of the present disclosure
that is
specifically contemplated are isolated nucleic acid molecules that have a
certain degree of
sequence variation compared with the sequences shown in Tables 1-2, but that
contain a novel
SNP allele disclosed herein. In other words, as long as an isolated nucleic
acid molecule
32

CA 02887830 2015-08-28
CA 2887830
contains a novel SNP allele disclosed herein, other portions of the nucleic
acid molecule that
flank the novel SNP allele can vary to some degree from the specific
transcript, genomic, and
context sequences shown in Tables 1-2, and can encode a polypeptide that
varies to some
degree from the specific polypeptide sequences shown in Table 1.
To determine the percent identity of two amino acid sequences or two
nucleotide
sequences of two molecules that share sequence homology, the sequences are
aligned for
optimal comparison purposes (e.g., gaps can be introduced in one or both of a
first and a second
amino acid or nucleic acid sequence for optimal alignment and non-homologous
sequences can
be disregarded for comparison purposes). In a preferred embodiment, at least
30%, 40%, 50%,
60%, 70%, 80%, or 90% or more of the length of a reference sequence is aligned
for
comparison purposes. The amino acid residues or nucleotides at corresponding
amino acid
positions or nucleotide positions are then compared. When a position in the
first sequence is
occupied by the same amino acid residue or nucleotide as the corresponding
position in the
second sequence, then the molecules are identical at that position (as used
herein, amino acid or
nucleic acid "identity" is equivalent to amino acid or nucleic acid
"homology"). The percent
identity between the two sequences is a function of the number of identical
positions shared by
the sequences, taking into account the number of gaps, and the length of each
gap, which need
to be introduced for optimal alignment of the two sequences.
The comparison of sequences and determination of percent identity between two
sequences can be accomplished using a mathematical algorithm. (Computational
Molecular
Biology, Lesk, A.M., ed., Oxford University Press, New York, 1988;
Biocomputing: Informatics
and Genorne Projects, Smith, D.W., ed., Academic Press, New York, 1993;
Computer Analysis of
Sequence Data, Part 1, Griffin, A.M., and Griffin, H.G., eds., Humana Press,
New Jersey, 1994;
Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987;
and Sequence
Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New
York, 1991). In a
preferred embodiment, the percent identity between two amino acid sequences is
determined
using the Needleman and Wunsch algorithm J. Mol. Biol. (48):444-453 (1970))
which has
been incorporated into the GAP program in the GCG software package, using
either a Blossom
62 matrix or a PAM2.50 matrix, and a gap weight of 16, 14, 12, 10, 8, 6, or 4
and a length
weight of 1, 2, 3, 4, 5, or 6.
33

CA 02887830 2015-08-28
CA 2887830
In yet another preferred embodiment, the percent identity between two
nucleotide
sequences is determined using the GAP program in the GCG software package
(Devereux, J., et
al., Nucleic Acids Res. 12(1):387 (1984)), using a NWSgapdna.CMP matrix and a
gap weight of
40, 50, 60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, or 6. In another
embodiment, the
percent identity between two amino acid or nucleotide sequences is determined
using the
algorithm of E. Myers and W. Miller (CABIOS, 4:11-17 (1989)) which has been
incorporated
into the ALIGN program (version 2.0), using a PAM120 weight residue table, a
gap length
penalty of 12, and a gap penalty of 4.
The nucleotide and amino acid sequences disclosed herein can further be used
as a
"query sequence" to perform a search against sequence databases to, for
example, identify other
family members or related sequences. Such searches can be performed using the
NBLAST and
XBLAST programs (version 2.0) of Altschul, et al. (J. MoL Biol. 215:403-10
(1990)). BLAST
nucleotide searches can be performed with the NBLAST program, score = 100,
wordlength =
12 to obtain nucleotide sequences homologous to the nucleic acid molecules
disclosed herein.
BLAST protein searches can be performed with the XBLAST program, score = 50,
wordlength
= 3 to obtain amino acid sequences homologous to the proteins of the
invention. To obtain
gapped alignments for comparison purposes, Gapped BLAST can be utilized as
described in
Altschul et al. (Nucleic Acids Res. 25(17):3389-3402 (1997)). When utilizing
BLAST and
gapped BLAST programs, the default parameters of the respective programs
(e.g., XBLAST
and NBLAST) can be used. In addition to BLAST, examples of other search and
sequence
comparison programs used in the art include, but are not limited to, PASTA
(Pearson, Methods
MoL Biol. 25, 365-389 (1994)) and KERR (Dufresne et al., Nat Biotechnol 2002
Dec;20(12):1269-71). For further information regarding bioinformatics
techniques, see
Current Protocols in Bioinforinatics, John Wiley 8,z Sons, Inc., N.Y.
The present disclosure is further of non-coding fragments of the nucleic acid
molecules
disclosed in Table 1 and/or Table 2. Preferred non-coding fragments include,
but are not
limited to, promoter sequences, enhancer sequences, intronic sequences, 5'
untranslated regions
(UTRs), 3' untranslated regions, gene modulating sequences and gene
termination sequences.
Such fragments are useful, for example, in controlling heterologous gene
expression and in
developing screens to identify gene-modulating agents.
34

CA 02887830 2015-08-28
CA 2887830
SNP Detection Reagents
In a specific aspect of the present disclosure, the SNPs disclosed in Table 1
and/or Table 2,
and their associated transcript sequences (provided in Table 1 as SEQ ID NOS:1-
14), genomic
sequences (provided in Table 2 as SEQ ID NOS:43-50), and context sequences
(transcript-based
context sequences are provided in Table 1 as SEQ ID NOS:29-42; genomic-based
context
sequences are provided in Table 2 as SEQ ID NOS:51-58), can be used for the
design of SNP
detection reagents. As used herein, a "SNP detection reagent" is a reagent
that specifically detects
a specific target SNP position disclosed herein, and that is preferably
specific for a particular
nucleotide (allele) of the target SNP position (i.e., the detection reagent
preferably can differentiate
between different alternative nucleotides at a target SNP position, thereby
allowing the identity of
the nucleotide present at the target SNP position to be determined).
Typically, such detection
reagent hybridizes to a target SNP-containing nucleic acid molecule by
complementary base-
pairing in a sequence specific manner, and discriminates the target variant
sequence from other
nucleic acid sequences such as an art-known form in a test sample. An example
of a detection
reagent is a probe that hybridizes to a target nucleic acid containing one or
more of the SNPs
provided in Table 1 and/or Table 2. In a preferred embodiment, such a probe
can differentiate
between nucleic acids having a particular nucleotide (allele) at a target SNP
position from other
nucleic acids that have a different nucleotide at the same target SNP
position. In addition, a
detection reagent may hybridize to a specific region 5' and/or 3' to a SNP
position, particularly a
region corresponding to the context sequences provided in Table 1 as SEQ ID
NOS:29-42;
genomic-based context sequences are provided in Table 2 as SEQ ID NOS:51-58).
Another
example of a detection reagent is a primer which acts as an initiation point
of nucleotide extension
along a complementary strand of a target polynucleotide. The SNP sequence
information
provided herein is also useful for designing primers, e.g. allele-specific
primers, to amplify (e.g.,
using PCR) any SNP of the present invention.
In one preferred embodiment, a SNP detection reagent is an isolated or
synthetic DNA
or RNA polynucleotide probe or primer or PNA oligomer, or a combination of
DNA, RNA
and/or PNA, that hybridizes to a segment of a target nucleic acid molecule
containing a SNP

CA 02887830 2015-08-28
CA 2887830
identified in Table 1 and/or Table 2. A detection reagent in the form of a
polynucleotide may
optionally contain modified base analogs, intercalators or minor groove
binders. Multiple
detection reagents such as probes may be, for example, affixed to a solid
support (e.g., arrays or
beads) or supplied in solution (e.g., probe/primer sets for enzymatic
reactions such as PCR, RT-
PCR, TaqMan assays, or primer-extension reactions) to form a SNP detection
kit.
A probe or primer typically is a substantially purified oligonucleotide or PNA
oligomer.
Such oligonucleoticle typically comprises a region of complementary nucleotide
sequence that
hybridizes under stringent conditions to at least about 8, 10, 12, 16, 18, 20,
22, 25, 30, 40, 50, 55,
60, 65, 70, 80, 90, 100, 120 (or any other number in-between) or more
consecutive nucleotides in
a target nucleic acid molecule. Depending on the particular assay, the
consecutive nucleotides can
either include the target SNP position, or be a specific region in close
enough proximity 5' and/or
3' to the SNP position to carry out the desired assay.
Other preferred primer and probe sequences can readily be determined using the

transcript sequences (SEQ ID NOS:1-14), genomic sequences (SEQ ID NOS:43-50),
and SNP
context sequences (transcript-based context sequences are provided in Table 1
as SEQ ID NOS:
29-42; genomic-based context sequences are provided in Table 2 as SEQ ID
NOS:51-58)
disclosed in the Sequence Listing and in Tables 1-2. It will be apparent to
one of skill in the art
that such primers and probes are directly useful as reagents for genotyping
the SNPs disclosed
herein, and can be incorporated into any kit/system format.
In order to produce a probe or primer specific for a target SNP-containing
sequence, the
gene/transcript and/or context sequence surrounding the SNP of interest is
typically examined
using a computer algorithm which starts at the 5 or at the 3' end of the
nucleotide sequence.
Typical algorithms will then identify oligomers of defined length that are
unique to the
gene/SNP context sequence, have a GC content within a range suitable for
hybridization, lack
predicted secondary structure that may interfere with hybridization, and/or
possess other
desired characteristics or that lack other undesired characteristics.
A primer or probe of the present invention is typically at least about 8
nucleotides in
length. In one embodiment, a primer or a probe is at least about 10
nucleotides in length. In a
preferred embodiment, a primer or a probe is at least about 12 nucleotides in
length. In a more
preferred embodiment, a primer or probe is at least about 16, 17, 18, 19, 20,
21, 22, 23, 24 or
36

CA 02887830 2015-08-28
CA 2887830
25 nucleotides in length. While the maximal length of a probe can be as long
as the target
sequence to be detected, depending on the type of assay in which it is
employed, it is typically
less than about 50, 60, 65, or 70 nucleotides in length. In the case of a
primer, it is typically
less than about 30 nucleotides in length. In a specific preferred embodiment,
a primer or a
probe is within the length of about 18 and about 28 nucleotides. However, in
other
embodiments, such as nucleic acid arrays and other embodiments in which probes
are affixed
to a substrate, the probes can be longer, such as on the order of 30-70, 75,
80, 90, 100, or more
nucleotides in length (see the section below entitled "SNP Detection Kits and
Systems").
For analyzing SNPs, it may be appropriate to use oligonucleotides specific for
alternative
SNP alleles. Such oligonucleotides which detect single nucleotide variations
in target sequences
may be referred to by such terms as "allele-specific oligonucleotides",
"allele-specific probes", or
"allele-specific primers". The design and use of allele-specific probes for
analyzing
polymorphisms is described in, e.g., Mutation Detection A Practical Approach,
ed. Cotton et
at. Oxford University Press, 1998; Saiki et at., Nature 324, 163-166 (1986);
Dattagupta,
EP235,726; and Saiki, WO 89/11548.
37

CA 02887830 2014-11-27
WO 2005/111241 PCT/LIS2005/016051
While the design of each allele-specific primer or probe depends on variables
=
' such as the precise composition of the nucleotide sequences flanking a
SNP position in a
target nucleic acid molecule, and the length of the primer or probe, another
factor in the
use of primers and probes is the stringency of the condition under which the
hybridization
between the probe or primer and the target sequence is performed. Higher
stringency
conditions utilize buffers with lower ionic strength and/or a higher reaction
temperature,
and tend to require a more perfect match between probe/primer and a target
sequence in
order to form a stable duplex. If the stringency is too high, however,
hybridization may
not occur at all. In contrast, lower stringency conditions utilize buffers
with higher ionic
strength and/or a lower reaction temperature, and permit the formation of
stable duplexes -
with more mismatched bases between a probe/primer and a target sequence. By
way of .
example and not limitation, exemplary conditions for high stringency
hybridization
conditions using an allele-specific probe are as follows: Prehybridization
with a solution
containing 5X standard saline phosphate EDTA (SSPE), 0.5% NaDodSO4. (SDS) at
55 C,
. and incubating probe with target nucleic acid molecules in the same solution
at the same
temperature, followed by washing with a solution containing 2X S SPE, and
0.1%SDS at =
55 C or room temperature.
Moderate stringency hybridization conditions may be used for allele-specific
= - primer extension reactions with a solution containing, e.g., about 50mM
KCI at about
46 C. Alternatively, the reaction may be carried out at an elevated
temperature such as
60 C. In another embodiment, a moderately stringent hybridization condition
suitable for
. oligonucleotide ligation assay (OLA) reactions wherein two probes are
ligated if they are
completely complementary to the target sequence may utilize a solution of
about 100mM
KC1 at a temperature of 46 C.
In a hybridization-based assay, allele-specific probes can be designed that
= hybridize to a segment of target DNA from one individual but do not
hybridize to the
corresponding segment from another individual due to the presence of different
polymorphic forms (e.g., alternative SNP alleles/nucleotides) in the
respective DNA :
segments from the two individuals. Hybridization conditions should be
sufficiently
stringent that there is a significant detectable difference in hybridization
intensity
between alleles, and preferably an essentially binary response, whereby a
probe
38

CA 02887830 2014-11-27
WO 2005/111241
PCT/US2005/016051
hybridizes to only one of the alleles or significantly more strongly to one
allele. While a
probe may be designed to hybridize to a target sequence that contains a SNP
site such
that the SNP site aligns anywhere along the sequence of the probe, the probe
is preferably
designed to hybridize to a segment of the target sequence such that the SNP
site aligns
with a central position of the probe (e.g., a position within the probe that
is at least three =
-nucleotides from either end of the probe). This design of probe generally
achieves good
discrimination in hybridization between different allelic forms.
In another embodiment, a probe or primer may be designed to hybridize to a
segment of target DNA such that the SNP aligns with either the 5' most end or
the 3'
:10 most end of the probe or primer. In a specific preferred embodiment
which is particularly
suitable for use in a oligonucleotide ligation assay (U.S. Patent No.
4,988,617), the
3'most nucleotide of the probe aligns with the SNP position in the target
sequence.
Oligonucleotide probes and primers may be prepared by methods well known in
- the art. Chemical synthetic methods include, but are 'limited to, the
phosphotriester
15. method described by Narang et al., 1979, Methods in Enzymology 68:90;
the
phosphodiester method described by Brown et al., 1979, Methods in Enzymology
68:109, the diethylphosphosmidate method described by Beaucage et al., 1981,
Tetrahedron Letters 22:1859; and the solid support method described in U.S.
Patent No.
4,458,066.
20 Allele-specific probes are often used in pairs (or, less commonly,
in sets of 3 or 4,
such as if a SNP position is known to have 3 or 4 alleles, respectively, or to
assay both
strands of a nucleic acid molecule for a target SNP allele), and such pairs
may be identical
except for a one nucleotide mismatch that represents the allelic variants at
the SNP position.
Commonly, one member of a pair perfectly matches a reference form of a target
sequence
25 that has a more common SNP allele (i.e., the allele that is more
frequent in the target =
population) and the other member of the pair perfectly matches a form of the
target
sequence that has a less common SNP allele (i.e., the allele that is rarer in
the target
population). In the case of an array, multiple pairs of probes can be
immobilized on the
same support for simultaneous analysis of multiple different polymorphisms.
30 In one type of PCR-based assay, an allele-specific primer hybridizes
to a region
on a target nucleic acid molecule that overlaps a SNP position and only primes
39

CA 02887830 2015-08-28
CA 2887830
amplification of an allelic form to which the primer exhibits perfect
complementarity (Gibbs,
1989, Nucleic Acid Res. 17 2427-2448). Typically, the primer's 3'-most
nucleotide is aligned
with and complementary to the SNP position of the target nucleic acid
molecule. This primer
is used in conjunction with a second primer that hybridizes at a distal site.
Amplification
proceeds from the two primers, producing a detectable product that indicates
which allelic form
is present in the test sample. A control is usually performed with a second
pair of primers, one
of which shows a single base mismatch at the polymorphic site and the other of
which exhibits
perfect complementarity to a distal site. The single-base mismatch prevents
amplification or
substantially reduces amplification efficiency, so that either no detectable
product is formed or
it is formed in lower amounts or at a slower pace. The method generally works
most
effectively when the mismatch is at the 3'-most position of the
oligonucleotide (i.e., the 3'-most
position of the oligonucleotide aligns with the target SNP position) because
this position is
most destabilizing to elongation from the primer (see, e.g., WO 93/22456).
This PCR-based
assay can be utilized as part of the TaqMan assay, described below.
In a specific embodiment, a primer described herein contains a sequence
substantially
complementary to a segment of a target SNP-containing nucleic acid molecule
except that the
primer has a mismatched nucleotide in one of the three nucleotide positions at
the 3'-most end of
the primer, such that the mismatched nucleotide does not base pair with a
particular allele at the
SNP site. In a preferred embodiment, the mismatched nucleotide in the primer
is the second from
the last nucleotide at the 3'-most position of the primer. In a more preferred
embodiment, the
mismatched nucleotide in the primer is the last nucleotide at the 3'-most
position of the primer.
In another embodiment, a SNP detection reagent of the invention is labeled
with a
fluorogenic reporter dye that emits a detectable signal. While the preferred
reporter dye is a
fluorescent dye, any reporter dye that can be attached to a detection reagent
such as an
oligonucleotide probe or primer is suitable for use in the invention. Such
dyes include, but are not
limited to, Acridine, AMCA, BODIPY, Cascade Blue, Cy2, Cy3, Cy5, Cy7, Dabcyl,
Edans,
Eosin, Erythrosin, Fluorescein, 6-Fam, Tet, Joe, Hex, Oregon Green, Rhodamine,
Rhodol Green,
Tamra, Rox, and Texas Red.
40

CA 02887830 2015-08-28
CA 2887830
In yet another embodiment, the detection reagent may be further labeled with a
quencher
dye such as Tamra, especially when the reagent is used as a self-quenching
probe such as a
TaqMan (U.S. Patent Nos. 5,210,015 and 5,538,848) or Molecular Beacon probe
(U.S. Patent
Nos. 5,118,801 and 5,312,728), or other stemless or linear beacon probe (Livak
et al., 1995, PCR
Method Appl. 4:357-362; Tyagi et al., 1996, Nature Biotechnology 14: 303-308;
Nazarenko et
al., 1997, Nucl. Acids Res. 25:2516-2521; U.S. Patent Nos. 5,866,336 and
6,117,635).
The detection reagents described herein may also contain other labels,
including but not
limited to, biotin for streptavidin binding, hapten for antibody binding, and
oligonucleotide for
binding to another complementary oligonucleotide such as pairs of zipcodes.
The present disclosure also contemplates reagents that do not contain (or that
are
complementary to) a SNP nucleotide identified herein but that are used to
assay one or more
SNPs disclosed herein. For example, primers that flank, but do not hybridize
directly to a
target SNP position provided herein are useful in primer extension reactions
in which the
primers hybridize to a region adjacent to the target SNP position (i.e.,
within one or more
nucleotides from the target SNP site). During the primer extension reaction, a
primer is
typically not able to extend past a target SNP site if a particular nucleotide
(allele) is present at
that target SNP site, and the primer extension product can be detected in
order to determine
which SNP allele is present at the target SNP site. For example, particular
ddNTPs are
typically used in the primer extension reaction to terminate primer extension
once a ddNTP is
incorporated into the extension product (a primer extension product which
includes a ddNTP at
the 3'-most end of the primer extension product, and in which the ddNTP is a
nucleotide of a
SNP disclosed herein, is a composition that is specifically contemplated by
the present
invention). Thus, reagents that bind to a nucleic acid molecule in a region
adjacent to a SNP site
and that are used for assaying the SNP site, even though the bound sequences
do not necessarily
include the SNP site itself, are also contemplated.
SNP Detection Kits and Systems
A person skilled in the art will recognize that, based on the SNP and
associated
sequence information disclosed herein, detection reagents can be developed and
used to assay
any SNP of the present invention individually or in combination, and such
detection reagents
41

CA 02887830 2015-08-28
CA 2887830
can be readily incorporated into one of the established kit or system formats
which are well
known in the art. The terms "kits" and "systems", as used herein in the
context of SNP
detection reagents, are intended to refer to such things as combinations of
multiple SNP
detection reagents, or one or more SNP detection reagents in combination with
one or more
other types of elements or components (e.g., other types of biochemical
reagents, containers,
packages such as packaging intended for commercial sale, substrates to which
SNP detection
reagents are attached, electronic hardware components, etc.). Accordingly, the
present
disclosure is further of SNP detection kits and systems, including but not
limited to, packaged
probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays of
nucleic acid
molecules, and beads that contain one or more probes, primers, or other
detection reagents for
detecting one or more SNPs of the present invention. The kits/systems can
optionally include
various electronic hardware components; for example, arrays ("DNA chips") and
micro-fluidic
systems ("lab-on-a-chip" systems) provided by various manufacturers typically
comprise
hardware components. Other kits/systems (e.g., probe/primer sets) may not
include electronic
hardware components, but may be comprised of, for example, one or more SNP
detection
reagents (along with, optionally, other biochemical reagents) packaged in one
or more
containers.
In some embodiments, a SNP detection kit typically contains one or more
detection
reagents and other components (e.g., a buffer, enzymes such as DNA polymerases
or ligases,
chain extension nucleotides such as deoxynucleotidc triphosphates, and in the
case of Sanger-
type DNA sequencing reactions, chain terminating nucleotides, positive control
sequences,
negative control sequences, and the like) necessary to carry out an assay or
reaction, such as
amplification and/or detection of a SNP-containing nucleic acid molecule. A
kit may further
contain means for determining the amount of a target nucleic acid, and means
for comparing
the amount with a standard, and can comprise instructions for using the kit to
detect the SNP-
containing nucleic acid molecule of interest. In one embodiment, kits are
provided which
contain the necessary reagents to carry out one or more assays to detect one
or more SNPs
disclosed herein. In a preferred embodiment, SNP detection kits/systems are in
the form of
nucleic acid arrays, or compartmentalized kits, including microfluidic/lab-on-
a-chip systems.
42

CA 02887830 2015-08-28
CA 2887830
SNP detection kits/systems may contain, for example, one or more probes, or
pairs of
probes, that hybridize to a nucleic acid molecule at or near each target SNP
position. Multiple
pairs of allele-specific probes may be included in the kit/system to
simultaneously assay large
numbers of SNPs, at least one of which is a SNP of the present invention. In
some
kits/systems, the allele-specific probes are immobilized to a substrate such
as an array or bead.
For example, the same substrate can comprise allele-specific probes for
detecting at least 1; 10;
100; 1000; 10,000; 100,000 (or any other number in-between) or substantially
all of the SNPs
shown in Table 1 and/or Table 2.
The terms "arrays", "microarrays", and "DNA chips" are used herein
interchangeably to
refer to an array of distinct polynucleotides affixed to a substrate, such as
glass, plastic, paper,
nylon or other type of membrane, filter, chip, or any other suitable solid
support. The
polynucleotides can be synthesized directly on the substrate, or synthesized
separate from the
substrate and then affixed to the substrate. In one embodiment, the microarray
is prepared and
used according to the methods described in U.S. Patent No. 5,837,832, Chee et
al., PCT
application W095/11995 (Chee etal.), Lockhart, D. J. et al. (1996; Nat.
Biotech. 14: 1675-
1680) and Schena, M. etal. (1996; Proc. Natl. Acad. Sci. 93: 10614-10619). In
other
embodiments, such arrays are produced by the methods described by Brown et
al., U.S. Patent
No. 5,807,522.
Nucleic acid arrays are reviewed in the following references: Zammatteo et
at., "New
chips for molecular biology and diagnostics", Biotechtiol Anna Rev. 2002;8:85-
101; Sosnowski
et al., "Active microelectronic
43

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
array system for DNA hybridization, genotyping and pharmacogenomic
' applications", Psychiatr Genet. 2002 Dec;12(4):181-92; Heller, "DNA.
microarray technology: devices, systems, and applications", Annu Rev Biomed
Eng. 2002;4:129-53. Epub 2002 Mar 22; Kolchinsky et al., "Analysis of SNPs "
and other genoraic variations using gel-based chips", Hum Mutat. 2002
Apr;19(4):343-60; and McGall et at., "High-density genechip oligonucleotide
probe arrays", Adv Biochem Eng Biotechnol. 2002;77:21-42.
Any number of probes, such as allele-specific probes, may be
implemented in an array, and each probe or pair of probes can hybridize to a
different SNP position. In the case of polynucleotide probes, they can be
synthesized at designated areas (or synthesized separately and then affixed to

designated areas) on a substrate using a light-directed chemical process. Each

DNA chip can contain, for example, thousands to millions of individual
synthetic polynucleotide probes arranged in a grid-like pattern and
miniaturized (e.g., to the size of a dime). Preferably, probes are attached to
a
solid support in an ordered., addressable array.
A microarray can be composed of a large number of unique, single-stranded
polynucleotides, usually either synthetic antisense polynucleotides or
fragments of
cDNAs, fixed to a solid support. Typical polynucleotides are preferably about
6-60
nucleotides in length, more preferably about 15-30 nucleotides in length, and
most
preferably about 18-25 nucleotides in length. For certain types of microarrays
or other
detection kits/systems, it may be preferable to use oligonucleotides that are
only about 7-
20 nucleotides in length. In other types of arrays, such as arrays used in
conjunction with
chemiluminescent detection technology, preferred probe lengths can be, for
example,
about 15-80 nucleotides in length, preferably about 50-70 nucleotides in
length, more
preferably about 55-65 nucleotides in length, and most preferably about 60
nucleotides in
length. The microarray or detection kit can contain polynucleotides that cover
the known
5' or 3' sequence of a gene/transcript or target SNP site, sequential
polynucleotides that
cover the full-length sequence of a gene/transcript; or unique polynucleotides
selected
from particular areas along the length of a target gene/transcript sequence,
particularly
44

CA 02887830 2015-08-28
CA 2887830
areas corresponding to one or more SNPs disclosed in Table 1 and/or Table 2.
Polynucleotides
used in the microarray or detection kit can be specific to a SNP or SNPs of
interest (e.g.,
specific to a particular SNP allele at a target SNP site, or specific to
particular SNP alleles at
multiple different SNP sites), or specific to a polymorphic gene/transcript or
genes/transcripts
of interest.
Hybridization assays based on polynucleotide arrays rely on the differences in

hybridization stability of the probes to perfectly matched and mismatched
target sequence
variants. For SNP genotyping, it is generally preferable that stringency
conditions used in
hybridization assays are high enough such that nucleic acid molecules that
differ from one another
at as little as a single SNP position can be differentiated (e.g., typical SNP
hybridization assays are
designed so that hybridization will occur only if one particular nucleotide is
present at a SNP
position, but will not occur if an alternative nucleotide is present at that
SNP position). Such high
stringency conditions may be preferable when using, for example, nucleic acid
arrays of allele-
specific probes for SNP detection. Such high stringency conditions are
described in the preceding
section, and are well known to those skilled in the art and can be found in,
for example, Current
Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6.
In other embodiments, the arrays are used in conjunction with chemiluminescent

detection technology. The following patents and patent applications, provide
additional
information pertaining to chemiluminescent detection: U.S. patent applications
10/620332 and
10/620333 describe chemiluminescent approaches for microarray detection; U.S.
Patent Nos.
6124478, 6107024, 5994073, 5981768, 5871938, 5843681, 5800999, and 5773628
describe
methods and compositions of dioxetane for performing chemiluminescent
detection; and U.S.
published application US2002/0110828 discloses methods and compositions for
microarray
controls.
In one embodiment, a nucleic acid array can comprise an array of probes of
about 15-25
nucleotides in length. In further embodiments, a nucleic acid array can
comprise any number
of probes, in which at least one probe is capable of detecting one or more
SNPs disclosed in
Table 1 and/or Table 2, and/or at least one probe comprises a fragment of one
of the sequences
selected from the group consisting of those disclosed in Table 1, Table 2, the
Sequence Listing,
and sequences complementary thereto, said fragment comprising at least about 8
consecutive

CA 02887830 2015-08-28
CA 2887830
nucleotides, preferably 10, 12, 15, 16, 18, 20, more preferably 22, 25, 30,
40, 47, 50, 55, 60, 65,
70, 80, 90, 100, or more consecutive nucleotides (or any other number in-
between) and
containing (or being complementary to) a novel SNP allele disclosed in Table 1
and/or Table 2.
In some embodiments, the nucleotide complementary to the SNP site is within 5,
4, 3, 2, or 1
nucleotide from the center of the probe, more preferably at the center of said
probe.
A polynucleotide probe can be synthesized on the surface of the substrate by
using a
chemical coupling procedure and an ink jet application apparatus, as described
in PCT application
W095/251116 (Baldeschweiler et al.). In another aspect, a "gridded" array
analogous to a dot (or
slot) blot may be used to arrange and link cDNA fragments or oligonudeotides
to the surface of a
substrate using a vacuum system, thermal, UV, mechanical or chemical bonding
procedures. An
array, such as those described above, may be produced by hand or by using
available devices (slot
blot or dot blot apparatus), materials (any suitable solid support), and
machines (including robotic
instruments), and may contain 8, 24, 96, 384, 1536, 6144 or more
polynucleotides, or any other
number which lends itself to the efficient use of commercially available
instrumentation.
Using such arrays or other kits/systems, the present disclosure is of methods
of identifying
the SNPs disclosed herein in a test sample. Such methods typically involve
incubating a test
sample of nucleic acids with an array comprising one or more probes
corresponding to at least one
SNP position of the present invention, and assaying for binding of a nucleic
acid from the test
sample with one or more of the probes. Conditions for incubating a SNP
detection reagent (or a
kit/system that employs one or more such SNP detection reagents) with a test
sample vary.
Incubation conditions depend on such factors as the format employed in the
assay, the detection
methods employed, and the type and nature of the detection reagents used in
the assay. One
skilled in the art will recognize that any one of the commonly available
hybridization,
amplification and array assay formats can readily be adapted to detect the
SNPs disclosed herein.
A SNP detection kit/system disclosed herein may include components that are
used to
prepare nucleic acids from a test sample for the subsequent amplification
and/or detection of a
SNP-containing nucleic acid molecule. Such sample preparation components can
be used to
produce nucleic acid extracts (including DNA and/or RNA), proteins or membrane
extracts
from any bodily fluids (such as blood, serum, plasma, urine, saliva, phlegm,
gastric juices,
semen, tears, sweat, etc.), skin, hair, cells (especially nucleated cells),
biopsies, buccal swabs or
46

CA 02887830 2016-10-14
CA 2887830
=
tissue specimens. The test samples used in the above-described methods will
vary
based on such factors as the assay format, nature of the detection method, and
the specific
tissues, cells or extracts used as the test sample to be assayed. Methods of
preparing nucleic
acids, proteins, and cell extracts are well known in the art and can be
readily adapted to obtain a
sample that is compatible with the system utilized. Automated sample
preparation systems for
extracting nucleic acids from a test sample are commercially available, and
examples are
Qiagen's BioRobot 9600, Applied BiosystemsTM' PRISMTm 6700 sample preparation
system,
and RocheTM Molecular Systems' COBAS AmpliPrep System.
Another form of kit contemplated by the present disclosure is a
compartmentalized kit. A
compartmentalized kit includes any kit in which reagents are contained in
separate containers.
Such containers include, for example, small glass containers, plastic
containers, strips of
plastic, glass or paper, or arraying material such as silica. Such containers
allow one to
efficiently transfer reagents from one compartment to another compartment such
that the test
samples and reagents are not cross-contaminated, or from one container to
another vessel not
included in the kit, and the agents or solutions of each container can be
added in a quantitative
fashion from one compartment to another or to another vessel. Such containers
may include,
for example, one or more containers which will accept the test sample, one or
more containers
which contain at least one probe or other SNP detection reagent for detecting
one or more SNPs
of the present invention, one or more containers which contain wash reagents
(such as
phosphate buffered saline, Tris-buffers, etc.), and one or more containers
which contain the
reagents used to reveal the presence of the bound probe or other SNP detection
reagents. The
kit can optionally further comprise compartments and/or reagents for, for
example, nucleic acid
amplification or other enzymatic reactions such as primer extension reactions,
hybridization,
ligation, electrophoresis (preferably capillary electrophoresis), mass
spectrometry, and/or laser-
induced fluorescent detection. The kit may also include instructions for using
the kit. Exemplary
compartmentalized kits include microfluidic devices known in the art (see,
e.g., Weigl et al., "Lab-
on-a-chip for drug development", Adv Drug Deliv Rev. 2003 Feb 24;55(3):349-
77). In such
microfluidic devices, the containers may be referred to as, for example,
microfluidic
"compartments", "chambers", or "channels".
47

CA 02887830 2015-08-28
CA 2887830
Microfluidic devices, which may also be referred to as "lab-on-a-chip"
systems,
biomedical micro-electro-mechanical systems (bioMEMs), or multicomponent
integrated
systems, are exemplary kits/systems of the present invention for analyzing
SNPs. Such
systems miniaturize and compartmentalize processes such as probe/target
hybridization, nucleic
acid amplification, and capillary electrophoresis reactions in a single
functional device. Such
microfluidic devices typically utilize detection reagents in at least one
aspect of the system, and
such detection reagents may be used to detect one or more SNPs disclosed
herein. One
example of a microfluidic system is disclosed in U.S. Patent No. 5,589,136,
which describes
the integration of PCR amplification and capillary electrophoresis in chips.
Exemplary
microfluidic systems comprise a pattern of microchannels designed onto a
glass, silicon, quartz,
or plastic wafer included on a microchip. The movements of the samples may be
controlled by
electric, electroosmotic or hydrostatic forces applied across different areas
of the microchip to
create functional microscopic valves and pumps with no moving parts. Varying
the voltage can
be used as a means to control the liquid flow at intersections between the
micro-machined
channels and to change the liquid flow rate for pumping across different
sections of the
microchip. See, for example, U.S. Patent Nos. 6,153,073, Dubrow et al., and
6,156,181, Parce
et al.
For genotyping SNPs, an exemplary microfluidic system may integrate, for
example,
nucleic acid amplification, primer extension, capillary electrophoresis, and a
detection method
such as laser induced fluorescence detection. In a first step of an exemplary
process for using
such an exemplary system, nucleic acid samples are amplified, preferably by
PCR. Then, the
amplification products are subjected to automated primer extension reactions
using ddNTPs
(specific fluorescence for each ddNTP) and the appropriate oligonucleotide
primers to carry out
primer extension reactions which hybridize just upstream of the targeted SNP.
Once the
extension at the 3' end is completed, the primers are separated from the
unincorporated
fluorescent ddNTPs by capillary electrophoresis. The separation medium used in
capillary
electrophoresis can be, for example, polyacrylamide, polyethyleneglycol or
dextran. The
incorporated ddNTPs in the single nucleotide primer extension products are
identified by laser-
induced fluorescence detection. Such an exemplary microchip can be used to
process, for
example, at least 96 to 384 samples, or more, in parallel.
48

CA 02887830 2015-08-28
CA 2887830
USES OF NUCLEIC ACID MOLECULES
The nucleic acid molecules described herein have a variety of uses, especially
in the
diagnosis and treatment of liver fibrosis and related pathologies. For
example, the nucleic acid
molecules are useful as hybridization probes, such as for genotyping SNPs in
messenger RNA,
transcript, cDNA, genomic DNA, amplified DNA or other nucleic acid molecules,
and for
isolating full-length cDNA and genomic clones encoding the variant peptides
disclosed in Table 1
as well as their orthologs.
A probe can hybridize to any nucleotide sequence along the entire length of a
nucleic acid
molecule provided in Table 1 and/or Table 2. Preferably, a probe as described
herein hybridizes to
a region of a target sequence that encompasses a SNP position indicated in
Table 1 and/or Table 2.
More preferably, a probe hybridizes to a SNP-containing target sequence in a
sequence-specific
manner such that it distinguishes the target sequence from other nucleotide
sequences which vary
from the target sequence only by which nucleotide is present at the SNP site.
Such a probe is
particularly useful for detecting the presence of a SNP-containing nucleic
acid in a test sample, or
for determining which nucleotide (allele) is present at a particular SNP site
(i.e., genotyping the
SNP site).
A nucleic acid hybridization probe may be used for determining the presence,
level,
form, and/or distribution of nucleic acid expression. The nucleic acid whose
level is
determined can be DNA or RNA. Accordingly, probes specific for the SNPs
described herein
can be used to assess the presence, expression and/or gene copy number in a
given cell, tissue,
or organism. These uses are relevant for diagnosis of disorders involving an
increase or
decrease in gene expression relative to normal levels. In vitro techniques for
detection of
mRNA include, for example, Northern blot hybridizations and in situ
hybridizations. In vitro
techniques for detecting DNA include Southern blot hybridizations and in situ
hybridizations
(Sambrook and Russell, 2000, Molecular Cloning: A Laboratory Manual, Cold
Spring Harbor
Press, Cold Spring Harbor, NY).
Probes can be used as part of a diagnostic test kit for identifying cells or
tissues in which a
variant protein is expressed, such as by measuring the level of a variant
protein-encoding nucleic
49

CA 02887830 2015-08-28
CA 2887830
acid (e.g., mRNA) in a sample of cells from a subject or determining if a
polynucleotide contains a
SNP of interest.
Thus, the nucleic acid molecules described herein can be used as hybridization
probes
to detect the SNPs disclosed herein, thereby determining whether an individual
with the
polymorphisms is at risk for liver fibrosis and related pathologies or has
developed early stage
liver fibrosis. Detection of a SNP associated with a disease phenotype
provides a diagnostic
tool for an active disease and/or genetic predisposition to the disease.
Furthermore, the nucleic acid molecules described herein are therefore useful
for
detecting a gene (gene information is disclosed in Table 2, for example) which
contains a SNP
disclosed herein and/or products of such genes, such as expressed mRNA
transcript molecules
(transcript information is disclosed in Table 1, for example), and are thus
useful for detecting
gene expression. The nucleic acid molecules can optionally be implemented in,
for example,
an array or kit format for use in detecting gene expression.
The nuclei e acid molecules as described herein may also be useful as primers
to amplify
any given region of a nucleic acid molecule, particularly a region containing
a SNP identified in
Table 1 and/or Table 2.
The nucleic acid molecules as described herein may also be useful for
constructing
recombinant vectors (described in greater detail below). Such vectors include
expression vectors
that express a portion of, or all of, any of the variant peptide sequences
provided in Table 1.
Vectors also include insertion vectors, used to integrate into another nucleic
acid molecule
sequence, such as into the cellular genome, to alter in situ expression of a
gene and/or gene
product. For example, an endogenous coding sequence can be replaced via
homologous
recombination with all or part of the coding region containing one or more
specifically introduced
SNPs.
The nucleic acid molecules as described herein may also be useful for
expressing
antigenic portions of the variant proteins, particularly antigenic portions
that contain a variant
amino acid sequence (e.g., an amino acid substitution) caused by a SNP
disclosed in Table 1
and/or Table 2.
The nucleic acid molecules as described herein may also be useful for
constructing vectors
containing a gene regulatory region of the nucleic acid molecules of the
present invention.

CA 02887830 2015-08-28
CA 2887830
The nucleic acid molecules as described herein may also be useful for
designing
ribozymes corresponding to all, or a part, of an mRNA molecule expressed from
a SNP-containing
nucleic acid molecule described herein.
The nucleic acid molecules as described herein may also be useful for
constructing host
cells expressing a part, or all, of the nucleic acid molecules and variant
peptides.
The nucleic acid molecules as describer herein may also be useful for
constructing
transgenic animals expressing all, or a part, of the nucleic acid molecules
and variant peptides.
The production of recombinant cells and transgenic animals having nucleic acid
molecules which
contain the SNPs disclosed in Table 1 and/or Table 2 allow, for example,
effective clinical design
of treatment compounds and dosage regimens.
The nucleic acid molecules as described herein may also be useful in assays
for drug
screening to identify compounds that, for example, modulate nucleic acid
expression.
The nucleic acid molecules described herein may also be useful in gene therapy
in
patients whose cells have aberrant gene expression. Thus, recombinant cells,
which include a
patient's cells that have been engineered ex vivo and returned to the patient,
can be introduced
into an individual where the recombinant cells produce the desired protein to
treat the
individual.
SNP Genotyping Methods
The process of determining which specific nucleotide (i.e., allele) is present
at each of one
or more SNP positions, such as a SNP position in a nucleic acid molecule
disclosed in Table 1
and/or Table 2, is referred to as SNP genotyping. The present disclosure is of
methods of SNP
genotyping, such as for use in screening for liver fibrosis or related
pathologies, or determining
predisposition thereto, or determining responsiveness to a form of treatment,
or in genome
mapping or SNP association analysis, etc.
Nucleic acid samples can be genotyped to determine which allele(s) is/are
present at
any given genetic region (e.g., SNP position) of interest by methods well
known in the art. The
neighboring sequence can be used to design SNP detection reagents such as
oligonucleotide
probes, which may optionally be implemented in a kit format. Exemplary SNP
genotyping
methods are described in Chen et aL, "Single nucleotide polymorphism
genotyping: biochemistry,
51

CA 02887830 2015-08-28
CA 2887830
protocol, cost and throughput", Pharmacogenornics J. 2003;3(2):77-96; Kwok
etal., "Detection of
single nucleotide polymorphisms", Curr Issues Mol Biol. 2003 Apr;5(2):43-60;
Shi,
"Technologies for individual gcnotyping: detection of genetic polymorphisms in
drug targets and
disease genes", Am J Pharmacogenomics. 2002;2(3):197-205; and Kwok, "Methods
for
genotyping single nucleotide polymorphisms", Alum Rev Genomics Hum Genet
2001;2:235-58.
Exemplary techniques for high-throughput SNP genotyping are described in
Marnellos, "High-
throughput SNP analysis for genetic association studies", Curr Opin Drug
Discov Devel. 2003
May;6(3):317-21. Common SNP genotyping methods include, but arc not limited
to, TaqMan
assays, molecular beacon assays, nucleic acid arrays, allele-specific primer
extension, allele-
specific PCR, arrayed primer extension, homogeneous primer extension assays,
primer extension
with detection by mass spectrometry, pyrosequencing, multiplex primer
extension sorted on
genetic arrays, ligation with rolling circle amplification, homogeneous
ligation, OLA (U.S. Patent
No.
52

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
4,988,167), multiplex ligation reaction sorted on genetic arrays, restriction-
fragment length
' polyinorphism, single base extension-tag assays, and the Invader assay.
Such methods may
be used in combination with detection mechanisms such as, for example,
luminescence or
chemilurninescence detection, fluorescence detection, time-resolved
fluorescence detection,
fluorescence resonance energy transfer, fluorescence polarization, mass
spectrometry, and
electrical detection.
Various methods for detecting polymorphisms include, but are not limited to,
methods in which protection from cleavage agents is used to detect mismatched
bases in=
RNA/RNA or RNA/DNA duplexes (Myers et aL, Science 230:1242 (1985); Cotton et
al.,
PNAS 85:4397 (1988); and Saleeba et al., Meth. EnzymoL 2/ 7:286-295 (1992)),
comparison
of the electrophoretic mobility of variant and wild type nucleic acid
molecules (Orita et at.,
PNAS 86:2766 (1989); Cotton et at., Mutat. Res. 285:125-144 (1993); and
Hayathi et al.,
Genet. Anal. Tech. AppL 9:73-79 (1992)), and assaying the movement of
polymorphic or
wild-type fragments in polyacrylarnide gels containing a gradient of
denaturant using
denaturing gradient gel electrophoresis (DGGE) (Myers et al., Nature 313:495
(1985)). =
Sequence variations at specific locations can also be assessed by nuclease
protection assays = =
.such as RNase and Si protection or chemical cleavage methods.
In a preferred embodiment, SNP geno _____ typing is performed using the
TaqMan assay, which is also known as the 5' nuclease assay (U.S. Patent
Nos. 5,210,015 and 5,538,848). The TaqMan assay detects the accumulation
of a specific amplified product during PCR. The TaqMan assay utilizes an
oligon.ucleotide probe labeled with a fluorescent reporter dye and a quencher
dye. The reporter dye is excited by irradiation at an appropriate wavelength,
it transfers energy to the quencher dye in the same probe via a process called
fluorescence resonance energy transfer (FRET). When attached to the probe,
the excited reporter dye does not emit a signal. The proximity of the
quencher dye to the reporter dye in the intact probe maintains a reduced
fluorescence for the reporter. The reporter dye and quencher dye may be at
the 5' most and the 3' most ends, respectively, or vice versa. Alternatively,
the reporter dye may be at' the 5' or 3' most end while the quencher dye is
attached to an internal nucleotide, or vice versa. In yet another embodiment,
53

CA 02887830 2016-10-14
CA 2887830
both the reporter and the quencher may be attached to internal nucleotides at
a distance from
each other such that fluorescence of the reporter is reduced.
During PCR, the 5' nuclease activity of DNA polymerase cleaves the probe,
thereby
separating the reporter dye and the quencher dye and resulting in increased
fluorescence of the
reporter. Accumulation of PCR product is detected directly by monitoring the
increase in
fluorescence of the reporter dye. The DNA polymerase cleaves the probe between
the reporter
dye and the quencher dye only if the probe hybridizes to the target SNP-
containing template
which is amplified during PCR, and the probe is designed to hybridize to the
target SNP site
only if a particular SNP allele is present.
Preferred TaqMan primer and probe sequences can readily be determined using
the SNP
and associated nucleic acid sequence information provided herein. A number of
computer
programs, such as Primer Express (Applied BiosystemsTM, Foster City, CA), can
be used to
rapidly obtain optimal primer/probe sets. It will be apparent to one of skill
in the art that such
primers and probes for detecting the SNPs disclosed herein are useful in
diagnostic assays for
liver fibrosis and related pathologies, and can be readily incorporated into a
kit format. The
present disclosure is also of modifications of the Taqman assay well known in
the art such as
the use of Molecular Beacon probes (U.S. Patent Nos. 5,118,801 and 5,312,728)
and other
variant formats (U.S. Patent Nos. 5,866,336 and 6,117,635).
Another preferred method for genotyping the SNPs disclosed herein is the use
of two
oligonucleotide probes in an OLA (see, e.g., U.S. Patent No. 4,988,617). In
this method, one
probe hybridizes to a segment of a target nucleic acid with its 3' most end
aligned with the SNP
site. A second probe hybridizes to an adjacent segment of the target nucleic
acid molecule
directly 3' to the first probe. The two juxtaposed probes hybridize to the
target nucleic acid
molecule, and are ligated in the presence of a linking agent such as a ligase
if there is perfect
complementarity between the 3' most nucleotide of the first probe with the SNP
site. If there is
a mismatch, ligation would not occur. After the reaction, the ligated probes
are separated from
the target nucleic acid molecule, and detected as indicators of the presence
of a SNP.
The following patents, patent applications, and published international patent

applications, provide additional information pertaining to techniques for
carrying out various
types of OLA: U.S. Patent Nos. 6027889, 6268148, 5494810, 5830711, and 6054564
describe
54

CA 02887830 2015-08-28
CA 2887830
OLA strategies for performing SNP detection; WO 97/31256 and WO 00/56927
describe OLA
strategies for performing SNP detection using universal arrays, wherein a
zipcode sequence can
be introduced into one of the hybridization probes, and the resulting product,
or amplified
product, hybridized to a universal zip code array; U.S. application US01/17329
(and
09/584,905) describes OLA (or LDR) followed by PCR, wherein zipcodes are
incorporated into
OLA probes, and amplified PCR products are determined by electrophoretic or
universal
zipcode array readout; U.S. applications 60/427818, 60/445636, and 60/445494
describe
SNPlex methods and software for multiplexed SNP detection using OLA followed
by PCR,
wherein zipcodes are incorporated into OLA probes, and amplified PCR products
are
hybridized with a zipchute reagent, and the identity of the SNP determined
from electrophoretic
readout of the zipchute. In some embodiments, OLA is carried out prior to PCR
(or another
method of nucleic acid amplification). In other embodiments, PCR (or another
method of
nucleic acid amplification) is carried out prior to OLA.
Another method for SNP genotyping is based on mass spectrometry. Mass
spectrometry takes advantage of the unique mass of each of the four
nucleotides of DNA.
SNPs can be unambiguously genotyped by mass spectrometry by measuring the
differences in
the mass of nucleic acids having alternative SNP alleles. MALDI-TOF (Matrix
Assisted Laser
Desorption Ionization ¨ Time of Flight) mass spectrometry technology is
preferred for
extremely precise determinations of molecular mass, such as SNPs. Numerous
approaches to
SNP analysis have been developed based on mass spectrometry. Preferred mass
spectrometry-
based methods of SNP genotyping include primer extension assays, which can
also be utilized
in combination with other approaches, such as traditional gel-based formats
and microarrays.

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
Typically, the primer extension assay involves designing and annealing a
primer
' to a template PCR amplicon upstream (5') from a target SNP position. A
mix of
dideoxynucleotide triphosphates (ddNTPs) and/or deoxynucleotide triphosphates
(dNTPs) are added to a reaction mixture containing template (e.g., a SNP-
containing
nucleic acid molecule which has typically been amplified, such as by PCR),
primer, and
DNA polymerase. Extension of the primer terminates at the first position in
the template
where a nucleotide complementary to one of the ddNTPs in the mix occurs. The
primer
can be either immediately adjacent (i.e., the nucleotide at the 3' end of the
primer =
hybridizes to the nucleotide next to the target SNP site) or two or more
nucleotides
removed from the SNP position. If the primer is several nucleotides removed
from the
target SNP position, the only limitation is that the template sequence between
the 3' end :
of the primer and the SNP position cannot contain a nucleotide of the same
type as the
one to be detected, or this will cause premature termination of Ole extension
primer.
Alternatively, if all four ddNTPs alone, with no dNIPs, are added to the
reaction mixture,
- the primer will always be extended by only one nucleotide, corresponding to
the target
SNP position. -In this instance, primers are designed to bind one nucleotide
upstream
. from the SNP position (i.e., the nucleotide at the 3' end of the primer
hybridizes to the .
nucleotide that is immediately adjacent to the target SNP site on the 5' side
of the target
SNP site). Extension by only one nucleotide is preferable, as it minimizes the
overall
mass of the extended primer, thereby increasing the resolution of mass
differences
between alternative SNP nucleotides. Furthermore, mass-tagged ddNTPs can be
= employed in the primer extension reactions in place of unmodified ddNTPs.
This
increases the mass difference between primers extended with these ddNTPs,
thereby
providing increased sensitivity and accuracy, and is particularly useful for
typing
heterozygous base positions. Mass-tagging also alleviates the need for
intensive sample- .
preparation procedures and decreases the necessary resolving power of the mass

spectrometer.
The extended primers can then be purified and analyzed by MALDI-TOF mass
spectrometry to determine the identity of the nucleotide present at the target
SNP
position. In one method of analysis, the products from the primer extension
reaction are
combined with light absorbing crystals that form a matrix. The matrix is then
hit with an
56

CA 02887830 2016-10-14
CA 2887830
=
energy source such as a laser to ionize and desorb the nucleic acid molecules
into the gas-
phase. The ionized molecules are then ejected into a flight tube and
accelerated down the tube
towards a detector. The time between the ionization event, such as a laser
pulse, and collision
of the molecule with the detector is the time of flight of that molecule. The
time of flight is
precisely correlated with the mass-to-charge ratio (m/z) of the ionized
molecule. Ions with
smaller m/z travel down the tube faster than ions with larger m/z and
therefore the lighter ions
reach the detector before the heavier ions. The time-of-flight is then
converted into a
corresponding, and highly precise, m/z. In this manner, SNPs can be identified
based on the
slight differences in mass, and the corresponding time of flight differences,
inherent in nucleic
acid molecules having different nucleotides at a single base position. For
further information
regarding the use of primer extension assays in conjunction with MALDI-TOF
mass
spectrometry for SNP genotyping, see, e.g., Wise et al., "A standard protocol
for single
nucleotide primer extension in the human genome using matrix-assisted laser
desorption/ionization time-of-flight mass spectrometry", Rapid Commun Mass
Spectrom.
2003;17(11):1195-202.
The following references provide further information describing mass
spectrometry-
based methods for SNP genotyping: Bocker, "SNP and mutation discovery using
base-specific
cleavage and MALDI-TOF mass spectrometry", Bioinformatics. 2003 Jul;19 Suppl
1:144-153;
Storm et al., "MALDI-TOF mass spectrometry-based SNP genotyping", Methods Mot
Biol.
2003;212:241-62; Jurinke etal., "The use of Mass ARRAY technology for high
throughput
genotyping", Adv Biochem Eng Biotechnol. 2002;77:57-74; and Jurinke etal.,
"Automated
genotyping using the DNA MassArray technology", Methods Mol Biol. 2002;187:179-
92.
SNPs can also be scored by direct DNA sequencing. A variety of automated
sequencing
procedures can be utilized ((1995)Biotechniques /9:448), including sequencing
by mass
spectrometry (see, e.g., PCT International Publication No. W094/16101; Cohen
et al., Adv.
Chromatogr. 36:127-162 (1996); and Griffin etal., App!. Biochem. Biotechnol.
38:147-159
(1993)). The nucleic acid sequences disclosed herein enable one of ordinary
skill in the art to
readily design sequencing primers for such automated sequencing procedures.
Commercial
instrumentation, such as the Applied BiosystemsTM 377, 3100, 3700, 3730, and
3730x1 DNA
Analyzers (Foster City, CA), is commonly used in the art for automated
sequencing.
57

CA 02887830 2015-08-28
CA 2887830
Other methods that can be used to genotype the SNPs disclosed herein include
single-
strand conformational polymorphism (SSCP), and denaturing gradient gel
electrophoresis
(DGGE) (Myers et al., Nature 313:495 (1985)). SSCP identifies base differences
by alteration
in electrophoretic migration of single stranded PCR products, as described in
Orita et al., Proc.
Nat. Acad. Single-stranded PCR products can be generated by heating or
otherwise denaturing
double stranded PCR products. Single-stranded nucleic acids may refold or form
secondary
structures that are partially dependent on the base sequence. The different
electrophoretic
mobilities of single-stranded amplification products are related to base-
sequence differences at
SNP positions. DGGE differentiates SNP alleles based on the different sequence-
dependent
stabilities and melting properties inherent in polymorphic DNA and the
corresponding
differences in electrophoretic migration patterns in a denaturing gradient gel
(Erlich, ed., PCR
Technology, Principles and Applications for DNA Amplification, W.H. Freeman
and Co, New
York, 1992, Chapter 7).
Sequence-specific ribozymes (U.S.Patent No. 5,498,531) can also be used to
score
SNPs based on the development or loss of a ribozyme cleavage site. Perfectly
matched
sequences can be distinguished from mismatched sequences by nuclease cleavage
digestion
assays or by differences in melting temperature. If the SNP affects a
restriction enzyme
cleavage site, the SNP can be identified by alterations in restriction enzyme
digestion patterns,
and the corresponding changes in nucleic acid fragment lengths determined by
gel
electrophoresis
SNP genotyping can include the steps of, for example, collecting a biological
sample
from a human subject (e.g., sample of tissues, cells, fluids, secretions,
etc.), isolating nucleic
acids (e.g., genomic DNA, mRNA or both) from the cells of the sample,
contacting the nucleic
acids with one or more primers which specifically hybridize to a region of the
isolated nucleic
acid containing a target SNP under conditions such that hybridization and
amplification of the
target nucleic acid region occurs, and determining the nucleotide present at
the SNP position of
interest, or, in some assays,
58

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
detecting the presence or absence of an amplification product (assays can be
4 r
designed so that hybridization and/or amplification will only occur if a
particular SNP allele is present or absent). In some assays, the size of the
amplification product is detected and compared to the length of a control
sample; for example, deletions and insertions can be detected by a change in
size of the amplified product compared to a normal genotype.
SNP genotyping is uSefill for numerous practical applications, as described
below.
Examples of such applications include, but are not limited to, SNP-disease
association =
analysis, disease predisposition screening, disease diagnosis, disease
prognosis, disease
progression monitoring, determining therapeutic strategies based on an
individual's
genotype ("pharrnacogenomics"), developing therapeutic agents based on SNP
genotypes
associated with a disease or likelihood of responding to a drug, stratifying a
patient
- population for clinical trial for a treatment regimen, predicting the
likelihood that an
individual will experience toxic side effects from a therapeutic agent, and
human =
identification applications such as forensics.
Analysis of Genetic Association Between SNPs and Phenotypic Traits
SNP genotyping for disease diagnosis, disease predisposition screening,
disease
prognosis, determining drug responsiveness (pharmacogenomics), drug toxicity
=
= screening, and other uses described hefein, typically relies on initially
establishing a,
genetic association between one or more specific SNPs and the particular
phenotypic
traits of interest.
Different study designs may be used for genetic association studies (Modern
Epidemiology, Lippincott Williams & Wilkins (1998), 609-622). Observational
studies =
are most frequently carried out in which the response of the patients is not
interfered
with. The first type of observational study identifies a sample of persons in
whom the
suspected cause of the disease is present and another sample of persons in
whom the
suspected cause is absent, and then the frequency of development of disease in
the two
samples is compared. These sampled populations are called cohorts, and the
study is a
prospective study. The other type of observational study is case-control or a
retrospective
study. In typical case-control studies, samples are collected from individuals
with the
59

CA 02887830 2014-11-27
WO 2005/111241
PCT/U52005/016051
phenotype of interest (cases) such as certain manifestations of a disease, and
from
iridividuals without the phenotype (controls) in a population (target
population) that .
conclusions are to be drawn from. Then the possible causes of the disease are
investigated retrospectively. As the time and costs of collecting samples in
case-control
studies are considerablyless than those for prospective studies, case-control
studies are
the more commonly used study design in genetic association studies, at least
during the
exploration and discovery stage. =
In both types of observational studies, there may be potential confounding
factors
. that should be taken into consideration. Confounding factors are those
that are associated. .
with both the real cause(s) of the disease and the disease itself, and they
include
demographic information such as age, gender, ethnicity as well as
environmental factors.
When confounding factors are not matched in cases and controls in a.study, and
are not
controlled properly, spurious association results can arise. If potential
confounding
factors are identified, they should be controlled for by analysis methods
explained below.
In a genetic association study, the cause of interest to be tested is a
certain allele
or a SNP or a combination of alleles or a haplotype from several SNPs. Thus,
tissue
specimens (e.g., whole blood) from the sampled individuals may be collected
and .
genomic DNA genotyped for the SNP(s) of interest. In addition to the
phenotypic trait of
interest, other information such as demographic (e.g., age, gender, ethnicity,
etc.), =
clinical, and environmental information that may influence the outcome of the
trait can be
collected to further characterize and define the sample set. In many cases,
these factors
are known to be associated with diseases and/or SNP allele frequencies. There
are likely -
gene-environment and/or gene-gene interactions as well. Analysis methods to
address
gene-environment and gene-gene interactions (for example, the effects of the
presence of
both susceptibility alleles at two different genes can be greater than the
effects of the
individual alleles at two genes combined) are discussed below.
After all the relevant phenotypic and genotypic information has been obtained,

statistical analyses are carried out to determine if there is any significant
correlation
between the presence of an allele or a genotype with the phenotypic
characteristics of an
individual. Preferably, data inspection and cleaning are first performed
before carrying
out statistical tests for genetic association. Epidemiological and clinical
data of the

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
samples can be summarized by descriptive statistics with tables and graphs.
Data
' validation is preferably performed to check for data completion,
inconsistent entries, and
outliers. Chi-squared tests and t-tests (Wilcoxon rank-sum tests if
distributions are not
normal) may then be used to check for significant differences between cases
and controls
for discrete and continuous variables, respectively. To ensure genotyp. ing
quality, Hardy-
Weinberg disequilibriura tests can be performed on cases and controls
separately.
Significant deviation from Hardy-Weinberg equilibrium (HWE) in both cases and
= =
controls for individual markers can be indicative of genotyping errors. If HWE
is
violated in a majority of markers, it is indicative of population substructure
that shouldbe = '
= 10 further investigated. Moreover, Hardy-Weinberg disequilibrium in cases
only can
indicate genetic association of the markers with the disease (Genetic Data
Analysis, Weir
B., Sinauer (1990)).
To test whether an allele of a single SNP is associated with the case or
control
status of a phenotypic trait, one skilled in the art can compare allele
frequencies in cases =
and controls. Standard chi-squared tests and Fisher exact tests can be carried
out on a
2x2 table (2 SNP alleles x 2 outcomes in the categorical trait of interest).
To test whether
genotypes of a SNP are associated, chi-squared tests can be carried out on a
3x2 table (3
genotypes x 2 outcomes). Score tests are also carried out for genotypic
association to
contrast the three genotypic frequencies (major homozygotes, heterozygotes and
minor
homozygotes) in cases and controls, and to look for trends using 3 different
modes of
inheritance, namely dominant (with contrast coefficients 2, ¨1, ¨1), additive
(with
contrast coefficients 1, 0, ¨1) and recessive (with contrast coefficients 1,
1, ¨2). Orirls
ratios for minor versus major alleles, and odds ratios for heterozygote and
homozygote
variants versus the wild type genotypes are calculated with the desired
confidence limits,
usually 95%.
In order to control for confounders and to test for interaction and effect
modifiers,
stratified analyses may be performed using stratified factors that are likely
to be
confounding, including demographic information such as age, ethnicity, and
gender, or
an interacting element or effect modifier, such as a known major gene (e.g.,
APOE for
Alzheimer's disease or HLA genes for autoimmune diseases), or environmental
factors
such as smoking in lung cancer. Stratified association tests may be carried
out using
61

CA 02887830 2014-11-27
WO 2005/111241
PCIATS2005/016051
Cochran-Mantel-Haenszel tests that take into account the ordinal nature of
genotypes
'with 0,1', and 2 variant alleles. Exact tests by StatXact may also be
performed when
computationally. possible. Another way to adjust for confounding effects and
test for
interactions is to perform stepwise multiple logistic regression analysis
using statistical
packages such as SAS or R. Logistic regression is a model-building technique
in which
the best fitting and most parsimonious model is built to describe the relation
between the
= dichotomous outcome (for instance, getting a certain disease or not) and
a set of
independent variables (for instance, genotypes of different associated genes,
and the
associated demographic and environmental factors). The most common model is
one in
which the logit transformation of the odds ratios is expressed as a linear
combination of
the variables (main effects) and their cross-product terms (interactions)
(Applied Logistic
Regression, Hosmer and Lemeshow, Wiley (2000)). To test whether a certain
variable or
interaction is significantly associated with the outcome, coefficients in the
model are first
estimated and then tested for statistical significance of their departure from
zero.
In addition to performing association tests one marker at a time, haplotype
association analysis may also be performed to study a number of markers that
are closely
linked together. Haplotype association tests can have better power than
genotypic or.
allelic association tests when the tested markers are not the disease-causing
mutations
themselves but are in linkage disequilibrium with such mutations. The test
will even be
more powerful if the disease is indeed caused by a combination of alleles on a
haplotype
(e.g., APOE is a haplotype formed by 2 SNPs that are very close to each
other). In order
. to perform haplotype association effectively, marker-marker linkage
disequilibrium
measures, both D' and R2, are typically calculated for the markers within a
gene to =
elucidate the haplotype structure. Recent studies (Daly eta!, Nature Genetics,
29, 232-
235, 2001) in linkage disequilibrium indicate that SNPs within a gene are
organized in
block pattern, and a high degree of linkage disequilibrium exists within
blocks and very
little linkage disequilibrium exists between blocks. Haplotype association
with the
disease status can be performed using such blocks once they have been
elucidated.
Haplotype association tests can be carried out in a similar fashion as the
allelic
and genotypic association tests. Each haplotype in a gene is analogous to an
allele in a
multi-allelic marker. One skilled in the art can either compare the haplotype
frequencies
62

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
in cases and controls or test genetic association with different pairs of
haplotypes. It has
been proposed (Schaid et al, Am. Hum. Genet., 70,425-434, 2002) that score
tests can
be done on haplotypes using the program "haplo.score". In that method,
haplotypes are =
first inferred by EM algorithm and score tests are carried out with a
generalized linear
model (GLM) framework that allows the adjustment of other factors. I. =
An important decision in the performance of genetic association tests is the
determination of the significance level at which significant association can
be declared
when the p-value of the tests reaches that level. In an exploratory
analysis,where positive
bits will be followed up in subsequent confirmatory testing, an unadjusted 'p-
value <0.2 (a
= significance level on the lenient side), for example, may be used for
generating
hypotheses for significant association of a SNP with certain phenotypic
characteristics of =.
a disease. It is preferred that a p-value < 0.05 (a significance level
traditionally used in
the art) is achieved in order for a SNP to be considered to have an
association with .a . =
disease. It is more preferred that a p-value <0.01 (a significance level on
the stringent =
side) is achieved for an association to be declared. When hits are followed up
in .
confirmatory analyses in more samples of the same source or in different
samples from
different sources, adjustment for multiple testing will be performed as to
avoid excess
number of hits while maintaining the experiment-wise error rates at 0.05.
While there are -
= different methods to adjust for multiple testing to control for different
kinds of error rates,
a commonly used but rather conservative method is Bonferroni correction to
control the
experiment-wise or family-wise error rate (Multiple comparisons and multiple
tests,
Westfall et al, SAS Institute (1999)). Permutation tests to control for the
false discovery
rates, FDR, can be more powerful (Benjamini and Hochberg, Journal of the Royal

Statistical Society, Series B 57, 1289-1300, 1995, Resampling-based Multiple
Testing,
Westfall and Young, Wiley (1993)). Such methods to control for multiplicity
would be
preferred when the tests are dependent and controlling for false discovery
rates is
sufficient as opposed to controlling for the experiment-wise error rates.
In replication studies using samples from different populations after
statistically
significant markers have been identified in the exploratory stage, meta-
analyses can then
be performed by combining evidence of different studies (Modern Epidemiology,
63

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
Lippincott Williams 8z Wilkins, 1998, 643-673). If available, association
results known
,
in the art for the same SNPs can be included in the meta-analyses.
= Since both genotyping and. disease status classification can involve
=
errors, sensitivity analyses may be performed to see how odds ratios and p-
values would change upon various estimates on genotyping and disease
classification error rates.
It has been well known that subpopulation-based sampling bias
between cases and controls can lead to spurious results in case-control
*association studies (Ewens and Spielman, Am. J. Hum. Genet. 62,450-458,
1995) when prevalence Of the disease is associated with different
subpopulation groups. Such bias can also lead to a loss of statistical power
in
genetic association studies. To detect population stratification, Pritchard
and
Rosenberg (Pritchard et al. Am. J. Hum. Gen. 1999, 65:220-228) suggested
typing markers that are unlinked to the disease and using results of =
association tests on those markers to determine whether there is any
population stratification. When stratification is detected, the genomic
control
(GC) method as proposed by Devlin and Roeder (Devlin et al. Biometrics
1999, 55:997-1004) can be used to adjust for the inflation of test statistids
due
to population stratification. GC method is robust to changes in population
structure levels as well as being applicable to DNA pooling designs (Devlin et
al. Genet. Epidem. 20001, 21:273-284). =
While Pritchard's method recommended using 15-20 unlinked microsatellite
, markers, it suggested using more than 30 biallelic markers to get enough
power to detect
population stratification. For the GC method, it has been shown (Bacanu et al.
Am. J.
Hum. Genet. 2000, 66:1933-1944) that about 60-70 biallelic markers are
sufficient to
estimate the inflation factor for the test statistics due to population
stratification. Hence,
70 intergenic SNPs can be chosen in unlinked regions as indicated in a genome
scan
(Kehoe et al. Hum. Mol. Genet. 1999, 8:237-245).
Once individual risk factors, genetic or non-genetic, have been found for the
predisposition to disease, the next step is to set up a
classification/prediction scheme to
64

CA 02887830 2015-08-28
CA 2887830
predict the category (for instance, disease or no-disease) that an individual
will be in depending
on his genotypes of associated SNPs and other non-genetic risk factors.
Logistic regression for
discrete trait and linear regression for continuous trait are standard
techniques for such tasks
(Applied Regression Analysis, Draper and Smith, Wiley (1998)). Moreover, other
techniques
can also be used for setting up classification. Such techniques include, but
are not limited to,
MART, CART, neural network, and discriminant analyses that are suitable for
use in
comparing the performance of different methods (The Elements of Statistical
Learning, Hastie,
Tibshirani & Friedman, Springer (2002)).
Disease Diagnosis and Predisposition Screening
Information on association/correlation between genotypes and disease-related
phenotypes can be exploited in several ways. For example, in the case of a
highly statistically
significant association between one or more SNPs with predisposition to a
disease for which
treatment is available, detection of such a genotype pattern in an individual
may justify
immediate administration of treatment, or at least the institution of regular
monitoring of the
individual. Detection of the susceptibility alleles associated with serious
disease in a couple
contemplating having children may also be valuable to the couple in their
reproductive
decisions. In the case of a weaker but still statistically significant
association between a SNP
and a human disease, immediate therapeutic intervention or monitoring may not
be justified
after detecting the susceptibility allele or SNP. Nevertheless, the subject
can be motivated to
begin simple life-style changes (e.g., diet, exercise) that can be
accomplished at little or no cost
to the individual but would confer potential benefits in reducing the risk of
developing
conditions for which that individual may have an increased risk by virtue of
having the
susceptibility allele(s).
The SNPs disclosed herein may contribute to liver fibrosis and related
pathologies in an
individual in different ways. Some polymorphisms occur within a protein coding
sequence and
contribute to disease phenotype by affecting protein structure. Other
polymorphisms occur in
noncoding regions but may exert phenotypic effects indirectly via influence
on, for example,
replication, transcription, and/or translation. A single SNP

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
may affect more than one phenotypic trait. Likewise, a single phenotypic trait
may be
affected by multiple SNPs in different genes.
=
As used herein, the terms "diagnose", "diagnosis", and "diagnostics" include,
but
are not limited to any of the following: detection of liver fibrosis that an
individual may .
presently have, predisposition/susceptibility screening (i.e., determining the
increased
risk of an individual in developing liver fibrosis in the future, or
determining whether an
individual has a decreased risk of developing liver fibrosis in the future,
determining the
rate of progression of fibrosis to bridging fibrosis/cirrhosis), determining a
pafticular type
or subclass of liver fibrosis in an individual known to have liver fibrosis,
confirniing or
reinforcing a previously made diagnosis of liver fibrosis, pharmacogenomic
evaluation of
an individual to determine which therapeutic strategy that individual is most
likely to
positively respond to or to predict whether a patient is likely to respond to
a particular
treatment, predicting whether a patient is likely to experience toxic effects
from a =
particular treatment or therapeutic compound, and evaluating the future
prognosis of an
individual having liver fibrosis. Such diagnostic uses are based on the SNPs
individually
or in a unique combination or SNP haplotypes of the present invention.
Haplotypes are particularly useful in that, for example, fewer SNPs can be
genotyped to determine if a particular genomic region harbors a locus that
influences a
particular phenotype, such as in linkage disequilibriurn-based SNP association
analysis. .
Linkage disequilibrium (LD) refers to the co-inheritance of alleles (e.g.,
alternative nucleotides) at two or more different SNP sites at frequencies
greater than
would be expected from the separate frequencies of occurrence of each allele
in a given
population. The expected frequency of co-occurrence of two alleles that are
inherited
independently is the frequency of the first allele multiplied by the frequency
of the
second allele. Alleles that co-occur at expected frequencies are said to be in
"linkage
equilibrium". In contrast, LD refers to any non-random genetic association
between
allele(s) at two or more different SNP sites, which is generally due to the
physical
proximity of the two loci along a chromosome. LD can occur when two or more
SNPs
sites are in close physical proximity to each other on a given chromosome and
therefore
alleles at these SNP sites will tend to remain unseparated for multiple
generations with
the consequence that a particular nucleotide (allele) at one SNP site will
show a non-
66

CA 02887830 2014-11-27
WO 2005/111241
PCT/US2005/016051
random association with a particular nucleotide (allele) at a different SNP
site located
,
nearby. Hence, genotyping one of the SNP sites will give almost the same
information as
genotyping the other SNP site that is in LD.
Various degrees of LD can be encountered between two or more SNPs with the
result being that some SNPs are more closely associated (i.e., in stronger LD)
than others.
Furthermore, the physical distance over which LD extends along a chromosome
differs =
between different regions of the genome, and therefore the degree of physical
separation
- between two or more SNP sites necessary for ID to occur can differ
between different
=
regions of the genome. =
. For diagnostic purposes and similar uses, if. a particular SNP site is
found to be
useful for diagnosing liver fibrosis and related pathologies (e.g., has a
significant
statistical association with the condition and/or is recognized as a causative
= polymorphism for the condition), then the skilled artisan would recognize
that other SNP
sites which are in ID with this SNP site would also be useful for diagnosing
the
condition. Thus, polymorphisms (e.g., SNPs and/or haplotypes) that are not the
actual
disease-causing (causative) polymorphisms, but are in LD with such causative
polymorphisms, are also useful. In such instances, the genotype of the
polymorphism(s)
that is/are in LD with the causative polymorphism is predictive of the
genotype of the
causative polymorphism and, consequently, predictive of the phenotype (e.g.,
liver
fibrosis) that is influenced by the causative SNP(s). Therefore, polymorphic
markers that
are in LD with causative polymorphisms are useful as diagnostic markers, and
are
particularly useful when the actual causative polymorphism(s) is/are unknown.
Examples of polymorphisms that can be in LD with one or more causative =
=
polymorphisms (and/or in Ti) with one or more polymorphisms that have a
significant =
statistical association with a condition) and therefore. useful for diagnosing
the same
condition that the causative/associated SNP(s) is used to diagnose, include,
for example,
other SNPs in the same gene, protein-coding, or mRNA transcript-coding region
as the
causative/associated SNP, other SNPs in the same exon or same intron as the
=
causative/associated SNP, other SNPs in the same haplotype block as the
causative/associated SNP, other SNPs in the same intergenic region as the
causative/associated SNP, SNPs that are outside but near a gene (e.g., within
6kb on
67

CA 02887830 2015-08-28
CA 2887830
either side, 5' or 3', of a gene boundary) that harbors a causative/associated
SNP, etc. Such
useful LD SNPs can be selected from among the SNPs disclosed in Tables 1-2,
for example.
Linkage disequilibrium in the human genome is reviewed in: Wall et at.,
"Haplotype
blocks and linkage disequilibrium in the human genome", Nat Rev Genet. 2003
Aug;4(8):587-
97; Garner et al., "On selecting markers for association studies: patterns of
linkage
disequilibrium between two and three diallelic loci", Genet Epidemiol. 2003
Jan;24(1):57-67;
Ardlie et at., "Patterns of linkage disequilibrium in the human genome", Nat
Rev Genet. 2002
Apr;3(4):299-309 (erratum in Nat Rev Genet 2002 Juk3(7):566); and Remm et al.,
"High-
density genotyping and linkage disequilibrium in the human genome using
chromosome 22 as a
model"; Carr Opm Chem Biol. 2002 Feb;6(1):24-30.
The contribution or association of particular SNPs and/or SNP haplotypes with
disease
phenotypes, such as liver fibrosis, enables the SNPs disclosed herein to be
used to develop
superior diagnostic tests capable of identifying individuals who express a
detectable trait, such
as liver fibrosis, as the result of a specific genotype, or individuals whose
genotype places them
at an increased or decreased risk of developing a detectable trait at a
subsequent time as
compared to individuals who do not have that genotype. As described herein,
diagnostics may
be based on a single SNP or a group of SNPs. Combined detection of a plurality
of SNPs (for
example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
24, 25, 30, 32, 48, 50,
64, 96, 100, or any other number in-between, or more, of the SNPs provided in
Table 1 and/or
Table 2) typically increases the probability of an accurate diagnosis. For
example, the presence
of a single SNP known to correlate with liver fibrosis might indicate a
probability of 20% that
an individual has or is at risk of developing liver fibrosis, whereas
detection of five SNPs, each
of which correlates with liver fibrosis, might indicate a probability of 80%
that an individual
has or is at risk of developing liver fibrosis. To further increase the
accuracy of diagnosis or
predisposition screening, analysis of the SNPs disclosed herein can be
combined with that of
other polymorphisms or other risk factors of liver fibrosis, such as disease
symptoms,
pathological characteristics, family history, diet, environmental factors or
lifestyle factors.
It will, of course, be understood by practitioners skilled in the treatment or
diagnosis of
liver fibrosis that the present disclosure generally does not intend to
provide an absolute
identification of individuals who are at risk (or less at risk) of developing
liver fibrosis, and/or
68

CA 02887830 2015-08-28
CA 2887830
pathologies related to liver fibrosis, but rather to indicate a certain
increased (or decreased)
degree or likelihood of developing the disease based on statistically
significant association
results. However, this information is extremely valuable as it can be used to,
for example,
initiate preventive treatments or to allow an individual carrying one or more
significant SNPs
or SNP haplotypes to foresee warning signs such as minor clinical symptoms, or
to have
regularly scheduled physical exams to monitor for appearance of a condition in
order to
identify and begin treatment of the condition at an early stage. Particularly
with diseases that
are extremely debilitating or fatal if not treated on time, the knowledge of a
potential
predisposition, even if this predisposition is not absolute, would likely
contribute in a very
significant manner to treatment efficacy.
The diagnostic techniques disclosed herein may employ a variety of
methodologies to
determine whether a test subject has a SNP or a SNP pattern associated with an
increased or
decreased risk of developing a detectable trait or whether the individual
suffers from a
detectable trait as a result of a particular polymorphism/mutation, including,
for example,
methods which enable the analysis of individual chromosomes for haplotyping,
family studies,
single sperm DNA analysis, or somatic hybrids. The trait analyzed using the
diagnostics of the
invention may be any detectable trait that is commonly observed in pathologies
and disorders
related to liver fibrosis.
Another aspect of the present disclosure relates to a method of determining
whether an
individual is at risk (or less at risk) of developing one or more traits or
whether an individual
expresses one or more traits as a consequence of possessing a particular trait-
causing or trait-
influencing allele. These methods generally involve obtaining a nucleic acid
sample from an
individual and assaying the nucleic acid sample to determine which
nucleotide(s) is/are present
at one or more SNP positions, wherein the assayed nucleotide(s) is/are
indicative of an
increased or decreased risk of developing the trait or indicative that the
individual expresses the
trait as a result of possessing a particular trait-causing or trait-
influencing allele.
In another embodiment, the SNP detection reagents described herein are used to
determine whether an individual has one or more SNP allele(s) affecting the
level (e.g., the
concentration of mRNA or protein in a sample, etc.) or pattern (e.g., the
kinetics of expression,
rate of decomposition, stability profile, Km, Vmax, etc.) of gene expression
(collectively, the
69

CA 02887830 2015-08-28
CA 2887830
"gene response" of a cell or bodily fluid). Such a determination can be
accomplished by
screening for mRNA or protein expression (e.g., by using nucleic acid arrays,
RT-PCR,
TaqMan assays, or mass spectrometry), identifying genes having altered
expression in an
individual, genotyping SNPs disclosed in Table 1 and/or Table 2 that could
affect the
expression of the genes having altered expression (e.g., SNPs that are in
and/or around the
gene(s) having altered expression, SNPs in regulatory/control regions, SNPs in
and/or around
other genes that are involved in pathways that could affect the expression of
the gene(s) having
altered expression, or all SNPs could be genotyped), and correlating SNP
genotypes with
altered gene expression. In this manner, specific SNP alleles at particular
SNP sites can be
identified that affect gene expression.
Pharmacogenomics and Therapeutics/Drug Development
The present disclosure is also of methods for assessing the pharmacogenomics
of a
subject harboring particular SNP alleles or haplotypes to a particular
therapeutic agent or
pharmaceutical compound, or to a class of such compounds. Pharmacogenomics
deals with the
roles which clinically significant hereditary variations (e.g., SNPs) play in
the response to drugs
due to altered drug disposition and/or abnormal action in affected persons.
See, e.g., Roses,
Nature 405, 857-865 (2000); Gould Rothberg, iVature Biotechnology 19, 209-211
(2001);
Eichelbaum, Clin. Exp. Pharmacol. Physiol. 23(10-14983-985 (1996); and Linder,
Clin. Chem.
43(2):254-266 (1997). The clinical outcomes of these variations can result in
severe toxicity of
therapeutic drugs in certain individuals or therapeutic failure of drugs in
certain individuals as a
result of individual variation in metabolism. Thus, the SNP genotype of an
individual can
determine the way a therapeutic compound acts on the body or the way the body
metabolizes the
compound. For example, SNPs in drug metabolizing enzymes can affect the
activity of these
enzymes, which in turn can affect both the intensity and duration of drug
action, as well as drug
metabolism and clearance.
The discovery of SNPs in drug metabolizing enzymes, drug transporters,
proteins for
pharmaceutical agents, and other drug targets has explained why some patients
do not obtain the
expected drug effects, show an exaggerated drug effect, or experience serious
toxicity from
standard drug dosages. SNPs can be expressed in the phenotype of the extensive
metabolizer and

CA 02887830 2015-08-28
CA 2887830
in the phenotype of the poor metabolizer. Accordingly, SNPs may lead to
allelic variants of a
protein in which one or more of the protein functions in one population are
different from those in
another population. SNPs and the encoded variant peptides thus provide targets
to ascertain a
genetic predisposition that can affect treatment modality. For example, in a
ligand-based
treatment, SNPs may give rise to amino terminal extracellular domains and/or
other ligand-
binding regions of a receptor that are more or less active in ligand binding,
thereby affecting
subsequent protein activation. Accordingly, ligand dosage would necessarily be
modified to
maximize the therapeutic effect within a given population containing
particular SNP alleles or
haplotypes.
As an alternative to genotyping, specific variant proteins containing variant
amino acid
sequences encoded by alternative SNP alleles could be identified. Thus,
pharmacogenomic
characterization of an individual permits the selection of effective compounds
and effective
dosages of such compounds for prophylactic or therapeutic uses based on the
individual's SNP
genotype, thereby enhancing and optimizing the effectiveness of the therapy.
Furthermore, the
production of recombinant cells and transgenic animals containing particular
SNPs/haplotypes
allow effective clinical design and testing of treatment compounds and dosage
regimens. For
example, transgenic animals can be produced that differ only in specific SNP
alleles in a gene that
is orthologous to a human disease susceptibility gene.
Pharmacogenomic uses of the SNPs of the present disclosure provide several
significant
advantages for patient care, particularly in treating liver fibrosis.
Pharmacogenomic
characterization of an individual, based on an individual's SNP genotype, can
identify those
individuals unlikely to respond to treatment with a particular medication and
thereby allows
physicians to avoid prescribing the ineffective medication to those
individuals. On the other hand,
SNP genotyping of an individual may enable physicians to select the
appropriate medication and
dosage regimen that will be most effective based on an individual's SNP
genotype. This
information increases a physician's confidence in prescribing medications and
motivates patients
to comply with their drug regimens. Furthermore, pharmacogenomics may identify
patients
predisposed to toxicity and adverse reactions to particular drugs or drug
dosages. Adverse drug
reactions lead to more than 100,000 avoidable deaths per year in the United
States alone and
therefore represent a significant cause of hospitalization and death, as well
as a significant
71

CA 02887830 2015-08-28
CA 2887830
economic burden on the healthcare system (Pfost et. al., Trends in
Biotechnology, Aug. 2000.).
Thus, pharmacogenomics based on the SNPs disclosed herein has the potential to
both save lives
and reduce healthcare costs substantially.
Pharmacogenomics in general is discussed further in Rose et al.,
"Pharmacogenetic
analysis of clinically relevant genetic polymorphisms", Methods Mol Med.
2003;85:225-37.
Pharmacogenomics as it relates to Alzheimer's disease and other
neurodegenerative disorders
is discussed in Cacabelos, "Pharmacogenomics for the treatment of dementia",
Ann Med.
2002;34(5):357-79, Maimone et al., "Pharmacogenomics of neurodegenerative
diseases", Eur J
Pharmacol. 2001 Feb 9;413(411-29, and Poirier, "Apolipoprotein E: a
pharmacogenetic
target for the treatment of Alzheimer's disease", Mol Diagn. 1999 Dec;4(4):335-
41.
Pharmacogenomics as it relates to cardiovascular disorders is discussed in
Siest et al.,
"Pharmacogenomics of drugs affecting the cardiovascular system", Clin Chem Lab
Med. 2003
Apr;41(4):590-9, Mukherjee et al., "Pharmacogenomics in cardiovascular
diseases", Frog
Cardiovasc Dis. 2002 May-Jun;44(6):479-98, and Mooser et al., "Cardiovascular
pharmacogenetics in the SNP era", J Thromb Haemost. 2003 Jul;1(7):1398-402.
Pharmacogenomics as it relates to cancer is discussed in McLeod et al.,
"Cancer
pharmacogenomics: SNPs, chips, and the individual patient", Cancer Invest.
2003;21(4):630-
40 and Watters et al., "Cancer pharmacogenomics: current and future
applications", Biochint
Biophys Acta. 2003 Mar 17;1603(2):99-111.
The SNPs disclosed herein may also be used to identify novel therapeutic
targets for
liver fibrosis. For example, genes containing the
72

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
disease-associated variants ("variant genes") or their products, as well as
genes or their products that are directly or indirectly regulated by or =
interacting with these variant genes or their products, can be targeted for
the
-development of therapeutics that, for example, treat the disease, orprevent
or
delay disease onset. The therapeutics may be composed of, for example,
small molecules, proteins, protein fragments or peptides, antibodies, nucleic
acids, or their derivatives or raimetics which modulate the functions or
levels
of the target genes or gene products.
The SNP-containing nucleic acid molecules disclosed herein, and their =
'complementary nucleic acid molecules, may be used as antisense constructs to
control
gene expression in cells, tissues, and organisms. Antisense technology is well
established
in the art and extensively reviewed in Antisense Drug Technology: Principles,
Strategies,
and Applications, Crooke (ed.), Marcel Dekker, Inc.: New York (2001). An
antisense
' nucleic acid molecule is generally designed to be complementary to a region
of mRNA . .
'expressed by a gene so that the antisense molecule hybridizes to the mRNA and
thereby
blocks translation of mRNA into protein. Various classes of antisense
oligonucleotides
are used in the art, two of which are cleavers and blockers. Cleavers, by
binding to target
RNAs, activate intracellular nucleases (e.g., RNaseH or RNase L) that cleave
the target
RNA. Blockers, which. also bind to target RNAs, inhibit protein translation
through steric
hindrance of ribosomes. Exemplary blockers include peptide nucleic acids,
morpholinos,
locked nucleic acids, and methylphosphonates (see, e.g., Thompson, Drug
Discovery
Today, 7 (17): 912-917 (2002)). Antisense oligonucleotides are directly useful
as
therapeutic agents, and are also useful for determining and validating gene
function (e.g.,
in gene knock-out or knock-down experiments).
Antisense technology is further reviewed in: Lavery et al., "Antisense and
RNAi:
powerful tools in drug target discovery and validation", 'Curr Opin Drug
Discov Devel.
2003 Jul;6(4):561-9; Stephens et al., "Antisense oligonucleotide therapy in
cancer", Curr
Opin Mol Then 2003 Apr;5(2):118-22; Kurreck, "Antisense technologies..
Improvement
through novel chemical modifications", Eur J Biochem. 2003 Apr;270(8):1628-44;
Dias
et al., "Antisense oligonucleotides: basic concepts and mechanisms",
Mol.Cancer Then
2002 Mar;1(5):347-55; Chen, "Clinical development of antisense
oligonucleotides as
73

CA 02887830 2015-08-28
CA 2887830
anti-cancer therapeutics", Methods Mol Med. 2003;75:621-36; Wang et al., "Anti
sense
anticancer oligonucleotide therapeutics", Curr Cancer Drug Targets. 2001
Nov;1(3):177-96;
and Bennett, "Efficiency of antisense oligonuelcotide drug discovery",
Antisense Nucleic Acid
Drug Dev. 2002 Jun;12(3):215-24.
The SNPs disclosed herein may be particularly useful for designing antisense
reagents
that are specific for particular nucleic acid variants. Based on the SNP
information disclosed
herein, antisense oligonucleotides can be produced that specifically target
mRNA molecules
that contain one or more particular SNP nucleotides. In this manner,
expression of mRNA
molecules that contain one or more undesired polymorphisms (e.g., SNP
nucleotides that lead
to a defective protein such as an amino acid substitution in a catalytic
domain) can be inhibited
or completely blocked. Thus, antisense oligonucleotides can be used to
specifically bind a
particular polymorphic form (e.g., a SNP allele that encodes a defective
protein), thereby
inhibiting translation of this form, but which do not bind an alternative
polymorphic form (e.g.,
an alternative SNP nucleotide that encodes a protein having normal function).
Antiscnse molecules can be used to inactivate mRNA in order to inhibit gene
expression and production of defective proteins. Accordingly, these molecules
can be used to
treat a disorder, such as liver fibrosis, characterized by abnormal or
undesired gene expression
or expression of certain defective proteins. This technique can involve
cleavage by means of
ribozymes containing nucleotide sequences complementary to one or more regions
in the
mRNA that attenuate the ability of the mRNA to be translated. Possible mRNA
regions
include, for example, protein-coding regions and particularly protein-coding
regions
corresponding to catalytic activities, substrate/ligand binding, or other
functional activities of a
protein.
The SNPs disclosed herein may also be useful for designing RNA interference
reagents
that specifically target nucleic acid molecules having particular SNP
variants. RNA
interference (RNAi), also referred to as gene silencing, is based on using
double-stranded RNA
(dsRNA) molecules to turn genes off. When introduced into a cell, dsRNAs are
processed by
the cell into short fragments (generally about 21, 22, or 23 nucleotides in
length) known as
small interfering RNAs (siRNAs) which the cell uses in a sequence-specific
manner to
recognize and destroy complementary RNAs (Thompson, Drug Discovery Today, 7
(17): 912-
74

CA 02887830 2015-08-28
CA 2887830
917*(2002)). Accordingly, an aspect of the present disclosure specifically
contemplates
isolated nucleic acid molecules that are about 18-26 nucleotides in length,
preferably 19-25
nucleotides in length, and more preferably 20, 21, 22, or 23 nucleotides in
length, and the use
of these nucleic acid molecules for RNAi. Because RNAi molecules, including
siRNAs, act in
a sequence-specific manner, the SNPs of the present invention can be used to
design RNAi
reagents that recognize and destroy nucleic acid molecules having specific SNP

alleles/nucleotides (such as deleterious alleles that lead to the production
of defective proteins),
while not affecting nucleic acid molecules having alternative SNP alleles
(such as alleles that
encode proteins having normal function). As with antisense reagents, RNAi
reagents may be
directly useful as therapeutic agents (e.g., for turning off defective,
disease-causing genes), and
are also useful for characterizing and validating gene function (e.g., in gene
knock-out or
knock-down experiments).
The following references provide a further review of RNAi: Reynolds et at.,
"Rational
siRNA design for RNA interference", Nat Biotechtiot. 2004 Mar;22(3):326-30.
Epub 2004 Feb
01; Chi et at., "Genomewide view of gene silencing by small interfering RNAs",
PNAS
100(11):6343-6346, 2003; Vickers et al., "Efficient Reduction of Target RNAs
by Small
Interfering RNA and RNase H-dependent Antisense Agents", J. Biol. Chem. 278:
7108-7118,
2003; Againi, "RNAi and related mechanisms and their potential use for
therapy", Curr Opin
Chem Biol. 2002 Dec;6(6):829-34; Lavery et at., "Antisense and RNAi: powerful
tools in drug
target discovery and validation", Curr Opin Drug Discov Devel. 2003
Jul;6(4):561-9; Shi,
"Mammalian RNAi for the masses", Trends Genet 2003 Jan;19(1):9-12), Shuey et
at., "RNAi:
gene-silencing in therapeutic intervention", Drug Discovery Today 2002
Oct:7(20):1040-1046;
McManus et al., Nat Rev Genet 2002 Oct;3(10):737-47; Xia et al., Nat
Biotechnol 2002
Oct;20(10):1006-10; Plasterk et al., Carr Opin Genet Dev 2000 Oct;10(5):562-7;
Bosher et at.,
Nat Cell Biol 2000 Feb;2(2):E31-6; and Hunter, Cuff Biol 1999 Jun
17;9(12):R440-2).
A subject suffering from a pathological condition, such as liver fibrosis,
ascribed to a
SNP may be treated so as to correct the genetic defect (see Kren et al., Proc.
Natl. Acad. Sci.
USA 96:10349-10354 (1999)). Such a subject can be identified by any method
that can detect
the polymorphism in a biological sample drawn from the subject. Such a genetic
defect may be
permanently corrected by administering to such a subject a nucleic acid
fragment incorporating

CA 02887830 2015-08-28
= CA 2887830
a repair sequence that supplies the normal/wild-type nucleotide at the
position of the SNP. This
site-specific repair sequence can encompass an RNA/DNA oligonucleotide that
operates to
promote endogenous repair of a subject's genomic DNA. The site-specific repair
sequence is
administered in an appropriate vehicle, such as a complex with
polyethylenimine, encapsulated
in anionic liposomes, a viral vector such as an adenovirus, or other
pharmaceutical composition
that promotes intracellular uptake of the administered nucleic acid. A genetic
defect leading to
an inborn pathology may then be overcome, as the chimeric oligonucleotides
induce
incorporation of the normal sequence into the subject's genome. Upon
incorporation, the
normal gene product is expressed, and the replacement is propagated, thereby
engendering a
permanent repair and therapeutic enhancement of the clinical condition of the
subject.
In cases in which a cSNP results in a variant protein that is ascribed to be
the cause of,
or a contributing factor to, a pathological condition, a method of treating
such a condition can
include administering to a subject experiencing the pathology the wild-
type/normal cognate of
the variant protein. Once administered in an effective dosing regimen, the
wild-type cognate
provides complementation or remediation of the pathological condition.
The disclosure further is of a method for identifying a compound or agent that
can be used
to treat liver fibrosis. The SNPs disclosed herein may be useful as targets
for the identification
and/or development of therapeutic agents. A method for identifying a
therapeutic agent or
compound typically includes assaying the ability of the agent or compound to
modulate the
activity and/or expression of a SNP-containing nucleic acid or the encoded
product and thus
identifying an agent or a compound that can be used to treat a disorder
characterized by undesired
activity or expression of the SNP-containing nucleic acid or the encoded
product. The assays can
be performed in cell-based and cell-free systems. Cell-based assays can
include cells naturally
expressing the nucleic acid molecules of interest or recombinant cells
genetically engineered to
express certain nucleic acid molecules.
Variant gene expression in a liver fibrosis patient can include, for example,
either
expression of a SNP-containing nucleic acid sequence (for instance, a gene
that contains a SNP
can be transcribed into an mRNA transcript molecule containing the SNP, which
can in turn be
translated into a variant protein) or altered expression of a normal/wild-type
nucleic acid sequence
76

CA 02887830 2015-08-28
CA 2887830
due to one or more SNPs (for instance, a regulatory/control region can contain
a SNP that affects
the level or pattern of expression of a normal transcript).
Assays for variant gene expression can involve direct assays of nucleic acid
levels (e.g.,
mRNA levels), expressed protein levels, or of collateral compounds involved in
a signal pathway.
Further, the expression of genes that are up- or down-regulated in response to
the signal pathway
can also be assayed. In this embodiment, the regulatory regions of these genes
can be operably
linked to a reporter gene such as luciferasc.
Modulators of variant gene expression can be identified in a method wherein,
for example,
a cell is contacted with a candidate compound/agent and the expression of mRNA
determined.
The level of expression of mRNA in the presence of the candidate compound is
compared to the
level of expression of mRNA in the absence of the candidate compound. The
candidate
compound can then be identified as a modulator of variant gene expression
based on this
comparison and be used to treat a disorder such as liver fibrosis that is
characterized by variant
gene expression (e.g., either expression of a SNP-containing nucleic acid or
altered expression of a
normal/wild-type nucleic acid molecule due to one or more SNPs that affect
expression of the
nucleic acid molecule) due to one or more SNPs as disclosed herein. When
expression of mRNA
is statistically significantly greater in the presence of the candidate
compound than in its absence,
the candidate compound is identified as a stimulator of nucleic acid
expression. When nucleic
acid expression is statistically significantly less in the presence of the
candidate compound than in
its absence, the candidate compound is identified as an inhibitor of nucleic
acid expression.
The disclosure is further of methods of treatment, with the SNP or associated
nucleic acid
domain (e.g., catalytic domain, ligand/substrate-binding domain,
regulatory/control region, etc.) or
gene, or the encoded mRNA transcript, as a target, using a compound identified
through drug
screening as a gene modulator to modulate variant nucleic acid expression.
Modulation can
include either up-regulation (i.e., activation or agonization) or down-
regulation (i.e., suppression
or antagonization) of nucleic acid expression.
Expression of mRNA transcripts and encoded proteins, either wild type or
variant, may be
altered in individuals with a particular SNP allele in a regulatory/control
element, such as a
promoter or transcription factor binding domain, that regulates expression. In
this situation,
methods of treatment and compounds can be identified, as discussed herein,
that regulate or
77

CA 02887830 2015-08-28
CA 2887830
overcome the variant regulatory/control element, thereby generating normal, or
healthy,
expression levels of either the wild type or variant protein.
The SNP-containing nucleic acid molecules described herein may also be useful
for
monitoring the effectiveness of modulating compounds on the expression or
activity of a variant
gene, or encoded product, in clinical trials or in a treatment regimen. Thus,
the gene expression
pattern can serve as an indicator for the continuing effectiveness of
treatment with the compound,
particularly with compounds to which a patient can develop resistance, as well
as an indicator for
toxicities. The gene expression pattern can also serve as a marker indicative
of a physiological
response of the affected cells to the compound. Accordingly, such monitoring
would allow either
increased administration of the compound or the administration of alternative
compounds to which
the patient has not become resistant. Similarly, if the level of nucleic acid
expression falls below a
desirable level, administration of the compound could be commensurately
decreased.
In another aspect of the present disclosure, there is provided a
pharmaceutical pack
comprising a therapeutic agent (e.g., a small molecule drug, antibody,
peptide, antisense or
RNAi nucleic acid molecule, etc.) and a set of instructions for administration
of the therapeutic
agent to humans diagnostically tested for one or more SNPs or SNP haplotypes
as described
herein.
The SNPs/haplotypes described herein may also be useful for improving many
different
aspects of the drug development process. For instance, an aspect of the
present disclosure
includes selecting individuals for clinical trials based on their SNP
genotype. For example,
individuals with SNP genotypes that indicate that they are likely to
positively respond to a drug
can be included in the trials, whereas those individuals whose SNP genotypes
indicate that they
are less likely to or would not respond to the drug, or who are at risk for
suffering toxic effects
or other adverse reactions, can be xcluded from the clinical trials. This not
only can improve
the safety of clinical trials, but also can enhance the chances that the trial
will demonstrate
statistically significant efficacy. Furthermore, the SNPs disclosed herein may
explain why
certain previously developed drugs performed poorly in clinical trials and may
help identify a
subset of the population that would benefit from a drug that had previously
performed poorly in
clinical trials, thereby "rescuing" previously developed drugs, and enabling
the drug to be made
available to a particular liver fibrosis patient population that can benefit
from it.
78

CA 02887830 2015-08-28
CA 2887830
SNPs have many important uses in drug discovery, screening, and development. A
high
probability exists that, for any gene/protein selected as a potential drug
target, variants of that
gene/protein will exist in a patient population. Thus, determining the impact
of gene/protein
variants on the selection and delivery of a therapeutic agent should be an
integral aspect of the
drug discovery and development process. (Jazwinska, A Trends Guide to Genetic
Variation and
Genomic Medicine, 2002 Mar; S30-S36).
Knowledge of variants (e.g., SNPs and any corresponding amino acid
polymorphisms)
of a particular therapeutic target (e.g., a gene, mRNA transcript, or protein)
enables parallel
screening of the variants in order to identify therapeutic candidates (e.g.,
small molecule
compounds, antibodies, antisense or RNAi nucleic acid compounds, etc.) that
demonstrate
efficacy across variants (Rothberg, Nat Biotechnol 2001 Mar;19(3):209-11).
Such therapeutic
candidates would be expected to show equal efficacy across a larger segment of
the patient
population, thereby leading to a larger potential market for the therapeutic
candidate.
Furthermore, identifying variants of a potential therapeutic target enables
the most
common form of the target to be used for selection of therapeutic candidates,
thereby helping to
ensure that the experimental activity that is observed for the selected
candidates reflects the real
activity expected in the largest proportion of a patient population
(Jazwinska, A Trends Guide
to Genetic Variation and Genomic Medicine, 2002 Mar; S30-S36).
Additionally, screening therapeutic candidates against all known variants of a
target can
enable the early identification of potential toxicities and adverse reactions
relating to particular
variants. For example, variability in drug absorption, distribution,
metabolism and excretion
(ADME) caused by, for example, SNPs in therapeutic targets or drug
metabolizing genes, can
be identified, and this information can be utilized during the drug
development process to
minimize variability in drug disposition and develop therapeutic agents that
are safer across a
wider range of a patient population. The SNPs disclosed herein, including the
variant proteins
and encoding polymorphic nucleic acid molecules provided in Tables 1-2, may be
useful in
conjunction with a variety of toxicology methods established in the art, such
as those set forth
in Current Protocols in Toxicology, John Wiley & Sons, Inc., N.Y.
Furthermore, therapeutic agents that target any art-known proteins (or nucleic
acid
molecules, either RNA or DNA) may cross-react with the variant proteins (or
polymorphic
79

CA 02887830 2015-08-28
CA 2887830
nucleic acid molecules) disclosed in Table 1, thereby significantly affecting
the
pharmacokinetic properties of the drug. Consequently, the protein variants and
the SNP-
containing nucleic acid molecules disclosed in Tables 1-2 may be useful in
developing,
screening, and evaluating therapeutic agents that target corresponding art-
known protein forms
(or nucleic acid molecules). Additionally, as discussed above, knowledge of
all polymorphic
forms of a particular drug target enables the design of therapeutic agents
that are effective
against most or all such polymorphic forms of the drug target.
Pharmaceutical Compositions and Administration Thereof
Any of the liver fibrosis-associated proteins, and encoding nucleic acid
molecules,
disclosed herein can be used as therapeutic targets (or directly used
themselves as therapeutic
compounds) for treating liver fibrosis and related pathologies, and the
present disclosure
enables therapeutic compounds (e.g., small molecules, antibodies, therapeutic
proteins, RNAi
and antisense molecules, etc.) to be developed that target (or are comprised
of) any of these
therapeutic targets.
In general, a therapeutic compound will be administered in a therapeutically
effective
amount by any of the accepted modes of administration for agents that serve
similar utilities.
The actual amount of the therapeutic compound described herein, i.e., the
active ingredient,
will depend upon numerous factors such as the severity of the

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
disease to be treated, the age and relative health of the subject, the potency
of the
.1
.compound used, the route and form of administration, and other factors. =
,
Therapeutically effective amounts of therapeutic compounds may range from, for

example, approximately 0.01-50 mg per kilogram body weight of the recipient
per day;
= 5 preferably about 0.1-20 mg/kg/day. Thus, as an example, for
administration to a70 kg
person, the dosage range would most preferably be about 7 mg to 1.4 g per day:
= In general, therapeutic compounds will be administered as pharmaceutical
compositions by any one of the following routes: oral, systemic (e.g.,
transdermal,
intranasal, or by suppository), or parenteral (e:g., intramuscular,
intravenous, or
= , 10 subcutaneous) administration. The preferred manner of administration
is oral or
parenteral using a convenient daily dosage regimen, which can be adjusted
according to
the degree of affliction. Oral compositions can take the form of tablets,
pills, capsules, =
semisolids, powders, sustained release formulations, solutions, suspensions,
elixirs, ,
aerosols, or any other appropriate compositions.
15 The choice of formulation depends on various factors such as the mode
of drug
administration (e.g., for oral administration, formulations in the form of
tablets; pills, or= .
capsules are preferred) and the bioavailability, of the drug substance.
Recently,
pharmaceutical formulations have been developed especially for drugs that show
poor
bioavailability based upon the principle that bioavailability can be increased
by
20 increasing the surface area, i.e., decreasing particle size. For
example, U.S. Patent No.
4,107,288 describes a pharmaceutical formulation having particles in the size
range from
to 1,000 mil in which the active material is supported on a cross-linked
matrix of
macromolecules. U.S. Patent No. 5,145,684 describes the production of a
pharmaceutical
formulation in which the drug substance is pulverized to nanoparticles
(average particle
25 size of 400 urn) in the presence of a surface modifier and then
dispersed in a liquid
medium to give a pharmaceutical formulation that exhibits remarkably high
bioavailability.
Pharmaceutical compositions are comprised of, in general, a therapeutic =
compound in combination with at least one pharmaceutically acceptable
excipient.
30 Acceptable excipients are non-toxic, aid administration, and do not
adversely affect the
therapeutic benefit of the therapeutic compound. Such excipients may be any
solid,
81

CA 02887830 2015-08-28
CA 2887830
liquid, semi-solid or, in the case of an aerosol composition, gaseous
excipient that is generally
available to one skilled in the art.
Solid pharmaceutical excipients include starch, cellulose, talc, glucose,
lactose, sucrose,
gelatin, malt, rice, flour, chalk, silica gel, magnesium stearate, sodium
stearate, glycerol
monostearate, sodium chloride, dried skim milk and the like. Liquid and
semisolid excipients
may be selected from glycerol, propylene glycol, water, ethanol and various
oils, including
those of petroleum, animal, vegetable or synthetic origin, e.g., peanut oil,
soybean oil, mineral
oil, sesame oil, etc. Preferred liquid carriers, particularly for injectable
solutions, include water,
saline, aqueous dextrose, and glycols.
Compressed gases may be used to disperse a compound described herein in
aerosol
form. Inert gases suitable for this purpose are nitrogen, carbon dioxide, etc.
Other suitable pharmaceutical excipients and their formulations are described
in
Remington's Pharmaceutical Sciences, edited by E. W. Martin (Mack Publishing
Company,
18th ed., 1990).
The amount of the therapeutic compound in a formulation can vary within the
full range
employed by those skilled in the art. Typically, the formulation will contain,
on a weight
percent (wt %) basis, from about 0.01-99.99 wt % of the therapeutic compound
based on the
total formulation, with the balance being one or more suitable pharmaceutical
excipients.
Preferably, the compound is present at a level of about 1-80 wt %.
Therapeutic compounds can be administered alone or in combination with other
therapeutic compounds or in combination with one or more other active
ingredient(s). For
example, an inhibitor or stimulator of a liver fibrosis-associated protein can
be administered in
combination with another agent that inhibits or stimulates the activity of the
same or a different
liver fibrosis-associated protein to thereby counteract the affects of liver
fibrosis.
For further information regarding pharmacology, see Current Protocols in
Pharmacology, John Wiley & Sons, Inc., N.Y.
82

CA 02887830 2015-08-28
CA 2887830
Human Identification Applications
In addition to their diagnostic and therapeutic uses in liver fibrosis and
related
pathologies, the SNPs disclosed herein may also be useful as human
identification markers for
such applications as forensics, paternity testing, and biometrics (see, e.g.,
Gill, "An assessment
of the utility of single nucleotide polymorphisms (SNPs) for forensic
purposes", lid J Legal
Med. 2001;114(4-5):204-10). Genetic variations in the nucleic acid sequences
between
individuals can be used as genetic markers to identify individuals and to
associate a biological
sample with an individual. Determination of which nucleotides occupy a set of
SNP positions
in an individual identifies a set of SNP markers that distinguishes the
individual. The more
SNP positions that are analyzed, the lower the probability that the set of
SNPs in one individual
is the same as that in an unrelated individual. Preferably, if multiple sites
are analyzed, the
sites are unlinked (Le., inherited independently). Thus, preferred sets of
SNPs can be selected
from among the SNPs disclosed herein, which may include SNPs on different
chromosomes,
SNPs on different chromosome arms, and/or SNPs that are dispersed over
substantial distances
along the same chromosome arm.
Furthermore, among the SNPs disclosed herein, preferred SNPs for use in
certain
forensic/human identification applications include SNPs located at degenerate
codon positions
(i.e., the third position in certain codons which can be one of two or more
alternative
nucleotides and still encode the same amino acid), since these SNPs do not
affect the encoded
protein. SNPs that do not affect the encoded protein are expected to be under
less selective
pressure and are therefore expected to be more polymorphic in a population,
which is typically
an advantage for forensic/human identification applications. However, for
certain
forensics/human identification applications, such as predicting phenotypic
characteristics (e.g.,
inferring ancestry or inferring one or more physical characteristics of an
individual) from a
DNA sample, it may be desirable to utilize SNPs that affect the encoded
protein.
For many of the SNPs disclosed in Tables 1-2 (which are identified as
"Applera" SNP
source), Tables 1-2 provide SNP allele frequencies obtained by re-sequencing
the DNA of
chromosomes from 39 individuals (Tables 1-2 also provide allele frequency
information for
"Celera7 source SNPs and, where available, public SNPs from dbEST,
83

CA 02887830 2014-11-27
WO 2005/111241
PCT/1JS2005/016051
EIGBASE, and/or HOOD). The allele frequencies provided in Tables 1-2 enable
these
,
SNP g to be readily used for human identification applications. Although any
SNP
disclosed in Table 1 and/or Table 2 could be used for human identification,
the closer that
the frequency of the minor allele at a particular SNP site is to 50%, the
greater the ability
of that SNP to discriminate between different individuals in a population
since it becomes
increasingly likely that two randomly selected individuals would have
different alleles at '
that SNP site. Using the SNP allele frequencies provided in Tables 1-2, one of
ordinary = -
skill in the art could readily select a subset of SNPs for which the frequency
of the minor
allele is, for example, at least 1%, 2%, 5%, 10%, 20%, 25%, 30%, 40%, 45%, or
50%, or
.10 - any other frequency in-between. Thus, since Tables 1-2 provide allele
frequencies based
on the re-sequencing of the chromosomes from 39 individuals, a subset of SNPs
could
= readily be selected for human identification in which the total allele
count of the minor
allele at a particular SNP.site is, for example, at least 1, 2,4, 8, 10, 16,
20, 24, 30, 32, 36,
. 38, 39, 40, or any other number in-between. =
Furthermore, Tabled 1-2 also provide population group (interchangeably
referred
to herein as ethnic or racial groups) information coupled with the extensive
allele
frequency information. For example, the group of 39 individuals whose DNA was
re-
sequenced was made-up of 20 Caucasians and 19 African-Americans. This
population =
= = . group information .enables further refinement of SNP selection for
human identification.
For example, preferred SNPs for human identification can be'selected from
Tables 1-2
that have similar allele frequencies in both the Caucasian and African-
American
populations; thus, for example, SNPs can be selected that have equally high
discriminatory power in both populations. Alternatively, SNPs can be selected
for which
there is a statistically significant difference in allele frequencies between
the Caucasian
and African-American populations (as an extreme example, a particular allele
may be
observed only in either the Caucasian or the African-American population group
but not
observed in the other population group); such SNPs are useful, for example,
for
predicting the race/ethnicity of an unknown perpetrator from a biological
sample such as
a hair or blood stain recovered at a crime scene. For a discussion of using
SNPs to
predict ancestry from a DNA sample, including statistical methods, see
Frudakis et al.,
84

CA 02887830 2014-11-27
WO 2005/111241
PCT/US2005/016051
"A Classifier for the SNP-Based Inference of Ancestry", Journal of Forensic
Sciences
2063; 48(4:771-782. . .
SNPs have numerous advantages over other types of polymorphic markers, such
as short tandem repeats (STRs). For example, SNPs can be easily scored and are
amenable to automation, making SNPs the markers of choice for large-scale
forensic
databases. SNPs are found in much greater abundance throughout the genome than

repeat polymorphisms. Population frequencies of two polymorphic forms can
usually be
determined with greater accuracy than those of multiple polymorphic forms at
multi-
allelic loci. SNPs are mutationaly more stable than repeat polymorphisms. SNPs
are not
. susceptible to artefacts such as stutter bands that can hinder analysis.
Stutter bands are
= frequently encountered when analyzing repeat polymorphisms, and are
particularly
troublesome when analyzing samples such as crime scene samples that may
contain
mixtures of DNA from multiple sources. Another significant advantage of SNP
markers . .
over STR markers is the much shorter length of nucleic acid needed to score a
SNP. For
= 15 example, STR-markers are generally several hundred base pairs in
length. A SNP, on the
= other hand, comprises a single nucleotide, and generally a short
conserved region on=
either side of the SNP position for primer and/or probe binding. This makes
SNPs more
amenable to typing in highly degraded or aged biological samples that are
frequently
encountered in forensic casework in which DNA may be fragmented into short
pieces.
SNPs also are not subject to microvariant and "off-ladder" alleles frequently
encountered when analyzing STR loci. Microvariants are deletions or insertions
within a
repeat unit that change the size of the amplified DNA product so that the
amplified
product does not migrate at the same rate as reference alleles with normal
sized repeat
units. When separated by size, such as by electrophoresis on a polyacrylamide
gel,
microvariants do not align with a reference allelic ladder of standard sized
repeat units,
but rather migrate between the reference alleles. The reference allelic ladder
is used for
precise sizing of alleles for allele classification; therefore alleles that do
not align with the
reference allelic ladder lead to substantial analysis problems. Furthermore,
when
analyzing multi-allelic repeat polymorphisms, occasionally an allele is found
that consists
of more or less repeat units than has been previously seen in the population,
or more or
less repeat alleles than are included in a reference allelic ladder. These
alleles will

CA 02887830 2015-08-28
CA 2887830
migrate outside the size range of known alleles in a reference allelic ladder,
and therefore are
referred to as "off-ladder" alleles. In extreme cases, the allele may contain
so few or so many
repeats that it migrates well out of the range of the reference allelic
ladder. In this situation, the
allele may not even be observed, or, with multiplex analysis, it may migrate
within or close to
the size range for another locus, further confounding analysis.
SNP analysis avoids the problems of microvariants and off-ladder alleles
encountered in
STR analysis. Importantly, microvariants and off-ladder alleles may provide
significant
problems, and may be completely missed, when using analysis methods such as
oligonucleotide
hybridization arrays, which utilize oligonucleotide probes specific for
certain known alleles.
Furthermore, off-ladder alleles and microvariants encountered with STR
analysis, even when
correctly typed, may lead to improper statistical analysis, since their
frequencies in the
population are generally unknown or poorly characterized, and therefore the
statistical
significance of a matching genotype may be questionable. All these advantages
of SNP
analysis arc considerable in light of the consequences of most DNA
identification cases, which
may lead to life imprisonment for an individual, or re-association of remains
to the family of a
deceased individual.
DNA can be isolated from biological samples such as blood, bone, hair, saliva,
or
semen, and compared with the DNA from a reference source at particular SNP
positions.
Multiple SNP markers can be assayed simultaneously in order to increase the
power of
discrimination and the statistical significance of a matching genotype. For
example,
oligonucleotide arrays can be used to genotype a large number of SNPs
simultaneously. The
SNPs provided by the present invention can be assayed in combination with
other polymorphic
genetic markers, such as other SNPs known in the art or STRs, in order to
identify an
individual or to associate an individual with a particular biological sample.
Furthermore, the SNPs disclosed herein can be genotyped for inclusion in a
database of
DNA genotypes, for example, a criminal DNA databank such as the FBI's Combined
DNA
Index System (CODIS) database. A genotype obtained from a biological sample of
unknown
source can then be queried against the database to find a matching genotype,
with the SNPs
disclosed herein providing nucleotide positions at which to compare the known
and unknown
DNA sequences for identity.
86

CA 02887830 2015-08-28
CA 2887830
Accordingly, the present disclosure is of a database comprising novel SNPs or
SNP alleles of
the present invention (e.g., the database can comprise information indicating
which alleles are
possessed by individual members of a population at one or more novel SNP sites
of the present
invention), such as for use in forensics, biometrics, or other human
identification applications.
Such a database typically comprises a computer-based system in which the SNPs
or SNP
alleles disclosed herein are recorded on a computer readable medium (see the
section of the
present specification entitled "Computer-Related Embodiments").
The SNPs disclosed herein can also be assayed for use in paternity testing.
The object of
paternity testing is usually to determine whether a male is the father of a
child. In most cases,
the mother of the child is known and thus, the mother's contribution to the
child's genotype can
be traced. Paternity testing investigates whether the part of the child's
genotype not attributable
to the mother is consistent with that of the putative father. Paternity
testing can be performed
by analyzing sets of polymorphisms in the putative father and the child, with
the SNPs
disclosed herein providing nucleotide positions at which to compare the
putative father's and
child's DNA sequences for identity. If the set of polymorphisms in the child
attributable to the
father does not match the set of polymorphisms of the putative father, it can
be concluded,
barring experimental error, that the putative father is not the father of the
child. If the set of
polymorphisms in the child attributable to the father match the set of
polymorphisms of the
putative father, a statistical calculation can be performed to determine the
probability of
coincidental match, and a conclusion drawn as to the likelihood that the
putative father is the
true biological father of the child.
In addition to paternity testing, SNPs are also useful for other types of
kinship testing,
such as for verifying familial relationships for immigration purposes, or for
cases in which an
individual alleges to be related to a deceased individual in order to claim an
inheritance from
the deceased individual, etc. For further information regarding the utility of
SNPs for paternity
testing and other types of kinship testing, including methods for statistical
analysis, see
Krawczak, Informativity assessment for biallelic single nucleotide
polymorphisms",
Electrophoresis 1999 Jun;20(8):1676-81.
87

CA 02887830 2015-08-28
CA 2887830
The use of the SNPs disclosed herein for human identification further extends
to various
authentication systems, commonly referred to as biometric systems, which
typically convert
physical characteristics of humans (or other organisms) into digital data.
Biometric systems
include various technological devices that measure such unique anatomical or
physiological
characteristics as finger, thumb, or palm prints; hand geometry; vein
patterning on the back of the
hand; blood vessel patterning of the retina and color and texture of the iris;
facial characteristics;
voice patterns; signature and typing dynamics; and DNA. Such physiological
measurements can
be used to verify identity and, for example, restrict or allow access based on
the identification.
Examples of applications for biometrics include physical area security,
computer and network
security, aircraft passenger check-in and boarding, financial transactions,
medical records access,
government benefit distribution, voting, law enforcement, passports, visas and
immigration,
prisons, various military applications, and for restricting access to
expensive or dangerous items,
such as automobiles or guns (see, for example, O'Connor, Stanford Technology
Law Review and
U.S. Patent No. 6,119,096).
Groups of SNPs, particularly the SNPs disclosed herein, can be typed to
uniquely identify
an individual for biometric applications such as those described above. Such
SNP typing can
readily be accomplished using, for example, DNA chips/arrays. Preferably, a
minimally invasive
means for obtaining a DNA sample is utilized. For example, PCR amplification
enables sufficient
quantities of DNA for analysis to be obtained from buccal swabs or
fingerprints, which contain
DNA-containing skin cells and oils that are naturally transferred during
contact.
Further information regarding techniques for using SNPs in forensic/human
identification
applications can be found in, for example, Current Protocols in Human
Genetics, John
Wiley & Sons, N.Y. (2002), 14.1-14.7.
VARIANT PROTEINS, ANTIBODIES,
VECTORS & HOST CELLS, & USES THEREOF
Variant Proteins Encoded by SNP-Containing Nucleic Acid Molecules
88

CA 02887830 2015-08-28
CA 2887830
The present disclosure is of SNP-containing nucleic acid molecules, many of
which
encode proteins having variant amino acid sequences as compared to the art-
known (i.e., wild-
type) proteins. Amino acid sequences encoded by the polymorphic nucleic acid
molecules of the
present invention are provided as SEQ ID NOS:15-28 in Table 1 and the Sequence
Listing. These
variants will generally be referred to herein as variant
proteins/peptides/polypeptides, or
polymorphic proteins/peptides/polypeptides. The terms "protein", "peptide",
and
"polypeptide" are used herein interchangeably.
A variant protein may be encoded by, for example, a nonsynonymous nucleotide
substitution at any one of the cSNP positions disclosed herein. In addition,
variant proteins
may also include proteins whose expression, structure, and/or function is
altered by a SNP
disclosed herein, such as a SNP that creates or destroys a stop codon, a SNP
that affects
splicing, and a SNP in control/regulatory elements, e.g. promoters, enhancers,
or transcription
factor binding domains.
As used herein, a protein or peptide is said to be "isolated" or "purified"
when it is
substantially free of cellular material or chemical precursors or other
chemicals. The variant
proteins can be purified to homogeneity or other lower degrees of purity. The
level of purification
will be based on the intended use. The key feature is that the preparation
allows for the desired
function of the variant protein, even if in the presence of considerable
amounts of other
components.
As used herein, "substantially free of cellular material" includes
preparations of the variant
protein having less than about 30% (by dry weight) other proteins (i.e.,
contaminating protein),
less than about 20% other proteins, less than about 10% other proteins, or
less than about 5% other
proteins. When the variant protein is recombinantly produced, it can also be
substantially free of
culture medium, i.e., culture medium represents less than about 20% of the
volume of the protein
preparation.
The language "substantially free of chemical precursors or other chemicals"
includes
preparations of the variant protein in which it is separated from chemical
precursors or other
chemicals that are involved in its synthesis. In one embodiment, the language
"substantially free
of chemical precursors or other chemicals" includes preparations of the
variant protein having less
than about 30% (by dry weight) chemical precursors or other chemicals, less
89

CA 02887830 2015-08-28
CA 2887830
than about 20% chemical precursors or other chemicals, less than about 10%
chemical precursors
or other chemicals, or less than about 5% chemical precursors or other
chemicals.
An isolated variant protein may be purified from cells that naturally express
it, purified
from cells that have been altered to express it (recombinant host cells), or
synthesized using
known protein synthesis methods. For example, a nucleic acid molecule
containing SNP(s)
encoding the variant protein can be cloned into an expression vector, the
expression vector
introduced into a host cell, and the variant protein expressed in the host
cell. The variant protein
can then be isolated from the cells by any appropriate purification scheme
using standard protein
purification techniques. Examples of these techniques are described in detail
below (Sambrook
and Russell, 2000, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor
Laboratory
Press, Cold Spring Harbor, NY).
The present disclosure is of isolated variant proteins that comprise, consist
of or consist
essentially of amino acid sequences that contain one or more variant amino
acids encoded by
one or more codons which contain a SNP of the present invention.
Accordingly, the present disclosure is of variant proteins that consist of
amino acid
sequences that contain one or more amino acid polymorphisms (or truncations or
extensions due
to creation or destruction of a stop codon, respectively) encoded by the SNPs
provided in Table 1
and/or Table 2. A protein consists of an amino acid sequence when the amino
acid sequence is the
entire amino acid sequence of the protein.
The present disclosure is further of variant proteins that consist essentially
of amino acid
sequences that contain one or more amino acid polymorphisms (or truncations or
extensions due
to creation or destruction of a stop codon, respectively) encoded by the SNPs
provided in Table 1
and/or Table 2. A protein consists essentially of an amino acid sequence when
such an amino acid
sequence is present with only a few additional amino acid residues in the
final protein.
The present disclosure is further of variant proteins that comprise amino acid
sequences
that contain one or more amino acid polymorphisms (or truncations or
extensions due to creation
or destruction of a stop codon, respectively) encoded by the SNPs provided in
Table 1 and/or
Table 2. A protein comprises an amino acid sequence when the amino acid
sequence is at least
part of the final amino acid sequence of the protein. In such a fashion, the
protein may contain
only the variant amino acid sequence or have additional amino acid residues,
such as a contiguous

CA 02887830 2015-08-28
CA 2887830
encoded sequence that is naturally associated with it or heterologous amino
acid residues. Such a
protein can have a few additional amino acid residues or can comprise many
more additional
amino acids. A brief description of how various types of these proteins can be
made and isolated
is provided below.
The variant proteins can be attached to heterologous sequences to form
chimeric or
fusion proteins. Such chimeric and fusion proteins comprise a variant protein
operatively
linked to a heterologous protein having an amino acid sequence not
substantially homologous
to the variant protein. "Operatively linked" indicates that the coding
sequences for the variant
protein and the heterologous protein are ligated in-frame. The heterologous
protein can be
fused to the N-terminus or C-terminus of the variant protein. In another
embodiment, the
fusion protein is encoded by a fusion polynucleotide that is synthesized by
conventional
techniques including automated DNA synthesizers. Alternatively, PCR
amplification of gene
fragments can be carried out using anchor primers which give rise to
complementary overhangs
between two consecutive gene fragments which can subsequently be annealed and
re-amplified
to generate a chimeric gene sequence (see Ausubel et al., Current Protocols in
Molecular
Biology, 1992). Moreover, many expression vectors are commercially available
that already
encode a fusion moiety (e.g., a GST protein). A variant protein-encoding
nucleic acid can be
cloned into such an expression vector such that the fusion moiety is linked in-
frame to the
variant protein.
In many uses, the fusion protein does not affect the activity of the variant
protein. The
fusion protein can include, but is not limited to, enzymatic fusion proteins,
for example, beta-
galactosidase fusions, yeast two-hybrid GAL fusions, poly-His fusions, MYC-
tagged, III-tagged
and Ig fusions. Such fusion proteins, particularly poly-His fusions, can
facilitate their purification
following recombinant expression. In certain host cells (e.g., mammalian host
cells), expression
and/or secretion of a protein can be increased by using a heterologous signal
sequence. Fusion
proteins are further described in, for example, Terpe. "Overview of tag
protein fusions: from
molecular and biochemical fundamentals to commercial systems", Appl Microbiol
Biotechnol.
2003 Jan;60(5):523-33. Epub 2002 Nov 07; Graddis et al., -Designing proteins
that work using
recombinant technologies", Curr Pharm Biotechnol. 2002 Dec;3(4):285-97; and
Nilsson et al.,
91

CA 02887830 2015-08-28
CA 2887830
"Affinity fusion strategies for detection, purification, and immobilization of
recombinant
proteins", Protein Expr Purtf 1997 Oct;11(1):1-16.
The present disclosure also relates to further obvious variants of the variant
polypeptides
described herein, such as naturally-occurring mature forms (e.g., alleleic
variants), non-naturally
occurring recombinantly-derived variants, and orthologs and paralogs of such
proteins that share
sequence homology. Such variants can readily be generated using art-known
techniques in the
fields of recombinant nucleic acid technology and protein biochemistry. It is
understood,
however, that variants exclude those known in the prior art before the present
invention.
Further variants of the variant polypeptides disclosed in Table 1 can comprise
an amino
acid sequence that shares at least 70-80%, 80-85%, 85-90%, 91%, 92%, 93%, 94%,
95%, 96%,
97%, 98%, or 99% sequence identity with an amino acid sequence disclosed in
Table 1 (or a
fragment thereof) and that includes a novel amino acid residue (allele)
disclosed in Table 1
(which is encoded by a novel SNP allele). Thus, an aspect of the present
disclosure that is
specifically contemplated are polypeptides that have a certain degree of
sequence variation
compared with the polypeptide sequences shown in Table 1, but that contain a
novel amino
acid residue (allele) encoded by a novel SNP allele disclosed herein. In other
words, as long as
a polypeptide contains a novel amino acid residue disclosed herein, other
portions of the
polypeptide that flank the novel amino acid residue can vary to some degree
from the
polypeptide sequences shown in Table 1.
Full-length pre-processed forms, as well as mature processed forms, of
proteins that
comprise one of the amino acid sequences disclosed herein can readily be
identified as having
complete sequence identity to one of the variant proteins described herein as
well as being
encoded by the same genetic locus as the variant proteins provided herein.
Orthologs of a variant peptide can readily be identified as having some degree
of
significant sequence homology/identity to at least a portion of a variant
peptide as well as being
encoded by a gene from another organism. Preferred orthologs will he isolated
from non-human
mammals, preferably primates, for the development of human therapeutic targets
and agents.
Such orthologs can be encoded by a nucleic acid sequence that hybridizes to a
variant peptide-
encoding nucleic acid molecule under moderate to stringent conditions
depending on the degree
of relatedness of the two organisms yielding the homologous proteins.
92

CA 02887830 2015-08-28
CA 2887830
Variant proteins include, but are not limited to, proteins containing
deletions, additions and
substitutions in the amino acid sequence caused by the SNPs disclosed herein.
One class of
substitutions is conserved amino acid substitutions in which a given amino
acid in a polypeptide is
substituted for another amino acid of like characteristics. Typical
conservative substitutions are
replacements, one for another, among the aliphatic amino acids Ala, Val, Leu,
and Ile; interchange
of the hydroxyl residues Ser and Thr; exchange of the acidic residues Asp and
Glu; substitution
between the amide residues Asn and Gin; exchange of the basic residues Lys and
Arg; and
replacements among the aromatic residues Phe and Tyr. Guidance concerning
which amino acid
changes are likely to he phenotypically silent are found in, for example,
Bowie et at., Science
247:1306-1310 (1990).
Variant proteins can be fully functional or can lack function in one or more
activities,
e.g. ability to bind another molecule, ability to catalyze a substrate,
ability to mediate signaling,
etc. Fully functional variants typically contain only conservative variations
or variations in
non-critical residues or in non-critical regions. Functional variants can also
contain substitution
of similar amino acids that result in no change or an insignificant change in
function.
Alternatively, such substitutions may positively or negatively affect function
to some degree.
Non-functional variants typically contain one or more non-conservative amino
acid
substitutions, deletions, insertions, inversions, truncations or extensions,
or a substitution,
insertion, inversion, or deletion of a critical residue or in a critical
region.
Amino acids that are essential for function of a protein can be identified by
methods
known in the art, such as site-directed mutagenesis or alanine-scanning
mutagenesis (Cunningham
et al., Science 244:1081-1085 (1989)), particularly using the amino acid
sequence and
polymorphism information provided in Table 1. The latter procedure introduces
single alanine
mutations at every residue in the molecule. The resulting mutant molecules are
then tested for
biological activity such as enzyme activity or in assays such as an in vitro
proliferative activity.
Sites that are critical for binding partner/substrate binding can also be
determined by structural
analysis such as crystallization, nuclear magnetic resonance or photoaffinity
labeling (Smith et al.,
J. MoL Biol. 224:899-904 (1992); de Vos et al. Science 255:306-312 (1992)).
Polypeptides can contain amino acids other than the 20 amino acids commonly
referred
to as the 20 naturally occurring amino acids. Further, many amino acids,
including the terminal
93

CA 02887830 2015-08-28
CA 2887830
amino acids, may be modified by natural processes, such as processing and
other post-
translational modifications, or by chemical modification techniques well known
in the art.
Accordingly, the variant proteins described herein also encompass derivatives
or analogs in
which a substituted amino acid residue is not one encoded by the genetic code,
in which a
substituent group is included, in which the mature polypeptide is fused with
another compound,
such as a compound to increase the half-life of the polypeptide (e.g.,
polyethylene glycol), or in
which additional amino acids are fused to the mature polypeptide, such as a
leader or secretory
sequence or a sequence for purification of the mature polypeptide or a pro-
protein sequence.
Known protein modifications include, but are not limited to, acetylation,
acylation, ADP-
ribosylation, amidation, covalent attachment of flavin, covalent attachment of
a heme moiety,
covalent attachment of a nucleotide or nucleotide derivative, covalent
attachment of a lipid or lipid
derivative, covalent attachment of phosphotidylinositol, cross-linking,
cyclization, disulfide bond
formation, demethylation, formation of covalent crosslinks, formation of
cystine, formation of
pyroglutamate, formylation, gamma carboxylation, glycosylation, GPI anchor
formation,
hydroxylation, iodination, methylation, myristoylation, oxidation, proteolytic
processing,
phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer-
RNA mediated
addition of amino acids to proteins such as arginylation, and ubiquitination.
Such protein modifications are well known to those of skill in the art and
have been
described in great detail in the scientific literature. Several particularly
common modifications,
glycosylation, lipid attachment, sulfation, gamma-carboxylation of glutamic
acid residues,
hydroxylation and ADP-ribosylation, for instance, are described in most basic
texts, such as
Proteins - Structure and Molecular Properties, 2nd Ed., T.E. Creighton, W. H.
Freeman and
Company, New York (1993); Wold, F., Posttranslational Covalent Modification of
Proteins, B.C.
Johnson, Ed., Academic Press, New York 1-12 (1983); Seifter et al., Meth.
Enzytnol. 182: 626-646
(1990); and Rattan et at., Ann. N.Y. Acad. Sci. 663:48-62 (1992).
The present disclosure is further of fragments of the variant proteins in
which the
fragments contain one or more amino acid sequence variations (e.g.,
substitutions, or truncations
or extensions due to creation or destruction of a stop codon) encoded by one
or more SNPs
disclosed herein. The fragments to which the disclosure pertains, however, are
not to be construed
as encompassing fragments that have been disclosed in the prior art before the
present invention.
94

CA 02887830 2015-08-28
CA 2887830
As used herein, a fragment may comprise at least about 4, 8, 10, 12, 14, 16,
18, 20, 25, 30,
50, 100 (or any other number in-between) or more contiguous amino acid
residues from a variant
protein, wherein at least one amino acid residue is affected by a SNP of the
present invention, e.g.,
a variant amino acid residue encoded by a nonsynonymous nucleotide
substitution at a cSNP
position. The variant amino acid encoded by a cSNP may occupy any residue
position along the
sequence of the fragment. Such fragments can be chosen based on the ability to
retain one or
more of the biological activities of the variant protein or the ability to
perform a function, e.g., act
as an immunogen. Particularly important fragments are biologically active
fragments. Such
fragments will typically comprise a domain or motif of a variant protein of
the present invention,
e.g., active site, transmembrane domain, or ligand/substrate binding domain.
Other fragments
include, but are not limited to, domain or motif-containing fragments, soluble
peptide fragments,
and fragments containing immunogenic structures. Predicted domains and
functional sites are
readily identifiable by computer programs well known to those of skill in the
art (e.g., PROS1TE
analysis) (Current Protocols in Protein Science, John Wiley & Sons, N.Y.
(2002)).
Uses of Variant Proteins
The variant proteins described herein can he used in a variety of ways,
including but not
limited to, in assays to determine the biological activity of a variant
protein, such as in a panel
of multiple proteins for high-throughput screening; to raise antibodies or to
elicit another type
of immune response; as a reagent (including the labeled reagent) in assays
designed to
quantitatively determine levels of the variant protein (or its binding
partner) in biological
fluids; as a marker for cells or tissues in which it is preferentially
expressed (either
constitutively or at a particular stage of tissue differentiation or
development or in a disease
state); as a target for screening for a therapeutic agent; and as a direct
therapeutic agent to be
administered into a human subject. Any of the variant proteins disclosed
herein may be
developed into reagent grade or kit format for commercialization as research
products.
Methods for performing the uses listed above are well known to those skilled
in the art (see,
e.g., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory
Press,
Sambrook and Russell, 2000, and Methods in Enzymology: Guide to Molecular
Cloning
Techniques, Academic Press, Berger, S. L. and A. R. Kimmel eds., 1987).

CA 02887830 2015-08-28
CA 2887830
In a specific embodiment, the methods described herein include detection of
one or
more variant proteins disclosed herein. Variant proteins are disclosed in
Table 1 and in the
Sequence Listing as SEQ ID NOS: 15-28. Detection of such proteins can be
accomplished
using, for example, antibodies, small molecule compounds, aptamers,
ligands/substrates, other
proteins or protein fragments, or other protein-binding agents. Preferably,
protein detection
agents are specific for a variant protein as described herein and can
therefore discriminate
between a variant protein and the wild-type protein or another variant form.
This can generally
be accomplished by, for example, selecting or designing detection agents that
bind to the region
of a protein that differs between the variant and wild-type protein, such as a
region of a protein
that contains one or more amino acid substitutions that is/are encoded by a
non-synonymous
cSNP as described herein, or a region of a protein that follows a nonsense
mutation-type SNP
that creates a stop codon thereby leading to a shorter polypeptide, or a
region of a protein that
follows a read-through mutation-type SNP that destroys a stop codon thereby
leading to a
longer polypeptide in which a portion of the polypeptide is present in one
version of the
polypeptide but not the other.
In another specific aspect of the disclosure, the variant proteins described
herein are used
as targets for diagnosing liver fibrosis or for determining predisposition to
liver fibrosis in a
human. Accordingly, the disclosure is of methods for detecting the presence
of, or levels of, one
or more variant proteins described herein in a cell, tissue, or organism. Such
methods typically
involve contacting a test sample with an agent (e.g., an antibody, small
molecule compound, or
peptide) capable of interacting with the variant protein such that specific
binding of the agent to
the variant protein can be detected. Such an assay can be provided in a single
detection format or
a multi-detection format such as an array, for example, an antibody or aptamer
array (arrays for
protein detection may also be referred to as "protein chips"). The variant
protein of interest can be
isolated from a test sample and assayed for the presence of a variant amino
acid sequence encoded
by one or more SNPs disclosed herein. The SNPs may cause changes to the
protein and the
corresponding protein function/activity, such as through non-synonymous
substitutions in protein
coding regions that can lead to amino acid substitutions, deletions,
insertions, and/or
rearrangements; formation or destruction of stop codons; or alteration of
control elements such as
promoters. SNPs may also cause inappropriate post-translational modifications.
96

CA 02887830 2015-08-28
CA 2887830
One preferred agent for detecting a variant protein in a sample is an antibody
capable of
selectively binding to a variant form of the protein (antibodies are described
in greater detail in the
next section). Such samples include, for example, tissues, cells, and
biological fluids isolated from
a subject, as well as tissues, cells and fluids present within a subject.
In vitro methods for detection of the variant proteins associated with liver
fibrosis that are
disclosed herein and fragments thereof include, but are not limited to, enzyme
linked
immunosorbent assays (ELISAs), radioimmunoassays (MA), Western blots,
immunoprecipitations, immunofluorescence, and protein arrays/chips (e.g.,
arrays of antibodies or
aptamers). For further information regarding immunoassays and related protein
detection
methods, see Current Protocols in Immunology, John Wiley & Sons, N.Y., and
Hage,
"Immunoassays", Anal Chem. 1999 Jun 15;71(12):294R-304R.
Additional analytic methods of detecting amino acid variants include, but are
not limited
to, altered electrophoretic mobility, altered tryptic peptide digest, altered
protein activity in cell-
based or cell-free assay, alteration in ligand or antibody-binding pattern,
altered isoelectric point,
and direct amino acid sequencing.
Alternatively, variant proteins can be detected in vivo in a subject by
introducing into the
subject a labeled antibody (or other type of detection reagent) specific for a
variant protein. For
example, the antibody can be labeled with a radioactive marker whose presence
and location in a
subject can be detected by standard imaging techniques.
Other uses of the variant peptides as described herein are based on the class
or action of
the protein. For example, proteins isolated from humans and their mammalian
orthologs serve
as targets for identifying agents (e.g., small molecule drugs or antibodies)
for use in therapeutic
applications, particularly for modulating a biological or pathological
response in a cell or tissue
that expresses the protein. Pharmaceutical agents can be developed that
modulate protein
activity.
As an alternative to modulating gene expression, therapeutic compounds can be
developed
that modulate protein function. For example, many SNPs disclosed herein affect
the amino acid
sequence of the encoded protein (e.g., non-synonymous cSNPs and nonsense
mutation-type
SNPs). Such alterations in the encoded amino acid sequence may affect protein
function,
particularly if such amino acid sequence variations occur in functional
protein domains, such as
97

CA 02887830 2015-08-28
CA 2887830
catalytic domains, ATP-binding domains, or ligand/substrate binding domains.
It is well
established in the art that variant proteins having amino acid sequence
variations in functional
domains can cause or influence pathological conditions. In such instances,
compounds (e.g., small
molecule drugs or antibodies) can be developed that target the variant protein
and modulate (e.g.,
up- or down-regulate) protein function/activity.
The therapeutic methods of the present disclosure further include methods that
target
one or more variant proteins as described herein. Variant proteins can be
targeted using, for
example, small molecule compounds, antibodies, aptamers, ligands/substrates,
other proteins,
or other protein-binding agents. Additionally, the skilled artisan will
recognize that the novel
protein variants (and polymorphic nucleic acid molecules) disclosed in Table 1
may themselves
be directly used as therapeutic agents by acting as competitive inhibitors of
corresponding art-
known proteins (or nucleic acid molecules such as mRNA molecules).
The variant proteins described herein are particularly useful in drug
screening assays, in
cell-based or cell-free systems. Cell-based systems can utilize cells that
naturally express the
protein, a biopsy specimen, or cell cultures. In one embodiment, cell-based
assays involve
recombinant host cells expressing the variant protein. Cell-free assays can be
used to detect the
ability of a compound to directly bind to a variant protein or to the
corresponding SNP-containing
nucleic acid fragment that encodes the variant protein.
A variant protein as described herein, as well as appropriate fragments
thereof, can be used
in high-throughput screening assays to test candidate compounds for the
ability to bind and/or
modulate the activity of the variant protein. These candidate compounds can be
further screened
against a protein having normal function (e.g., a wild-type/non-variant
protein) to further
determine the effect of the compound on the protein activity. Furthermore,
these compounds can
be tested in animal or invertebrate systems to determine in vivo
activity/effectiveness. Compounds
can be identified that activate (agonists) or inactivate (antagonists) the
variant protein, and
different compounds can be identified that cause various degrees of activation
or inactivation of
the variant protein.
Further, the variant proteins can be used to screen a compound for the ability
to stimulate
or inhibit interaction between the variant protein and a target molecule that
normally interacts with
the protein. The target can be a ligand, a substrate or a binding partner that
the protein normally
98

CA 02887830 2015-08-28
CA 2887830
interacts with (for example, epinephrine or norepinephrine). Such assays
typically include the
steps of combining the variant protein with a candidate compound under
conditions that allow the
variant protein, or fragment thereof, to interact with the target molecule,
and to detect the
formation of a complex between the protein and the target or to detect the
biochemical
consequence of the interaction with the variant protein and the target, such
as any of the associated
effects of signal transduction.
Candidate compounds include, for example, 1) peptides such as soluble
peptides, including
Ig-tailed fusion peptides and members of random peptide libraries (see, e.g.,
Lam et al., Nature
354:82-84 (1991); Houghten et al., Nature 354:84-86 (1991)) and combinatorial
chemistry-
derived molecular libraries made of D- and/or L- configuration amino acids; 2)
phosphopeptides
(e.g., members of random and partially degenerate, directed phosphopeptide
libraries, see, e.g.,
Songyang et al., Cell 72:767-778 (1993)); 3) antibodies (e.g., polyclonal,
monoclonal, humanized,
anti-idiotypic, chimeric, and single chain antibodies as well as Fab, F(ab')2,
Fab expression library
fragments, and epitope-binding fragments of antibodies); and 4) small organic
and inorganic
molecules (e.g., molecules obtained from combinatorial and natural product
libraries).
One candidate compound is a soluble fragment of the variant protein that
competes for
ligand binding. Other candidate compounds include mutant proteins or
appropriate fragments
containing mutations that affect variant protein function and thus compete for
ligand.
Accordingly, a fragment that competes for ligand, for example with a higher
affinity, or a
fragment that binds ligand but does not allow release, is encompassed by the
disclosure.
The disclosure is further of other end point assays to identify compounds that
modulate
(stimulate or inhibit) variant protein activity. The assays typically involve
an assay of events in
the signal transduction pathway that indicate protein activity. Thus, the
expression of genes that
are up or down-regulated in response to the variant protein dependent signal
cascade can be
assayed. In one embodiment, the regulatory region of such genes can be
operably linked to a
marker that is easily detectable, such as luciferase. Alternatively,
phosphorylation of the variant
protein, or a variant protein target, could also be measured. Any of the
biological or biochemical
functions mediated by the variant protein can be used as an endpoint assay.
These include all of
the biochemical or biological events described herein, in the references cited
herein, incorporated
99

CA 02887830 2015-08-28
CA 2887830
by reference for these endpoint assay targets, and other functions known to
those of ordinary skill
in the art.
Binding and/or activating compounds can also be screened by using chimeric
variant
proteins in which an amino terminal extracellular domain or parts thereof, an
entire
transmembrane domain or subregions, and/or the carboxyl terminal intracellular
domain or parts
thereof, can be replaced by heterologous domains or subregions. For example, a
substrate-binding
region can be used that interacts with a different substrate than that which
is normally recognized
by a variant protein. Accordingly, a different set of signal transduction
components is available as
an end-point assay for activation. This allows for assays to be performed in
other than the specific
host cell from which the variant protein is derived.
100

CA 02887830 2014-11-27
WO 2005/111241
PCT/US2005/016051
= The variant proteins are also useful in competition binding assays in
methods
11
designed to discover compounds that interact with the variant protein. Thus, a
compound
= can be exposed to a variant protein under conditions that allow the
compound to bind or to =
otherwise interact with the variant protein. A binding partner, such as
ligand, that normally
interacts with the variant protein is also added to the mixture. If the test
compound interacts
with the variant protein or its binding partner, it decreases the amount of
complex formed or
= activity from the variant protein. This type of assay is particularly
useful in screening for
compounds that interact with specific regions. of the variant protein
(Hodgson,
Bio / technology, 1992, Sept 10(9), 973-80).
10' To perform
cell-free drug screening assays, it is sometimes desirable to immobilize
either the variant protein or a fragment thereof, or its target molecule, to
facilitate separation
of complexes from uncomplexed forms of one or both of the proteins, as well as
to
= accommodate automation of the assay. Any method for immobilizing proteins
on matrices
= can be used in drug screening assays. In one embodiment, a fusion protein
containing an
added domain allows the protein to be bound to a matrix. For
example,=glutathione-S-
' transferase/125I fusion proteins can be adsorbed onto glutathione
sepharose beads (Sigma
= =
Chemical, St. 'Allis, MO) or glutathione derivatized microtitre plates, which
are then
combined with the cell lysates 35S-
labeled) and a candidate compound, such as a drug
' candidate, and the mixture incubated under conditions conducive to complex
formation
(e.g., at physiological conditions for salt and pH). Following incubation, the
beads=can be =
=
= washed to remove any unbound label, and the matrix immobilized and
radiolabel
determined directly, or in the supernatant after the complexes are
dissociated. Alternatively,
the complexes can be dissociated from the matrix, separated by SDS-PAGE, and
the level of
= bound material found in the bead fraction quantitated from the gel using
standard
electrophoretic techniques.
Either the variant protein or its target molecule can be immobilized utilizing

conjugation of biotin and streptavidin. Alternatively, antibodies reactive
with the valiant
protein but which do not interfere with binding of the variant protein to its
target molecule
can be detivatized to the wells of the plate, and the variant protein trapped
in the wells by
antibody conjugation. Preparations of the target molecule and a candidate
compound are
incubated in the variant protein-presenting wells and the amount of complex
trapped in the
101

CA 02887830 2015-08-28
CA 2887830
well can be quantitated. Methods for detecting such complexes, in addition to
those described
above for the GST-immobilized complexes, include immunodetection of complexes
using
antibodies reactive with the protein target molecule, or which are reactive
with variant protein and
compete with the target molecule, and enzyme-linked assays that rely on
detecting an enzymatic
activity associated with the target molecule.
Modulators of variant protein activity identified according to these drug
screening
assays can be used to treat a subject with a disorder mediated by the protein
pathway, such as
liver fibrosis. These methods of treatment typically include the steps of
administering the
modulators of protein activity in a pharmaceutical composition to a subject in
need of such
treatment.
The variant proteins, or fragments thereof, disclosed herein can themselves be
directly
used to treat a disorder characterized by all absence of, inappropriate, or
unwanted expression or
activity of the variant protein. Accordingly, methods for treatment include
the use of a variant
protein disclosed herein or fragments thereof.
In yet another aspect of the disclosure, variant proteins can be used as "bait
proteins" in
a two-hybrid assay or three-hybrid assay (see, e.g., U.S. Patent No.
5,283,317; Zervos et al.
(1993) Cell 72:223-232; Madura et al. (1993) J. Biol. Chem. 268:12046-12054;
Bartel et al.
(1993) Biotechniques 14:920-924; Iwabuchi et al. (1993) Oneogene 8:1693-1696;
and Brent
W094/10300) to identify other proteins that bind to or interact with the
variant protein and are
involved in variant protein activity. Such variant protein-binding proteins
are also likely to be
involved in the propagation of signals by the variant proteins or variant
protein targets as, for
example, elements of a protein-mediated signaling pathway. Alternatively, such
variant
protein-binding proteins are inhibitors of the variant protein.
The two-hybrid system is based on the modular nature of most transcription
factors,
which typically consist of separable DNA-binding and activation domains.
Briefly, the assay
typically utilizes two different DNA constructs. In one construct, the gene
that codes for a
variant protein is fused to a gene encoding the DNA binding domain of a known
transcription
factor (e.g., GAL-4). In the other construct, a DNA sequence, from a library
of DNA
sequences, that encodes an unidentified protein ("prey" or "sample") is fused
to a gene that
codes for the activation domain of the known transcription factor. If the
"bait" and the "prey"
102

CA 02887830 2015-08-28
CA 2887830
proteins are able to interact, in vivo, forming a variant protein-dependent
complex, the DNA-
binding and activation domains of the transcription factor are brought into
close proximity.
This proximity allows transcription of a reporter gene (e.g., LacZ) that is
operably linked to a
transcriptional regulatory site responsive to the transcription factor.
Expression of the reporter
gene can be detected, and cell colonies containing the functional
transcription factor can be
isolated and used to obtain the cloned gene that encodes the protein that
interacts with the
variant protein.
Antibodies Directed to Variant Proteins
The present disclosure is also of antibodies that selectively bind to the
variant proteins
disclosed herein and fragments thereof. Such antibodies may be used to
quantitatively or
qualitatively detect the variant proteins described herein. As used herein, an
antibody selectively
binds a target variant protein when it binds the variant protein and does not
significantly bind to
non-variant proteins, i.e., the antibody does not significantly bind to
normal, wild-type, or art-
known proteins that do not contain a variant amino acid sequence due to one or
more SNPs
disclosed herein (variant amino acid sequences may be due to, for example,
nonsynonymous
cSNPs, nonsense SNPs that create a stop codon, thereby causing a truncation of
a polypeptide or
SNPs that cause read-through mutations resulting in an extension of a
polypeptide).
As used herein, an antibody is defined in terms consistent with that
recognized in the art:
they are multi-subunit proteins produced by an organism in response to an
antigen challenge. The
antibodies described herein include both monoclonal antibodies and polyclonal
antibodies, as well
as antigen-reactive proteolytic fragments of such antibodies, such as Fab,
F(ab)'z, and Fy
fragments. In addition, an antibody further includes any of a variety of
engineered antigen-
binding molecules such as a chimeric antibody (U.S. Patent Nos. 4,816,567 and
4,816,397;
Morrison et aL, Proc. Natl. Acad. ScL USA, 81:6851, 1984; Neuberger et al.,
Nature 312:604,
1984), a humanized antibody (U.S. Patent Nos. 5,693,762; 5,585,089; and
5,565,332), a single-
chain Fv (U.S. Patent No. 4,946,778; Ward et al., Nature 334:544, 1989), a
bispecific antibody
with two binding specificities (Segal et aL, J. IntmunoL Methods 248:1, 2001;
Carter, J. ItnnutnoL
Methods 248:7, 2001), a diabody, a triabody, and a tetrabocly (Todorovska et
aL, J. ImmunoL
Methods, 248:47, 2001), as well as a Fab conjugate (dimer or trimer), and a
minibody.
103

CA 02887830 2015-08-28
CA 2887830
Many methods are known in the art for generating and/or identifying antibodies
to a given
target antigen (Harlow, Antibodies, Cold Spring Harbor Press, (1989)). In
general, an isolated
peptide (e.g., a variant protein as described herein) is used as an immunogen
and is administered to
a mammalian organism, such as a rat, rabbit, hamster or mouse. Either a full-
length protein, an
antigenic peptide fragment (e.g., a peptide fragment containing a region that
varies between a
variant protein and a corresponding wild-type protein), or a fusion protein
can be used. A protein
used as an immunogen may be naturally-occurring, synthetic or recombinantly
produced, and may
be administered in combination with an adjuvant, including but not limited to.
Freund's (complete
and incomplete), mineral gels such as aluminum hydroxide, surface active
substance such as
lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole
limpet hemocyanin,
dinitrophenol, and the like.
Monoclonal antibodies can be produced by hybricloma technology (Kohler and
Milstein, Nature, 256:495, 1975), which immortalizes cells secreting a
specific monoclonal
antibody. The immortalized cell lines can be created in vitro by fusing two
different cell types,
typically lymphocytes, and tumor cells. The hybridoma cells may be cultivated
in vitro or in
vivo. Additionally, fully human antibodies can be generated by transgenic
animals (He et al., J.
Immunol., 169:595, 2002). Fd phage and Fd phagemid technologies may be used to
generate
and select recombinant antibodies in vitro (Hoogenboom and Chames, Inununol.
Today 21:371,
2000; Liu etal., J. Mol. Biol. 315:1063, 2002). The complemcntarity-
determining regions of
an antibody can be identified, and synthetic peptides corresponding to such
regions may be
used to mediate antigen binding (U.S. Patent No. 5,637,677).
Antibodies are preferably prepared against regions or discrete fragments of a
variant
protein containing a variant amino acid sequence as compared to the
corresponding wild-type
protein (e.g., a region of a variant protein that includes an amino acid
encoded by a
nonsynonymous cSNP, a region affected by truncation caused by a nonsense SNP
that creates a
stop codon, or a region resulting from the destruction of a stop codon due to
read-through
mutation caused by a SNP). Furthermore, preferred regions will include those
involved in
function/activity and/or protein/binding partner interaction. Such fragments
can be selected on a
physical property, such as fragments corresponding to regions that are located
on the surface of the
protein, e.g., hydrophilic regions, or can be selected based on sequence
uniqueness, or based on
104

CA 02887830 2015-08-28
CA 2887830
the position of the variant amino acid residue(s) encoded by the SNPs
disclosed herein. An
antigenic fragment will typically comprise at least about 8-10 contiguous
amino acid.residues in
which at least one of the amino acid residues is an amino acid affected by a
SNP disclosed herein.
The antigenic peptide can comprise, however, at least 12, 14, 16, 20, 25, 50,
100 (or any other
number in-between) or more amino acid residues, provided that at least one
amino acid is affected
by a SNP disclosed herein.
Detection of an antibody described herein can be facilitated by coupling
(i.e., physically
linking) the antibody or an antigen-reactive fragment thereof to a detectable
substance. Detectable
substances include, but are not limited to, various enzymes, prosthetic
groups, fluorescent
materials, luminescent materials, bioluminescent materials, and radioactive
materials. Examples
of suitable enzymes include horseradish peroxidase, alkaline phosphatase, (3-
galactosidase, or
acetylcholinesterase; examples of suitable prosthetic group complexes include
streptavidin/biotin
and avidin/biotin; examples of suitable fluorescent materials include
umbelliferone, fluorescein,
fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein,
dansyl chloride or
phycoerythrin; an example of a luminescent material includes luminol; examples
of
bioluminescent materials include luciferase, luciferin, and aequorin, and
examples of suitable
radioactive material include 1251, 1311, 3.5s or 3H.
Antibodies, particularly the use of antibodies as therapeutic agents, are
reviewed in:
Morgan, "Antibody therapy for Alzheimer's disease", Expert Rev Vaccines. 2003
Feb;2(1):53-9;
Ross et al., "Anticancer antibodies", Am Clin Pathol. 2003 Apr;119(4):472-85;
Goldenberg,
"Advancing role of radiolabeled antibodies in the therapy of cancer", Cancer
Immunol
Inununother. 2003 May;52(5):281-96. Epub 2003 Mar 11; Ross et al., "Antibody-
based
therapeutics in oncology", Expert Rev Anticancer Ther. 2003 Feb;3(1):107-21;
Cao et al.,
"Bispecific antibody conjugates in therapeutics", Adv Drug Deliv Rev. 2003 Feb
10;55(2):171-97;
von Mehren et al., "Monoclonal antibody therapy for cancer", Antzu Rev Med.
2003;54:343-69.
Epub 2001 Dec 03; Hudson et al., "Engineered antibodies", Nat Med. 2003
Jan;9(1):129-34;
Brekke et al., "Therapeutic antibodies for human diseases at the dawn of the
twenty-first century",
Nat Rev Drug Discov. 2003 Jan;2(1):52-62 (Erratum in: Nat Rev Drug Discov.
2003
Mar;2(3):240); Houdebine, -Antibody manufacture in transgenic animals and
comparisons with
other systems", Curr Opin Biotechnol. 2002 Dec;13(6):625-9; Andreakos et al.,
"Monoclonal
105

CA 02887830 2015-08-28
CA 2887830
antibodies in immune and inflammatory diseases", Curr Opin Biotechnol. 2002
Dec;13(6):615-20;
Kellermann et al., "Antibody discovery: the use of transgenic mice to generate
human monoclonal
antibodies for therapeutics", Curr Opin Biotechtzol. 2002 Dec;13(6):593-7;
Pini et al., "Phage
display and colony filter screening for high-throughput selection of antibody
libraries", Comb
Chem High Throughput Screen. 2002 Nov;5(7):503-10; Batra et al.,
"Pharmacokinetics and
biodistribution of genetically engineered antibodies", Carr Opin Biotechtzol.
2002 Dec;13(6):603-
8; and Tangri et al., "Rationally engineered proteins or antibodies with
absent or reduced
immunogenicity", Curr Med Chem. 2002 Dec;9(24):2191-9.
Uses of Antibodies
Antibodies can be used to isolate the variant proteins described herein from a
natural cell
source or from recombinant host cells by standard techniques, such as affinity
chromatography or
immunoprecipitation. In addition, antibodies are useful for detecting the
presence of a variant
protein described herein in cells or tissues to determine the pattern of
expression of the variant
protein among various tissues in an organism and over the course of normal
development or
disease progression. Further, antibodies can be used to detect variant protein
in situ, in vitro, in a
bodily fluid, or in a cell lysate or supernatant in order to evaluate the
amount and pattern of
expression. Also, antibodies can be used to assess abnormal tissue
distribution, abnormal
expression during development, or expression in an abnormal condition, such as
liver fibrosis.
Additionally, antibody detection of circulating fragments of the full-length
variant protein can be
used to identify turnover.
Antibodies to the variant proteins described herein are also useful in
pharmacogenomic
analysis. Thus, antibodies against variant proteins encoded by alternative SNP
alleles can be used
to identify individuals that require modified treatment modalities.
Further, antibodies can be used to assess expression of the variant protein in
disease states
such as in active stages of the disease or in an individual with a
predisposition to a disease related
to the protein's function, particularly liver fibrosis. Antibodies specific
for a variant protein
encoded by a SNP-containing nucleic acid molecule described herein can be used
to assay for the
presence of the variant protein, such as to screen for predisposition to liver
fibrosis as indicated by
the presence of the variant protein.
106

CA 02887830 2015-08-28
CA 2887830
Antibodies are also useful as diagnostic tools for evaluating the variant
proteins in
conjunction with analysis by electrophoretic mobility, isoelectric point,
tryptic peptide digest, and
other physical assays well known in the art.
Antibodies are also useful for tissue typing. Thus, where a specific variant
protein has
been correlated with expression in a specific tissue, antibodies that are
specific for this protein can
be used to identify a tissue type.
Antibodies can also be used to assess aberrant subcellular localization of a
variant protein
in cells in various tissues. The diagnostic uses can be applied, not only in
genetic testing, but also
in monitoring a treatment modality. Accordingly, where treatment is ultimately
aimed at
correcting the expression level or the presence of variant protein or aberrant
tissue distribution or
developmental expression of a variant protein, antibodies directed against the
variant protein or
relevant fragments can be used to monitor therapeutic efficacy.
The antibodies are also useful for inhibiting variant protein function, for
example, by
blocking the binding of a variant protein to a binding partner. These uses can
also be applied in a
therapeutic context in which treatment involves inhibiting a variant protein's
function. An
antibody can be used, for example, to block or competitively inhibit binding,
thus modulating
(agonizing or antagonizing) the activity of a variant protein. Antibodies can
be prepared against
specific variant protein fragments containing sites required for function or
against an intact variant
protein that is associated with a cell or cell membrane. For in vivo
administration, an antibody may
be linked with an additional therapeutic payload such as a radionuclide, an
enzyme, an
immunogenic epitope, or a cytotoxic agent. Suitable cytotoxic agents include,
but are not limited
to, bacterial toxin such as diphtheria, and plant toxin such as ricin. The in
vivo half-life of an
antibody or a fragment thereof may be lengthened by pegylation through
conjugation to
polyethylene glycol (Leong et al., Cytokine 16:106, 2001).
The disclosure is also of kits for using antibodies, such as kits for
detecting the presence of
a variant protein in a test sample. An exemplary kit can comprise antibodies
such as a labeled or
labelable antibody and a compound or agent for detecting variant proteins in a
biological sample;
means for determining the amount, or presence/absence of variant protein in
the sample; means for
comparing the amount of variant protein in the sample with a standard; and
instructions for use.
107

CA 02887830 2015-08-28
CA 2887830
Vectors and Host Cells
The present disclosure is also of vectors containing the SNP-containing
nucleic acid
molecules described herein. The term "vector" refers to a vehicle, preferably
a nucleic acid
molecule, which can transport a SNP-containing nucleic acid molecule. When the
vector is a
nucleic acid molecule, the SNP-containing nucleic acid molecule can be
covalently linked to the
vector nucleic acid. Such vectors include, but are not limited to, a plasmid,
single or double
stranded phage, a single or double stranded RNA or DNA viral vector, or
artificial chromosome,
such as a BAC, PAC, YAC, or MAC.
A vector can be maintained in a host cell as an extrachromosomal element where
it
replicates and produces additional copies of the SNP-containing nucleic acid
molecules.
Alternatively, the vector may integrate into the host cell genome and produce
additional copies of
the SNP-containing nucleic acid molecules when the host cell replicates.
The disclosure is of vectors for the maintenance (cloning vectors) or vectors
for expression
(expression vectors) of the SNP-containing nucleic acid molecules. The vectors
can function in
prokaryotic or eukaryotic cells or in both (shuttle vectors).
Expression vectors typically contain cis-acting regulatory regions that are
operably linked
in the vector to the SNP-containing nucleic acid molecules such that
transcription of the SNP-
containing nucleic acid molecules is allowed in a host cell. The SNP-
containing
108

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
inucleic acid molecules can also be introduced into the host cell with a
separate nucleic acid
molecule capable of affecting transcription. Thus, the second nucleic acid
molecule may .
=
provide a trans-acting factor interacting with the cis-regulatory control
region to allow
transcription of the SNP-containing nucleic acid molecules from the vector.
Alternatively, a
trans-acting factor may. be supplied by the host cell. Finally, a trans-acting
factor can be
produced from the vector itself. It is understood, however, that in some
embodiments,
transcription and/or translation of the nucleic acid molecules can occur in a
cell-free system.
The regulatory sequences to which the SNP-containing nucleic acid molecules.
described herein can be operably linked include promoters for directing mRNA
transcription. These include, but are not limited to, the left promoter from
bacteriophage
the lac, TRP, and TAC promoters from E. coil, the early and late promoters
from SV40, the '
CMV immediate early promoter, the adenovirus early and late promoters, and
retrovirus
long-terminal repeats.
In addition to control regions that promote transcription, expression vectors
may also
include regions that modulate transcription, such as repressor binding sites
and enhancers.
Examples include the SV40 enhancer, the cytomegalovirus immediate early
enhancer,
= polyoma enhancer, adenovirus enhancers, and retrovims LTR enhancers.
In addition to containing sites for transcription initiation and control,
expression
= vectors can also contain sequences necessary for transcription
termination and, in the
transcribed region, a ribosome-binding site for translation. Other regulatory
control
elements for expression include initiation and termination codons as well as
polyadenylation
= signals. A person of ordinary skill in the art would be aware of the
numerous regulatory
sequences that are useful in expression vectors (see, e.g., Sambrook and
Russell, 2000,
Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press,
Cold
Spring Harbor, NY).
A variety of expression vectors can be used to express a SNP-containing
nucleic
acid molecule. Such vectors include chromosomal, episomal, and virus-derived
vectors, for
example, vectors derived from bacterial plasmids, from bacteriophage, from
yeast episomes,
from yeast chromosomal elements, including yeast artificial chromosomes, from
viruses
such as baculoviruses, papovavimses such as SV40, Vaccinia viruses,
adenoviruses,
poxviruses, pseudorabies viruses, and retroviruses. Vectors can also be
derived from
109

CA 02887830 2015-08-28
CA 2887830
combinations of these sources such as those derived from plasmid and
bacteriophage genetic
elements, e.g., cosmids and phagemids. Appropriate cloning and expression
vectors for
prokaryotic and eukaryotic hosts are described in Sambrook and Russell, 2000,
Molecular
Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring
Harbor, NY.
The regulatory sequence in a vector may provide constitutive expression in one
or more
host cells (e.g., tissue specific expression) or may provide for inducible
expression in one or more
cell types such as by temperature, nutrient additive, or exogenous factor,
e.g., a hormone or other
ligand. A variety of vectors that provide constitutive or inducible expression
of a nucleic acid
sequence in prokaryotic and eukaryotic host cells are well known to those of
ordinary skill in the
art.
A SNP-containing nucleic acid molecule can be inserted into the vector by
methodology
well-known in the art. Generally, the SNP-containing nucleic acid molecule
that will ultimately
be expressed is joined to an expression vector by cleaving the SNP-containing
nucleic acid
molecule and the expression vector with one or more restriction enzymes and
then ligating the
fragments together. Procedures for restriction enzyme digestion and ligation
are well known to
those of ordinary skill in the art.
The vector containing the appropriate nucleic acid molecule can be introduced
into an
appropriate host cell for propagation or expression using well-known
techniques. Bacterial host
cells include, but are not limited to, E. coli, Streptomyces, and Salmonella
typhimurium.
Eukaryotic host cells include, but are not limited to, yeast, insect cells
such as Drosophila, animal
cells such as COS and CHO cells, and plant cells.
As described herein, it may be desirable to express the variant peptide as a
fusion protein.
Accordingly, the disclosure is of fusion vectors that allow for the production
of the variant
peptides. Fusion vectors can, for example, increase the expression of a
recombinant protein,
increase the solubility of the recombinant protein, and aid in the
purification of the protein by
acting, for example, as a ligand for affinity purification. A proteolytic
cleavage site may be
introduced at the junction of the fusion moiety so that the desired variant
peptide can ultimately be
separated from the fusion moiety. Proteolytic enzymes suitable for such use
include, but are not
limited to, factor Xa, thrombin, and enterokinase. Typical fusion expression
vectors include
pGEX (Smith et al., Gene 67:31-40 (1988)), pMAL (New England Biolabs, Beverly,
MA) and
110

CA 02887830 2015-08-28
CA 2887830
pRIT5 (Pharmacia, Piscataway, NJ) which fuse glutathione S-transferase (GST),
maltose E
binding protein, or protein A, respectively, to the target recombinant
protein. Examples of suitable
inducible non-fusion E. coli expression vectors include pTrc (Amann et at.,
Gene 69:301-315
(1988)) and pET lid (Studier at at., Gene Expression Technology: Methods in
Enzymology
185:60-89 (1990)).
Recombinant protein expression can be maximized in a bacterial host by
providing a
genetic background wherein the host cell has an impaired capacity to
proteolytically cleave the
recombinant protein (Gottesman, S., Gene Expression Technology: Methods in
Enzymology 185,
Academic Press, San Diego, California (1990) 119-128). Alternatively, the
sequence of the SNP-
containing nucleic acid molecule of interest can be altered to provide
preferential codon usage for
a specific host cell, for example, E. coli (Wada et at., Nucleic Acids Res.
20:2111-2118 (1992)).
The SNP-containing nucleic acid molecules can also be expressed by expression
vectors
that are operative in yeast. Examples of vectors for expression in yeast
(e.g., S. cerevisiae) include
pYepSecl (Baldari, et al., EMBO J. 6:229-234 (1987)), pMFa (Kurjan et at.,
Cell 30:933-
943(1982)), pJRY88 (Schultz et al., Gene 54:113-123 (1987)), and pYES2
(Invitrogen
Corporation, San Diego, CA).
The SNP-containing nucleic acid molecules can also be expressed in insect
cells using, for
example, baculovirus expression vectors. Baculovirus vectors available for
expression of proteins
in cultured insect cells (e.g., Sf 9 cells) include the pAc series (Smith et
al., Mol. Cell Biol. 3:2156-
2165 (1983)) and the pVL series (Lucklow at al., Virology 170:31-39 (1989)).
In certain embodiments, the SNP-containing nucleic acid molecules described
herein are
expressed in mammalian cells using mammalian expression vectors. Examples of
mammalian
expression vectors include pCDM8 (Seed, B. Nature 329:840(1987)) and pMT2PC
(Kaufman et
al., EMBO J. 6:187-195 (1987)).
The disclosure is also of vectors in which the SNP-containing nucleic acid
molecules
described herein are cloned into the vector in reverse orientation, but
operably linked to a
regulatory sequence that permits transcription of antisense RNA. Thus, an
antisense transcript can
be produced to the SNP-containing nucleic acid sequences described herein,
including both coding
and non-coding regions. Expression of this antisense RNA is subject to each of
the parameters
111

CA 02887830 2015-08-28
CA 2887830
described above in relation to expression of the sense RNA (regulatory
sequences, constitutive or
inducible expression, tissue-specific expression).
The disclosure also relates to recombinant host cells containing the vectors
described
herein. Host cells therefore include, for example, prokaryotic cells, lower
eukaryotic cells such as
yeast, other eukaryotic cells such as insect cells, and higher eukaryotic
cells such as mammalian
cells.
The recombinant host cells can be prepared by introducing the vector
constructs described
herein into the cells by techniques readily available to persons of ordinary
skill in the art. These
include, but are not limited to, calcium phosphate transfection, DEAE-dextran-
mediated
transfection, cationic lipid-mediated transfection, electroporation,
transduction, infection,
lipofection, and other techniques such as those described in Sambrook and
Russell, 2000,
Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold
Spring Harbor
Laboratory Press, Cold Spring Harbor, NY).
Host cells can contain more than one vector. Thus, different SNP-containing
nucleotide
sequences can be introduced in different vectors into the same cell.
Similarly, the SNP-containing
nucleic acid molecules can be introduced either alone or with other nucleic
acid molecules that are
not related to the SNP-containing nucleic acid molecules, such as those
providing trans-acting
factors for expression vectors. When more than one vector is introduced into a
cell, the vectors
can be introduced independently, co-introduced, or joined to the nucleic acid
molecule vector.
In the case of bacteriophage and viral vectors, these can be introduced into
cells as
packaged or encapsulated virus by standard procedures for infection and
transduction. Viral
vectors can be replication-competent or replication-defective. In the case in
which viral
replication is defective, replication can occur in host cells that provide
functions that complement
the defects.
Vectors generally include selectable markers that enable the selection of the
subpopulation
of cells that contain the recombinant vector constructs. The marker can be
inserted in the same
vector that contains the SNP-containing nucleic acid molecules described
herein or may be in a
separate vector. Markers include, for example, tetracycline or ampicillin-
resistance genes for
prokaryotic host cells, and dihydrofolate reductase or
112

CA 02887830 2014-11-27
WO 2005/111241
PCT/US2005/016051
neomycin resistance genes for eukaryotic host cells. However, any marker that
provides
selection for a phenotypic trait can be effective.
While the mature variant proteins can be produced in bacteria, yeast,
mammalian
cells, and other cells under the control of the appropriate regulatory
sequences, cell-free
'5 transcription and translation systems can also be used to produce these
variant proteins using -
= RNA derived from the DNA constructs described herein.
Where secretion of the variant protein is desired, which is difficult to
achieve with
multi-transmembrane domain containing proteins such as G-protein-coupled
receptors
= (GPCRs), appropriate secretion signals can be incorporated into the
vector. The signal .
sequence can be endogenous to the peptides or heterologous to these peptides.
. .==
Where the variant protein is not secreted into the medium, the protein can be
isolated
from the host cell by standard disruption procedures, including freeze/thaw,
sonication,
= mechanical disruption, use of lysing agents, and the like. The variant
protein can then be
recovered and purified by well-known purification methods including, for
example, =
ammonium sulfate precipitation, acid extraction, anion or cationic exchange
, chromatography, phosphocellulose chromatography, hydrophobic-interaction
=
. chromatography, affinity chromatography, hydroxylapatite chromatography,
lectin
. chromatography, or high performance liquid chromatography. =
=
It is also understood that, depending upon the host cell in which ===
recombinant production of the variant proteins described herein occurs, they
=
can have various glycosylation patterns, or may be non-glycosylated, as when
produced in bacteria. In addition, the variant proteins may include an initial
= modified methionine in some cases as a result of a host-mediated process.

For further information regarding vectors and host cells, see Current
Protocols in Molecular Biology, John Wiley & Sons, N.Y.
Uses of Vectors and Host Cells, and Transgenic Animals
Recombinant host cells that express the variant proteins described herein have
a
variety of uses. For example, the cells are useful for producing a variant
protein that can be
further purified into a preparation of desired amounts of the variant protein
or fragments
113

CA 02887830 2015-08-28
CA 2887830
thereof. Thus, host cells containing expression vectors are useful for variant
protein production.
Host cells are also useful for conducting cell-based assays involving the
variant protein
or variant protein fragments, such as those described above as well as other
formats known in
the art. Thus, a recombinant host cell expressing a variant protein is useful
for assaying
compounds that stimulate or inhibit variant protein function. Such an ability
of a compound to
modulate variant protein function may not be apparent from assays of the
compound on the
native/wild-type protein, or from cell-free assays of the compound.
Recombinant host cells are
also useful for assaying functional alterations in the variant proteins as
compared with a known
function.
Genetically-engineered host cells can be further used to produce non-human
transgenic
animals. A transgenic animal is preferably a non-human mammal, for example, a
rodent, such as a
rat or mouse, in which one or more of the cells of the animal include a
transgene. A transgene is
exogenous DNA containing a SNP of the present invention which is integrated
into the genome of
a cell from which a transgenic animal develops and which remains in the genome
of the mature
animal in one or more of its cell types or tissues. Such animals are useful
for studying the function
of a variant protein in vivo, and identifying and evaluating modulators of
variant protein activity.
Other examples of transgenic animals include, but are not limited to, non-
human primates, sheep,
dogs, cows, goats, chickens, and amphibians. Transgenic non-human mammals such
as cows and
goats can be used to produce variant proteins which can be secreted in the
animal's milk and then
recovered.
A transgenic animal can be produced by introducing a SNP-containing nucleic
acid
molecule into the male pronuclei of a fertilized oocyte, e.g., by
microinjection or retroviral
infection, and allowing the oocyte to develop in a pseudopregnant female
foster animal. Any
nucleic acid molecules that contain one or more SNPs as disclosed herein can
potentially be
introduced as a transgene into the genome of a non-human animal.
Any of the regulatory or other sequences useful in expression vectors can form
part of the
transgenic sequence. This includes intronic sequences and polyadenylation
signals, if not already
included. A tissue-specific regulatory sequence(s) can be operably linked to
the transgene to
direct expression of the variant protein in particular cells or tissues.
114

CA 02887830 2014-11-27
WO 2005/111241
PCT/US2005/016051
Methods for generating transgenic animals via embryo manipulation and =
=
microinjection, particularly animals such as mice, have become conventional in
the art and
are described in, for example, U.S. Patent Nos. 4,736,866 and 4,870,009, both
by Leder et -
al., U.S. Patent No. 4,873,191 by Wagner et aL, and in Hogan, B., Manipulating
the Mouse
Embryo, (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1986).
Similar
methods are used for production of other transgenic animals. A transgenic
founder animal
can be identified-based upon the presence of the transgene in its genome
and/or expression
of transgenic mRNA in tissues or,cells of the animals. A transgenic founder
animal can then =
be used to breed additional animals carrying the transgene. Moreover,
transgenic animals =
carrying a transgene can further be bred to other transgenic animals carrying
other
transgenes. A transgenic animal also includes a non-human animal in which the
entire
-animal or tissues in the animal have been produced using the horaologously
recombinant
host cells described herein. =
In another embodiment, transgenic non-human animals can be produced which
contain selected systems that allow for regulated expression of the transgene.
One example
of such a system is the cre/loxP recombinase system of bacteriophage PI (Lakso
et al. PNAS
89:6232-6236 (1992)). Another example of a recombinase system is the FLP
recombinase
system of S. cerevisiae (O'Gorman et al. Science 251:1351-1355 (1991)). If a
cre/loxP
recombinase system is used to regulate expression of the transgene, animals
containing
transgenes encoding both the Cre recombinase and a selected protein are
generally needed.
Such animals can be provided through the construction of ".double" transgenic
animals, e.g.,
by mating two transgenic animals, one containing a transgene encoding a
selected variant
protein and the other containing a transgene encoding a recombinase.
Clones of the non-human transgenic animals described herein can also be
produced
according to the methods described in, for example, Wilmut, I. et al. Nature
385:810-813
(1997) and PCT International Publication Nos. WO 97/07668 and WO 97/07669. In
brief, a
cell (e.g., a somatic cell) from the transgenic animal can be isolated and
induced to exit the
growth cycle and enter Go phase. The quiescent cell can then be fused, e.g.,
through the use
of electrical pulses, to an enucleated oocyte from an animal of the same
species from which
the quiescent cell is isolated. The reconstructed oocyte is then cultured such
that it develops
to rnorula or blastocyst and then transferred to pseudopregnant female foster
animal. The
115

CA 02887830 2015-08-28
CA 2887830
offspring born of this female foster animal will be a clone of the animal from
which the cell (e.g., a
somatic cell) is isolated.
Transgenic animals containing recombinant cells that express the variant
proteins
described herein are useful for conducting the assays described herein in an
in vivo context.
Accordingly, the various physiological factors that are present in vivo and
that could influence
ligand or substrate binding, variant protein activation, signal transduction,
or other processes or
interactions, may not be evident from in vitro cell-free or cell-based assays.
Thus, non-human
transgenic animals of the present invention may be used to assay in vivo
variant protein function as
well as the activities of a therapeutic agent or compound that modulates
variant protein
function/activity or expression. Such animals are also suitable for assessing
the effects of null
mutations (i.e., mutations that substantially or completely eliminate one or
more variant protein
functions).
For further information regarding transgenic animals, see Houdebine, "Antibody

manufacture in transgenic animals and comparisons with other systems", Cuff
Opin Biotechnol.
2002 Dec;13(6):625-9; Petters et al., "Transgenic animals as models for human
disease",
Transgenic Res. 2000;9(4-5):347-51; discussion 345-6; Wolf et al., -Use of
transgenic animals in
understanding molecular mechanisms of toxicity", JPhar,n Pharmacol. 1998
Jun;50(6):567-74;
Ethelard, "Recombinant protein production in transgenic animals", Curr Opin
Biotechnol. 1996
Oct;7(5):536-40; Houdebine, "Transgenic animal bioreactors", Transgenic Res.
2000;9(4-5):305-
20; Pirity et al., "Embryonic stem cells. creating transgenic animals",
Methods Cell Biol.
1998;57:279-93; and Robl et al., "Artificial chromosome vectors and expression
of complex
proteins in transgenic animals", Thenogenology. 2003 Jan 1;59(4107-13.
COMPUTER-RELATED EMBODIMENTS
The SNPs disclosed herein may be "provided" in a variety of mediums to
facilitate use thereof.
As used in this section, "provided" refers to a manufacture, other than an
isolated nucleic acid
molecule, that contains SNP information of the present invention. Such a
manufacture
provides the SNP information in a form that allows a skilled artisan to
examine the
manufacture using means not directly applicable to examining the SNPs or a
subset thereof as
they exist in nature or in purified form. The SNP information that may be
provided in such a
116

CA 02887830 2015-08-28
CA 2887830
form includes any of the SNP information disclosed herein such as, for
example, polymorphic
nucleic acid and/or amino acid sequence information such as SEQ ID NOS:1-14,
SEQ ID
NOS:15-28, SEQ ID NOS:43-50, SEQ ID NOS:29-42, and SEQ ID NOS:51-58;
information
about observed SNP alleles, alternative codons, populations, allele
frequencies, SNP types,
and/or affected proteins; or any other information disclosed in Tables 1-2
and/or the Sequence
Listing.
In one application of this embodiment, the SNPs disclosed herein can be
recorded on a
computer readable medium. As used herein, "computer readable medium" refers to
any
medium that can be read and accessed directly by a computer. Such media
include, but are not
limited to: magnetic storage media, such as floppy discs, hard disc storage
medium, and
magnetic tape; optical storage media such as CD-ROM; electrical storage media
such as RAM
and ROM; and hybrids of these categories such as magnetic/optical storage
media. A skilled
artisan can readily appreciate how any of the presently known computer
readable media can be
used to create a manufacture comprising computer readable medium having
recorded thereon a
nucleotide sequence described herein. One such medium is provided with the
present
application, namely, the present application contains computer readable medium
(CD-R) that
has nucleic acid sequences (and encoded protein sequences) containing SNPs
provided/recorded thereon in ASCII text format in a Sequence Listing along
with
accompanying Tables that contain detailed SNP and sequence information
(transcript
sequences are provided as SEQ ID NOS:1-14, protein sequences are provided as
SEQ ID
NOS:15-28, genomic sequences are provided as SEQ Ill NOS:43-50, transcript-
based context
sequences are provided as SEQ ID NOS:29-42, and genomic-based context
sequences are
provided as SEQ ID NOS:51-58).
As used herein, "recorded" refers to a process for storing information on
computer
readable medium. A skilled artisan can readily adopt any of the presently
known methods for
recording information on computer readable medium to generate manufactures
comprising the
SNP information disclosed herein.
A variety of data storage structures are available to a skilled artisan for
creating a
computer readable medium having recorded thereon a nucleotide or amino acid
sequence
described herein. The choice of the data storage structure will generally be
based on the means
117

CA 02887830 2015-08-28
CA 2887830
chosen to access the stored information. In addition, a variety of data
processor programs and
formats can be used to store the nucleotide/amino acid sequence information
described herein
on computer readable medium. For example, the sequence information can be
represented in a
word processing text file, formatted in commercially-available software such
as WordPerfect
and Microsoft Word, represented in the form of an ASCII file, or stored in a
database
application, such as 0B2, Sybasc, Oracle, or the like. A skilled artisan can
readily adapt any
number of data processor structuring formats (e.g., text file or database) in
order to obtain
computer readable medium having recorded thereon the SNP information of the
present
invention.
By providing the SNPs disclosed herein in computer readable form, a skilled
artisan can
routinely access the SNP information for a variety of purposes. Computer
software is publicly
available which allows a skilled artisan to access sequence information
provided in a computer
readable medium. Examples of publicly available computer software include
BLAST (Altschul
et al., J. Mol. Biol. 215:403-410 (1990)) and BLAZE (Brutlag et al., Comp.
Chem. 17:203-207
(1993)) search algorithms.
The present disclosure is further of systems, particularly computer-based
systems,
which contain the SNP information described herein. Such systems may be
designed to store
and/or analyze information on, for example, a large number of SNP positions,
or information
on SNP genotypes from a large number of individuals. The SNP information
disclosed herein
and represents a valuable information source. The SNP information disclosed
herein
stored/analyzed in a computer-based system may be used for such computer-
intensive
applications as determining or analyzing SNP allele frequencies in a
population, mapping
disease genes, genotype-phenotype association studies, grouping SNPs into
haplotypes,
correlating SNP haplotypes with response to particular drugs, or for various
other
bioinformatic, pharmacogenomic, drug development, or human
identification/forensic
applications.
As used herein, "a computer-based system" refers to the hardware means,
software
means, and data storage means used to analyze the SNP information disclosed
herein. The
minimum hardware means of the computer-based systems of the present invention
typically
comprises a central processing unit (CPU), input means, output means, and data
storage means.
118

CA 02887830 2015-08-28
CA 2887830
A skilled artisan can readily appreciate that any one of the currently
available computer-based
systems are suitable for use in the present invention. Such a system can be
changed into a
system described herein by utilizing the SNP information provided on the CD-R,
or a subset
thereof, without any experimentation.
As stated above, the computer-based systems described herein comprise a data
storage
means having stored therein SNPs disclosed herein and the necessary hardware
means and
software means for supporting and implementing a search means. As used herein,
"data storage
means" refers to memory which can store SNP information disclosed herein, or a
memory
access means which can access manufactures having recorded thereon the SNP
information
desclosed herein.
As used herein, "search means" refers to one or more programs or algorithms
that are
implemented on the computer-based system to identify or analyze SNPs in a
target sequence
based on the SNP information stored within the data storage means. Search
means can be used
to determine which nucleotide is present at a particular SNP position in the
target sequence. As
used herein, a "target sequence'' can be any DNA sequence containing the SNP
position(s) to
be searched or queried.
As used herein, "a target structural motif," or "target motif," refers to any
rationally
selected sequence or combination of sequences containing a SNP position in
which the
sequence(s) is chosen based on a three-dimensional configuration that is
formed upon the
folding of the target motif. There are a variety of target motifs known in the
art. Protein target
motifs include, but are not limited to, enzymatic active sites and signal
sequences. Nucleic acid
target motifs include, but are not limited to, promoter sequences, hairpin
structures, and
inducible expression elements (protein binding sequences).
A variety of structural formats for the input and output means can be used to
input and
output the information in the computer-based systems of the present invention.
An exemplary
format for an output means is a display that depicts the presence or absence
of specified
nucleotides (alleles) at particular SNP positions of interest. Such
presentation can provide a
rapid, binary scoring system for many SNPs simultaneously.
119

CA 02887830 2015-08-28
CA 2887830
One exemplary embodiment of a computer-based system comprising SNP information

disclosed herein is provided in Figure 1. Figure 1 provides a block diagram of
a computer
system 102 that can be used to implement the disclosed methods. The computer
system 102
includes a processor 106 connected to a bus 104. Also connected to the bus 104
are a main
memory 108 (preferably implemented as random access memory, RAM) and a variety
of
secondary storage devices 110, such as a hard drive 112 and a removable medium
storage
device 114. The removable medium storage device 114 may represent, for
example, a floppy
disk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable storage
medium 116
(such as a floppy disk, a compact disk, a magnetic tape, etc.) containing
control logic and/or
data recorded therein may be inserted into the removable medium storage device
114. The
computer system 102 includes appropriate software for reading the control
logic and/or the data
from the removable storage medium 116 once inserted in the removable medium
storage device
114.
The SNP information disclosed herein may be stored in a well-known manner in
the
main memory 108, any of the secondary storage devices 110, and/or a removable
storage
medium 116. Software for accessing and processing the SNP information (such as
SNP
scoring tools, search tools, comparing tools, etc.) preferably resides in main
memory 108
during execution.
EXAMPLES
The following examples are offered to illustrate, but not to limit the claimed
invention.
STATISTICAL ANALYSIS OF SNP ASSOCIATION WITH LIVER FIBROSIS
IN HCV-INFECTED INDIVIDUALS
Example 1:
A case-control genetic study was performed to determine the association of
SNPs in the
human genome with liver fibrosis and in particular the increased or decreased
risk of
developing bridging fibrosis/cirrhosis, and the rate of progression of
fibrosis in HCV infected
patients. The study involved genotyping SNPs in DNA samples obtained from
120

CA 02887830 2016-10-14
CA 2887830
>500 HCV-infected patients. The study population came from 2 clinic sites, the
University of
California, San Francisco (UCSF) and Stanford University (Stanford). Among the
435 patients
from UCSF, the percentage for minimal, moderate, and severe fibrosis was 46%,
26%, and
28%, respectively, which reflects the distribution of HCV patients in their
clinics. The 100
samples obtained from Stanford were intentionally collected on extreme cases,
and therefore
comprised 62% minimal fibrosers and 38% severe fibrosers. Samples were divided
into a case
group or a control group. The cases comprised those samples obtained from
individuals
determined to have severe fibrosis (bridging fibrosis/cirrhosis) and the
controls comprised those
samples obtained from individuals with minimal and moderate fibrosis. The
stage of fibrosis in
I() each individual was determined according to the system of Batts et al.,
Am I Surg. Pathol.
19:1409-1417 (1995) as reviewed by Brunt Hepatology 31:241-246 (2000). All
patients who
donated samples had signed informed, written consent, and the study protocols
were approved
by the respective Institutional Review Boards (IRB).
DNA was extracted from individual blood samples using conventional DNA
extraction
methods or by using commercially available kits according to manufacturer's
suggested
conditions, such as the QIA-amp kit from QiagenTM (Valencia, CA). SNP markers
in the
extracted DNA samples were analyzed by genotyping using primers such as those
presented in
Table 5. While some samples were individually genotyped, the same samples and
any
remaining samples were also used for pooling studies, in which DNA samples
from about 50
individuals were pooled and allele frequencies were obtained using a PRISM
7900 HT
Sequence Detection System (Applied BiosystemsTM, Foster City, CA) by kinetic
allele-specific
PCR similar to the method described by Germer et al., Genome Research 10:258-
266 (2000).
The results of statistical analysis of association of a SNP with a decreased
risk of
developing bridging fibrosis/cirrhosis are presented in Table 4. For
statistical analysis, the
outcomes include only fibrosis stage (categorized into 0+1, 2, for controls
and 3+4 for cases)
(and identified as "stage" in Table 4). Genotypes were categorized into
ordinal (three groups,
including major homozygotes, heterozygotes, and minor homozygotes) as well as
using a
dominant model assumption (two groups, including major homozygotes versus
heterozygotes +
minor homozygotes). Multiple logistic regression as well as
121

CA 02887830 2014-11-27
W02005/111241 PCT/US2005/016051
proportional logistic regression analysis was used to generate age-adjusted
odds ratios
and 95% confidence intervals to assess the association between the genotypes
and
fibrosis stage. All reported p-values are two-sided. =
A marker having an odds ratio (OR) < 1.0 is protective (e.g., an individual is
less
likely to develop severe liver fibrosis), whereas a marker having an odds
ratio (OR) > 1.0

.
is associated with an increased risk (e.g., an individual is more likely to
develop severe
liver fibrosis). =
=
Among the 120 SNPs tested in the two sample sets, hCV11638783 is a marker
, that is associated with decreased risk for severe fibrosis. hCV11638783 is a
replicated
marker because it shows significant association with severe fibrosis in both
the UCSF
and Stanford sample sets(p-values of 0.0014 and 0.0175 respectively in the
ordinal
analyses and 0.0055 and 0.0071 respectively in the dominant analyses). The
odds ratio in
. , both sample sets was less than 1.0 (in the UCSF sample set, for
ordinal, the odds ratio =
was 0.583 for fibrosis stage, and, for dominant, the odds ratio was 0.586 for
fibrosis '
15. stage. In the Stanford sample set, for ordinal, the odds ratio was
0.408 for fibrosis stage:. =
in "Sample Set 2", for dominant, the odds ratio was 0.291 for fibrosis stage).
Thus, this
SNP may be used to identify individuals with a decreased risk of developing
fibrosis,
especially bridging fibrosis/cirrhosis.
=
Example 2
=
Sample Set Description
=
= In a second case control study, DNA samples obtained from the University
of
California San Francisco (UCSF) were used as a discovery sample set to
initially identify
. SNPs in association with severe .fibrosis. Among the 537 patients in the
discovery
sample set, the percentage for minimal stage 0-1, moderate stage 2, and severe
stage 3-4
fibrosers was 52%, 23%, and 25%, respectively, which reflects the typical
distribution of
HCV infected patients in clinics.
in addition sample sets were collected from 3 additional but different clinic
sites
for use in replication studies: Virginia Commonwealth University (VCU),
University of
Illinois, Chicago (UIC) and Stanford University (Stanford). Arricing the
approximately
483 patients in the sample set from VCU, the percentage for minimal, moderate,
and
122

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
severe fibrosis was approximately 18%, 34%, and 48%, respectively. Among the
115
patients in the sample set from UIC, the percentage for minimal, moderate, and
severe
fibrosis was 29%, 30%, and 41%, respectively. Samples from the Stanford sample
set
were intentionally collected on extreme cases, which contained 62% minimal
stage 0-1
fibrosers and 38% severe stage 3-4 fibrosers. The stage of fibrosis in each
individual in
theNCU sample set was determined according to the system of Knodell et at.,
flepatology 1:431-435 (1981). The stage of fibrosis in individuals in the
UCSF, UIC and
Stanford sample sets was determined according to the system of Batts et aL, Am
J. Surg.
Pathol. 19:1409-1417 (1995). Both scoring systems are reviewed by Brunt
Hepatology =
31:241-246 (2000). All patients who donated samples had signed informed,
written
=
consent, and the study protocols were approved by their respective 1RBs.
All patients in the sample sets met the inclusion/exclusion criteria in the
study
=protocol as follows:
Inclusion criteria: = =
= Adults (Age 18 75).
* HCV positive patients who have undergone a full course (at least 24 weeks)
of
Interferon (IFN) treatment (any formulation +/- ribavirin) and for whom six
month
=
follow-up viral load data was available/potentially available.
Exclusion criteria:
= Discontinuation of IFN treatment secondary to poor tolerance of side
effects
= Evidence of other chronic active viral hepatitis including positive
hepatitis antigen,
= Evidence of co-infection with human immunodeficiency virus (HIV),
e.g. Positive anti- =
BIV antibody.
= Evidence of other serious liver disease: e.g. Wilson's Hemachromatosis, etc
= Other serious medical conditions: Rheumatic/renal/lung diseases,
cardiovascular
disease, cancer
Additional information used for data analysis:
= Age
= Race
= Gender
= HCV genotype
a Viral load
= Ethanol use
= Intravenous drug use
= Other medications
= Exact treatment regimen
= Alanine amino transferase levels
= Response to 1FN treatment
123

CA 02887830 2016-10-14
=
= Other medical history, including serious medical illness such as kidney
disease,
= cardiovascular disease, antoimmmie disease, and cancer
Pooling and whole genome scan on discovery sample set
Association of SNP alleles in the human genome with fibrosis stage in HCV
patients was tested in the discovery sample set. DNA was extracted from blood
samples
using a standard protocol or DNA extraction kits as described above. DNA
samples from
patients were pooled based on siinilRr clinical phenotypes of the patients.
While some
saMples were indivirinslly genotyped, the same samples were also used for
pooling -
10' studies, in which DNA samples from about 50 indivirinsls were pooled and
allele
= frequencies in the pools were obtained using primers such as those
presented in Table 5.
Genotypes and pool allele frequencies were measured using aPRISM 7900HT
TM
= Sequence Detection PM System (Applied Biosystems) by kinetic allele-
specific -PCR,
similar to the method described by Germer et al. (Germer S., Holland M.J.,
lEguchi R.
2000, Genome Res. 10: 258-266).
=
Data analysis on whole genome scan using pooled DNA
Approximately 21,470 SNPs throughout the genome were genotyped in the
discovery sample set. Allele odds ratios and p-values were generated comparing
the
2 0 1 advanced or high fibrosis stage group (case group) (also known as
bridging .
fibrosis/cirrhosis) vs .medium and low groups (mild or no fibrosis) (control
group).
. Results were stratified by ethni'city (all ethnic groups (A)
Caucasian.(C) and other than
Caucasian (0)) for the 'stage outcome to assess any confounding by these
factors.
Data analysis of indivirinal genotyping results on replication sample sets
.
About 175 SNPs were selected from a study using the discovery sample set based
=
on their association with severe fibrosis. These SNPs were then retested by
individual
genotyping in the UCSF sample set to confirm the initial results and in the
VCU, U1C or
Stanford sample sets. The data obtained from the VCU sample set was used to
replicate
the results obtained from the UCSF sample set The UIC and Stanford sample sets
provided additional replication data. The Allelic Association, Fischer Exact
Test was
used to analyze the association of a SNP with fibrosis stage. In replication
studies, a SNP
124

CA 02887830 2014-11-27
WO 2005/111241 PCT/US2005/016051
was considered a replicated marker only if it had a significant p-value <0.1
for a
particular stage in the UCSF sample set and the VCU sample set, and the Odds
Ratio
(OR) had to go in the same direction ¨ that is, the regression coefficient had
to have the
same sign in each of the UCSF and VCU sample sets. 67 markers were replicated
in
these two sample sets in showing statistically significant association with
severe fibrosis -
(Table 7). For example, marker hCV7450990 is a replicated marker when all
patients as
well as patients other than Caucasian populations (but not the Caucasian only
population
were analyzed, whereas marker hCV11935588, is a replicated marker when all
patients as
well as Caucasian-only population were analyzed. Both of these SNPs are
protective.
alleles with ORs <1.
hCV7450990 a missense SNP in DDX5, which encodes a DEAD (Asp-Glu-Ala-
Asp) box polypeptide 5, and it is shown to have an association with severe
fibrosis in
both the UCSF and VCU sample sets. Previous studies in the art have shown that
this
gene is expressed in multiple tissues including the liver. A recent report
indicates that
DDX5, an RNA helicase also known as p68, interacts with HCV NS5B (HMI RNA-
dependent RNA polymerase), suggesting that DDX5 is a human cellular factor
involved
in HCV RNA replication (Goh et al., J Viral., 2004, 78: 5288-98). Therefore,
SNPs in
the DDX5 gene, particularly missense SNPs such as hCV7450990, might render a
protective effect by affecting the ability of HCV to replicate.
SNP (hCV11935588), which is located in a gene on chromosome 16, is an
example of a marker associated with severe fibrosis in Caucasians in all four
sample sets.
In addition, hCV15851335 also showed association with severe fibrosis when all
patients as well as a Caucasian-only population were analyzed in the UCSF and
VCU
sample sets (Table 7). hCV15851335 is a missense SNP in CPT1A (camitine,
pplmitoyltransferase 1A, liver). CPTIA is a key enzyme in camitine-dependent
transport
across the mitochondrial inner membrane, and its deficiency results in a
decreased rate of
fatty acid beta-oxidation,. which causes fatty liver diseases.
The SNPs disclosed herein could be used to identify other markers that are
associated with and increased risk of developing bridging fibrosis/cirrhosis,
and
progression of fibrosis in HCV-infected individuals and other liver disease
patients. For
125

CA 02887830 2015-08-28
CA2887830
example, the markers listed in Tables 4-5 can be used to identify other
mutations (preferably
SNPs), such as those that exhibit similar or enhanced predictive value, in the
identified genes or
surrounding nucleotide sequence (e.g., 500 Kb upstream to 500 Kb downstream of
the marker)
through database searches or through sequencing of DNA samples. Specifically,
marker
hCV7450990 is an A480S change in DDX5, a DEAD-Box RNA helicase. DEAD-Box
proteins
are characterized by an Asp-Glu-Ala-Asp motif. DDX5 is located on the long arm
of
chromosome 17, 17q24.1. The sex-averaged recombination rate in the region is
estimated to be
0.6 cM/Mb. The SNP appears to fall within a region of high linkage
disequilibrium that
extends roughly 20Kb centromeric of the SNP and extends roughly 244Kb
telomeric to the
SNP. Other SNPs in these two regions may be associated with fibrosis
progression rate or
inflammation. Given the high homology within the DEAD-Box protein family, all
DEAD-box
genes and SNPs in those genes are likely to play a role in advanced fibrosis
stage.
Marker hCV11935588 is located in a gene on chromosome 16. The following genes
are
located in the region and may also play a role in advanced fibrosis stage: 1)
CTRB1
(chymotrypsinogen B1) is located within roughly 10kb of marker hCV11935588.
CTRB1 is a
zymogen secreted by the pancreas (highly expressed in the pancreas) and
cleaved by trypsin to
become a protease in the small intestine. It is also expressed in the liver.
2) BCARI (breast
cancer antiestrogen resistance) is within 50kb of marker hCV11935588 and is
involved in
apoptosis. 3) LDHD (lactate dehydrogenase D) is roughly 100kb away from marker
hCV11935588. LDHD is in the electron transport chain and highly expressed in
the liver. It's
also expressed in the kidneys. Two isoforms of LDHD exist. 4) KARS (lysyl-tRNA

synthetase) is within 400kb of marker hCV11935588. KARS is expressed in many
immune
cells (NK, T-cells, B cells, etc.), and is also expressed in BM tissue. A p-
value of 0.01 was
observed in Sample Set 1 at a SNP in the KARS gene. KARS has been shown to
play a role in
autoimmune diseases and is a target for autoantibodies in polymyositis and
dermatomyositis.
The SNPs disclosed herein, alone, or in combination with other risk factors,
such as age,
gender, and alcohol consumption, can provide a non-invasive test that enables
physicians to
assess the fibrosis risk in HCV-infected individuals. Such a test offers
several advantages, such
as: 1) enabling better treatment strategies (for example,
126

CA 02887830 2014-11-27
individuals with a higher fibrosis risk can be given higher priority for
treatment, while . =
treatment for individuals with a lower fibrosis risk can be delayed, thereby
alleviating
them from the side effects and high cost of treatment); and 2) reducing the
need for
repeated liver biopsies for all patients. =
Furthermore, the SNPs disclosed herein could be used in diagnostic kits to
assess
the increased or decreased risk of developing bridging fibrosis/cirrhosis and
progression.
of fibrosis for patients with other liver diseases, such as hepatitis B, any
co-infection with =
other viruses (such as HIV, etc.), non-alcoholic fatty liver diseases (NAFLD),
drug- .
induced liver diseases, alcoholic liver diseases (AID), primary biliary
cirrhosis (PBC), :,
primary sclerosing cholangitis (PSC), autoimmune hepatitis (AIR) and
cryptogenic =
cirrhosis. Depending on the genotypes of one or multiple markers disclosedin
any of
= Tables 6-7, alone or in combination with other risk factors, physicians
could categorize -
these patients into slow, median, or rapid fibrosers. =
=
15.
I[
Various modifications and variations of the described
compositions, methods and systems of the invention will be apparent to those
skilled in
the art without deputing from the scope of the invention. Although the
invention has been described in connection with specific preferred embodiments
and
certain working examples, it should be understood that the invention as
claimed should -
not be unduly limited to such specific embodiments. Indeed, various
modifications of the -
above-described modes for carrying out the invention that are obvious to those
skilled in
the field of molecular biology, genetics and related fields are intended to
be. within the
scope of the following Claims. = =
=
127

CA 02887830 2015-08-28
TABLE 1
Gene Number: 25
Celera Gene: hCG1810767 - 64000126973272
Ceiera Transcript: hCT1950036 - 64000126973273
Public Transcript Accession: NM 025225
Cclera Protein: hCP1765925 - 197000069451968
Public Protein Accession: NP 079501
Gene Symbol: C2orf20
Protein Name: chromosome 22 open reading frame 20
Celera Genomic Axis: GA_x5YUV32VY8D(1403768..1440680)
Chromosome: 22
GMTM NUMBER:
OMIM Information:
Transcript SEQ ID NO: 1
Protein SEQ ID NO: 15
SNP Information:
Context (SEQ ID NO: 29):
GCATCTCTCTTACCAGAGTGTCTGATGGGGAAAACGTTCTGGTGTCTGACTTTCGGTCCAAAGACGAAGTCGTG
GATGCCTTGGTATGTTCCTGCTTCAT
CCCTTCTACAGTGGCCTTATCCCTCCTTCCTTCAGAGGCGTGCGATATGTGGATGGAGGAGTGAGTGACAACGT
ACCCTTGATTGATGCCAAAACAACCA
Celera SNP ID: hCV7241
SNP Position Transcript: 617
SNP Source: dbSNP
Population(Allele,Count): no pop(C,71IG,21)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 15, 148,(I,ATC) (M,ATG)
Gene Number: 51
Celera Gene: hCG27399 - 146000220312482
Celera Transcript: hCT18539 - 146000220312504
Public Transcript Accession: NM 003266
Celera Protein: hC-1543686 - 197000064928737
Public Protein Accession: NP 003257
Gene Symbol: TLR4
Protein Name: toll-like receptor 4
Calera Genomic Axis: GA_x5YUV32W1V9(4596116-4621277)
Chromosome: 9
OMIM NUMBER: 603030
OMIM Information: Endotoxin hyporesponsiyeness (3)
Transcript SEQ ID NO: 2
Protein SEQ ID NO: 16
SNP Information:
Context (SEQ ID NO: 30):
GCTITTTCAGAAGTTGATCTACCAAGCCTTGAGTTTCTAGATCTCAGTAGAAATCGCTIGAGTTTCAAAGGTTG
CTGITCTCAAAGTGATTTTGGGACAA
CAGCCTAAAGTATTTAGATCTGAGCTTCAATGGIGTTATTACCAIGAGTTCAAACTTCTTGGGCTTAGAACAAC
TAGAACATCTGGATTTCCAGCATTCC
Celera SNP ID: hCV11722237
SNP Position Transcript: 1429
SNP Source: Applera
Population(Allele,Count): eaucasian(C,37IT,3) african
american(C,36)T,2) total(C,73)T,5)
SNP Type: Missense Mutation
128

CA 02887830 2015-08-28
Protein Coding: SEQ ID NO: 16, 399,(T,ACC) (I,ATC)
SNP Source: HGMD; dbSNP; Nickerson
Population(Allele,Count): no_pop(C,-IT,-) ; no_pop(C,-IT,-) ;
no pop(C,436IT,26)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 16, 399, (T,ACC) (T,ATC)
Gene Number: 51
Celcra Gene: hCG27399 - 146000220312482
Celera Transcript: hCT1961394 - 146000220312489
Public Transcript Accession: NM 003266
Celera Protein: liCT)1774277 - 191000064928735
Public Protein Accession: NP 003257
Gene Symbol: TLR4
Protein Name: toll-like receptor 4
Celera Genomic Axis: GA_x5YUV32W1V9(4596116-4621277)
Chromosome: 9
OMIM NUMBER: 603030
OMIM Information: Endotoxin hyporesponsiveness (3)
Transcript SEQ ID NO: 3
Protein SEQ ID NO: 17
SNP Information:
Context (SEQ ID NO: 31):
GCTTTTTCAGAAGTTGATCTACCAAGCCTTGAGTITCTAGATCTCAGTAGAAATGGCTTGAGTTTCAAAGGTTG
CTGTTCTCAAAGTGATTTTGGGACAA
CAGCCTAAAGTATTTAGATCTGAGCTTCAATCOTGTTATTACCATGAGTTCAAACTTCTIGGGCTTAGAACAAC
TAGAACATCTGGATTTCCAGCATTCC
Celera SNP ID: hCV11722237
SNP Position Transcript: 1549
SNP Source: Applera
Population(Allele,Count): caucasian(C,37IT,3) african
american(C,36IT,2) totai(C,73IT,5)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 17, 359,(T,ACC) (I,ATC)
SNP Source: HGMD; dbSNP; Nickerson
Population(Allele,Count): no_pcp(C,-)T,-) ; no_pop(C,-II,-) ;
no_pop(C,436IT,26)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 17, 359,(T,ACC) (I,ATC)
Gene Number: 51
Celera Gene: hCG27399 - 146000220312482
Celera Transcript: hCT1961395 - 146000220312483
Public Transcript Accession: NM 138554
Celera Protein: hG11774243 - 197000064928734
Public Protein Accession: NP 612564
Gene Symbol: TLR4
Protein Name: toll-like receptor 4
Celera Genomic Axis: GA_x5YUV32W1V9(4596116..4621277)
Chromosome: 9
OMIM NUMBER: 603030
OMIM Information: Endotoxin hyporesponsiveness (3)
Transcript SEQ ID NO: 4
129

CA 02887830 2015-08-28
Protein SEQ ID NO: 18
SNP Information:
Context (SEQ ID NO: 32):
GCTITTTCAGAAGTIGATCTACCAAGCCTTGAGTTTCTAGATCTCAGTAGAAATGGCTTGAGTTICAAAGGTTG
CTGTTCTCAAAGTGATTTTGGGACAA
CAGCCTAAAGTATTTAGATCTGAGCTTCAATGGTGTTATTACCATGAGTTCAAACTTCTTGGGCTTAGAACAAC
TAGAACATCTGGATITCCAGCATTCC
Celera SNP ID: hCV11722237
SNP Position Transcript: 1262
SNP Source: Applera
Population(Allele,Count): caucasian(C,37IT,3) african
american(C,36IT,2) total(C,73IT,5)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 18, 199,(T,ACC) (I,ATC)
SNP Source: HGMD; dbSNP; Nickerson
Population(Allele,Count): no_oop(C,-IT,-) ; no_pop(C,-IT,-) ;
no pop(C,436IT,26)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 18, 199, (T,ACC) (I,ATC)
Gene Number: 31
Celera Gene: hCG27399 - 146000220312482
Celera Transcript: hCT2316295 - 146000220312497
Public Transcript Accession: NM 138554
Celera Protein: hCP1796095 - 197000064928736
Public Protein Accession: NP 612564
Gene Symbol: TLR4
Protein Name: toll-like receptor 4
Celera Genomic Axis: GA_x5YUV32W1V9(4596116..4621277)
Chromosome: 9
OMIM NUMLIER: 603030
OMIM Information: Endotoxin hyporesponsiveness (3)
Transcript SEQ ID NO: 5
Protein SEQ ID NO: 19
SNP Information:
Context (8W. ID NO: 33):
GCTITTTCAGAAGTTGATCTACCAAGCCTTGAGTTTCTAGATCTCAGTAGAAATGGCTTGAGTTTCAAAGGTTG
CTGTTCTCAAAGTGATTTTGGGACAA
CAGCCTAAAGTATTTAGATCTGAGCTTCAATGGTGTTATTACCATGAGTTCAAACTTCTTGGGCTTAGAACAAC
TACAACATCTCCATTTCCACCATTCC
Celera SNP ID: hCV11722237
SNP Position Transcript: 1382
SNP Source: Applera
Population(Allele,Count): caucasian(C,37IT,3) african
american(C,36IT,2) total(C,73IT,5)
SNP Type: Missense MutaLion
Protein Coding: SEQ ID NO: 19, 342,(T,ACC) (I,ATC)
SNP Source: HGMD; dbSNP; Nickerson
Population(Allele,Count): no_pop(C,-IT,-) ; no pop(C,-IT,-) ;
no pop(C,436IT,26)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 19, 342,(T,ACC) (I,ATC)
130

CA 02887830 2015-08-28
Gene Number: 53
Celera Gene: hCG27643 - 62000133384069
Cetera Transcript: hCT18784 - 62000133384087
Public Transcript Accession: NM 004396
Celera Protein: hCP43680 - 197000064924833
Public Protein Accession: NP 004387
Gene Symbol:
Protein Name: DEAD (Asp-Glu-Ala-Asp) box polype-otide 5
Celera Genomic Axis: GA_x5YUV32W3KM(353933-374581)
Chromosome: 17
OMIM NUMBER: 180630
OMIM Information:
Transcript SEQ ID NO: 6
Protein SEQ ID NO: 20
SNP Information:
Context (SEQ ID NO: 34):
ACCTAATAACATAAAGCAAGTGAGCGACCTTATCTCTGTGCTTCGTGAAGCTAATCAAGCAATTAATCCCAAGT
TGCTTCAGTTGGTCGAAGACAGAGGT
CAGGTCGTTCCACCGGTAGAGGAGGCATGAAGGATGACCGTCGGGACAGATACTCTGCGGGCAAAAGGGGTGGA
TTTAATACCTTTAGAGACAGGGAAAA
Celera SNP ID: hCV7450990
SNP Position Transcript: 1729
SNP Source: dbSNP; Nickerson; HapMap
Population(Allele,Count): no_pop(T,172IG,12) ; no_pop(T,-IG,H ;
no pop(T,-IG,-)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 20, 480,(S,TCA) (A,GCA)
Gene Number: 53
Celera Gene: hCG27643 - 62000133384069
Celera Transcript: hCT1971014 - 62000133384070
Public Transcript Accession: NM 004396
Celera Protein: hCP1783369 - 197000064924832
Public Protein Accession: NP 004387
Gene Symbol: DDX5
Protein Name: DEAD (Asp-Giu-Ala-Asp) box polypeptide 5
Celera Genomic Axis: GA_x5YUV32W3KM(353933..374581)
Chromosome: 17
OMIM NUMBER: 180630
OMIM Information:
Transcript SEQ ID NO: 7
Protein SEQ ID NO: 21
SNP Information:
Context (SEQ ID NO: 35):
ACCTAATAACATAAAGCAAGTGAGCGACCTTATCTCTGTGCTTCGTGAAGCTAATCAAGCAATTAATCCCAAGT
TGCTTCAGTTGGTCGAAGACAGAGGT
CAGGTCGTTCCAGGGGTAGAGGAGGCATGAAGGATGACCGTCGGGACAGATACTCTGCGGGCAAAAGGGGTGGA
TTIAATACCTTTAGACACAGGGAAAA
Celera SNP ID: hCV7450990
SNP Position Transcript: 1690
SNP Source: dbSNP; Nickerson; HapMap
Population(Allele,Count): no_pOp(T,1721G,12) ; no_PoP(T,-IC,-) ;
no pop(1,-IG,-)
SNP Type: Missense Mutation
131

CA 02887830 2015-08-28
Protein Coding: SEQ ID NO: 21, 480,(S,TCA) (A,GCA)
Gene Number: 67
Celera Gene: hC537774 - 104000117648431
Celera Transcript: hCT29008 - 104000117648432
Public Transcript Accession: NM 000253
Calera Protein: hCP48619 - 197000069479734
Public Protein Accession: NP 000244
Gene Symbol: MTP
Protein Name: microsomal triglyceide transfer protein
(large polypeptide, 88kDa)
Celera Genomic Axis: GA_x5YUV32W7K2(47612373..47673550)
Chromosome: 4
GRIM NUMBER: 157147
OMIM Information: Abetalipoproteinemia, 200100 (3)
Transcript SEQ ID NO: 8
Protein SEQ ID NO: 22
SNP Information:
Context (SEQ ID NO: 36):
CAGAGAGGAGAGAAGAGCATC1TCAAAGGAAAAAGCCCATCTAAAATAATGGGAAAGGAAAACTTGGAAGCTCT
GCAAAGACCTACGCTCCTTCATCTAA
CCATGGAAAGGTCAAAGAGTTCTACTCATATCAAAATGAGGCAGTGGCCATAGAAAATATCAAGAGAGGCCTGG
CTAGCCTATTICAGACACAGTTAAGC
Celera SNP ID: hCV22274307
SNP Position Transcript: 469
SNP Source: Applera
Population(Allele,Count): caucasian(C,10IT,26) african
american(C,16IT,22) toal(C,2611,48)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 22, 128,(I,ATC) (T,ACC)
SNP Source: dbSNP; Nickerson
Population(Allele,Count): no pop(T,2502IC,490) ; no_pcp(T,-1C,-)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 22, 128,(I,ATC) (T,ACC)
Gene Number: 98
Celera Gene: hCG2039431 - 208000027149733
Celera Transcript: hCT2258512 - 208000027149752
Public Transcript Accession: NM 000120
Celera Protein: hCPL806343 - 208000027149651
Public Protein Accession: N? 000111
Gene Symbol: EPHX1
Protein Name: epoxide hydrolase 1, microsomal
(xenobictic)
Celera Genomic Axis: GA_x5YUV32VWMC(1764047-1812133)
Chromosome: 1
OMIM NUMBER: 132810
GRIM Information: ?Fetal hydantoin syndrome (1);
Diphenylhydantoin toxicity (1);/Eyperch
olanemia, familial, 607748 (3)
Transcript SEQ ID NO: 9
Protein SEQ ID NO: 23
SNP Information:
Context (SEQ ID NO: 37):
132

CA 02887830 2015-08-28
CAGGTGGAGATTCTCAACAGATACCCTCACT7CAAGACTAAGATTGAAGCGCTGGACATCCACTTCATCCACGT
GAAGCCCCCCCAGCTGCCCGCAGGCC
TACCCCGAAGCCCTTGCTGATGGTGCACGGCTGGCCCGGCTCTTTCTACGAGTTTTATAAGATCATCCCACTCC
TGACTGACCCCAAGAACCATGGCCTG
Celera SNP ID: hCV11638783
SNP Position Transcript: 917
SNP Source: HGMD; dbSNP; Nickerson; HapMap
Population(Allele,Count): no_pop(A,-IG,-) ; no_pop(A,2488IG,504) ;
no_pop(A,632IG,1904) ; nc_pop
(A,2176IG,544)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 23, 139,(H,CAT) (R,CGT)
Gene Number: 98
Celera Gene: hCG2039431 - 208000027149733
Celera Transcript: hCT2258514 - 208000027149726
Public Transcript Accession: NM 000120
Celera Protein: hCP1806345 - 208000027149688
Public Protein Accession: NP 000111
Gene Symbol: EPHX1
Protein Name: epoxide hydrolase 1, microsomal
(xenobictic)
Celera Genomic Axis: GA_x5YUV32VWMC(1764047..1812133)
Chromosome: 1
OMIM NUMBER: 132810
OMIM Information: ?Fetal hydantoin syndrome (1);
Diphenylhydantoin toxicity (1);/Hyperch
olanemia, familial, 607748 (3)
Transcript SEQ ID NO: 10
Protein SEQ ID NO: 24
SNP Information:
Context (SEQ ID NO: 38):
CAGGTGGAGATTCTCAACAGATACCCTCACTTCAAGACTAAGATTGAAGGGCTGGACATCCACTTCATCCACGT
GAAGCCCCCCCAGCTGCCCGCAGGCC
TACCCCGAAGCCCTTGCTGAIGGTGCACGGCTGGCCCGGCTOITTCTACGAGTTTTATAAGATCATCCCACTCC
TGACTGACCCCAAGAACCATGGCCTG
Celera SNP ID: hCV11638783
SNP Position Transcript: 438
SNP Source: HCMD; dbSNP; Nickerson; HapMap
Pcpulation(Allele,Count): no_pop(A,-IG,-) ; no_pop(A,24881G,504) ;
nc_pop(A,632IG,1904) ; no_pop
(A,2176IG,544)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 24, 139,(H,CA7) (R,CGT)
Gene Number: 98
Celera Gene: hCG2039431 - 208000027149733
Celera Transcript: hCT2258516 - 208000027149772
Public Transcript Accession: NM 000120
Celera Protein: nC17)1806342 - 208000027149690
Public Protein Accession: NP 000111
Gene Symbol: EPHX1
133

CA 02887830 2015-08-28
Protein Name: epoxide hydrolase 1, microsomal
(xenobiotic)
Celera Genomic Axis: GA_x5YUV32VWMC(1764047..1812133)
Chromosome: 1
OMIM NUMBER: 132810
OMIM Information: ?Fetal hydantoin syndrome (1);
Diphenylhydantoin toxicity (1);/Hyperch
olanemia, familial, 607748 (3)
Transcript SEQ ID NO: 11
Protein SEC ID NO: 25
SNP Information:
Context (SEQ ID NO: 39):
CAGGTGGAGATTCTCAACAGATACCCTCACTTCAAGACTAAGATTGAAGGGCTGGACATCCACTTCATCCACGT
GAAGCCCCCCCAGCIGCCCGCAGGCC
TACCCCGAAGCCCTTGCTGATGGTGCACGGCTGGCCCGGCTCTTTCTACGAGTTTTATAAGATCATCCCACTCC
TGACTGACCCCAAGAACCATOGCCTO
Celera SNP ID: hCV11638783
SNP Position Transcript: 691
SNP Source: HGMD; dbSNP; Nickerson; HapMap
Population(Allele,Count): no_pop(A,-IG,-) ; no_pop(A,2488IG,504) ;
no_pop(A,632IG,1904) ; no_pop
(A,2176)G,544)
SNP Type: Missenseutation
Protein Coding: SEQ ID NO: 25, 139,(H,CAT) (R,CST)
Gene Number: 98
Celera Gene: hCG2039431 - 208000027149733
Celera Transcript: hCT2344145 - 208000027149694
Public Transcript Accession: NM 000120
Celera Protein: hCP1909440 - 208000027149693
Public Protein Accession: NP 000111
Gene Symbol: EPTIX1
Protein Name: epoxide hydrolase 1, microsomal
(xenobiotic)
Celera Genomic Axis: GA x5YUV32VWMC(1764047..1812133)
Chromosome: 1
OMIM NUMBER: 132810
OMIM Information: ?Fetal hydantoin syndrome (1);
Diphenylhydantoin toxicity (1);/Hyperch
olanemia, familial, 607748 (3)
Transcript SEQ ID NO: 12
Protein SEQ TB NO: 26
SNP Information:
Context (SEQ ID NO: 40):
CAGGTGGAGATTCTCAACAGATACCCTCACTTCAAGACTAAGATTGAAGGGCTGGACATCCACTTCATCCACGT
GAAGCCCCCCCAGCTGCCCGCAGGCC
TACCCCGAAGCCCTTGCTGATGGTGCACGGCTGGCCCGGC7CTTTCTACGAGTTTTATAAGATCATCCCACTCC
TGACTGACCCCAAGAACCATGGCCTC
Celera SNP ID: nCV11638783
SNP Position Transcript: 659
SNP Source: HGMD; dbSNP; Nickerson; HapMap
Population(Allele,Count): no_pop(A,-1G,-) ; no_pop(A,2488)G,504) ;
no_pop(A,632IG,1904) ; no_pop
(A,2176'G,544)
134

CA 02887830 2015-08-28
SNP Type: Missense Mu-_aLion
Protein Coding: SEQ ID NO: 26, 139,(H,CAT) (R,CGT)
Gene Number: 99
Celera Gene: hCG2039450 - 208000027162714
Celcra Transcript: hCT2309553 - 208000027162700
Public Transcrip-L Accession: NM 001876
Celera Protein: hCT1901811 - 208000027162705
Public Protein Accession: NP 001867
Gene Symbol: CPT1A
Protein Name: carnitine paimitoyltransterase lA (liver)
Celera Genomic Axis: GA_x5YUV32VYAU(14202585..14299809)
Chromosome: 11
OMIM NUMBER: 600528
OMIM Information: CPT deficiency, hepaLic, Lype IA, 255120 (3)
Transcript SEQ ID NO: 13
Protein SEQ ID NO: 27
SNP Information:
Context (SEQ ID NO: 41):
CCTCCGAGGACGAGGGCCGCTCATGGTGAACAGCAACTATTATGCCATGGATCTGCTGTATATCCTTCCAACTC
ACATTCAGGCAGCAAGAGCCGGCAAC
CCATCCATGCCATCCTGCTTTACAGGCGCAAACTGGACCGGGAGGAAATCAAACCAATTCGTCTTTTGGGATCC
ACGATTCCACTCTGCTCCGCTCAGTG
Celera SNP ID: hCV15851335
SNP Position Transcript: 920
SNP Source: Applera
Population(Allele,Count): caucasian(G,39IA,1) african
american(G,361A,0) total(G,75(A,1)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 27, 275,(A,GCC) (T,ACC)
SNP Source: HGMD
Population(Allele,Count): no pop(G,-A,-)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 27, 275,(A,GCC) (T,ACC)
Gene Number: 99
Celera Gene: hCG2039450 - 208000027162114
Celera Transcript: hCT2309554 - 208000027162777
Public Transcript Accession: NM 001876
Celera Protein: hCP1901807 - 208000027162702
Public Protein Accession: NP 001867
Gene Symbol: CPT1A
Protein Name: carnitine palmitoyltransferasc iA (liver)
Celera Genomic Axis: GA_x5YUV32VYAU(14202565..14299809)
Chromosome: 11
OMIM NUMBER: 600528
OMIM Information: CPT deficiency, hepatic, type IA, 255120 (3)
Transcript SEQ ID NO: 14
Protein SEQ ID NO: 28
SNP Information:
Context (SEQ ID NO: 42):
CCTCCGAGGACGAGGGCCGCTCATGGTGAACAGCAACTATTATGCCATGGATCTGCTGTATATOCTTCCAACTC
ACATTCAGGCAGCAAGAGCCGGCAAC
135

CA 02887830 2015-08-28
CCATCCATGCCATCCTGCTTTACAGGCGCAAACTGGACCGGGAGGAAATCAAACCAATTCGTCTTTTGGGATCC
ACCATTCCACTCTGCTCCGCTCAGTG
Celera SNP ID: hCV15851335
SNP Position Transcript: 977
SNP Source: Applera
Population(Allele,Count): caucasian(G,39IA,1) african
american(G,36 A,O) total(G,75IA,1)
SNP Type: Missense MuLaLion
Protein Coding: SEQ ID NO: 28, 275,(A,GCC) (T,ACC)
SNP Source: HGMD
Population(Allele,Count): no_pop(0,-IA,-)
SNP Type: Missense Mutation
Protein Coding: SEQ ID NO: 28, 275,(A,GCC) (T,ACC)
136

CA 02887830 2015-08-28
Table 2
Gene Number: 25
Celera Gene: hCG1810767 - 64000126973272
Gene Symbol: C22orf20
Protein Name: chromosome 22 open reading frame 20
Celera Genomic Axis: GA_x5YUV32VY8D(403768..1440680)
Chromosome: 22
OMIM NUMBER:
OMIM Information:
Genomic SEQ ID NO: 43
SNP Information:
Context (SEQ ID NO: Si)
TGGAGAAAGCTTATGAAGGATCAGGAAAATTAAAAGGGTGCICTCGCCTATAACTTCTCTCTCCT=CTTTCA
CAGGCCTTGGTATGTTCCTGCTTCAT
CCCTTCTACAGTGGCCTTATCCCTCCTTCCTTCAGAGGCGTGGTAAGTCGCOTTTCTCTGCTAGCGCTGAGTCC
TGGGGGCCTCTGAAGTGTGCTCACAO
Celera SNP ID: hCV7241
SNP Position Genomic: -1403769
SNP Source: dbSNP
Population(Allele,Count): no pop(C,71IG,21)
SNP Type: MISSENSE MUTATION;HUMAN-MOUSE SYNTENIC
REGION
Gene Number: 50
Celera Gene: hCG27192 - 104000117137572
Gene Symbol: LRP5
Protein Name: low density lipoprotein receptor-related
protein 5
Celera Genomic Axis: GA_x5YUV32VYAU(13757051..13906097)
Chromosome: 11
OMIM NUMBER: 603506
OMIM Information: Osteoporosis-pseudoglioma syndrome, 259770
(3); [Bone mineral density/
variability 1], 601884 (3); Osteopetrosis, autosomal dominant, type I,
607634 (3); Hyperostosis, end
osteal, 144750 (3); van Buchem disease, type 2, 607636/(3);
{Osteoporosis}, 166710 (3); Exudative vi
treoretinooathy, dom4nant, 133783 (3); Exudative vi:reoretinopathy,
recessive, 601813 (3)
Genomic SEQ ID NO: 44
SNP Information:
Context (SEQ ID NO: 52):
GAGGCTTGCAAAGSTTAAGGGGCTGTTCGAGGCCCAGGCTGGCAGGAGATGGGCCTGGGCCAGAGTCTGGGACT
TCCCATGCCTGGGCTGTCTTTGGTCC
GTTGCTCACCATCCCTCCCTGGGGCCATGACCTTAGAGAGCCAAATGGAGGTGCAGGTAACCCACGGCAAGGAG
GGGTTGCCATGACICAGAGTCCCCGT
Celera SNP ID: hCV8761599
SNP Position Genomic: -13757052
SNP Source: dbSNP; Celera; Nickerson; ABI_Val
Population(Allele,Count): no_pop(T,516IC,92) ; no_pop(T,144IC,36) ;
no pop(T,402IC,99) ; no_pop(
T,104IC,16)
SNP Type: TNTRON
137

CA 02887830 2015-08-28
Gene Number: 51
Celera Gene: hCG27399 - 146000220312482
Gene Symbol: TLR4
Protein Name: toll-like receptor 4
Celera Genomic Axis: GA_x5YUV32W1V9(4596116-4621277)
Chromosome: 9
OMIM NUMBER: 603030
OMIM Information: Endotoxin hyporesponsiveness (3)
Genomic SEQ ID NO: 45
SNP Information:
Context (SEQ ID NO: 53):
GCITTTTCAGAAGITGATCTACCAAGCCTTGAGTTICTAGATCTCAGTAGAAATGGCTTGAGTTTCAAAGGITG
CTGTTCTCAAAGTGATTTTGGGACAA
CAGCCTAAAGTATITAGATCTGAGCTTCAATGGTGITATTACCATGAGTTCAAACTTCTTGGGCTTAGAACAAC
TAGAACATCTCGATTTCCAGCATTCC
Celera SNP ID: hCV11722237
SNP Position Genomic: -4596117
SNP Source: Applera
Population(Allele,Count): caucasian(C,37I',3) african
american(C,361T,2) tota1(C,73IT,5)
SNP Type: MISSENSE MUTATION;HUMAN-MOUSE SYNTENIC
REGION
SNP Source: HGMD; dbSNP; Nickerson
Population(Allele,Count): no_pop(C,-IT,-) ; no_pop(C,-IT,-) ;
no pop(C,43611,26)
SNP Type: MISSENSE MJTATION;HUMAN-MOUSE SYNTENIC
REGION
Gene Number: 53
Celera Gene: hCG27643 - 62000133384069
Gene Symbol: DDX5
Protein Name: DEAD (Asp-Glu-Ala-Asp) box polypeptide 5
Celera Genomic Axis: GA_x5YUV32W3KM(353933-374581)
Chromosome: 17
OMIM NUMBER: 180630
OMIM Information:
Genomic SEQ ID NO: 46
SNP Information:
Contcxt (SEQ ID NO: 54):
CATTCAAGGTTTTACTCACCCTCCAATACCATTTAAATOGATTTGTAGACAACGATGTGACTCGTAACTACCAA
CAT TTCCTATCAGTCATCCTTACCTG
ACCTCTGICTTCGACCAACTGAAGCAACTIGGGATTAATTGCTTGAITAGCTICACGAACCACAGAGATAAGGT
CGCTCACTTGCTTTATGTTATTAGGT
Celera SNP ID: hCV7450990
SNP Position Genomic: 374583
SNP Source: dbSNP; Nickerson; HapMap
Population(Alleie,Count): no_pop(A,172)C,12) ; no_poo(A,-IC,-) ;
no pop(A,-IC,-)
SNP Type: MISSENSE MUTATION:TRANSCRIPTION FACTOR
BINDING SITE;HUMAN-MOUSE SYNTEN
IC REGION
138

CA 02887830 2015-08-28
Gene Number: 67
Celera Gene: hCG37774 - 104000117648431
Gene Symbol: MT?
Protein Name: microsomal triglyceride transfer protein
(large polypeptide, 88kDa)
Celera Genomic Axis: GA_x5YUV32W7K2(47612373..47673550)
Chromosome: 4
OMIM NUMBER: 157147
OMIM Information: Abetalipoproteinemia, 200100 (3)
Genomic SEQ ID NO: 47
SNP Information:
Context (SEQ ID NO: 55):
CAGAGAGGAGAGAAGAGCATCTTCAAAGGAAAAAGCCCATCTAAAATAATGGGAAAGGAAAACTTGGAAGCTCT
GCAAAGACCTACGCTCCTTCATCTAA
CCATGGAAAGGTAAAGGGGCGTTTAGATTCCACAACTTITTCTCCAACTICATATTTTTCTTCCCTTCAGTAGA
TATTATTTTGAGGTAATCACATTGTA
Celera SNP ID: hCV22274307
SNP Position Genomic: -47612374
SNP Source: Applera
Population(Allele,Count): caucasian(C,101T,26) african
american(C,161T,22) total(C,26 T,48)
SNP Type: MISSENSE MUTATION;HUMAN-MOUSE SYNTENIC
REGION
SNP Source: dbSNP; Nickerson
Population(Allele,Count): no pop(T,25021C,490) ; no pop(T,-1C,-)
SNP Type: =SENSE MUTATION;HUMAN-MOUSE SYNTENIC
REGION
Gene Number: 98
Celera Gene: hCG2039431 - 208000027149733
Gene Symbol: EPHX1
Protein Name: epoxide hydrolase 1, microsomal
(xenobiotic)
Celera Genomic Axis: GA_x5YUV32VWMC(1764047..1812133)
Chromosome: 1
OMIM NUMBER: 132810
OMIM Information: ?Fetal hydantoin syndrome (1);
Diphenylhydantoin toxicity (1);/Hyperch
olanemia, familial, 607748 (3)
Genomic SEQ ID NO: 48
SNP Information:
Context (SEQ ID NO: 56):
TGCAGGGICTTCTCTCTCCCTCCACCCTGACTGIGCTCTGICCCCCCAGGGCTGGACATCCACTICATCCACGT
GAAGCCCCCCCAGCTGCCCGCAGGCC
TACCCCGAAGCCCTTGCTGATOGTGCACGGCTGGCCCGGCTCTTTCTACGAGTTITATAAGATCATCCCACTCC
TGACTGACCCCAAGAACCATGGCCTG
Celera SNP ID: hCV11638783
SNP Position Genomic: -1764048
SNP Source: HGMD; dbSNP; Nickerson; HapMap
Population(Allele,Count): no_pop(A,-1G,-) ; no_pop(A,24881G,504) ;
no pop(A,6321G,1904) ; nc_pop
(A,21761G,544)
SNP Type: MISSENSE MUTATION
139

CA 02887830 2015-08-28
=
Gene Number: 99
Celera Gene: hCG2039450 - 208000027162714
Gene Symbol: CPT1A
Protein Name: carnitine palmitoyltransferase 1A (liver)
Celera Genomic Axis: GA x5YUV32VYAU(14202585..14299809)
Chromosome: 11
OMIM NUMBER: 600528
OMIM Information: CPT deficiency, hepatic, type IA, 255120 (3)
Genomic SEQ ID NO: 49
SNP Information:
Context (SEQ ID NO: 57):
AAGCTGTTTGAAAATAATTTTTTTAAAGCAATTTGTTTGCCTACTGGTTTGATTTCCTCCCGGTCCAGITTGCG
CCTGTAAAGCAGGATGGCATGGATGG
GTTGCCGGCTCTTGCTGCCTGAATGTGAGTTGGAAGGATATACAGCAGATCCTGAAAAGCGACAAAGGTGGAGA
GAATTTGCATAGGGAARGATAAGGAA
Celera SNP ID: hCV15851335
SNP Position Genomic: -14202586
SNP Source: Applera
Population(Allele,Count): caucasian(C,39IT,1) african
american(C,361T,0) total(C,75IT,1)
SNP Type: MISSENSE MJTATION;TRANSCRIPTION FACTOR
BINDING SITE;HUMAN-MOUSE SYNTEN
IC REGION
SNP Source: HGMD
Population(Allele,Count): no pop(C,-IT,-)
SNP Type: MISSENSE MUTATION;TRANSCRIPTION FACTOR
BINDING SITE;HUMAN-MOUSE SYNTEN
IC REGION
Gene Number: 112
Celera Gene: hCG1642252 -
Gene Symbol:
Protein Name:
Celera Genomic Axis: GA_x5YUV32W3V0(13011322..13024095)
Chromosome: 16
OMIM NUMBER:
OMIM Information:
Genomic SEQ ID NO: 50
SNP Information:
Context (SEQ ID NO: 58):
GTTGGATTCCCTTCATCCCCATAGICACCTTCCTCACTTTGCACAGGTTGTACTTTGCCTCTTCGACTGTAACG
CGATCAACAGCAACACAGCCCTTGGG
CACAGACCAGGCAGAAATGCTCACCTGTCTTGATGCCGATGACATCCGTGAATCCAGCAGGGTATSTGATAGCC
ACTCGAACCTIGCCATCAATTTTAAT
Celera SNP ID: hCV11935588
SNP Position Genomic: 13024097
SNP Source: dbSNP; Celera; Nickerson; HapMap
Population(Allele,Count): no_pop(T,-IA,-) ; no_pop(T,7IA,1) ;
no pop(T,-IA,-) ; no_pop(T,-IA,-)
SNP Type: UTR5
140

TABLE 3
Marker Alleles Sequence A (allele-specific primer) Sequence B (allele-
specific primer) Sequence C (common primer)
hCV 11638783 A/G AAGGGCTTCGGGGTAC AAGGGCTTCGGGGTAT
ACCGTGCAGGGTCTTCT
(SEQ ID NO: 59) (SEQ ID NO: 60) (SEQ ID
NO: 61)
hCV11722237 C/T CAAAGTGATTTTGGGACAAC CAAAGTGATTTTGGGACAAT GAA'l
ACTGAAAACTCACTCATTTGT
(SEQ ID NO: 62) (SEQ ID NO: 63) (SEQ ID
NO: 64)
hCV11935588 A/T ATTTCTGCCIGGICTGTGT TTTCTGCCTGGTCTGTGA
CTCACTTTGCACAGGTTGTACT
(SEQ ID NO: 65) (SEQ ID NO: 66) (SEQ ID
NO: 67)
hCV15851335 C/T GATGGCATGGATGGC GGATCiGCATGGATGGT
GCAGGATGTGCTGTGATTAT
(SEQ ID NO: 68 (SEQ ID NO: 69) (SEQ ID
NO: 70) o
hCV22274307 C/T GACCTACGCTCCTTCATCTAAC GACCTACGCTCCTTCATCTAAT
CAATGTGATTACCTCAAAATAATATCTAC
(SEQ ID NO: 71) (SEQ ID NO: 72) (SEQ ID
NO: 73) o
iv
hCV7241 C/G TGGTATGTTCCTGCTTCATC TGGTATGTTCCTGCTTCATG CAGGCAGGAGATGTGTGAG
co
co
(SEQ ID NO: 74) (SEQ ID NO: 75) , (SEQ ID
NO: 76) --.1
CD
hCV7450990 A/C TGGTCGA AGACAGAGGTG TTGGTCGAAGACAGAGGTT
GTTTTCACATTCAAGGTTTTACTC (...)
(SEQ ID NO: 77) (SEQ ID NO: 78) (SEQ ID
NO: 79) o
hCV8761599 C/T CIC3CiAl GGI CiAGCAACG AGGGATGGTGAGCAACA
GAACTTGAGGCTTGCA A AGGTTAACi iv
o
(SEQ ID NO: 80) (SEQ ID NO: 81) (SEQ ID
NO: 82) 1-=
(xi
O
co
-'-r-"-
1
--,
iv
co

CA 02887830 2015-08-28
TABLE 4
Markers,replicatad between "Stanford Samples" and "UCSF Samples": Stage
analysts
Marker Gene symbol "Stanford Samples" (stage) "UCSFSamples"
(stage)
OR LCL UCL p-vat OR LCL UCL p-val
11638733 ord EPHX1 0.408 0.195 0.855 0.0175 0.583 0.419 0.811
0 0014
11630703 dom EPHX1 0.291 0.118 0.715 0.0071 0.536 0.402 0.855
0.0055
142

CA 02887830 2015-08-28
,
111111111111111111
lg IMMIIR455,M1H5
rtri.MHISHEIMPJ
=
In11111111111'
M453,911UHM5HIS5
g
g 515111mgrIgIagg!
pigift15141111=
1111111111111111111
5U5555M15M3H5H
g g551n0OHMIHN
g 6.151W55g0g6lillIg
11111111111111111110
g g5555SHM1555M5E55
0
g 5.11U2IMNI151110
511111v15115H111111:
.00400,0040...K00.,
a1-1-1-1-0-,F.FF0-4-1-4-00Wd44khi.
111111111111111111111
143

DEMANDES OU BREVETS VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVETS
COMPREND PLUS D'UN TOME.
CECI EST LE TOME 1 DE 2
NOTE: Pour les tomes additionels, veillez contacter le Bureau Canadien des
Brevets.
JUMBO APPLICATIONS / PATENTS
THIS SECTION OF THE APPLICATION / PATENT CONTAINS MORE
THAN ONE VOLUME.
THIS IS VOLUME 1 OF 2
NOTE: For additional volumes please contact the Canadian Patent Office.

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date 2017-06-20
(22) Filed 2005-05-09
(41) Open to Public Inspection 2005-11-24
Examination Requested 2015-05-15
(45) Issued 2017-06-20

Abandonment History

Abandonment Date Reason Reinstatement Date
2015-05-11 FAILURE TO PAY APPLICATION MAINTENANCE FEE 2015-05-12

Maintenance Fee

Last Payment of $473.65 was received on 2023-05-05


 Upcoming maintenance fee amounts

Description Date Amount
Next Payment if small entity fee 2024-05-09 $253.00
Next Payment if standard fee 2024-05-09 $624.00

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Patent fees are adjusted on the 1st of January every year. The amounts above are the current amounts if received by December 31 of the current year.
Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Application Fee $400.00 2014-11-27
Maintenance Fee - Application - New Act 2 2007-05-09 $100.00 2014-11-27
Maintenance Fee - Application - New Act 3 2008-05-09 $100.00 2014-11-27
Maintenance Fee - Application - New Act 4 2009-05-11 $100.00 2014-11-27
Maintenance Fee - Application - New Act 5 2010-05-10 $200.00 2014-11-27
Maintenance Fee - Application - New Act 6 2011-05-09 $200.00 2014-11-27
Maintenance Fee - Application - New Act 7 2012-05-09 $200.00 2014-11-27
Maintenance Fee - Application - New Act 8 2013-05-09 $200.00 2014-11-27
Maintenance Fee - Application - New Act 9 2014-05-09 $200.00 2014-11-27
Reinstatement: Failure to Pay Application Maintenance Fees $200.00 2015-05-12
Maintenance Fee - Application - New Act 10 2015-05-11 $250.00 2015-05-12
Request for Examination $800.00 2015-05-15
Maintenance Fee - Application - New Act 11 2016-05-09 $250.00 2016-04-19
Maintenance Fee - Application - New Act 12 2017-05-09 $250.00 2017-04-19
Final Fee $1,632.00 2017-05-05
Maintenance Fee - Patent - New Act 13 2018-05-09 $250.00 2018-05-07
Maintenance Fee - Patent - New Act 14 2019-05-09 $250.00 2019-05-03
Maintenance Fee - Patent - New Act 15 2020-05-11 $450.00 2020-05-01
Maintenance Fee - Patent - New Act 16 2021-05-10 $459.00 2021-04-30
Maintenance Fee - Patent - New Act 17 2022-05-09 $458.08 2022-04-29
Maintenance Fee - Patent - New Act 18 2023-05-09 $473.65 2023-05-05
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
CELERA CORPORATION
Past Owners on Record
None
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Description 2014-11-27 600 27,414
Description 2014-11-27 600 24,835
Description 2014-11-27 600 25,438
Description 2014-11-27 600 23,518
Description 2014-11-27 600 27,274
Description 2014-11-27 600 23,569
Description 2014-11-27 600 20,986
Description 2014-11-27 600 20,794
Description 2014-11-27 600 21,205
Description 2014-11-27 600 21,176
Description 2014-11-27 600 21,172
Description 2014-11-27 600 21,250
Description 2014-11-27 600 21,203
Description 2014-11-27 600 21,577
Description 2014-11-27 600 21,910
Description 2014-11-27 600 21,443
Description 2014-11-27 281 11,026
Description 2014-11-27 600 35,534
Description 2014-11-27 600 21,046
Description 2014-11-27 600 37,798
Description 2014-11-27 500 46,380
Description 2014-11-27 500 46,188
Description 2014-11-27 500 19,698
Description 2014-11-27 600 20,853
Description 2014-11-27 600 20,702
Description 2014-11-27 600 21,022
Description 2014-11-27 600 20,958
Description 2014-11-27 600 21,085
Description 2014-11-27 600 21,147
Description 2014-11-27 600 20,937
Description 2014-11-27 400 13,950
Description 2014-11-27 435 14,598
Abstract 2014-11-27 1 13
Claims 2014-11-27 5 181
Drawings 2014-11-27 1 12
Representative Drawing 2015-05-15 1 7
Cover Page 2015-05-15 1 40
Claims 2015-08-28 3 72
Claims 2016-10-14 3 76
Final Fee 2017-05-05 2 67
Cover Page 2017-05-23 1 49
Representative Drawing 2017-06-20 1 15
Correspondence 2015-05-28 2 59
Assignment 2014-11-27 15 300
Correspondence 2015-05-01 1 156
Fees 2015-05-12 3 106
Prosecution-Amendment 2015-05-15 2 81
Amendment 2015-12-08 3 142
Correspondence 2016-01-27 2 22
Office Letter 2016-08-05 1 23
Prosecution-Amendment 2015-08-28 294 19,698
Examiner Requisition 2016-09-21 3 203
Amendment 2016-10-14 13 618
Description 2015-08-28 145 6,876
Description 2015-08-28 177 13,200
Description 2015-12-08 145 6,882
Description 2015-12-08 177 13,200
Description 2016-10-14 145 6,892
Description 2016-10-14 177 13,200

Biological Sequence Listings

Choose a BSL submission then click the "Download BSL" button to download the file.

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Please note that files with extensions .pep and .seq that were created by CIPO as working files might be incomplete and are not to be considered official communication.

No BSL files available.