Patent 2651934 Summary

(12) Patent Application:	(11) CA 2651934
(54) English Title:	MASS SPECTROMETRY BIOMARKER ASSAY
(54) French Title:	DOSAGE DE MARQUEUR BIOLOGIQUE PAR SPECTROMETRIE DE MASSE
Status:	Deemed Abandoned and Beyond the Period of Reinstatement - Pending Response to Notice of Disregarded Communication

Bibliographic Data

(51) International Patent Classification (IPC):	G01N 33/50 (2006.01) G01N 33/574 (2006.01) G01N 33/68 (2006.01)
(72) Inventors :	NISHIMURA, TOSHIHIDE (Japan) OGIWARA, ATSUSHI (Japan) KAWAMURA, TAKESHI (Japan) KAWAKAMI, TAKAO (Japan) KYONO, YUTAKA (Japan) KANAZAWA, MITSUHIRO (Japan) NYBERG, FREDRIK (Sweden) MARKO-VARGA, GYOERGY (Sweden) ANYOJI, HISASE (Japan)
(73) Owners :	ASTRAZENECA UK LIMITED MEDICAL PROTEOSCOPE CO LTD
(71) Applicants :	ASTRAZENECA UK LIMITED (United Kingdom) MEDICAL PROTEOSCOPE CO LTD (Japan)
(74) Agent:	SMART & BIGGAR LP
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date:	2007-06-13
(87) Open to Public Inspection:	2007-12-21
Availability of licence:	N/A
Dedicated to the Public:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	Yes
(86) PCT Filing Number:	PCT/GB2007/002187
(87) International Publication Number:	WO 2007144606
(85) National Entry:	2008-11-12

(30) Application Priority Data:

Application No.	Country/Territory	Date
0611669.3	(United Kingdom)	2006-06-13

Abstracts

English Abstract

The invention provides a method for determining the presence of one or more polypeptide biomarkers in a sample, comprising the steps of: (a) subjecting the sample to a mass spectrometric (MS) analysis and recording retention time index and corresponding mass for each signal detected; (b) con-elating the mass corresponding to each signal to a reference database of biomarker masses to form a con-elation between each signal and a reference biomarker, and discarding those signals whose masses do not correlate to a reference boimarker mass; (c) storing those signals whose masses correlate with a reference biomarker; (d) confirming the con-elation between each stored signal and a reference biomarker by matching the MS spectrum of each signal with the MS spectrum of the reference biomarker in the database using a similarity measure, to define a set of positively correlating signals; (d) measuring the intensity of each positivley coreelating signal and scoring its absolute signal intensity or its relative signal intensity using a discrimination function; (e) applying a threshold to the score values obtained from the discrimination function to detemiine the presence or absence of the biomarker.

French Abstract

L'invention a pour objet un procédé permettant de déterminer la présence d'un ou de plusieurs marqueurs biologiques polypeptidiques dans un échantillon. Le procédé comprend les étapes consistant à : (a) soumettre l'échantillon à une analyse spectrométrique de masse (SM) et enregistrer le temps de rétention et la masse correspondante de chaque signal détecté ; (b) comparer la masse correspondant à chaque signal avec une base de données de référence de masses de marqueurs biologiques afin d'obtenir une corrélation entre chaque signal et un marqueur biologique de référence, puis se débarrasser des signaux dont les masses ne correspondent à celle d'aucun marqueur biologique de référence ; (c) stocker les signaux dont les masses corrèlent avec un marqueur biologique de référence ; (d) confirmer la corrélation entre chaque signal stocké et un marqueur biologique de référence en opposant le spectre SM de chaque signal au spectre SM du marqueur biologique de référence dans la base de données au moyen d'une mesure de similitude afin de définir un ensemble de signaux corrélant positivement ; (e) mesurer l'intensité de chaque signal corrélant positivement et enregistrer son intensité absolue ou son intensité relative au moyen d'une fonction de discrimination ; (f) appliquer un seuil aux valeurs obtenues au moyen de la fonction de discrimination afin de déterminer la présence ou l'absence du marqueur biologique recherché.

Claims

Note: Claims are shown in the official language in which they were submitted.

39
Claims
1. A method for determining the presence of one or more polypeptide biomarkers
in
a sample, comprising the steps of:
(a) subjecting the sample to a mass spectrometric (MS) analysis and recording
retention time index and corresponding mass for each signal detected;
(b) correlating the mass corresponding to each signal to a reference database
of
biomarker masses to form a correlation between each signal and a reference
biomarker,
and discarding those signals whose masses do not correlate to a reference
boimarker
mass;
(c) storing those signals whose masses correlate with a reference biomarker;
(d) confirming the correlation between each stored signal and a reference
biomarker by matching the MS spectrum of each signal with the MS spectrum of
the
reference biomarker in the database using a similarity measure, to define a
set of
positively correlating signals;
(d) measuring the intensity of each positivley correlating signal and scoring
its
absolute signal intensity or its relative signal intensity using a
discrimination function;
(e) applying a threshold to the score values obtained from the discrimination
function to determine the presence or absence of the biomarker.
2. A method according to claim 1, wherein the test sample is subjected to MS
analysis without prior separation procedures.
3. A method according to claim 2, wherein the test sample is analysed by
direct
infusion using static nano-electrospray principles, flow injection analysis or
flow
injection with sample enrichment.
4. A method according to claim 1, wherein the test sample is processed prior
to MS
analysis.
5. A method according to claim 4, wherein the sample processing comprises
sample
separation by single- or multi-phase high-pressure liquid chromatography
(HPLC).

40
6. A method according to any preceding claim, wherein the MS is electrospray
ionisation (ESI) MS, matrix-assisted laser desorption ionisation - time of
flight (MALDI-
TOF) MS or surface enhanced laser desorption ionisation - time of flight
(SELDI-TOF)
MS.
7. A method according to any preceding claim, wherein reference mass and MS
spectral data for a plurality of biomarkers are stored in electronic or paper
form.
8. A method according to any preceding claim, wherein reference MS spectra for
a
defined biomarker are averaged spectra from actual and measured data obtained
by a
clustering calculation.
9. A method according to any preceding claim, wherein one or more internal
standards of reference peptides are added to the sample prior to analysis by
MS.
10. A method according to claim 9, wherein the internal standards are labelled
with a
molecular tag.
11. A method according to claim 9, wherein the internal standards are labelled
and
included in the master data set.
12. A method according to claim 11, wherein the absolute signal intensity is
scored by
measuring the biomarker signal intensity and comparing it to the signal
intensity of one or
more known internal standards.
13. A method according to any one of claims 1 to 9, wherein the sample is
processed
without the addition of internal standards.
14. A method according to claim 13, wherein the relative signal intensity is
scored by
measuring the ratio between the individual biomarker signal intensities in a
patient and
the reference signal intensity for a patient group.
15. A method according to claim 13, which is fully automated.

41
16. A method according to claim 1, wherein the discrimination function to
calculate the
score from MS signal intensity optionally includes the use of any clinical
variables such
as clinical examination results and/or phenotying of clinical observation
and/or medical
records.
17. A diagnostic method for determining the presence of a disease which
comprises
comparing the protein sequence biomarkers of a test sample with reference
biomarkers,
wherein the reference biomarkers comprise peptides identified in Table 1.

Description

Note: Descriptions are shown in the official language in which they were submitted.

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
1
Mass Spectrometry Biomarker assay
The present invention relates to an assay for bioinarlcers. In particular, the
invention
describes a multiplex assay capable of automatically screening for the
presence of
biomarlcers in sainples by mass spectrometry.
Various biological marlcers, known as biomarkers, have been identified and
studied
through the application of biochemistry and molecular biology to medical and
toxicological states. Bioinarlcers can be discovered in both tissues and
biofluids, where
blood is the most common biofluid used in biomarker studies.
Biomarlcers may have a predictive power, and as such may be used to predict or
detect the
presence, level, type or stage of particular conditions or diseases (including
the presence
or level of particular microorganisms or toxins), the susceptibility
(including genetic
susceptibility) to particular conditions or diseases, or the response to
particular treatments
(including drug treatments). It is thought that biomarkers will play an
increasingly
important role in the future of drug discovery and development, by improving
the
efficiency of research and development programs. Biomarkers can be used as
diagnostic
agents, monitors of disease progression, monitors of treatment and predictors
of clinical
outcome. For example, various biomarker research projects are attempting to
identify
lnarkers of specific cancers and of specific cardiovascular and immunological
diseases.
Intact proteins can be assayed in a number of ways utilizing both gel-based as
well as
liquid phase separation technologies. Two-dimensional gel electrophoresis is
used with
solubilised protein mixtures where the proteins are separated based upon
charge and size.
The proteins are resolved such that both isomeric forms, as well as post-
translational
modifications, are resolved. Quantitation of the proteins is made by staining
techniques,
where both pre- and post staining tecliniques can be applied. Metabolic
labelling also
allows the linear range to be extended tip to 5 orders of magnitude, offering
sensitivities
within the femtomolar range. Protein identification is performed from excised
gel spots.
The proteins are digested after chemical degradation and modification. The
resulting
peptide mixtures are extracted from the isolated gel sample and subsequently
identified
by mass spectrometry.

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
2
Multidimensional HPLC (Hig11 Performance Liquid Chromatography) can be used as
a
good alteinative for separating proteins or peptides. The protein or peptide
mixture is
passed through a succession of chromatographic stationary phases or dimensions
which
gives a higller resolving power. HPLC is flexible for many experimental
approaches and
various stationary and mobile phases can be selected for their suitability in
resolving
specific protein or peptide classes of interest and for compatibility with
each other and
with downstream mass spectrometric metllods of detection and identification.
High
Performance Liquid Chromatography is cuiTently the best methodology for solute
separations which also allows for automated operation with a high degree of
reproducibility. On-line configurations of these types of multi-mechanism
separation
platforms are commonly applied within proteomics studies.
Mass spectrometry (MS) is also an essential element of the proteoinics field.
In fact MS
is the major tool used to study and characterize purified proteins in this
field. The
interface linl{ in proteomics and MS, displaying hundreds or thousands of
proteins, is
made by gel technology where high resolution can be reached on a single gel.
Researchers are successfully harnessing the power of MS to supersede the two-
dimensional gels that originally gave proteomics its impetus.
The application and development of mass spectrometry (MS) to identify proteins
or
peptides separated via liquid phase separation techniques and/or gel-based
separation
techniques have led to significant technological advance in protein and
peptide expression
analysis. There are two inain methods for the mass spectrometric
characterization of
proteins and peptides: matrix-assisted laser desorption ionization (MALDI) and
electrospray ionization (ESI). Using various approaches, MALDI and ESI ion
sources
can be combined with time-of-flight (TOF) or other types of mass spectrometric
analyzers
to determine the mass or the sequence of peptides.
In MALDI, peptides are co-crystallized with the matrix, and pulsed with
lasers. This
treatment vaporizes and ionizes the peptides. The molecular weights (masses)
of the
charged peptides are then determined in a TOF analyzer. In this device, an
electric field
accelerates the charged molecules toward a detector, and the differences in
the length of

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
3
time it talces ionized peptides to reach the detector (their time-of-flight)
reveal the
inolecular weights of the peptides; smaller peptides reach the detector more
quickly. This
metliod generates mass profiles of the peptide mixtures - that is, profiles of
the molecular
weigllts and amounts of peptides in the mixture. These profiles can then be
used to
identify lalown proteins from protein sequence databases.
By malcing an ESI-MS interface to liquid chromatography (LC/MS/MS), the
eluting
peptides from the LC-column are introduced into the ion source of the mass
spectrometer.
A voltage is applied to a very fine needle. The needle then sprays droplets
into a mass
spectrometric analyzer where the droplets evaporate and peptide ions are
released
corresponding to a variety of charge states that are fragmented and from where
the
sequence can be determined. In LC/MS/MS, researchers use microcapilliary LC
devices
to initially separate peptides.
Mass spectrometry (MS) is a valuable analytical technique because it measures
an
intrinsic property of a bio-molecule, its mass, with very high sensitivity MS
can
therefore be used to measure a wide range of molecule types (proteins,
peptide, or any
other bio-molecules) and a wide range of sample types/biological materials.
Correct
sample preparation is known to be crucial for the MS signal generation and
spectra
resolution and sensitivity. Sample preparation is therefore a crucial area for
overall
feasibility and sensitivity of the analysis.
Proteins are bio-macro molecules that are difficult to separate by liquid
phase
chromatographic separation techn.iques, due to the unfavorable mass transfer
within the
particles of the chromatographic coluinn material, the stationary phase.
However, proteins
can be rendered into smaller unit (peptide or polypeptide) form by breaking
the peptide
bond joining two adjacent amino acids. This can be accomplished by enzymatic
cleavage
by proteases, proteins that are capable of interacting and dissolving peptide
bonds on
other proteins. Trypsin is the most commonly used protease, used in protein
expression
analysis studies. After the enzyinatic degradation, a resulting complex
mixture of peptides
-will be separated and fractionated by capillary chromatography. All peptides
that are the
sum of the digested proteins in the sainple will be unresolved at this stage.
The peptides
that have been generated from the coiresponding protein will not be separated
as one unit

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
4
in the chromatographic fractionation step, but rather will be separated
togetlier with the
resulting peptides fiom all other proteins in the sample. The high resolved
and separated
eluting peptides fiom the capillary, will be fractioned most commonly based
upon charge
and hydrophobicity. The separated peptides are introduced on-line from the
chromatographic part of the platform into the mass spectrometer, thereby
circumventing
possible contaminations. The peptides are then mass deterinined (m/z), in
order to capture
all the peptides present in that given time window. Next, a number of peptide
masses are
selected for sequencing (MS/MS), based upon tlleir abundance in the given time
window.
This is perfonned by a new ion sampling interface by an electrospray
ionization ion trap
mass spectrometer system. The interface uses linear quadrupoles as ion guides
and ion
traps to enhance the performance of the trap. Trapping ions in the linear
quadrupoles is
demonstrated to improve the duty cycle of the system. Dipolar excitation of
ions trapped
in a linear quadiLipole is used to eject unwanted ions.
After the first appearance of successful instrumentation in 1990, ion trap
mass
spectrometry with electro-spray ionization (ESI) has become a widely used tool
for trace
analysis. Electrospray is a gentle source that can ionize important analytes
such as
peptides, and proteins. Highly charged ions produced in ESI can extend the
range of mass
analyzers. Trap mass spectrometers have favorable capabilities such as
flexible tandem
MS capability (MS fz..... ). In this ionization process, the precursor ion is
activated by
acceleration into a mass-selective linear ion trap under conditions whereby
some of the
fraginent ions formed are unstable within the trap. After a time delay the
stability
parameters of the ion trap are changed to allow capture of fragments that that
were
previously unstable. The result is a product ion spectrum that originates from
precursor
ions with a modified internal energy distribution. It is possible to follow
the evolution of
the precursor internal energy distribution for many milliseconds after
admittance of the
precursor ions into the linear ion trap. Time-delayed fragmentation product
ion spectra
typically display reduced sequential fragmentation products leading to spectra
that are
more easily interpreted. Several iinportant experimental parameters important
to time-
delayed fragmentation have been identified and are discussed. The teclmique
has
applications for both small precursor ions and multiply charged peptides.

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
Tandem mass spectrometry (MS/MS) is at the heart of most of modenl mass
spectrometric investigations of complex mixtures. The fragmentation involves
activation
of a precursor ion via collisions with a target gas and may produce charged
and neutral
fraginents. The nature of the fi=agment ions, as well as their intensities, is
often indicative
5 of the structure of the precursor ion and thus can yield useful information
for the
identification of unknown analytes, as well as providing a useful screening
technique for
different classes of analytes. Activation via multiple collisions both
prolongs the
activation time and enables higlier energies to be deposited into precursor
ions. Higher
collision gas pressures also imply higher collision relaxation rates.
Whilst the combination of protein separation by 2D gel electrophoresis and
analysis by
mass spectrometry have been established to be useful for biomarker analysis,
multiplex
systems capable of analysing several biomarkers are currently at the
experimental stage.
Many diseases have been shown to be associated with a complex pattern of
biomarkers,
which may be diagnostic for the disease or indicative of the resposnse to di-
ug treatment
by a patient. These patterns often involve several biomarkers, requiring
multiple
simultaneous analyses. There is a need, therefore, for a system capable of
assaying
multiple biomarlcers simultaneously. Ideally, the system could be automated.
Summary of the Invention
The invention provides an assay for biomarlcers in a biological sample which
is
automated and accurate. The assay relies on mass spectrometry to identify
biomarkers,
and is referred to herein as the mass spectrometry biomarker assay (MSBA).
The invention provides a method for detennining the presence of one or more
polypeptide
biomarkers in, preferably, a human test sample, which may including non-human
test
samples, which is typically confined in a voluine of a biofluid containing
naturally
occurring proteins and peptides contained within an amount of tissue, blood,
or other
clinically obtained speciments.
The method preferably comprises the following steps:

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
6
(a) subjecting the sample to a mass spectrometric (MS) analysis and recording
retention time index and corresponding mass for eac11 signal detected;
(b) correlating the mass corresponding to each signal to a reference database
holding a master set of biomarlcer masses from a known disease or biological
alteration,
to forin a correlation between each test sample signal and a biomarker from
the master set
of biomarkers within the reference database, and discarding those test signals
whose
masses do not correlate to a reference biomarker mass in the master data set;
(c) storing those test sainple signals whose masses correlate with a reference
biomarker in the master data;
(d) confirming the correlation between each stored signal and a reference
biomarker by matching the MS spectrum of each signal with the MS spectrum of
the
reference biomarlcer in the database using a similarity measure, to define a
set of
positively correlating signals;
(d) measuring the intensity of each stored test signal positively correlating
signal
and scoring its absolute signal intensity or its relative signal intensity
using a
discrimination function;
(e) applying a threshold to the score values obtained from the discrimination
fiinction to determine the presence or absence of the biomarker.
Preferably, the method of the invention uses the master data set in the test
sample
screening phase.
Advantageously, the method filters and screens mass and sequence identities of
data sets
that are based on each of the unique properties of charge, mass, sequence
spectra
associated with certaiil identified protein sequences in the master data set.
In a first aspect, therefore, the invention provides a method for determining
the presence
of one or more polypeptide biomarkers in a sample, comprising the steps of:
(a) subjecting the sample to a mass spectrometric (MS) analysis and recording
retention time index and corresponding mass for each signal detected;
(b) correlating the mass corresponding to each signal to a reference database
of
biomarker masses to forin a coiTelation between each signal and a reference
biomarker,

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
7
and discarding those signals whose masses do not correlate to a reference
boimarker
inass;
(c) storing those signals whose masses correlate with a reference biomarlcer;
(d) confirining the correlation between each stored signal and a reference
biomarker by matching the MS spectrum of each signal with the MS spectruin of
the
reference biomarlcer in the database using a similarity measure, to define a
set of
positively correlating signals;
(d) measuring the intensity of each positivley coiTelating signal and scoring
its
absolute signal intensity or its relative signal intensity using a
discrimination function;
(e) applying a threshold to the score values obtained fiom the discrimination
function to determine the presence or absence of the biomarlcer.
The method of the invention allows users to analyse, simultaneously, hundreds
or
thousands of biomarkers in a sample. The method relies on a database of
biomarkers,
which have been shown to be associated with a disease, which comprises mass
and
spectral data for each of the biomarlcers and allows the said biomarkers to be
indentified
precisely by the MSBA software in a given sample. By screening the peptides
present in
a sample and eliminating undesired sequences on the basis of the retention
time index,
which correlates with the time of arrival of the peptide at the MS detector,
upwards of
30,000 sequences can be analysed in minutes and given biomarlcers identified
with high
confidence. The method is automatable, high-throughput and operable by
relatively
unskilled technicians.
The sample can be subjected to MS analysis without prior separation
procedures. In such
an einbodiment, the sample is preferably analysed by direct infusion using
static nano-
electrospray principles, flow injection analysis or flow injection with sample
enrichment.
Advantageously, the sample is processed prior to MS analysis, preferably to
separate
sample components prior to loading them into the MS. For example, the sample
processing coinprises sainple separation by single- or multi-phase higli-
pressure liquid
chromatography (HPLC).

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
8
The MS system itself is preferably electrospray ionisation (ESI) MS, matrix-
assisted laser
desorption ionisation - time of flight (MALDI-TOF) MS or surface ei-d-ianced
laser
desorption ionisation - time of fliglit (SELDI-TOF) MS.
The method according to the invention is advantageously automated and
performed under
computer control. Identification of biomarkers in a sample is made by
coinparison with
reference data for said biomarkers; preferably, reference mass and MS spectral
data for a
plurality of biomarkers are stored on a computer.
Reference MS spectra for a defined biomarlcer are preferably averaged spectra
obtained
from actual and measured data obtained by a clustering calculation.
The method of the invention may be implemented in two ways; using internal
standards
to provide a reference for quantitating signal intensity, and without such
standards. Thus,
in one embodiment, one or more internal standards are added to the sample
prior to
analysis by MS. Preferably, the internal standards are labelled.
In such an implementation of the invention, the absolute signal intensity for
each
biomarker signal is scored by measuring the biomarker signal intensity and
comparing it
to the signal intensity of one or more known internal standards.
In the alternative implementation, the sample is processed without the
addition of internal
standards. In such an embodiment, the relative signal intensity is scored by
measuring the
ratio between the individual biomarlcer signal intensities in a patient and
the reference
signal intensity for a patient group.
A bioinarker can be described as "a characteristic that is objectively
measured and
evaluated as an indicator of normal biologic processes, pathogenic processes,
or
pharmacologic responses to a therapeutic inteivention". A biomarker is any
identifiable
and measurable indicator associated with a particular condition or disease
where there is a
coiTelation between the presence or level of the bioinarker and some aspect of
the
condition or disease (including the presence of, the level or changing level
of, the type of,
the stage of, the susceptibility to the condition or disease, or the
responsiveness to a drug

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
9
used for treating the condition or disease). The coiTelation may be
qualitative,
quantitative, or both qualitative and quantitative. Typically a biomarker is a
coznpound,
compound frag7nent or group of compounds. Such compounds may be any compounds
found in or produced by an organism, including proteins (and peptides),
nucleic acids and
other compounds.
The sample may be any biological substance of interest, but is advantageously
a
biological tissue and preferably a biological fluid such as blood or plasma.
The method of the invention relies upon correlation of observed MS signals
with
reference masses and MS spectra of lmown biomarkers. The reference data is
preferably
stored on a computer server, which allows the entire procedure to be carried
out under
coinputer control.
Signals are correlated to reference standards by comparison, for example using
computational functions as described herein. Preferably, signals are
characterised as
"positive" or "negative" according to whether a threshold level of similarity
is achieved;
signals which are negative and do not achieve the threshold level of
similarity are
discarded in the MSBA process, whilst those signals which are positive are
matched with
biomarlcers and result in a diagnosis of the presence of said biomarkers in a
biological
sample.
Signal intensity is measured with reference to known control standards added
to the
biological sample, or to by comparison with a reference intensity calculated
across a
patient group, depending on the implementation of the MSBA assay.
In the case of iznplementation with standards, the MSBA scoring of the
biomarlcer signals
is calculated by the ratio between the signal of biomarker present in the
sainple and the
intemal standard added to the sample. All biomarkers in a multiplex assay will
be
analysed the same way resulting in a final MSBA scoring factor.
In order to have absolute quantitation built into the MSBA methodology, the
use of
internal calibrant standards is preferred. Such standards are for instance
isotope labelled,

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
making the assay read-out highly accurate in terms of protein sequence, as
well
advatageous in tenns of absolute quantitation. Built in calibration sequences
within the
MSBA screening will allow the measurement of absolute protein biomarker levels
in
blood, or any otlier clinical sample.
5
A new method is provided, composed of multiple linked steps, for detecting and
quantifying protein sequence biomarlcers with a multiplex read-out where the
expression
levels of, but not restricted to, 2-100 biomarkers can be mapped in one single
MSBA
read-out. The MSBA system is built on a liquid phase platform that can handle
single line
10 diagnostic mapping, or a multiple flow configuration with simultaneous
parallel
processing of sainples, thereby increasing the capacity and throughput of the
system. The
detection mode of the MSBA method is the accurate mass identification and
sequence
deteiinination and subsequent quantitation by mass spectrometry.
This methodology may be applied to any type of biological sample that is in,
or can be
transformed into, a liquid form. The MSBA methodology can also process samples
from
any type of cellular, or biotechnology processes where for instance kinetic
profiles over
time are measured. This analysis over time is perfoimed by subsequent sample
introduction into the MSBA platfoi-in automatically over time. The entire
analysis
capability of the MSBA diagnostic profiling is entirely computer control
including the
mass signal evaluation, the sequence analysis, the multiplex quantitation by
weighing
discrimination and finally the MSBA SCORE diagnosis. All of the intermediate
steps
within the MSBA cycle run on this platform are evaluated by dedicated
algorithms that
make accurate decision making from the massive amount of data generated in
each cycle
of MSBA analysis from any given biological sample.
The MSBA method results in the identification of specific peptides, as well as
biochemically modified variants thereof, present as separate entities or
present within
complex mixtures of proteins and peptides. Each peptide may be defined by a
specific
sequence of amino acids, that can be selectively identified by either its
precise mass, or its
unique iinmuno-affinity binding properties to a given immunological reagent.

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
11
The method allows the identification of statistically significant protein
identities and
znodified versions thereof. Moreover, it is possible to measure relative
quantities of each
biomarker in the sample even without the use of intemal standards.
Alternafiively,
absolute quantitations can be made of each bioinarlcer separately in any given
biological
sample by the use of internal standards, where these internal standards (e.g.
n=1-20) are
the protein sequences of the biomarkers. These internal standards can then be
made as
cold amino acid sequences, or as isotope labeled amino acid sequences. The
standards
have identical sequences to the selected biomarkers, with the possible
exception of the
labelling.
The method comb'rnes several key steps which results in the specific
processing,
separation, isolation, and identification of unique protein sequences present
in a
biological material sample. The metlzod may be applied to human clinical
samples. The
method may also be applied to samples derived from non-human animals.
We provide a multi-step method for identifying the identity of unique protein
sequences
presented for exainple as atoinic mass units of entities from a biological
sample that has
been proven to have a quantitative alteration in a given multiplex biomarker
group, the
size of which can range for example between 2-100, of a given sample.
We moreover provide a method to determine or confirm that the biomarkers in
any
specific biological sample have the multiplex quantitative shift of a
biomarker set of
protein sequences that is pre-determined, in clinical, cellular, or any other
type of sample.
This quantitative alteration is finally calculated by the MSBA algorithms to
generate a
MSBA SCORE that will be the diagnostic read-out.
Further, statistically significant similarities may be detected and registered
as unique
protein sequencs identities or multiple-peptide identities. Detennining
statistically
significant similarities involves using publicly available protein and gene
sequence data
bases as well as algoritluns developed specifically to meet the demands of the
MSBA
methodology.

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
12
The integration of process steps for biomarker identification is advantageous.
The
integrated process relies on the following principles: 1) high quality
biomedical clinical
material, 2) reproducible and high speed sample processing wit11 subsequent
liquid phase
separations, 3) accurate quantitative and qualitative detennination of a
multiplex set of
biomarkers and 4) algoritluns that will control the data generation and
calculate and allow
the isolation of the biomarkers in the multiplex protein sequence group, one
by one.
Brief Description of the Figures
Figure 1 shows a schematic illustration of the MSBA principle.
Figure 2 illustrates in more detail the data handling procedures involved in
MSBA.
Figure 3 shows a mass spectrum from a blood sample fiom a lung disease
patient.
Multiple biomarlcers are identified in the sainple.
Figure 4 shows an example of biomarlcer annotation made fonn the multiplex
assay,
presented by the MS spectrum where the biomarker was recognised by the MSBA
software, and the follow up MS/MS spectrum that represents the resulting
CVLFPYGGCQGNGNK biomarker.
Figure 5 shows an example of evaluating the predictability of an MSBA model
with 11
biomarker signals on sample data of 19 patients, as described in Example 2. 10
cases and
9 controls were used as if they were blinded samples. The MSBA score for each
subject
was calculated using Eq.5. In this example, subjects whose MSBA score was
equal or
greater than 1 were diagnosed as cases (red circles). Otherwise the subject
was considered
to be a control (blue circles). The prediction accuracy was 100%.
Figure 6 shows the auto-discrimination results using an MSBA model with 10
signals on
sample data of 96 patients, as described in Exainple 3. Each dot represents a
patient, and
vertical axis represents the discriminant score (z), calculated using Eq. 6.
If this score was

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
13
>0, it was inteipreted as a case of the disease (red circles). Otherwise the
subject was
considered to be a control (blue circles). The prediction accuracy was 83.3%.
Detailed Description of the Invention
Biomarkers
The FDA definition of biomarker is "a characteristic that is objectively
measured and
evaluated as an indicator of normal biologic processes, pathogenic processes,
or
pharmacologic responses to a therapeutic intervention".
As used herein, the terin "biomarker" refers to a polypeptide which can be use
to monitor
the presence or the progress of a disease, consistent with the above FDA
definition.
Bioinarlcers can be used as diagnostic agents, monitors of disease
progression, monitors
of treatment and predictors of clinical outcome. For example, various
biomarker research
projects are attempting to identify marlcers of specific cancers and of
specific
cardiovascular and immunological diseases.
Some of these disease-associated proteins lnay be identified as novel drug
targets and
some may be useful as biomarlcers of disease progression. Such biomarkers may
be used
to iinprove clinical development of a new drug or to develop new diagnostics
for the
particular disease.
Disease-associated proteins are lcnown in the art, and their use as biomarkers
for the
disease is established, Such biomarkers can be monitored by means of the
present
invention. Novel disease-associated proteins, however, may be identified.
Detection of
disease-associated proteins may be achieved, for exainple, by the following
method.
Protein samples are talcen from single patients or groups of patients. These
samples may
be cells, tissues, or biological fluids that are processed to extract and
enrich protein and/or
peptide constituents. Typically the process entails partitioning into solution
phase but

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
14
may also include the establisllinent of protein and/or peptide coinponents
attached to solid
matrixes. After separation and analysis (proteomics, peptidonomics), protein
expression
fingerprints are produced for both diseased and healthy subjects by
qualitative and
quantitative measurement. These fingerprints may be used as unique identifiers
to
distinguish individuals and/or establish and/or track certain natural or
disease processes.
These prototype fingeiprints are established for each individual
sainple/subject and are
recorded as numerical values in a coinputer database. The fingerprints are
then analysed
using bioinformatic tools to identify and select the proteins or peptides that
are present in
the prototype fingeiprints and whose expression may or may not be
differentially present
in the salnples derived from the healtlly and diseased subject samples. These
proteins/peptides are then further characterised and detailed profiles are
produced which
identify the characteristic physical properties of the proteins or peptides.
Either a singular
proteins/peptide or groups of proteins/peptides may be determined to be
significantly
associated with certain natural or diseased processes.
Mass Spectrometry
Mass spectrometry is the method of choice for the analysis of proteins and
peptides.
Modem biomarlcer discovery research employs two major mass spectrometry
principles:
MALDI-TOF (matrix assisted laser desorption ionisation time of flight) mass
spectrometry where the proteins are analysed in a crystalline state, and ESI
(electrospray
ionisation ) mass spectrometry where the proteins are analysed in liquid
state. In addition,
a surface enhanced chip application of MALDI nained surface-enhanced laser
desorption
ionisation (SELDI) has been used extensively in biomarker discovery studies.
See, for
example, Petricoin et al., Lancet, 16, 572-577, 2002; Alexe et al.,
Proteomics. 2004,
4766-4783; and Liotta et al., Endocr Relat Cancer. 2004 Dec;11(4):585-7.
The surface-enhanced laser desorption/ionizatioii (SELDI)-TOF-MS technology
uses
chromatographic surfaces coupled to the assay target plate. The protein-bound
material
on the plates is then directly analyzed by MALDI-MS. SELDI assays peptides and
proteins predominantly in the low molecular mass range. This technology is
applicable to
the major, to nledluln-abundant peptides and proteins where a suitable upfront
purification scheme is not integrated. The SELDI technology leads primarily to
a pattern

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
from where sequencing can be perfonned using MALDI-TOF-TOF identification of
peptides.
Multi-mechanism separation platforins enable high resolution peptide
separation
5 configured on-line with electrospray ionization mass spectrometry, or off-
line with
ionization principles such as matrix assisted laser desoiption ionization mass
spectrometry. See, for example, Aebersold,R. & Goodlett,D.R. Chem. Rev. 2001,
101,
269-295; Maim, et al., Annu. Rev. Biochem, 2001. 70, 437-473; Wolters,et al.
Anal.
Chem. 73, 5683-5690 (2001); and Washburn,et al., Nat. Biotechnol. 19, 242-247
(2001).
Mass spectrometry (MS) is also an essential element of the proteomics field.
In fact MS
is the major tool used to study and characterise proteins structure and
sequence within
this field. See Aebersold, R. & Mann, M. Mass spectrometry-based proteoinics.
Nature
422, 198-207 (2003); Steen, H. and M. Mann (2004). Nat Rev Mol Cell Biol 5(9):
699-
711; and Olsen, J. V. and M. Mann (2004) Proc Natl Acad Sci U S A 101(37):
13417-22.
Researchers are successfully harnessing the power of MS to supersede the two-
dimensional gels that originally gave proteomics its impetus. Using ESI and
liquid
chromatography (LC)/MS/MS, a voltage is applied to a very fine needle that
contains a
peptide mixture, generating peptide sequences, eluting from the LC-column. The
needle
then sprays droplets into a mass spectrometric analyzer wllere the droplets
evaporate and
peptide ions are released. In LC/MS/MS, researchers use microcapilliary LC
devices to
initially separate peptides.
Mass spectrometry (MS) is a valuable analytical technique because it measures
an
intrinsic property of a bio-molecule, its mass, with very high sensitivity. MS
can
therefore be used to measure a wide range of molecule types (proteins,
peptide, or any
other bio-molecules) in a wide range of sample types/biological materials.
Correct sample preparation can influence MS signal generation and spectrum
resolution
and sensitivity. High resolution separation systems such as single-dimensional
high-
pressure Liquid Cluomatography (LC) and multidimensional Liquid
Cliromatography
(LC/LC) can be directly interfaced wit11 Mass Spectrometry. This interface
allows fast

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
16
automated acquisition and collection of large data sets that represents both
quantification
as well as sequence information with.in the mass spectra generated. This
integrated
shotgun proteomics teclulology is laiown as MudPIT (Multidimensional Protein
Identification Teclulology). See Eng et al., J Am Soc Mass Spectrom 1994, 5:
976-989;
Lii-dc et al., Nat Bioteclulol. 1999 Ju1;17(7):676-82; Washburn et al., Nat
Biotechnol. 2001
Mar;19(3):242-7; Lin et al., Ainerican Genomic/Proteomic Techiiology, 2001
1(1): 38-
46; and Tabb et al., J. Proteome Res. 2002 1:21-26.
In the shotgun proteomics approach, peptides generated by specific protein
digesting
enzyines such as trypsin and other endo-, and exo-peptidases/proteinases are
analysed
rather than intact proteins. This fraginentation offers definite advantages
due to the fact
that even very large proteins, with varying physical and chemical
characteristics such a's
very hydrophobic, or very basic proteins, can be analysed. Such protein
classes can
otherwise be difficult to handle. These proteins will give rise to resulting
peptide mixtures
of sufficient size and number that allows for accurate protein amiotation and
identification. However, since several peptides are generated from each
respective
protein, the complexity of the mixture to be analyzed is increased.
Consequently,
considerable instruinent time and computing power are needed for the shotgun
approach.
However, the wealth of protein expression information is extensive, and
generated in a
fiilly automated setting with simultaneous real-time protein identification.
In order to be able to handle different patient samples which present a
various degree of
disease, different methods can be applied to adjust and align the resulting
liquid
chromatography chromatograms and mass spectra. This normalisation can be
perform.ed
by a software approach whereby the total signal generation made from the
entire
experiment is used and compared to that of the various patient samples
analyzed. The
mean values and commonalities of all the signals will be aligned to allow
differential
quantitations.
In a second approach, a pre-determined amount of peptide standard is added to
the
sainple. This addition will be made both before and after, or, either before
or after the
digestion of the samples. The standards used will be the actual biomarlcer
sequences

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
17
synthesized as isotope labelled sequences, or without isotope labelling, and
spiked with
the samples.
The use of labelling technologies within the Proteomics field for quantitative
clinical
protein regulation studies is highly common. Various labelling techniques have
been
developed and applied utilizing a of variety binding chemistries. Ainongst the
most
commonly used labels within the proteomics field are the ICAT and ITRAQ
labels. See
Parker et al., Mol. Cell. Proteoinics, 625-659, 3, 2004; Ross et al., Cell.
Proteoinics, 3,
1153-1169, 2004; and DeSouza et al., J. Proteome Res., 2005, 4, 377- 386.
Sample separation
Single-, or multidimensional HPLC (High Perfonnance Liquid Chromatography)
will be
used as the preferred alternative for separating proteins or peptides. The
protein or
peptide mixture is passed through a succession of chromatographic stationary
phases or
dimensions which gives a higher resolving power. HPLC is adaptable for many
experimental approaches and various stationary and mobile phases can be
selected for
their suitability in resolving specific protein or peptide classes of interest
and for
compatibility with each other and with downstream mass spectrometric methods
of
detection and identification. HPLC is used to separate clinical samples that
have been
digested by a proteolytic enzylne where the corresponding enzyme products, the
peptide
mixtures, are generated. Sample preparation procedures are applied to protein
samples
such as blood, tissue, or any other type of biofluid. See, for instance,
Schulte et al.,
Expert Rev. Diagn., 5(2), 2005, 145-157; Chertov et al., Expert Rev. Diagn.,
5(2), 2005,
139-145; Adkins et al., Mol. Cell. Proteomics, 1 (12), 2002 947-955; Pieper et
al.,
Proteomics. 2003 Jul;3(7):1345-64; and Aiiderson, N. L. & Anderson, N. G. Mol.
Cell
Proteom.2002 1, 845-867.
The coiTesponding peptide mixture is passed through a succession of
chromatographic
stationary phases or dimensions which gives a high resolving power. HPLC is
flexible
for many experimental approaches; in the setting of the present invention an
optimization
is made that specifically eliminates the high abundance fraction of proteins
expressed in
human blood samples, whereby enriclunent is made of proteins in the medium-,
and low

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
18
abundance region. The separation of peptides and proteins is based on the
peptide
sequence, the functional groups of the peptide sequence, as well as the
physical
properties.
MSBA-OPERATION PRINCIPLES
Prior to exposing samples to MSBA, a sample handling and preparation step is
required
in most cases. The aim of introducing this step prior to the MSBA metliodology
is to
eliminate interfering agents and matrix components, thereby facilitating
improved overall
detectability resulting an increase in annotation, as well as overall
sensitivity. However,
in certain embodiments sample preparation can be dispensed with, particularly
if the
biomarker is in higher abtindance aiid the sample of low complexity. Those
skilled in the
art will be able to determine whether a preparation step is essential.
The MSBA platfonn can be operated in a number of different ways, predominantly
determined by the nature of the sample and its complexity.
The biomarker protein sequences are determined qualitatively and
quantitatively in the
patient sample by multiplex analysis. Both Labelled and Unlabelled MSBA
principles can
be applied, employing configurations of the MSBA assay according to two
possible
principles: the internal standard addition principle and the no internal
standard principle.
General Methodology
After sample preparation, the sample is injected into the MSBA platform.
Next, the following operations are undertaken;
STEP 1A
Firstly, the biomarl{er MS-signals need to be identified within the sample.
A predefined list of Biomarlcer list masses +/- 1 Dalton that correlates with
the retention
time index and corresponding mass of the respective biomarker is screened for
in the
biofluid saniple. The relative retention time indexes obtained in most MSBA
assays is
defined in minutes and has a variability of about +/- 2%, altough this figure
may vary.

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
19
These steps are perfoi7ned by real-time mass spectral matching to the MSBA
reference
spectra repository, as illustrated in Figure 1.
Next, a coinparison is made of the masses in the MS-spectra with the masses of
the
reference list of Biomarlcers, to about +/- 1 Dalton.
STEP 1B
When the biomarker candidate mass is identified as that of a biomarker having
a
matching MS spectruin within the reference list, within +/- 1 Da, the
infonnation therein
is saved on the MSBA-server. In case thst the mass is incorrect, the MSBA
screening
malces no spectral file savings to the server.
These operations are performed by a file sharing and inter-process
communication (such
as client-server-type communication) mechanism.
STEP 2A;
When the mass identity in the MS-spectra is identified, mass identification
and sequence
identity analysis is initiated.
The pattern matching step within the MSBA software will identify a certain
similarity
measure, for example the cosine coirelation. Using the similarity measure, the
correct
protein sequence is confirmed. This confirmation is made by spectral matching.
The
spectral matching is performed by comparison of the sample spectra and the
reference
spectra in the MSBA database. For a positive identity at this stage a cosine
coiTelation
factor of 0.8 or higher is required in order to confirm the accurate protein
sequence.
Equivalent threshold values for alternative similarity measures will be
apparent to those
skilled in the art.
The reference spectral coinparison and evaluation is performed in the
following way.
The MS/MS spectrum is represented as a list of doublets (m, v) where in
represents mass-
to-charge ratio, and v means the ion signal intensity value. By binning a
with the interval

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
of mU, (inU = the actual width of the bin) MS/MS spectrum can also be
expressed by a
vector v={ where its length (n) equals the number of bins, and the value of
each element is the sum of intensity of all signals within each bin. This is a
profile
representation of an MS/MS spectrum.
5
The Cosine correlation (S) of 2 different MS/MS spectra (VI , V 2) can be
calculated as a
cosine correlation accordiia.g to eq 1;
VVz,t
S = V'GV' `-' (Eq 1)
VIIIV21
The value of S varies from 0 to 1. 0 A value of 0 signifies that two vectors
are completely
independent; in the case of a S vector with the value=l, this signifies that
the direction of
the two vectors is the same.
Note that the two MS/MS spectra vectors must have the same binning, i.e., if
the binning
of m of one vector is 500-501, 501-502, ... 1999-2000, then another specti-uin
must be
binned in the same manner. Consequently, the lengtll of two vectors inust be
the same.
In order to judge whether measured MS signals are the cor-rect biomarlcer or
not, the
MS/MS portion of the measured signals is extracted and compared with the MS/MS
reference spectrum of the sample by using for example the cosine correlation
described
above.
If the cosine correlation value S is equal to or greater than a pre-defined
threshold value,
for example 0.8, then the measured signals are judged to be the derived from
the putative
biomarker in the reference set.
The following section describes how to construct reference spectra that are
obtained as a
group specific spectrum from many individual patients.

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
21
For each candidate biomarlcer, once such biomarlcer is established, several
MS/MS
spectra should be collected to construct a reference MS/MS spectnmz map. This
is an
averaged spectrum from actual and measured data sets and is obtained by a
clustering
calculation.
An example of the construction of such reference spectrum is as follows:
(1) Collection of multiple MS/MS spectra for the targeted biomarlcer. These
MS/MS
spectra must be confirmed to be derived from the target biomarker by MASCOT,
SEQUEST or other programs with a given confidence level.
(2) Investigation of the similarity of each collected MS/MS spectium with the
above
mentioned resemblance is perforined, This is performed by a clustering
calculation using
the similarity measure. The clustering calculation is performed to a point
where the
similarity measure decreases to the pre-defined threshold value. Following
these
clustering calculations, a summary list of all remaining protein sequence ions
within the
MS/MS spectra is generated. The next step is the removal of the remaining
protein
sequence ions in tlie summary list from the chister calculation.
(3) Using the established and qualified summary MS/MS spectrum from the
cluster, it is
possible to calculate the arithmetic average for each element of the spectrum
vector. The
averaged vector can now be used as the MS/MS reference spectrum.
(4) During the clustering process, it is possible to come up with a result
that
generates more than one reference from the patient group.
a) In that case, there is more than one cluster that contains a difference in
the MS/MS
spectral profile map. The criteria set on these situations is that these
groups need to have
reliable target biornarker identification. It is then possible to generate and
make use of
more than one reference spectilun for one target. Such cases would appear if
there are
MS/MS fraginent ions that are different in the groups but correlates to the
same annotated
protein.

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
22
b) It is also possible to arrive at situations witli the cluster analysis data
where there
is a difference in the biomarlcer profile map. That would mean that several
individual
groups can be established from the patient cohoi-t, e.g. different phenotypes.
In these
cases, the comparative multiplex pattern will be phenotype specific. However,
there will
also be possibilities of biomarlcer overlaps in between the phenotypes.
These confiimation algorithnis will be applied and used in real-time within
the high-
throughput screening operations of the MSBA platform, exainplified below.
STEP 2B
The mass spectrometer (for example the Finnigan LTQ), once a positive
biomarker mass
has been identified, will stay on that mass target in order to make repeated
scanning of the
biomarker ion signal. The number of scans will be dependent on the score match
generated for each particular protein sequence, but will be aligned to tlie
positive identity
of the biomarker. The scanning window will be determined automatically by the
MSBA
software.
The criterion for a positive correlation should be higher or equal to 0.8 in a
cosine
correlation similarity measure.
The next succeeding step will be to inalce a statistically significant
identity of the protein
sequence by utilizing commercial search engines such as MASCOT or SEQUEST or
any
other search engine with the protein data bases, to confirm that it is the
correct Biomarker
identity.
The MSBA system will only store and archive those signals and data files that
are within
the mass and sequence area of the biomarkers. All other data generated from
the assay are
not transfeired to the MSBA database.
STEP 3
Calculation of the multiplex biomarker assay read-out

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
23
The calculation of the inultiplex biomarker assay read-out is perfornled by
the application
of the MSBA algoritlun which consists of a discrimination function that will
calculate the
diagnostic MSBA score.
A discrimination fiulction is defined as a fiinction of x, ,===, x,t , where
xl represents n
absolute or relative signal intensity of the i:th biomarker. The output of a
discrimination
function must be either positive or negative value according to the diagnosis
result. For
example, if the diagnosis is positive, the output value of the discrimination
function must
be positive, and vice versa.
For example, in Eq 2, the discrimination function used is outlined:
11
Y, aixi - aoxtotal (Eq 2)
i=1
where, ii is tlie number of multiple biomarkers used for the diagnosis. xi is
the absolute or
relative signal intensity of the i:th biomarker, and xtorar is the total
signal intensity of the
MS measurement.
There is also a weight factor included into the algorithms of the MSBA
software.
A vector { a,,..., a,l , Czo } is a weight vector that determines the
direction of the normal
vector of a separating hyper-plane that divides the n-dimensional signal
intensity space
into two: diagnosis positive and diagnosis negative. An example of the
procedure to
determine the weiglit vector is described afterward, however various kind of
algorithms
e.g. Support Vector Machine, Artificial Neural Networlc, and others can be
used to
determine the weight vector.
Anotlier exainple of a discrimination function is included in the MSBA
algorithms and
are defiiled as follows:

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
24
afp (.xl ,. . . , .xjZ ~ y P4, (.xl ~ . . . ~ .xj2 ~ (Eq 3)
The fiinction fp and fõ is an arbitrary function that give a measure of either
similarity or
distance between a set of measured biomarkers in a patient to be diagnosed and
sets of
reference biomarkers signals in the MSBA server. fp denotes the similarity or
difference
ftulction from the diagnosis positive references, and fõ denotes that from
diagnosis
negative ones.
a and (3 are coefficients that can be used to unequally weight the diagnosis-
positive and
the diagnosis-negative metrics.
If the function fp and fõ give a similarity measures, then a patient sample
will be
diagnosed as positive when the equation 3 generates a positive value. If the
functions give
distance measures, the positive value of the eqn3 means the diagnosis
negative.
An example of such function is Euclidean distance , where yi is the i:th
biomarker signal
intensity in a patient to be diagnosed,
2
y xZ) (Eq 4)
Fi=
the i:th biomarker signal intensity of the reference set.
and x; is
Another example is a standard error of the predicted value in the regression:

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
2
72 fZ 11
2 (ny, xiJ'Z-(~xi)(Y' Yi)
~ ~a i=1 i-1 i=1
77~1r -~~ yi) - , /n(n_2) (Eq 5)
n
=1 i~1 n 2
n x; - Y, xi
M
where ra is the number of biomarker signals, xi is the measured i:th signal
intensity of a
5 patient sample, and y, is the predicted value from each x; by using a linear
regression line
that was calculated by the least square fitting between the measured xi's and
the reference
signals.
The entire software scheme, including the algorithms that controls each
specific step
10 within the process is outlined in Figure 2.
STEP 4
Bioinarlcer annotation and quantitation
15 The MSBA assay platform builds on a:
A) a separating principle
B) non-separating principle
in the case of non separating principle we are able to inake the biomarker
annotations and
20 quantitations by:
(a) Direct MS-analysis
i) direct infusion of biological sample by using static nano-electrospray
principles.
Disposable nanospray needles are used, where each nano electrospray needle
will only be
25 exposed to one biological sample, thereby circumventing sample overload and
memory
effects.
(ii) flow injection analysis mode where the sample is injected as a plug. The
sample
voluine chosen within the plug is directly related to the signal intensity of
the respective

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
26
biomarker protein sequence. It is also possible for low abundant biomarkers to
use large
(several ml) sample injection voluines thereby reaching a saturation (steady-
state) of the
ion signal efficiency of the mass spectrometer.
(iii) flow injection with sample enriclunent by MS-analysis utilizing a
chromatographic
solid phase extraction enriclunent column. This step allows for simultaneous
clean-up, by
elimination of matrix components within the sample, and trace enrichment of
biomarkers.
The advantage of this approach is that it is possible to analyze biomarkers
from sainple
origins with high complexity, e. g. tissue extracts.
Additionally, in the sample enrichment mode (iii), we are able to generate
signal
amplification factors ranging but not restricted to 2-500. Additionally, this
approach will
improve on the detectability of biomarkers expressed at low levels, but also
on the
accuracy of the protein sequence annotation.
B) In separating analysis, liquid chromatography (LC) integrated biomarker
identification
relies upon the high resolving power of LC that can be operated in the single
column
mode (see Figure x) or in the multi-column mode utilizing column switching
where the
samples are analyzed in a sequential mode, thereby improving the sample
throughput.
MSBA Programming
Here is an example code of the core part to calculate a weight vector (or a
model) and to
predict diagnosis using the Support Vector Machine algorithm. The code is
written in R-
Language. A model (model) will be constri.icted from a training data set
(train) , and then
will be utilized to generate a prediction (pred) for a given test data set
(test). Data sets
train and test are data fraines containing plural number of data points, which
consist of an
object Diag containing diagnosis results (categorical value: either "Positive"
or
"Negative". The values are empty for the case of pred), and a vector
containing signal
intensity for each biomarker and total signal intensity. The MSBA prograinming
will rely
on data generated from the protein sequence screening performed on the two
patient
groups from where the biomarkers have been generated.
The programining within the MSBA software is perfonned by in the following
way;

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
27
library(e 1071)
model <- svrn( Diag - ., data = train )
## the above equation can calculate the model, but it may be too simple to
reflect the eqn
2.
### selected support vectors are: model$SV
## and weight vectors are: model$coefs
pred <- predict( model, test )
EXAMPLES
The following examples are illustrations from a lung cancer study that was
performed by
LC-MS protein profiling in human blood samples. Two patient groups were
analysed, the
CASE and the CONTROL cancer group with differential protein expression
differences
analysed.
EXAMPLE 1
Experimental details
From each patient approximately 6 ml blood was taken into a sampling tube
containing
Heparin sodium salt and was 2-3 times upside-down mixed. Then, it was
subjected to
centrifugation at 2,000 x g for 10 min at 4 C. Three ml plasma was obtained
from the
supernatant. The sample was freeze stored at -80 C. Next, the proteins were
extracted
from plasma and were subjected to tryptic digestion after depleting abundant
human
plasma albumin and IgG. Aliquots of the fractionated plasma sample was then
analyzed
by LC-MS, and sequenced by MS/MS, as previously described [WO06100446].
The MSBA Plot
The Multiplex biomarlcer suinmary plot presents the inultiplex expression data
of the
patient biomarkers within the Lung cancer study.
The 10-inultiplex biomarker diagnostic read-out generated from the MSBA
methodology
illustrates each and every biomarker separately (see Figure 3). The
quantitative

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
28
difference, i.e. the fold change difference that is already known and stored
within the
MSBA database (see Figure 1) is used together with the qualitative differences
to assay
the biomarlcers. Biomarlcer data generated from the lung cancer study
correctly identifies
all of these patients as positive from the diagilostic multiplex MSBA read-
out.
The scoring of all ofthese 10 patients was found to be in the range of 0.80-
1Ø
Figure 4 shows an example of biomarlcer annotation made form the multiplex
assay,
presented by the MS spectruin where the biomarker was recognised by the MSBA
software, and the follow up MS/MS spectrum (see Figure 4) that represents the
resulting
CVLFPYGGCQGNGNK bioinarker. The MSBA matching, using the reference
biomarker spectra in the MSBA-database applying cosine correlation, shows the
cosine
correlation factor to be equal, or higher than 0.8.
Table 1 presents the details of the MSBA-data generation, where pre-defined
masses of
the regulated biomarkers are analyzed.

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
cu
en
C.> N tn ~D N N 00
~O ~D M 7t d
(y rrj ~rj O O O
o O.
ct un
c. C
+~ o
' o G~ O~ Q~ O~ O~
Q' vv, O C O O O
~
0o d N d d~ 06 M d al
~ ~ O N ~ O O06
~
v~ l~ d v~ ~ N in 00 l~
00 d N d d V~ ~ M Ol
~ O ~ d~' ~ O 06
V'1 M V,) Ln
N d C, N d M O~ 00
O~ d O M C r M ~n ~ O ~f N
N c~ M N M M d m N M ~Y o0
O1 l ~t 00 O O ~D 00 l~ N C
M l O Ol\ Lr) N Ln
tn ,--~ N v') \-o oO \D l~
d N oo vt N M ~ ~f ~ N M ~ N N
~ M N C ~O ~ ~ N O N ~ d' N O~ l~ l~
C M in 01~ N v') ~D 00 0~
N ~O N N M d O~ ~O M ~) ' ~O O M ~ vl M 01
cr w ~ l~ N Ln Ln N O O 01 O N N ~ 00 d [~ O
N V') u') oo ~ N Ln "O ~ kn N M 00
~
ca
tn kn
N N 00 N N 00 fV
00 00
C C
E O O~ 16 N
b~n
z.
n!
F1
~ N M d
r-~
CC

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
3Q
N d M M
M M M ~
prp0~0 ~
G O O O O
N 01 ~ oo
N pp 01 M
Ln
pp N
M ~ ~ N M
00 4'
00 O_ ~O l~ ~ N \O [~ - 00 O 00
00 ~O O M N
00 Ln 00 M
C~ ~F d N ~D
c',' 00 d'
- C=j '--i O N r' ~ ~ ~ `O l0 OO
~ C7~ \~o O "D ~ ~O tn
6 06 \.6
M M T N M O\ ~~~ [~ 00 Ql ~
O l~ N O - N l~ = a M l~ l~ '- ~
00 d t~ V.) co ln ~ v-) oo "o 0o d' t~ a-~ `o o0
C~ co N ~ 00 M d ~ d`n N d ~ `~
00 00 N ~ d O 00 N l~ Lri N M N \0
d,% M
V) i!) 00 Vl l~ d' ~D a1 .. l~ k-j "o
Vq l~ r= - ~D
O v~ 01 00 [~ M C-q ~ 0~1 0M0 ~ ~ N M
tf) ti0 '"'
C) 00
M
00
N
~
M M M ~ CN
00

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
31
EXAMPLE 2
Another example described as follows is derived from a lung disease CASE-
CONTROL
study that was perfonned by LC-MS protein profiling in huinan blood samples.
Two
patient groups were analyzed, the CASE and the CONTROL lung disease group with
differential protein expression analysis.
Experimental details
Procedures of plasma sample collection and preparation were the same as above
described example.
MSBA model building and evaluation
46 patient sample data consisting of 10 cases and 36 controls were used to
construct
MSBA models. We had constructed 5 different MSBA models containing 14, 8, 26,
8,
and 11 signals, respectively. After combining the 5 models, it was revealed
that the final
(5t1') MSBA model had dominant discrimination ability. Thus we used only the
5"' model
in the following step. LC-MS information (Retention time (min) / MS-value
(m/z)) of the
11 signals of the final model was as follows: 11.6/485.2, 12.5/608.1,
18.3/547.0,
20.2/681.3, 21.1/575.1, 21.3/531.5, 25.51561.6, 23.1/514.5, 32.5/682.2,
44.0/985.2,
48.7/945.8
In order to evaluate the predictability of the model, we tried to predict CASE
/
CONTROL using 10 cases and 9 controls as if they were blinded samples.
According to
the MSBA diagnosis procedure described above (also illustrated in Figure 2),
the 11-
multiplex biomarlcer signals were identified for each test sample, and the
quantification of
each peak signal was perfonned. From the quantity of all the 11-multiplex
signals, the
MSBA score for each subject was calculated using the above described fonnula
(Eq.5). In
this exainple, subjects whose MSBA score was equal or greater than 1 were
diagnosed as
cases (Table 2, MSBA diagnosis). Consequently, we could predict all the
samples
correctly, i.e. discrimination ability was 100% (See Figure 5).

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
32
EXAMPLE 3
The following exainple was also derived fiom the lung disease CASE-CONTROL
study
perfoi-lned by LC-MS protein profiling in human blood samples with two patient
groups
(CASE and CONTROL). hl this example, another set of multiplex biomarkers was
used
to construct an MSBA model, with different patient dataset of much larger
size.
Experimental details
Procedures of plasma sample collection and preparation were the same as above
described example.
MSBA model building and evaluation
96 patient sample data consisting of 21 cases and 75 controls were used as
training
dataset. As the number of samples increased, sample variability did also
increase. Thus
firstly we applied Smirnov test to remove outlier signals. Consequently, 5
samples that
contain so many outliers were also removed from the analysis set. Using the
result of t-
test, we constructed an initial MSBA model containing 100 candidate biomarker
signals.
Then by recursively applying discriminant analysis to remove the minimum
contributed
signal from the discrimination model, finally we obtained an MSBA model with
10
signals. See Table 3 for the listing of 10 multiplex marlcer signals.
In this example, to calculate the discriminant score (z), we used another
scoring function
presented as the following equation.
z=Ea1=x;+C (Eq.6)
(x; : signal intensity, ai & C: coefficients described in Table 3)
If the score value is positive, it is inteipreted as CASE, otherwise CONTROL.
In order to evaluate the predictability of the model, we tried to predict CASE
/
CONTROL using the same dataset (21 cases + 75 controls, 96 in total).
According to the
MSBA diagnosis procedure described above (also illustrated in Figure 2), the
10-
multiplex biomarlcer signals were identified for each sample, and the
quantification of
each peak signal was perfonned. From the quantity of all the 10-multiplex
signals, the

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
33
MSBA score for each subject was calculated using the above described foili7ula
(Eq.6). hi
this example, sttbjects whose MSBA score was equal or greater than 0 were
diagnosed as
cases. Table 4 and Figure 6 show the auto-discrimination results. In Figure 6,
each dot
represents a patient, and vertical axis represents the discriminant score (z).
In this
exainple, sensitivity was 85.7%, specificity was 82.7%, and prediction
accuracy was
83.3%.

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
34
Table 2
MSBA Clinical
Sample#
Score diagnosis diagnosis
SBJ201 0.542 control control
SBJ202 1.586 Case Case
--------------- ---------- --------------- ----------------------
SBJ203 3.688 Case Case
--------------- ---------- ---------------- ----------------------
SBJ204 1.964 Case Case
---------------- ---------- ----------- ----------------------
SBJ205 3.339 Case Case
--------------- ---------- ---------------------------------------
SBJ206 0.682 control control
---------------- ---------- ---------------- ----------------------
SBJ207 4.430 Case Case
-------------- ---------- ---------------- ----------------------
SBJ208 0.241 control control
---------------- --------------------------------------------------
SBJ209 1.947 Case Case
---------------- ---------- ---------------- -------------- -------
SBJ210 3.456 Case Case
---------------- ---------- ---------------- ----------------------
SBJ211 0.483 control control
---------------- ---------- ---------------- ----------------------
SBJ212 2.835 Case Case
---------------- ---------- ---------------- ----------------------
SBJ213 0.268 control control
-----------------
SBJ214 0.522 control control
-------------- ---------- ---------------- ----------------------
SBJ215 0.332 control control
----------------
SBJ216 1.091 Case Case
---------------- --------------------------------------------------
SBJ217 0.840 control control
--------------------
SBJ218 0.269 control control
-------------------------------------------------------------------
SBJ219 1.691 Case Case

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
Table 3
i RT MZ a'
1 36.44 758.5 13.00464
2 37.04 509.2 17.09257
3 37.81 611.2 2.55337
4 39.68 652.3 3.74875
5 39.84 671.2 -32.15044
6 39.97 523.1 -18.14263
7 50.35 974.3 2.41990
8 52.94 758.7 -29.15126
9 59.27 800.8 -17.91525
10 59.41 786.7 5.34939
C= -5.92638

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
36
Table 4
MSBA diagnosis Clinical MSBA diagnosis Clinical
Sample# Sample#
z-score diagnosis diagnosis z-score diagnosis diagnosis
SBJ301 -0.6753 control case SBJ349 9.3784 case case
SBJ302 -11.1851 control control SBJ350 9.4884 case case
_.... ..e...._ ._ ,. .. .... ...... ...... _......... . ..
....... ._.. ... .._ ......
SBJ303 -6.5025 control control SBJ351 -8.1428 control control
...... . ...... -W...... . _ ......... .. ...... .......__ ......
SBJ304 -5.9519 control control SBJ352 -9.2091 control control
_ _..e _......... _ ..
SBJ305 -6.0320 control control SBJ353 -4.8211 control control
SBJ306 -2.6594 control control SBJ354 22.5948 control control
....... ... _....... .......... ...._, .._ .. . _ ..
SBJ307 4.9968 case control SBJ355 -8.7626 control control
SBJ308 -9.1398 control control SBJ356 10.5248 control control
..
SBJ309 19.1731 case control SBJ357 6.5157 case case
,.... ...... ....... .... _.... ......... .... ... ,.,... .....
SBJ310 -7.1342 control control SBJ358 -4.0407 control control
..... ... . ......._,_............... ......... ...,. _ _.._......
SBJ311 -8.4585 control control SBJ359 25.0186 case case
.......... ... .. .... .. _........... _m ...... .. - .........._ ........
......... .. _ ..._........ ............
SBJ312 -6.7038 control control SBJ360 4.9111 case control
....m....... . -, _...... ._.........._ ..... ....... .....
SBJ313 -3.6724 control control SBJ361 -6.5589 control control
. ...... ...................... _ ..., __.... ............ ...
SBJ314 -5.8528 control control SBJ362 -3.6577 control control
SBJ315 8.8020 case control SBJ363 11.7341 case control
.................. ..
SBJ316 -2.6856 control control SBJ364 1.7725 case control
. ........... .....
SBJ317 -7.5656 control control SBJ365 -8.0470 control control
...... . ..... ........... ......
SBJ318 -1.9188 control control SBJ366 0.0504 case control
................... ......
SBJ319 9.9846 case case SBJ367 2.5321 case case
......... ..... .... ..... .... .......,........._....
.............._....................._.......... . ............. ..
SBJ320 -4.2516 control control SBJ368 38.8935 case control

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
37
SBJ321 6.9731 case control SBJ369 6.2574 case case
............ ..... .......................... .......... .....
................._... ............... _..... .._....... ..... .......
_....~......._.........__...
SBJ322 -6.2063 control control SBJ370 -1.2407 control control
___._....._........._..__..... ._.......... ............. ....._._.........
SBJ323 10.9300 case control SBJ371 -7.9560 control control
~_ . ..........._.... ......... .. __.,...._.. . _...... .._._..
SBJ324 -5.6956 control control SBJ372 -8.4344 control control
SBJ325 2.3495 case case SBJ373 -8.2976 control control
SBJ326 3.4061 case control SBJ374 10.4185 control control
.,... . _.__...... _
SBJ327 -3.8970 control control SBJ375 -6.5147 control control
SBJ328 -0.7222 control control SBJ376 2.6958 case case
SBJ329 2.3875 case case SBJ377 -5.1208 control control
SBJ330 -1.6660 control control SBJ378 4.0404 case case
SBJ331 -1.2825 control control SBJ379 3.0357 case case
SBJ332 -8.4760 control control SBJ380 10.3597 case case
SBJ333 -3.8589 control control SBJ381 -3.2779 control control
..._..... ...._ ....... ..- ............... ._ -........ .
SBJ334 -3.9578 control control SBJ382 -6.5229 control control
.... _ .. .._ ...... . ..
SBJ335 -2.6746 control case SBJ383 14.4020 case control
SBJ336 -5.2540 control control SBJ384 -6.7897 control control
2.9959
SBJ337 case case SBJ385 24.2026 control case
SBJ338 -5.6114 control control SBJ386 -8.7277 control control
SBJ339 -2.9495 control control SBJ387 6.4477 case case
.... _......... .....
SBJ340 -4.8496 control control SBJ388 -2.8463 control control
..._............. .............. .._.......... ............ _..._......
..,.... _............ _ .. ......
SBJ341 -9.7494 control control SBJ389 -8.2789 control control
...... .... ...- ..............
SBJ342 -8.3938 control control SBJ390 -3.9892 control control
_ ._.... ... ..... __..............................
SBJ343 -6.4816 control control SBJ391 -9.9261 control control

CA 02651934 2008-11-12
WO 2007/144606 PCT/GB2007/002187
38
SBJ344 5.4095 case case SBJ392 3.9925 case case
SBJ345 7.5053 case case SBJ393 -4.5450 control control
...........
...._.
SBJ346 -7.3492 control control SBJ394 -0.3773 control control
...
SBJ347 -11.2632 control control SBJ395 -4.7587 control control
... _......... _....... SBJ348 124.7669 case control SBJ396 -3.9212 control
control

Representative Drawing

Sorry, the representative drawing for patent document number 2651934 was not found.

Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee and Payment History should be consulted.

Event History

Description	Date
Application Not Reinstated by Deadline	2011-06-13
Time Limit for Reversal Expired	2011-06-13
Deemed Abandoned - Failure to Respond to Maintenance Fee Notice	2010-06-14
Inactive: Applicant deleted	2010-01-14
Inactive: Cover page published	2009-03-19
Correct Applicant Requirements Determined Compliant	2009-03-16
Inactive: Notice - National entry - No RFE	2009-03-16
Inactive: First IPC assigned	2009-02-26
Application Received - PCT	2009-02-25
Inactive: Correspondence - PCT	2009-02-19
Inactive: Sequence listing - Amendment	2009-02-11
Inactive: Declaration of entitlement - PCT	2009-02-09
National Entry Requirements Determined Compliant	2008-11-12
Application Published (Open to Public Inspection)	2007-12-21

Abandonment History

Abandonment Date	Reason	Reinstatement Date
2010-06-14

Maintenance Fee

The last payment was received on 2009-05-19

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

the reinstatement fee;
the late payment fee; or
additional fee to reverse deemed expiry.

Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Fee History

Fee Type	Anniversary Year	Due Date	Paid Date
Basic national fee - standard			2008-11-12
MF (application, 2nd anniv.) - standard	02	2009-06-15	2009-05-19

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
ASTRAZENECA UK LIMITED
MEDICAL PROTEOSCOPE CO LTD

Past Owners on Record
ATSUSHI OGIWARA
FREDRIK NYBERG
GYOERGY MARKO-VARGA
HISASE ANYOJI
MITSUHIRO KANAZAWA
TAKAO KAWAKAMI
TAKESHI KAWAMURA
TOSHIHIDE NISHIMURA
YUTAKA KYONO

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Description	2009-02-11	3	52
Description	2008-11-12	38	1,801
Claims	2008-11-12	3	103
Abstract	2008-11-12	1	85
Drawings	2008-11-12	6	138
Cover Page	2009-03-19	2	49
Description	2009-02-11	40	1,835
Reminder of maintenance fee due	2009-03-16	1	111
Notice of National Entry	2009-03-16	1	193
Courtesy - Abandonment Letter (Maintenance Fee)	2010-08-09	1	172
PCT	2008-11-12	5	158
Correspondence	2009-02-09	2	74
Correspondence	2009-02-19	1	42

Biological Sequence Listings

Choose a BSL submission then click the "Download BSL" button to download the file.

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Please note that files with extensions .pep and .seq that were created by CIPO as working files might be incomplete and are not to be considered official communication.

BSL Files

File Name	Received On	Size (bytes)
A651934.TXT	2009-02-11	543
A651934.PEP	2009-02-11	655

To view selected files, please enter reCAPTCHA code :

Language selection

Menus

English Abstract

French Abstract

Event History

Abandonment History

Maintenance Fee

Fee History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 2651934 Summary

English Abstract

French Abstract

Event History

Abandonment History

Maintenance Fee

Fee History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.