Note: Descriptions are shown in the official language in which they were submitted.
CA 02694828 2010-04-01
Patent Application No. 2,694, 828 by Gerald Buckler
of 1205 Dorchester Avenue
Ottawa, Ontario
K1Z 8E3
Title of Invention:
Prototype of a computerized system to automatically acquire, statistically
test and supply the
parameters needed (standard deviation and percent recovery--obtained by
running control
samples) to compute 95% confidence intervals for chemical (and other)
measurements contained
in an organizations's main database (or locally) and ultimately to unbias
those measurements and
their confidence intervals.
Table of Contents:
Operational Description and Theory of Invention, page 1 to page 25.
Special Note to Patent Office: This file is to be included with Description-
filel.pdf,
Description-file3.pdf and Description-file4.pdf as previously submitted for
this application. This
particular file (Description-file2-corrected.pdf) contains a number of
corrections and is being
submitted to replace the file (Description-file2.pdf) as previously submitted.
Operational Description and Theory of Invention:
Some Basic Premises:
1) Quantitative chemical analysis is done according to high technological
standards. The
analytical methods are documented and often published. The analytical
chemistry methods are
given different identification numbers and are always followed to the letter
for every analytical
run. Chemists and Chemical Technicians know how to carry out their trade. They
know how to
do exact weighings on five decimal place high precision balances. They know
how to
quantitatively transfer substances in solution from one flask to another
without losing any. They
know how to prepare solutions to exact volumes and exact concentrations. They
know the theory
of matter, basic chemistry and physics. There is absolutely no reason
whatsoever for one
chemical analyst to get a different percent recovery or standard deviation for
the same material
sample under identical conditions. Great care is taken by laboratory staff to
ensure that specific
analytical methods can be repeated over and over again in an identical manner.
However,
notwithstanding all of this, there will be random variation occurring in the
various stages of the
CA 02694828 2010-04-01
chemical processing and in some of those stages, small losses will also occur
which leads to one
obtaining something less than 100% recovery. But the better methods have the
bigger number of
stages in them to take care of all potential interferences. This leads to a
slightly less that
desirable percent recovery at times but this can be offset by allowing the
chemical measurements
to be unbiased by the DBMS in the main database. This latter facility would be
transparent to all
laboratory staff by the proposed computerized system. This has all been said
to justify the
making of the first premise: The within-run variances of the premeasurements
and
submeasurements of a particular analytical method in a particular laboratory
can be considered
to be more or less constant over the several ongoing analytical runs that are
routinely being
made in the laboratory even though different laboratory analysts are
performing the analyses.
2) The second premise that needs to be made is that: All random variation that
is present in
analytical chemistry measurements comes from the various stages of the
chemical processing
that occurs when performing the analyses. The specific analytical chemistry
method as it is
being done in a particular laboratory is a specific stochastic process
generator. Therefore only
the particular stochastic characteristics of the particular chemistry method
need be determined
in order to obtain the standard deviations for all the measurements to be
generated by the
analytical method. This obviates the need to be continually determining
confidence intervals
from chemical measurement data.
3) A third premise that can be made is that: While obtaining the particular
stochastic
characteristics of a particular analytical method in a particular laboratory
at a specific
measurement level, the percent recovery can also often be concurrently
determined from the
same control sample data for that measurement level.
4) A fourth premise: The nature of the stochastic variation that occurs in
each of the various
stages of the chemical processing is well known and understood by professional
chemists. I fa
modification to the method ever needs to be done, the determined stochastic
characteristics can
often be modified by careful thought, chemical process stage testing, such as
of a new model of
chemical instrumentation, and a minimum of re-running of control samples.
5) A fifth premise is that: Stochastic variation is inherited from one stage
of the chemical
processing to another in the analytical method in such stochastic manner so as
to be
effects-additive. Even the tolerances of standard laboratory labware such as
volumetric flasks
and pipettes are inherited in this stochastic manner for both regular samples
and calibration
standards. This means that the particular stochastic characteristics of a
particular analytical
chemistry method in a particular laboratory can be determined by running the
appropriate
control samples at specific measurement levels. As to be explained more fully
later on, all of this
control sampling can be routinely done at a leisurely pace over several
analytical runs if only
the collection of control sample data can be given its justifiable priority
and initiated promptly
by management officials. Often, standard deviations can be obtained from
database records of
regular sample duplicates.
6) A sixth premise is that: It can be shown that systematic error in chemical
measurements
cannot properly exist beyond the level at which the measurements are properly
unbiased. The
CA 02694828 2010-04-01
levels of concern are: (1) within analytical runs, (2) between analytical runs
in the same
laboratory, and (3) between laboratory biases. For reasons which are quite
self-evident in light
of the revelations having been made in this paper, the best level to unbias at
is (2), the
laboratory level. In other words, the measurements from the individual
analytical methods in
each particular laboratory would be unbiased in such manner as this
computerized system is
capable of doing, as has been explained, and if possible, this would be done
by the DBMS in the
main database, using the proper parameters that are supplied to it, but it can
also be done within
the laboratory, if necessary.
7) In paragraph (6), it was noted that the unbiasing of the chemical
measurements is best done
within the main database, but that it could be done within the laboratory, if
necessary. A
particular example where this might find application could be that of a
typical government
research scientist. It is well known that research scientists almost
invariably adopt the strategy of
choosing the particular analytical methods they need in the beginning of their
career and then to
keep them for the duration of their research tenure. This is done to overcome
the problem of bias
between analytical methods, but as is evident from this paper, there could be
systematic error
between analytical runs. In addition, they are often driven to produce reams
of analytical data in
order to obtain sufficiently high statistical sample sizes for statistical
testing purposes and for
comparison to the data of other scientists. Often, it is desired to obtain a
high degrees of freedom
t-distribution confidence interval for publication. It can be shown that a t-
distribution
confidence interval is valid for significance testing but is useless and
deceiving as a descriptive
statistic. The proposed computerized system solves all these problems by
determining the high
degrees of freedom standard deviations needed to obtain the proper confidence
intervals from
the very beginning of the research project, from the analytical method itself,
rather than from
the reams of data produced by it for each new data set, and the research
scientists can now
compare unbiased data and confidence intervals with each other, resulting in
huge savings in
time and money. This is the seventh premise, the research scientist
functioning as the
administrator of the computerized system.
Programming the DPSP:
Variances are never entered as predefined program variables into the DPSP,
only their standard
deviation counterparts (this helps to control the number of decimal places
needed). With the
exception of the standard deviation of the slope, which is entered in terms of
(AU, XAU, or
AREA) per PPM at Q,-level, all standard deviations must be entered into the
DPSP in terms of
PPM at Q2-level. This subsection and the next one deals with how to estimate
the sample
standard deviations of the premeasurement and submeasurement random variables
that are
inherent in almost every analytical chemistry method that is out there. First
of all, it should be
documented that the author is recommending that a minimum of 15 degrees of
freedom be
established as a minimum industry standard for these standard deviations
before they can be
thought of as being a substitute for their population parameter counterparts
for routine
applications and reports. It can be shown that at a 95% confidence coefficient
for the sample
standard deviation at 15 degrees of freedom that it will be about 55% too high
2.5% of the time
and 26% too low 2.5% of the time. What this translates into is that a 95%
confidence interval for
the mean of measurements calculated as plus or minus 2.0 sample standard
deviations at
CA 02694828 2010-04-01
15 degrees of freedom will produce an actual confidence coefficient between
95% and 97.4%
about 68% of the time and between 85.4% to 95% about 32% of the time. But this
should be
acceptable for routine applications and reports. It can be shown, using the
theory of multiple
tests, that a 95% minimum confidence coefficient confidence interval (MCCCI)
for the population
mean of measurements should be calculated as plus or minus 3.08 sample
standard deviations at
15 degrees of freedom. This includes unbiasing of the sample standard
deviation [7]. Such a
confidence interval will be at 95% confidence coefficient, or above, all of
the time and would
therefore be more suitable for legal purposes such as court proceedings. Some
chemical analysts
may want to make do with some lessor number of degrees of freedom, say a
minimum of ten,
where, as in gas chromatography, it can take up to an hour to get a single
reading on the gas
chromatograph. In this case, it would be presumed, that an exception could be
made. But the
reliability of the 95% confidence intervals calculated as plus or minus 2.0
sample standard
deviations at 10 degrees of freedom will be much less. However, the multiplier
for the sample
standard deviation could be increased as an expedient measure.
Note that these standard deviations with this many degrees of freedom do not
need to be
determined in a single analytical run. They can be obtained at a leisurely
pace by running the
appropriate control samples as time and circumstances permit and the results
entered into the
appropriate PAF's. After a period of some weeks, months, or even in some
cases, a couple of
years, the estimates for these standard deviations, at the minimum standard of
15 degrees of
freedom per measurement level, will be achieved. But the sooner one starts
collecting the data,
the better. Any authoritative reference on industrial quality control will
specify that such control
sampling must take place for some required period of time before legitimate
quality control
charting can begin. It is the same principle. In the meantime, before the
required minimum
standard is achieved, the regular measurements that are routinely being
generated in the
laboratory can be entered into the DPSP for the particular analytical
chemistry method and from
there eventually will be entered into the main database. From time to time,
the DBMS of the
main database will check the DPSP for each particular analytical chemistry
method in each
particular laboratory to see if the required minimum standard, standard
deviations and percent
recoveries, have been entered into the temporary database of the DPSP
alongside the
identification numbers for the respective samples. When this happens, the
required standard
deviations and percent recoveries for the particular samples will be uploaded
and entered into the
main database. Of course, all of this, or any part of it, can be done manually
with now
commonplace computer spreadsheet technology. The minimum standard for the
percent
recovery is four recovery constants (RC) or four recovery samples (RS) per
measurement level
(one RC or one RS per measurement level per run) to be obtained over four
analytical runs for
each of the required measurement levels. If a recovery sample or recovery
constant cannot be
run, the developer of the analytical method will supply the estimate. The
percent recovery is
entered into the DPSP as a percentage (this is the most straightforward and
intuitive way) for
uploading into the main database where it is then converted by the DBMS to its
decimal
equivalent.
It should be noted that the primary standards for very new and exotic
chemicals are often far
from being ideally pure. If a recovery sample is run using a primary standard
chemical of, say,
80% theoretical purity, and the same primary standard chemical is used to make
up the
CA 02694828 2010-04-01
calibration standards, and if, in neither case, the actual purity is not being
taken into account to
determine the theoretical weights of primary standard chemical required to
make up the recovery
sample and standards, then the percent recovery obtained is only for the
chemical processing
stages of the analytical method and not the whole method. The recovery run
could turn out to be
100% in this hypothetical case (it is as though the primary standard chemical
is being considered
to be 100%). If, indeed, this were the case, then the measurements being
produced by this
method would be, consistently, 20% too high throughout whole measurement
spectrum (method
bias). It is common practice in many laboratories to do a recovery run in just
this way. That is
why it is absolutely stipulated for the purposes of this system that the
actual lot analysis or purity
of the primary standard chemical to three significant figures always be taken
into account in
determining the theoretical weights used for the recovery sample run. Then the
actual percent
recovery for this hypothetical method will turn out to be 120%, when the
actual lot analysis or
purity of the primary standard chemical is taken into account in determining
the theoretical
weights used to make up the recovery sample but not the instrument calibration
standards. This
common practice with the instrument calibration standards does not matter to
this computerized
system. But when the system is implemented, a decision must be made, whether
or not to
continue not taking into account the actual lot analysis or purity of the
primary standard
chemical in determining the theoretical weights used to make up the instrument
calibration
standards for all future analytical runs of the analytical chemistry method in
the particular
laboratory.
Generally speaking, though not always, the required minimum standard, standard
deviations and
percent recoveries, need to be determined at three different specific
measurement levels, low,
medium and high, before being entered into the DPSP for each particular
analytical chemistry
method being used in the laboratory. The DPSP has been programmed to adjust,
usually by
some form of interpolation or extrapolation to explained later, the required
minimum standard,
standard deviations and percent recoveries, determined at low, medium and high
measurement
levels, that have been entered into it as predefined program variables, so
that they can be applied
to the routine overall measurements at M-level of the material samples at
their various
measurement levels. The DPSP has been programmed to further adjust the
required minimum
standard, standard deviations for application to the routine overall
measurements at M-level of
the material samples being analyzed according to the following data that is to
be input into the
data entry screen of the DPSP by the chemical analyst doing the particular
analytical run:
1) the deviation of the material sample weight or volume of the sample or
subsample replicates
being analyzed from the standard nominal value required by the analytical
chemistry method in
the particular laboratory. A simple ratio, called an 'f' factor, is calculated
by the chemical
analyst and entered into the data entry screen of the DPSP. The 'f' factor is
calculated as:
f _ nominal standard sample weight or volume
(actual or nominal) non-standard sample weight or volume
2) the number of material sample subsample replicates being processed in the
particular
analytical run that is being entered into the data entry screen of the DPSP.
CA 02694828 2010-04-01
3) the number of reagent blanks being processed in the particular analytical
run that is being
entered into the data entry screen of the DPSP. This includes the number
"zero" if there are no
reagent blanks being processed. Alternatively, a different version of the
program will not have a
data entry column for this or it will be hidden.
4) the number of runs being made on the instrument calibration standards for
the particular
block of samples and/or subsample replicates that is going to be applied to
them (by averaging
the slopes, if necessary) that is being entered into the data entry screen of
the DPSP. The
possibilities are: one slope or two slopes (being averaged), if calibration
standards are being run.
More than one run is sometimes made on the instrument calibration standards if
there are any
sensitivity changes occurring in the instrument during the course of reading
all the sample or
subsample extracts on the instrument.
5) any front-end or back-end dilutions or concentrations that are required for
any individual
samples or subsample replicates that are over and above all of those that are
specified in the
documented analytical chemistry method (that is, superimposed) for all samples
or subsample
replicates that are being entered into the data entry screen of the DPSP.
6) the number of replicate instrument readings, that are being made on each
individual sample
extract and/or on each replicate subsample extract.
The computer data entry screen contains the following columns:
Column 1: the current date.
Column 2: the lab-method identifier. This identifies the particular analytical
chemistry method
being done in the particular analytical laboratory.
Column 3: the unique sample identifier.
Column 4: the single or average (if more than one subsample replicate was
done) original
measurement for the sample.
Column 5: the number of subsample replicates done on the sample, for the
analytical run.
Column 6: the front-end overall superimposed (that is, over and above any
dilutions/
concentrations specifically indicated to be done in the analytical method
during the regular
chemical processing) dilution/concentration factor for the sample.
Column 7: the back-end overall superimposed (that is, over and above any
dilutions/
concentrations specifically indicated to be done in the analytical method
during the regular
chemical processing) dilution/concentration factor for the sample.
Column 8: the "f' factor for the sample, as explained above.
CA 02694828 2010-04-01
Column 9: the number of reagent blanks that were run for the block of samples
or subsample
replicates in the analytical run. This value can be "zero," if no reagent
blanks have been included
in the current analytical run.
Column 10: the number of calibration slopes (zero, one or two) that were run
for the block of
samples or subsample replicates in the analytical run. This value can be
"zero," if no calibration
standards are being used in the particular analytical chemistry method. Note
that in a titrimetric
analytical method, the titer [4] is equivalent to the value of the slope but
it usually has no
significant variance, so a "zero" should be entered into column (10) or else
the standard deviation
of the titer would have to be determined and entered into the DPSP and a "1"
entered into
column (10).
Column 11: the number of replicate standard instrument readings that were made
on each
sample or subsample replicate being run. Note that all replicate instrument
readings must consist
of one or more (all to be averaged along with the original reading) standard
readings which may
already consist of one or more (averaged) standard sub-readings such as occurs
with standards
additions "at the instrument" or as an expedient (when the sub-readings are
averaged) to help
normalize the output of the instrument while reducing the variation thereof.
If multiple
(averaged) instrument sub-readings are a part of standard processing
conditions (that is, they are
done on each regular or control sample extract, each replicate subsample
extract, and each
calibration standard), then these same multiple sub-readings must be done when
determining the
various standard deviations on all of the various PAF-forms, including the
standard deviation of
the instrument as it is being determined on the STAN-DUP, CAL-DUP or CAL-DATA
forms. In
the latter case though, the standard deviation of the instrument could
alternatively be determined
as the parent random variable of the instrument (that is, considering each
individual
non-composite reading to be a single outcome from the instrument) and then the
variance thereof
(obtained from multiple consecutive individual non-composite instrument
readings using a single
sample extract or standard solution) can be adjusted so as to comply with the
number of multiple
sub-readings which are standard. Only the respective standard deviation
determined from that
variance so adjusted can be entered as an alternative predefined program
variable into the DPSP
once it is converted to PPM at M-level by multiplying by the standard "c"
factor for the specific
analytical chemistry method. On the other hand, the standard deviation of" y"
given "x" (also the
standard deviation of the instrument response variable) determined from each
run on the
calibration standards, would normally be calculated from the standard number
of instrument
sub-readings already having been made on each calibration standard so that it
would not
normally need to be adjusted before entering it as a predefined program
variable into the DPSP, it
having been converted to PPM at M-level by multiplying by the standard "c"
factor for the
specific analytical chemistry method.
All of the above adjustments to the required minimum standard, standard
deviations and percent
recoveries, for the particular DPSP that are to accompany the overall
measurements at M-level as
they are being generated by the particular analytical chemistry method as it
is being done in a
particular laboratory and entered into the main database, are pretty
straightforward to program
into the DPSP although a lot of definitions had to be formulated in order to
control the data entry
process on behalf of chemical analysts performing the analyses. Insofar as the
"calculations
CA 02694828 2010-04-01
formula" of the particular analytical chemistry formula is concerned, it is
almost invariably made
up of "statistical constants," including the required standard nominal sample
weight or volume
in the denominator thereof, so that the entire formula is almost invariably
reducible to a single
standard "c" factor. What is meant by "standard" here is that analytical
chemistry methods
generally call for a specific "nominal" sample weight or volume to be measured
out for each
sample or subsample replicate to be run. For example, this could be 10.0 grams
of material
sample homogenate. By "nominal" is meant that the chemical analyst could, for
example, weigh
out 9.88, 9.93, 10.03 and 10.11 grams for a group of four subsample
replicates. In this case, the
"f' factor, as explained above, would be equal to 1.00. The "f' factor column
is therefore
pre-loaded with "1.00's" for every row for the convenience of the chemical
analyst in reducing
data entry time and to help reduce data entry errors. But if only
approximately 3.00 grams were
available for analyses, most likely the actual sample weight, say 3.08 grams,
would be used to
calculate the "f' factor. But if approximately 3.00 grams of an SRM material
were to be run as a
control sample for every analytical run, on an ongoing basis, then even though
actual weights
would be used for each run, the number 3.00 would be used to calculate the 'f'
factor since 3.00
grams would be the "target weight" for each actual weighed-out portion of SRM
material. This
example is given here but there were other such definitions that had to be
formulated.
There are some more statements that are required about how to program the DPSP
to do the
interpolation and extrapolation required in order to adjust, the required
minimum standard,
standard deviations and percent recoveries, determined at low, medium and high
measurement
levels that have been entered into the DPSP as predefined program variables,
so that they can be
applied, after being adjusted within the DPSP, to the routine overall
measurements at M-level of
all the material samples being done by the analytical method, at their various
measurement
levels. Originally, it was decided to include also the number of degrees of
freedom as a separate
adjusted parameter (for the adjusted standard deviations) to be included along
with the adjusted
standard deviations and percent recoveries which were to eventually be entered
into the main
database alongside the material sample and its measurement. But the approach
taken in this
paper is to establish a defined "minimum standard" for the number of degrees
of freedom for the
standard deviation, and statistical sample size for the percent recovery,
eliminating the need for
this option. But, of course, it can be done if desired. It should be noted
here also that, although
the interpolation and extrapolation techniques that are to be described here
are in terms the
required minimum standard, standard deviations and percent recoveries,
determined at low,
medium and high measurement levels, there are many cases where only two or
even one
measurement level would suffice. For example, a particular analytical
chemistry method may
only be in need of standard deviations and percent recoveries, for a
particular restricted range of
measurement levels, the ones being used, for example, to test for compliance
of a certain food
product to government imposed standards and regulations. But, for the purpose
of explaining of
the techniques, it will be assumed that there are three.
The main strategy used to describe the interpolation and extrapolation
techniques will be to
construct in one's imagination, a graph of the three plotted points using
standard deviations or
percent recoveries on the y-axis and measurement level on the x-axis. Taking
the standard
deviations first, linear interpolation would most likely be used, exclusively,
to determine the
adjusted standard deviations between points 1 (low measurement level), 2
(medium measurement
CA 02694828 2010-04-01
level) and 3 (high measurement level). Subsequent adjustments will be made
further on in the
DPSP to the adjusted standard deviation determined here. Each of the three
original plotted
points must be for the standard deviation of a single analytical determination
at M-level.
Originally, it was thought that a "computer table" of values of the standard
deviations and
percent recoveries would be needed, but it was found that simple mathematical
formulas would
suffice. Between point 1 and the origin (0,0) of the imagined graph, linear
interpolation or a line
constructed from a plot of the standard deviation according to the holding of
the coefficient of
relative variance (crv) of point 1 constant throughout the interval could be
used, depending on
what points 1, 2, and 3 are seen to doing. Such a plot makes a very nice
curved line passing
through the origin in somewhat of a logarithmic fashion. The system
administrator, laboratory
supervisor or analytical chemistry method developer would be the one making
the choices. For
points above point 3, linear extrapolation could be used, or extrapolation by
means of holding the
coefficient of relative standard deviation (crsd) of point 3 constant could be
used, or
extrapolation by means of holding the "crv" of point 3 constant, as previously
explained, could
be used. Again, it depends on what the points 1, 2 and 3 are seen to be doing.
For the percent
recoveries, the task is even easier. Only linear interpolation need be used
from the origin
through to point 3 and beyond that the value at point 3 is extrapolated as a
maximum.
Note: Unlimited extrapolation for the standard deviation is allowed to be made
for all
measurement levels above the highest measurement level (point 3) where the
standard deviations
were determined and for the percent recovery, the value at this point is
extrapolated as a
maximum for all measurement levels above it. This is allowed for the purposes
of the algorithm
that is going to be used to determine the adjusted standard deviations and
percent recoveries.
For example, the extrapolation may exceed the highest measurement level (point
3) by a factor of
ten times, if there is a back-end overall superimposed dilution/concentration
factor equal to ten.
This may not seem very reasonable but the limiting factor for the percent
recovery and standard
deviation is usually not the chemical processing stages themselves (overall
measurement
spectrum) but the limited measurement spectrum of the instrument.
It is necessary, at this point, to fully describe how the algorithm is used to
adjust the required
minimum standard, standard deviations and percent recoveries, determined at
low, medium and
high measurement levels that have been entered into the DPSP as predefined
program variables,
so that they can be applied, after adjustment, to the routine overall
measurements at M-level of
all the material samples being done by the analytical method, at their various
measurement
levels. To understand this is to understand how the system works. First of
all, it needs to be
pointed out that along with each of the predefined program variables for the
standard deviations
and percent recoveries, determined at low, medium and high measurement levels,
there are other
predefined program variables entered in the DPSP that record the number of
reagent blanks and
slopes that were being run when the various PAF-forms were being used to
determine the
minimum standard, standard deviations and percent recoveries for the
measurement levels. The
standard deviation of the chemical instrumentation being used and the standard
deviation of the
slopes, both of which were determined under standard processing conditions,
are also to be
entered into the DPSP. These are obtainable from any of the STAN-DUP, CAL-DUP
or
CAL-DATA forms. It is the responsibility of the system administrator,
laboratory supervisor or
analytical method developer, to enter all these predefined program variables
into the DPSP.
CA 02694828 2010-04-01
Then, for the purpose of describing the algorithm below, it will be assumed
that the chemical
analyst will have also entered into the data entry screen of the DPSP, the
required variables
concerning each material sample or group of subsample replicates that have
been run. The
algorithm will be described in stepwise fashion with annotation:
Data Processing Algorithm:
Note: There are three possible "steps" that can be superimposed onto the
standard chemical
processing stages of the analytical chemistry method and each has its
equivalent "factor" to be
used in calculating the overall measurement. For example, there can be a front-
end overall
superimposed dilution/concentration giving rise to a front-end overall
superimposed dilution/
concentration factor and there can be a back-end overall superimposed
dilution/concentration
giving rise to a back-end overall superimposed dilution/concentration factor.
In other words, a
superimposed dilution/concentration factor is the reciprocal of the degree of
superimposed
dilution/concentration that was used for the sample. A non-standard sample
weight or volume
may also be used. An "f' factor has been created for the chemical analyst to
enter into the data
entry screen so that the standard deviations may be adjusted according to the
ratio of the standard
to non-standard sample weight or volume. It can be shown that the non-standard
sample weight
or volume and the front-end overall superimposed dilution/concentration both
affect the input to
the standard chemical processing stages while the back-end overall
superimposed dilution/
concentration only affects the output. It can be further be shown that for
purposes of
determining a mock measurement for entering the computer table at the correct
g-amount of
analyte flowing through the various standard chemical processing stages, that
the "f' factor
and/or the front-end overall superimposed dilution/concentration factor should
be removed from
the original overall single or average measurement for the sample that has
been entered into
column (4). This is done by dividing by the respective "factors." The "f'
factor is an implicit
multiplicand in the calculations formula because the actual non-standard
sample weight or
volume will have been used in the denominator of the calculations formula
instead of the actual
standard sample weight of volume. The back-end overall superimposed
dilution/concentration
factor is not taken out in this manner because then the mock measurement would
no longer be
representative of the correct g-amount of analyte flowing through the various
standard chemical
processing stages of the analytical chemistry method. While the standard
deviation of the
various standard chemical processing stages of the analytical method are
unaffected by this
choice (the back-end dilution/concentration, a divisor, and the back-end
dilution/concentration
factor, a reciprocal multiplicand, cancel each other off), the standard
deviation of the instrument
can be magnified (or diminished) because it is only being multiplied by the
back-end overall
superimposed dilution/concentration factor and nothing is cancelling it off. A
back-end
superimposed dilution is usually not made unless the concentration of the
sample extract is very
high and above the range of the calibration standards. To compensate for this
possibility, the
DPSP does unlimited extrapolation above the highest measurement level at which
the standard
deviations for the DPSP were determined so that the standard deviation of the
sample will
continue to vary as it has been doing over the standard measurement levels.
Since the system
administrator, laboratory supervisor or analytical method developer will have
entered the
standard deviation of the instrument and the standard deviation of the slopes
into the DPSP as
predefined program variables and the chemical analyst will have entered the
number of replicate
CA 02694828 2010-04-01
(and averaged) instrument readings that were made on each sample or subsample
replicate along
with the back-end overall superimposed dilution/concentration factor, the
algorithm given below
will be adjusted to deal with these possibilities.
1) Divide the single or average measurement for the sample that has been
entered into
column (4) by the "f' factor entered in column (8). Call this result "mock
measurement-1" and
store it in computer memory. The "f' factor would have been used implicitly as
a multiplier in
the traditional calculation procedure when an (actual or nominal) non-standard
sample weight or
volume was used in determining the single or average measurement for the
sample that was
entered into column (4). Thus, by this action, it is removed.
2) Divide the mock measurement-1 determined in step (1) by the front-end
overall
superimposed dilution/concentration factor from column (6). Call this result
"mock
measurement-2" and store it in computer memory. The front-end overall
superimposed
dilution/concentration factor would have been used as a multiplier in the
traditional calculation
procedure for determining the single or average measurement for the sample
that was entered
into column (4). Thus, by this action, it is removed. This mock measurement-2,
is the most
representative measurement for entering the computer table in order to
determine the eventual
standard deviation and the percent recovery for the single or average
measurement that was
entered into column (4).
Note: If there have been no superimposed dilutions/concentrations, then "1.00"
will have been
automatically entered into both column (6) for the front-end overall
superimposed dilution/
concentration factor and column (7) for the back-end overall superimposed
dilution/
concentration factor. This is also true for the "f' factor entered in column
(8), if there have been
no non-standard sample weights or volumes used. "n," the number of subsample
replicates done
on the sample (and averaged) for the current analytical run in column (5) and
the number of
replicate (and averaged) instrument readings that were made on the sample or
on each subsample
replicate in column (11) will also have been pre-set to the positive whole
number "1."
Note: There may have been more than one back-end superimposed
dilution/concentration. Thus,
the word "overall" is used to reflect this.
Note: All subsample replicates must have the same degree of front-end and/or
back-end overall
superimposed dilutions/concentrations and their associated reagent blank or
"reagent blanks "
(to be averaged) must also (each of them) have the same degree of back-end
overall
superimposed dilutions/concentrations. In addition, all subsample replicates
must also have the
same number of replicate instrument readings. Note that each replicate
instrument reading may
consist of more than one standard sub-reading such as occurs with standards
additions "at the
instrument "or as an expedient (when the sub-readings are averaged) to help
normalize the
output of the instrument while reducing the variation thereof
Note: For the rest of the algorithm, the word "sample" will refer to each
material sample or
group of subsample replicates that have been run for which a single or average
measurement is
to be calculated and entered into the main database. Also, the "columns" refer
to the various data
CA 02694828 2010-04-01
entry columns described above. The algorithm will be described as though a
particular material
sample or group of subsample replicates from a single sample homogenate has
being processed
for which the single or average measurement for the sample has been entered
into column (4).
Note: At each step described in this algorithm, the intermediate calculated
results are stored in a
computer memory input and output grid for further computer data processing and
error checking.
The details of where and how they are stored are not given.
3) Using the above described procedure for interpolation and extrapolation,
the DPSP
determines the standard deviation and percent recoveryfor the exact
measurement level given for
mock measurement-2. This will be the true percent recovery for the original
single or average
measurement for the sample that has been entered into column (4). It is stored
in the computer
memory grid for output later on. Further data processing is done on the
standard deviation. Call
this standard deviation "mock standard deviation-1 ."
Note: All the standard deviations entered as predefined programmed variables
in the DPSP are
either BAV-standard deviations, corrected BAV-standard deviations or corrected
WAV-standard
deviations so that they only apply to single determinations at M-level. These
standard deviations
have been determined under the standard or "corrected to standard" processing
conditions
specified in the particular analytical chemistry method to which the
particular DPSP program
applies. The number of reagent blanks and/or slopes that were used to
determine the standard
deviations under these standard processing conditions have also been recorded
in the DPSP as
predefined program variables as well as the standard deviations for the parent
random variables
of the reagent blanks and the slopes. If they are BAV-standard deviations such
as are determined
on the RS-form, they will contain the correct proportion of all forms of
between-run systematic
error (BRSE), including any between-run systematic measurement error, BRSME
(RBV and/or
SRLV), being generated by the WRME (RBV and/or SRLV) of the reagent blank(s)
or slopes(s)
that are being run under standard conditions using the traditional calculation
procedure. If they
are WAV-standard deviations, such as are obtained on the SAM-DUP form, they
will have been
corrected by having had added to them the appropriate terms for the WRME (RB V
and/or SRLV)
variation of the reagent blank(s) and/or slopes(s) that were being run to
determine the standard
deviations of the reagent blank(s) and/or slopes(s) under standard conditions.
4) The mock standard deviation-1 determined in step (3) is squared giving mock
variance-l.
5) If the number of reagent blanks that were being run when the standard
deviations for the
regular samples were being determined under standard processing conditions is
not "zero," the
standard deviation for the parent random variable of the reagent blanks
determined under
standard processing conditions is squared, giving the respective variance.
This variance for the
parent random variable of the reagent blanks is then divided by the number of
reagent blanks that
were used to determine the standard deviation of the reagent blanks on the RB-
DUP form under
standard processing conditions and the result is subtracted from mock variance-
1 giving mock
variance-2. If the number of reagent blanks that were being run when the
standard deviations for
the regular samples were being determined under standard processing conditions
is "zero," then
the value in mock variance-1 is assigned to the memory location for mock
variance-2.
CA 02694828 2010-04-01
Note: Not every analytical method runs a reagent blank or count blank as part
of standard
conditions. In this case, a "zero" would automatically be entered into column
(9) and the column
hidden on the data entry screen. The standard deviation for the reagent blanks
(or count blanks)
would automatically be set to "zero" as a predefined program variable in the
DPSP.
6) If the number of calibration slopes (zero, one or two) that were being run
when the standard
deviations for the regular samples were being determined under standard
processing conditions
is not "zero," the standard deviation for the parent random variable of the
slopes determined
under standard processing conditions is squared, giving the respective
variance. The variance
for the parent random variable of the slopes is then divided by the number of
slopes (one or two)
that were used to determine the standard deviation of the slopes on the STAN-
DUP, CAL-DUP or
CAL-DATA forms under standard processing conditions. This result is then
multiplied by the
square of mock measurement-2. This, in turn, is divided by the square of the
mean or grand
mean of the slopes as determined under standard processing conditions.
Finally, this last result
is subtracted from mock variance-2 giving mock variance-3. If the number of
slopes that were
being run when the standard deviations for the regular samples were being
determined under
standard processing conditions is "zero," then the value in mock variance-2 is
assigned to the
memory location for mock variance-3.
Note: Not every analytical method runs instrument calibration standards as a
part of standard
conditions. In this case, a "zero" would automatically be entered into column
(10) and the
column hidden on the data entry screen. The standard deviation for the slopes
would then also be
automatically set to "zero" as a predefined program variable in the DPSP as a
precaution.
Another special case is with standard additions "at the instrument." In this
case, the calibration
standards (including a "zero" standard) are added on top of each sample or
subsample replicate
injection or else mixed with each sample or subsample extract before
injection. Therefore, a run
on the calibration standards is being done for each sample or subsample
extract as a part of
obtaining an overall individual instrument reading (one sub-reading from each
injection) for each
extract. In this case, the number of calibration slopes would also be
automatically set to "zero"
in column (10) and the column hidden on the data entry screen, since the
variation in the
individual "standard additions" slopes for each overall reading per
determination will be included
(as inherited variation), in the standard deviation of the instrument as
determined for, and/or
corrected to, a single instrument reading (composed of more than one sub-
reading). The
"standard additions" technique is too complex to be described here but note
that this
computerized system is not applicable to doing standard additions "through the
method," in
which case, the overall standard deviation at M-level for each determination
is obtainable from
the technique itself and the overall recovery is normally 100%.
Note: Mock variance-3 is an unmixed variance, not containing any BRSME (RBV
and/or SRLV)
that would have been generated by the WRME (RB V and/or SRLV) of the reagent
blank(s) and/or
slopes(s) that were being run under standard processing conditions, nor
likewise any variation
from the WRME (RB V and/or SRL V) itself, when the standard deviations were
being determined.
As a result, this variance can now be manipulated by standard statistical
procedures. If it were
BAV-standard deviations that had been entered as predefined program variables
into the DPSP,
there could be some other form of BRSE contained in this variance, but in
theory, there shouldn't
CA 02694828 2010-04-01
be any, and if there is, it has a right to be included as long as it is
random. Any form of
non-random BRSE should have been screened out on the respective PAF-forms. The
basic idea
is to remove, in steps (5) and (6), all variation due to the reagent blank(s)
and/or slopes(s) that
were being run under standard processing conditions using the traditional
calculation procedure
when the standard deviations for the measurement levels were being determined
for the DPSP.
In steps (13) and (15), the variance for the parent random variable of the
reagent blanks divided
by the number of reagent blanks that are entered into column (9) for the
current analytical run
and the variance term for the number of calibration slopes {one or two, as
entered into
column (10) } for the current analytical run, will be put back into the
overall variance for the
sample in their stead.
7) The standard deviation of the instrument, such as determined on the STAN-
DUP, CAL-DUP
or CAL-DATA forms under standard processing conditions, must be for only a
single standard
instrument reading per sample or per subsample replicate with no back-end
overall superimposed
dilution/concentration factor employed. This standard deviation of the
instrument, having been
entered into the DPSP as a predefined program variable, is squared giving the
respective
variance of the instrument. This variance is then subtracted from mock
variance-3 giving mock
variance-4. Note that under standard processing conditions, each standard
individual instrument
reading may consist of more then one standard sub-reading such as occurs with
standard
additions "at the instrument" or as an expedient (when the sub-readings are
averaged) to help
normalize the output of the instrument while reducing the variation thereof.
8) The back-end overall superimposed dilution/concentration factor from column
(7) is
squared.
Note: Both the front-end overall superimposed dilution/concentration factor
and the back-end
overall superimposed dilution/concentration factor must be very clearly
defined to the user,
especially what is meant by "superimposed." A message concerning this must
always be output
to the user on the data entry screen. An example to explain this point is
given here concerning
the back-end overall superimposed dilution/concentration factor. Suppose an
analytical
chemistry method requires under standard processing conditions, a
concentration of 10 ml to
ml at the back-end of the analytical method. Then, a concentration factor of
0.5 will appear in
the numerator of the calculations formula, as part of standard conditions.
Suppose that the
chemical analyst decides to concentrate further, in the above mentioned step,
down to 1 ml. This
is an overall concentration of 10 ml to 1 ml and the overall concentration
factor is 0.1. But only
the 5 ml to 1 ml is "superimposed." Consequently, the chemical analyst should
enter 0.2 into
column (7) of the data entry screen as the back-end overall superimposed
dilution/concentration
factor. To check, 0.2 which is entered into column (7) times 0.5 which is in
the numerator of the
calculations formula, is equal to 0.1 which is the correct overall
concentration factor.
9) Multiply the variance of the instrument as determined in step (7) by the
result from step (8)
and divide this result by the number of replicate instrument readings that
were made on the
sample, or on each subsample replicate, that is entered into column (11) of
the data entry screen.
CA 02694828 2010-04-01
Note: Each standard individual instrument reading may consist of more than one
standard
sub-reading such as occurs with standard additions "at the instrument" or as
an expedient (when
the sub-readings are averaged) to help normalize the output of the instrument
while reducing the
variation thereof. Only replicate instrument readings are being dealt with
here, not sub-readings.
Refer to column (11) in the data entry section for an explanation of the
number of replicate
instrument readings that have been made on each sample or subsample replicate
being run. See
also the first note for step (6).
10) The result from step (9) is added to mock variance-4 from step (7), giving
mock variance-5.
Note: Thus, the variance of the instrument is either put back into the overall
variance for the
sample the way it was or, as modified by steps (8), (9) and (10).
11) Mock variance-5 is then divided by "n," the number of subsample replicates
done on the
sample, for the current analytical run, as entered into column (5), provided
"n" is a positive
whole number greater than or equal to "1," giving mock variance-6.
12) If the number of reagent blanks entered into column (9) that were being
run for the block of
samples or subsample replicates in the current analytical run is not "zero,"
the variance for the
parent random variable of the reagent blanks from step (5), is divided by the
number of reagent
blanks that are entered into column (9). Call this result the "blank variance
correction term"
(BVCT). If the value entered into column (9) is "zero," then the value of
"zero" is assigned to
the BVCT.
Note: The same note as for step (5) applies to this step.
13) The BVCT, determined in step (12), is added to mock variance-6 giving mock
variance-7.
14) If the number of calibration slopes (zero, one or two) entered into column
(10) that were run
for the block of samples or subsample replicates in the current analytical run
is not "zero," the
variance for the parent random variable of the slopes from step (6), is
divided by the number of
slopes (one or two) that are entered into column (10). This result is then
multiplied by the
square of mock measurement-2. This, in turn, is divided by the square of the
mean or grand
mean of the slopes as determined under standard processing conditions. Call
this result the
"slope variance correction term" (SVCT). If the value entered into column (10)
is "zero," then
the value of "zero" is assigned to the SVCT.
Note: The same notes as for step (6) apply to this step.
Note: In a titrimetric analytical method, the titer [4] is equivalent to the
value of the slope but it
usually has no significant variance, so a "zero" should be entered into column
(10) or else the
standard deviation of the titer would have to be determined and entered into
the DPSP and a "1"
entered into column (10).
15) The SVCT, determined in step (14), is added to mock variance-7 giving mock
variance-8.
CA 02694828 2010-04-01
16) Mock variance-8 is then converted to a standard deviation by taking the
square root of it.
17) The result from step (16) is then multiplied by the "f' factor entered in
column (8) and this
result is further multiplied by the front-end overall superimposed
dilution/concentration factor
from column (6). This last result will then be the computed standard deviation
at M-level for the
original single or average (f more than one subsample replicate was done)
measurement for the
sample that was entered into column (4).
Note: It would be prudent to remember at this point, the two basic assumptions
that underlie this
algorithm in its present form, which are that (1) any unbiasing that needs to
be done will be done
by the DBMS in the main database with the parameters that are supplied to it,
and that (2) the
percent recoveries that have been entered into the DPSP, being averages based
on a minimum of
four (and where possible, sixteen) recovery samples that have been run, one
per run, over the
required number of analytical runs, means that the random variation in these
average percent
recoveries can usually be ignored. If, in defiance of the first assumption,
the unbiasing is to be
optionally done within the DPSP, then the algorithm would have to be modified
at this point to
allow for it. But these two basic assumptions will be maintained for the
purpose of the algorithm
as it is being presented here, on the basis that a bias-error tolerance of
approximately plus or
minus 1.00% would likely be acceptable to the user. However, in defiance of
the second
assumption, it might be desirable to have the DBMS further adjust the unbiased
standard
deviation that was computed by it from the (possibly biased) standard
deviation that was
obtained from step (17) so that a further corrected version could be applied
to determine the
unbiased 95% confidence interval for the unbiased single or average
measurement in such
manner that would take into account the random variation in the average
percent recovery. This
would require that an additional parameter, the standard deviation of the
average percent
recovery, also called the standard error of the percent recovery, be entered
as a predefined
program variable in the DPSP and stored in the temporary database in the DPSP
to be uploaded
along with the other two parameters that were determined by the algorithm for
the particular
sample. This would only be done in the event that the standard error of the
percent recovery is
unusually high and/or the degree of bias-error tolerance in the unbiased
measurement is
unacceptable. [The standard error of the percent recovery, as obtainable from
either the
RC-form or the RS-form, is not independent from the standard deviation of the
measurement as
calculated on these same forms, but, nevertheless, it should be suitable for
the purpose of
correcting the overall standard deviation of the single or average
measurement. The standard
error of the percent recovery will be independent if the data set obtained by
running the recovery
samples over several analytical runs is used to calculate the standard error
independently from
all other calculations.] This further adjustment of the unbiased standard
deviation for the
unbiased single or average measurement of the sample would then be done by the
DBMS
according to the f nal term that is given below in the general equation for
the overall variance of
a single or average determination (including the dividing by the percent
recovery in decimal
form).
To summarize, adhering to the above two assumptions, the DBMS will calculate
the (possibly
biased) 95% confidence interval for the (possibly biased) single or average
measurement for the
sample as plus or minus two of the standard deviations that were determined in
step (17). This
CA 02694828 2010-04-01
(possibly biased) 95% confidence interval and (possibly biased) measurement
data are to be
maintained (not deleted) in separate columns in the main database (necessary
for a variety of
reasons) despite the unbiasing operation which is to be done next. The DBMS
will then unbias
the (possibly biased) single or average measurement for the sample and the
(possibly biased)
standard deviation that was obtained from step (17) by dividing both of them
by the uploaded
percent recovery for the measurement level in decimal form (this uploaded
percent recovery
value can be equal to 100% depending on the analytical method). The resulting
unbiased single
or average measurement and unbiased standard deviation are then stored in
separate and hidden
password-protected columns. The DBMS will then calculate the unbiased 95%
confidence
interval for the unbiased single or average measurement as plus or minus two
of the unbiased
standard deviations. The resulting unbiased 95% confidence interval will then
be stored in a
separate and hidden password-protected column in the main database. This
unbiasing operation
assumes, as already stated, that the uploaded percent recovery is regarded as
being a statistical
constant. If, in defiance of the second assumption above, the standard error
of the percent
recovery has also been uploaded, then the DBMS will further adjust the
unbiased standard
deviation so that a corrected version of it can be applied to determine a
corrected version of the
unbiased 95% confidence interval. The unbiased standard deviation will then be
corrected
according to the final term that is given below in the general equation for
the overall variance of
a single or average determination (including the dividing by the percent
recovery in decimal
form) so as to take into account the random variation in the standard error of
the percent
recovery. The resulting corrected unbiased standard deviation will then be
stored in a separate
and hidden password-protected column. The DBMS will then alternatively
calculate the
corrected unbiased 95% confidence interval for the unbiased single or average
measurement that
was calculated above as plus or minus two of the corrected unbiased standard
deviations. The
resulting corrected unbiased 95% confidence interval will then be stored in a
separate and hidden
password-protected column in the main database. As previously stated, all of
the above
unbiasing operations can be done within the DPSP if required.
18) The percent recovery value from step (3) and the computed standard
deviation at M-level
from step (17) are then output in the output screen to the user along with the
original single or
average (if more than one subsample replicate was done) measurement at M-level
that was
entered into column (4) and these values are stored (along with the sample
identifier and other
relevant data) in a temporary database in the DPSP for uploading into the main
database when
accessed by the DBMS--unless step (19) applies.
19) If the data processing that has just been done to determine the standard
deviation and
percent recovery for the original single or average (if more than one
subsample replicate was
done) measurement that was entered into column (4) applies to transformed
biological,
microbiological or radiological data, then a 95% confidence interval is
calculated for the single
or average measurement by the DPSP. The endpoints for this 95% confidence
interval are then
retransformed and output in the output screen to the user along with the
original single or
average (if more than one subsample replicate was done) measurement that was
entered into
column (4) and these values are stored (along with the sample identifier and
other relevant data)
in a temporary database in the DPSP for uploading into the main database when
accessed by the
DBMS. The percent recovery determined by the DPSP would normally always be set
to 100% in
CA 02694828 2010-04-01
this case or else this parameter is omitted altogether. The transformational
and
retransformational formulas would normally be entered into the DPSP by the
user.
The following are examples of experiments, thinking in terms of a computer
table:
Explanation of Computer Table Experiment 1:
1) Suppose the DPSP is using a computer table instead of simple formulas.
2) Suppose for purposes of checking the algorithm that the coefficient of
relative standard
deviation (crsd) is constant throughout the measurement spectrum.
3) Create two computer tables, one for IOg sample and one for 5g sample.
lOg 5g
"c" factor = 0.1 "c" factor = 0.2
M-level M-level Q2-level Q2 -level Q2-level M-level M-level Q2 -level Q2 -
level Q2-level
Meas. S.D. g-output S.D. %Rec Meas. S.D. g-output S.D. %Rec
PPM PPM g PPM PPM g
100 5.0 1000 50 99 100 5.0 500 25 98
50 2.5 500 25 98 50 2.5 250 12.5 97
25 1.25 250 12.5 97 25 1.25 125 6.3 96
4) The same g-output at the back-end of the anal. chem. method should give
the same S.D. at
Q2-level. The output and S.D. at Q2-level is given in " g" instead of PPM for
simplification.
Therefore the "c" factors are purely hypothetical but they are in the correct
proportion for 5g and
l Og in the denominator of the calculations formula.
5) There is only one computer table available and it is for log of sample but
there is only 5g of
sample available to be run.
6) The overall measurement at M-level is 100 PPM for 5g of sample.
7) The "f' factor for 5g of sample, when lOg of sample is standard, is 2Ø
8) If the overall measurement at M-level (100 PPM) is divided by the "f'
factor, a mock
measurement of 50 PPM is obtained.
CA 02694828 2010-04-01
9) The table for lOg is accessed at 50 PPM, and a S.D. of 2.5 PPM is obtained.
This is the
correct S.D. at M-level for 500 }.g of output at Q2-level in the I Og table.
The percent recovery of
98% is also obtained at this time. If any adjustments need to be made to the
S.D., they are done
here at 2.5 PPM. It is assumed that none are needed.
10) The standard deviation (2.5 PPM) obtained in step (9) is multiplied by the
"f' factor giving
a value of 5.0 PPM.
11) By inspection of the hypothetical 5g table, this is the correct S.D. for
the 5g sample at
M-level for 500 g of output at Q2 -level in the 5g table.
12) By inspection of the hypothetical 5g table, the percent recovery is also
correct since,
although the measurement is divided by the "f' factor before accessing the
computer table in
step (9), the percent recovery obtained is not multiplied by the "f' factor.
Explanation of Computer Table Experiment 2:
1) Suppose the DPSP is using a computer table instead of simple formulas.
2) Suppose for purposes of checking the algorithm that the standard deviation
(S.D.) is constant
at Q2-level throughout the measurement spectrum.
3) Create two computer tables, one for IOg sample and one for 5g sample.
lOg 5g
"c" factor = 0.1 "c" factor = 0.2
M-level M-level Q2 -level Q2-level Q2 -level M-level M-level Q2 -level Q?
level Q2 -level
Meas. S.D. g-output S.D. %Rec Meas. S.D. g-output S.D. %Rec
PPM PPM g PPM PPM g
100 2.5 1000 25 99 100 5.0 500 25 98
50 2.5 500 25 98 50 5.0 250 25 97
25 2.5 250 25 97 25 5.0 125 25 96
4) The same g-output at the back-end of the anal. chem. method should give
the same S.D. at
Q2-level. The output and S.D. at Q2 -level is given in " g" instead of PPM for
simplification.
Therefore the "c" factors are purely hypothetical but they are in the correct
proportion for 5g and
IOg in the denominator of the calculations formula.
CA 02694828 2010-04-01
5) There is only one computer table available and it is for lOg of sample but
there is only 5g of
sample available to be run.
6) The overall measurement at M-level is 100 PPM for 5g of sample.
7) The 'f' factor for 5g of sample, when lOg of sample is standard, is 2Ø
8) If the overall measurement at M-level (100 PPM) is divided by the "f'
factor, a mock
measurement of 50 PPM is obtained.
9) The table for lOg is accessed at 50 PPM, and a S.D. of 2.5 PPM is obtained.
This is the
correct S.D. at M-level for 500 g of output at Q2-level in the lOg table. The
percent recovery of
98% is also obtained at this time. If any adjustments need to be made to the
S.D., they are done
here at 2.5 PPM. It is assumed that none are needed.
10) The standard deviation (2.5 PPM) obtained in step (9) is multiplied by the
'f' factor giving
a value of 5.0 PPM.
11) By inspection of the hypothetical 5g table, this is the correct S.D. for
the 5g sample at
M-level for 500 g of output at Q2 -level in the 5g table.
12) By inspection of the hypothetical 5g table, the percent recovery is also
correct since,
although the measurement is divided by the "f' factor before accessing the
computer table in
step (9), the percent recovery obtained is not multiplied by the 'f' factor.
Conclusion of experiments 1 and 2:
The correct standard deviation and percent recovery are obtained in both
experiments. If any
adjustments had been made, they would have been made in approximately the
correct proportions
for the fmal overall standard deviations. The only error remaining will be due
to the uncertainty
in the standard deviations themselves in the computer table. These two
experiments only deal
with the "f" factor but, for example, the "f' factor could have been replaced
with the "f' factor
times the front-end overall superimposed dilution/concentration factor.
CA 02694828 2010-04-01
Some Statistical Formulas:
1) Sample variance of "x"
Sex = (x' - x) 2 (Formula-1)
k - 1
"k" is the number of analytical runs.
The degrees of freedom (df) for the sample variance is equal to "k - 1"
2) (s x), the sample standard deviation, is equal to the square root of (S2
x) .
The degrees of freedom (df) for the sample standard deviation is equal to "k -
1"
3) Sample variance of "x"
S 2 x = (d 2) (Formula-2)
2k
"d" is equal to (x, - x2) the difference between the duplicate measurements.
"k" is the number of sample duplicates
The degrees of freedom (df) for the sample variance is equal to "k"
4) (s x), the sample standard deviation, is equal to the square root of (S2
The degrees of freedom (df) for the sample standard deviation is equal to "k"
5) Sample pseudo-variance of "I d I"
2
S2Idl _ I (kdJ) (Formula-3)
d is equal to I x, - x2 1 the absolute value of the difference between the
duplicate
measurements (also called the range of duplicates).
"k" is the number of sample duplicates
The degrees of freedom (df) for the sample pseudo-variance is equal to "k"
6) (S I d l ), the sample pseudo-standard deviation is equal to the square
root of (S2 I d l ).
The degrees of freedom (df) for the sample pseudo-standard deviation is equal
to "k"
CA 02694828 2010-04-01
Notes:
(1) Both Formula-1 and Formula-2 can be utilized under either BA V or WAV
statistical
sampling conditions depending on the application.
(2) Formula-2 is easy to derive. Just let "d/2" = (x; - k) in Formula-I but
with (n - 1) in the
denominator instead of (k - 1). The sign of "d/2," of course, doesn't matter
due to squaring. This
yields the intermediate formula "(d 2)/2" divided by (n - 1) which is the
formula for determining
the sample variance of "x" from two outcomes from a non-composite primary
random variable
"X" in terms of the "difference" between the two outcomes. "n" is always equal
to "2," so the
denominator is usually omitted, but it will be needed here. Plug this
intermediate formula into
the general formula for the pooled variance [8] using (n - 1) as the degrees
of freedom in the
denominators of the variances to be pooled, substituting "(d 2)/2" divided by
(n - 1) for each of
the "k" variances in the numerator of the general pooled variance formula. In
the denominator of
the general pooled variance formula, we have "k" times (n - 1) which is equal
to "k." By using a
summation identity, "2/4 = 1/2" is factored entirely out of the numerator of
the general pooled
variance formula and placed to the left of the summation sign. This is then
taken this out of the
numerator of the general pooled variance formula altogether by putting a "2"
in the denominator.
This yields Formula-2 which is sometimes called the "pooled variance formula
for duplicates."
It is an unbiased estimator of the population variance of "X" since it has
been the unbiased form
of the general pooled variance formula that has been used to derive it. But it
must be
remembered that the sample variance is not for "d" but for "x" and the number
of degrees of
freedom for it is not "2k" but "k." Because of its ease of programming into
the computer
spreadsheets, Formula-2 is used to determine the WAV-variances and the WAV-
standard
deviations in all of the "duplicates" PAF-forms.
(3) Another strategy, used by Pearson and Hartley [3], to determine the
probabilities for the
range at "n = 2," from the standard normal probability table is a little more
difficult to describe
without a diagram but it can be shown that these probabilities can be obtained
from the
right-hand side of the standard normal probability table. Basically, by taking
the absolute values
of the distribution of (x, - x2) which is composed of equal frequencies of
both positive and
negative values, we get the distribution of I x, - x2 1 which is composed of
only positive values.
The frequencies of the positive values are doubled but this doesn't affect the
probabilities. The
variance of (x, - x2) is double the variance of "x" so the variance of "x," as
defined in
Formula-2, is multiplied by "2," cancelling off the "2" in the denominator.
This is the real
variance of (x, - x2) but not of I x, - x2 I so it is called a pseudo-variance
for the distribution of
lx, - x2 I and the square root of it is called a pseudo-standard deviation for
the distribution of
XI - x2 1 . Thus, the standard normal probability table can still be used to
determine the
probabilities for the distribution of I x, - x2 1 . For example, 95% of
outcomes from the
distribution of (x, - x2) will be between -2 and +2 standard deviations for
(x, - x2) and 95% of
outcomes from the distribution of I x, - x2 1 will be between "zero" and +2
pseudo-standard
deviations for I x, - X21. The pseudo-variance of I x, - x2 1is shown above as
Formula-3. The
respective pseudo-standard deviation of I x, - x2 1 is used to determine the
control limits for the
range charts in all of the "duplicates" PAF-forms.
CA 02694828 2010-04-01
(4) Another strategy adapted by the author is called "chain-link-sampling." To
explain this,
imagine three identical series of outcomes, labelled Si, S2 and S3, directly
on top of one
another, from the same non-composite primary random variable "X," the
population mean of
which can be premised to be absolutely constant. The members of each series,
Si, S2 and S3,
are labelled by subscripting "x" as a, b, c, d, e, f, g, and so on, say for
about 500 outcomes. Then
referring to each of the outcomes by their subscripts, Si and S2 will fast be
sampled according to
the traditional sampling method: S 1: a _b, c_d, e_f, and so on; S2: b_c, d_e,
f_g, and so on.
Then the samples for Si could be used to calculate a sample mean from the sets
of pairs,
averaging each pair and then averaging the individual averages and likewise
for S2. Then the
two overall means could be averaged giving a grand mean. Then, applying "chain-
link-
sampling" to S3, the sampling would be: S3: a_b, b_c, c_d, de, e_f, f_g, and
so on. The
overall mean calculated from the sets of pairs from S3, averaging each pair
and then averaging
the individual averages, will obviously be equal to the grand mean calculated
from S 1 and S2.
This is not "overlapped sampling." There is no overlapping of any of the means
in each of the
pairs from S3. Nor is it related in any way to any form of "re-sampling."
The same principle can be applied to sampling for the variance and standard
deviation using
Formula-2. In this case, the "difference between duplicates" is obtained from
each pair and
applied to Formula-2 to calculate a variance. Then the variances obtained from
Si and S2 could
be pooled. It can be shown that the variances from S 1 and S2 are not entirely
independent. In
fact, in the extreme hypothetical case, they are inversely correlated. But
this is an advantage. If
the variance from Si is too small than the variance from S2 will be too big.
But when the two
variances are pooled, a better estimate is obtained with double the degrees of
freedom. Of
course, with random sampling, the two variances will be similar anyway.
Research, using the
random generation capability of the computer spreadsheet to generate random
normal variates,
confirms these statements. Then it can be shown that the variance obtained by
applying the
"difference between duplicates" obtained from S3 to Formula-2 will give the
exact same variance
as the former pooled variance from S 1 and S2. The same justification applies
to both the
WAV-variances and the WAV-standard deviations determined on the "duplicates"
PAF-forms.
This is not "overlapped sampling" nor any form of "re-sampling." There is no
overlapping of
any of the deviations inherent in each of the differences obtained from each
of the pairs from S3.
Note that ANOVA cannot be done using the "chain-link-sampled" pairs from S3.
(5) "Chain-link-sampling" is considered to be absolutely essential for this
computerized system.
The time and cost of obtaining the required number of degrees of freedom for
the standard
deviations from some of the "duplicates" PAF-forms is quite high having to use
stratified
sampling according to measurement level and having to obtain the various
duplicates at
"random" measurement levels, since the concentrations of analyte in the
regular material samples
are unknown before analysis. "Chain-link-sampling" cuts this time and cost in
half. In practice,
any number of subsample replicates can be run on any material sample
homogenate by labelling
their respective flasks as: a, b, c, d, e, f, g, and so on. A rule is made to
subtract "b" from "a",
"c" from "b", "d" from "c", "e" from "d", "f' from "e", "g" from "f', and so
on. Six pairs of
"differences between duplicates" are obtained if "chain-link-sampling" is
used, whereas a
maximum of only three is available by using the regular sampling. Over and
above this stated
advantage, additional PAF-forms would otherwise have to be created for
triplicates,
quadruplicates, quintuplicates, and so on when running this many subsample
replicates. This
CA 02694828 2010-04-01
would be an enormous task in itself and would make the computerized system so
much more
confusing and awkward and irksome to use. "Chain-link-sampling" can only be
used with the
SAM-DUP, RB-DUP, RS-DUP, COUNT-DUP (on transformed data), CAL-DUP and STAN-DUP
forms. One big precaution: ANOVA cannot be done using "chain-link-sampled"
duplicates.
General Equation for the Overall Variance of a Single or Average
Determination:
The general equation for the overall variance of a single or average unbiased
measurement of the
concentration of a single ingredient in a single sample homogenate in PPM 2 at
M-level
(each term is to be referenced in serial order from top to bottom) is given
by:
(FE)2 * (f)2 * 1/E(u)2 [ c2/E(m)2 { Var (all chemical processing stages)
+ ( Var (IRV + IBV) of a single instrument reading * (BE)2 ) /Nr } [Nd
+ c2/E(m)2 { Var (rb) /Nrb }
+ c2/E(m)2 { Var (measured VSAM in the material sample homogenate) }
+ 1/E(m)2 { ( Var (m) /Nm) * E(BMAC)2 } ]
+ 1/E(u)2 { Var (u-bar) * E(UMAC)2 }
For the single or average measurement obtained for a particular material
sample at a particular
measurement level in a single analytical run of a particular analytical
chemistry method
(equal sample weights or volumes), the measurement being calculated in PPM at
M-level as:
(FE * f * c )/u-bar { X or X-bar } = (FE * f * c )/u-bar { see term--next line
below }
{ BE/(m or m-bar) [(Y or Y-bar) - (rb or rb-bar)] }
(X or X-bar) is the concentration obtained from the calibration graph in PPM (
g/ml) at Q2-level.
(Y or Y-bar) is the instrument reading in AU, XAU, or AREA for the sample at
Q,-level.
"Var" is the variance operator.
"E" is the expectation operator.
"u and u-bar" are the percent recovery (decimal equivalent) at the particular
measurement level.
"c" is the "c" factor for the standard calculations formula (must be a
statistical constant).
'f' is the 'f' factor for the standard/non-standard--sample weight or volume
ratio.
"m and m-bar" are the single/average slope of the calibration line (regression
line).
"rb and rb-bar" are the single/average reading of one or more reagent blanks.
"Nrb" is the number of (averaged) reagent blanks being run.
"Nm" is the number of (averaged) slopes (one or two) for a block of samples in
the run.
"Nr" is the number of (averaged) instrument readings on the sample extract for
each single or
replicate determination.
CA 02694828 2010-04-01
"Nd" is the number of (averaged) replicate determinations done on the
particular sample
homogenate.
"FE" is the front-end overall superimposed dilution/concentration factor.
"BE" is the back-end overall superimposed dilution/concentration factor.
"VSAM" is the measured (as opposed to actual) residual variation of the
concentration of the
ingredient (analyte) in the single material sample homogenate.
`BMAC" is the possibly biased measurement at M-level in PPM (gg/g or gg/ml) of
the actual
concentration of the ingredient (analyte) in the single material sample
homogenate as determined
by the particular analytical chemistry method.
"UMAC" is the unbiased (having been unbiased--the verb) measurement at M-level
(gg/g or g/ml) of the actual concentration of the ingredient (analyte) in the
single material
sample homogenate and closest practicable approximation to the actual
concentration.
Special Note to Patent Office: This file is to be included with Description-
file l .pdf,
Description-file3.pdf and Description-file4.pdf as previously submitted for
this application. This
particular file (Description-file2-corrected.pdf) contains a number of
corrections and is being
submitted to replace the file (Description-file2.pdf) as previously submitted.