Note: Descriptions are shown in the official language in which they were submitted.
" 20190~
10577-209/T18
DEFERRED COMPARISON MULTIPLIER CHECKER
BACKGROUND OF THE INVENTION
The present invention relates generally to data
processing equipment, and more particularly to a method, and
apparatus for implementing that method, of checking the
result of each multiplier operation.
As today s data processing equipment finds its way
into more critical uses, the need to check the integrity of
various of the operations performed by the data processing
equipment, including its computational operations~ becomes
important. Balancing this importance is a need to keep cost
lS and circuit count low. Thus, there often is a conflict
between the necessity of reliable error detection and
expense.
A variety of error detection routines are
available for the more simplistic arithmetic operations sùch
as, for example, addition, subtraction, and shifting.
However, for the more complicated arithmetic operations,
such as multiplication implemented by special purpose
circuitry, the error detection methods can become complex,
necessitating large amounts of circuitry.
One approach to error detection of multiplication
is found in an article by Janak H. Patel and Leona Y. Fung,
entitled "Concurrent Error Detection i Il Multiply and Divide
Arrays," IEEE Transactions on Computers, Vol. C-32, No. 4,
April, 1983 (pps. 417-422). This article describes an error
detection technique in which, after the initial
multiplication operation, one of the operands is shifted,
and a redundant multiplication operation performed. The
result of the initial multiplication operation is thus
checked by comparing it with the result of the redundant
multiplication operation. However, the technique leaves
much to be desired: First, although the technique taught
involves some data alteration before performing the
~ (~ 7 ~
redundant multipllcatlon operatlon, lt is believed that the
- partlcular alteratlon taught (shlftlng one operand) is not as
complete as could be. In addition, the technique withholds
use of the result of the inltlal multipllcation operation
until the check is complete. This withholding is costly in
terms of CPU tlme for each multlplication operation.
Accordingly, it is apparent that error detectlon for
multlplication operatlons can be improved.
SUMMARY OF THE INVENTION
According to the present invention there is provided
a method, and apparatus for implementing that method, of
checking multiplication operations later in time than
heretofore known in the prior art.
In accordance with the present invention there is
further provided in a data processing system includlng a
processor unit of the type having an instruction unit operable
to execute a plurality of instructions, and an autonomous
arithmetic unit operable to receive multl-bit operands to
perform arithmetlc operations substantlally in parallel with
operation of the lnstruction unit and produce therefrom a
result, a method of checking the result comprising the steps
of providing at least first and second operands to the
arithmetic unit; the arithmetlc unit performing a first
arlthmetic operatlon on the first and second operands to
obtaln therefrom a flrst arlthmetlc result; temporarlly
retainlng the first arlthmetlc result and, at the same tlme,
maklng the first arlthmetic result available to the processor
64157-330
2a
unlt; providlng agaln the flrst and second operands ln swapped
- order so that the flrst operand takes the place of the second
operand as a thlrd operand, and the second operand takes the
place of the flrst operand as a fourth operand; the arlthmetlc
unlt performlng a second arlthmetlc operatlon on the thlrd and
fourth operands to obtaln therefrom a second arlthmetlc
result; holdlng the second arlthmetlc result untll a next
lmmedlately successlve arlthmetlc operatlon ls requested whlle
permlttlng use of the flrst arlthmetlc result by the
lnstructlon unlt; when the next lmmedlately successlve
arlthmetlc operatlon ls requested, comparlng the flrst and
second arlthmetlc results to one another for equallty when the
another arlthmetlc operatlon ls requested; and lssulng an
error slgnal ln the event the comparlng step does not result
ln equallty.
In accordance wlth the present lnventlon there ls
also provlded ln a data processlng system of the type
lncludlng at least one processor unlt, the processor unlt
lncludlng a state machlne-controlled multlpller unlt for
performlng multlpllcatlon operatlons, a method of checklng
multlpllcatlon operatlons, comprlslng the steps of: the
multlpllcatlon unlt performlng a multlpllcatlon operatlon upon
flrst and second operands to produce a flrst result that ls
lmmedlately avallable to the processor unlt for use;
lnltlating a multlpllcatlon operation upon the flrst and
second operands to produce a second result; holdlng the second
result untll another multlpllcatlon operatlon ls requested, or
64157-330
2b
the first result ls to be used outside the processor unlt,
whlchever occurs flrst; comparlng the flrst and second
results; and haltlng the processor unlt lf the comparlng step
lndlcates the flrst result may not be correct.
In accordance wlth the the present lnventlon there
ls further provlded a method of detectlng errors ln arlthmetlc
operatlons performed by an autonomous arlthmetlc unlt that
forms a part of a controlllng processor unlt, the method
comprlslng the steps of: performlng a flrst arlthmetlc
operatlon to produce a flrst result that ls then substantlally
lmmedlately avallable to the processor unlt for use;
performlng a second arlthmetlc operatlon to produce a second
result that can be used to check the operatlon of the
arlthmetlc unlt ln a check operatlon lncorporatlng the flrst
result; holdlng the second result untll the flrst result ls to
leave the processor unlt; then, before the flrst result ls
permltted to leave the processor unlt, checklng the arlthmetlc
unlt, uslng the flrst and second results; and lssulng an error
slgnal ln response to the checklng step lndlcatlng that the
flrst and second results to not compare.
Brlefly, the method of the present lnventlon
lnvolves performlng an lnltlal multlpllcatlon operatlon
involvlng flrst and second operands, uslng an autonomous
multlpller array. Then, the operands are altered by flrst
swapplng them, and shlftlng one of them one blt posltlon, and
the multlpllcatlon operatlon performed agaln. However, the
check ls not made untll the next multlpller operatlon called
64157-330
- 2c -
for. Alternatlvely, lf no subsequent multlpller lnstruction
ls encountered, and the result of the lnltlal multlpller
operatlon ls to exlt the processor, the data lntegrlty check
ls made before allowlng the lnltlal result to propagate
through the system.
A number of advantages flow from the present
lnventlon. Flrst, by deferrlng the error checklng, the
inltlal result of the multlpller operatlon can be used
lmmedlately, so that each multlpller operatlon need not walt
for the check to be performed. The error checklng routlne ls
performed ln parallel wlth a subsequent multlpller operatlon,
thereby savlng tlme, and clrcultry.
Further, the redundant multlply operatlon forms a
result for the deferred check uslng permutated operands; here,
the operands are swapped (l.e., each operand replaces
64157-330
- 20~9~
~,_
the other), and one is shifted left (i.e., multiplied) one
bit position. This performs a more complete check of the
multiplier unit to ensure that a fault will produce
different multiplication results. Swapping only may not
detect certain faults internal to the multiplier unit, and
shifting only can fail to detect faulty (e.g., stuck HIGH or
LOW) input pins of the multiplier unit. Swapping the
operands and shifting one of them left one bit position
advantageously achieves a more complete multiplication
check.
These, and other advantages and features of the
present invention, will become evident to those skilled in
the art upon a reading of the following detailed description
of the invention, which should be taken in conjunction with
the accompanying drawings.
DESCRIPTION OF THE DRAWINGS
Fig. 1 is a simplified block diagram schematic of
a processor unit implementing the present invention;
Fig. 2 is a state diagram of the state'machine
that controls the multiplier of the processor shown in Fig.
l; and
Fig. 3 is a flow chart illustrating the steps
taken in the multiplier operation, including the error
checking routine.
DETAILED DESCRIPTION
Referring first to Fig. 1, there is illustrated a
processor unit, designated generally with the reference
numeral 10, shown as including an instruction unit 12, a
memory unit 14, a control unit 16, and a data unit 18. Data
communication between the various units 12-18 of the
processor 10 is established, in part, by an Bus 20, a multi-
bit data/address bus of conventional fashion.
The instruction unit 12 is also of conventional
design, and functions to sequentially access instructions
from the memory unit 14, decode them, and communicate an
20190~
~ "
initial microcode address for the sequencer 24 via signals
19 .
The memory unit 14 holds both data and the
instructions, and includes random access memory (RAM). Its
5 architecture is also conventional and, therefore, is not
discussed in further detail here. Synchronous operation of
the elements (e.g., instruction unit 12) is accomplished
with a periodic clock signal (CLK) generated by a clock
circuit 22.
The control unit 16 is designed to include various
control circuitry (not shown); including a sequencer 24.
Sequencer 24 is responsible for generating the various
control signals that operate the processor 10. Although,
for reasons of clarity, sequencer 24 is shown here as a
15 single unit, those skilled in this art will readily
recognize that sequencer 24 is the microcode engine of the
processor; and that it will include a control store from
which microcode instructions sequentially issue to form,
with other qualifying signals (not shown) the control
20 signals that guide, direct, and control operation of the
processor 10. The signaling on signal lines l9-determine
which of various microcode sequences will issue, which in
turn, depends upon the instructions decoded by the
instruction unit 12. The control unit (CU) 16 additionally
25 includes the circuitry for performing multiply operations as
well as assisting in divide operations (by multiplying a
reciprocal of a divisor by a dividend). As further
illustrated in Fig. 1, the control unit 16 includes X and Y
registers 26, 28 which each receive data from the Bus 20 via
30 signal lines 30. Data is loaded in the X, Y registers 26,
28 in response to appropriate load commands (not shown) that
issue from the sequencer 24.
The data content of the X and Y registers are
applied to respective operand inputs of a multiplier array
35 32, the output of which is coupled to a multiplier (MPX) 34.
The multiplier array is formed from combinatorial logic of
conventional design, such as that manufactured and
2019~
~_ 5
distributed by Cypress Semiconductor, Inc., 3901 North First
Street, San Jose, California 95134, and sold under the Part
No. CY7C517.
The output of the multiplier array 32, i8 passed
to a control unit (CU) bus register 36 by the MPX 34, where
it is made available via the Bus 20 to the processor 10.
Control of the MPX 34 and bus register 36 are effected by a
multiplier state machine 40. The multiplier state machine
40 is of conventional design, and its operation will be
discussed further with respect to the state diagram shown in
Fig. 2.
In addition to the multiplier unit itself, the
control unit 16 includes other circuitry (not shown) used
for various control and data processing purposes. However,
only the multiplier circuitry is shown for reasons of
clarity, and also because this other circuitry is not
germane to the present invention. The CU output register
36, however, is used for the entire control unit 16, and,
therefore, the necessity of multiplexing these various other
units (identified as "OTHER" in Fig. 1) with thè-output of
the multiplier array 32 by the MPX 34. '~
Continuing with Fig. 1, the data unit 18 is shown
as including a register file 50, constructed from a
plurality of register circuits in conventional fashion, and
forming a high-speed, temporary storage for, among other
things, the operands used in the multiplier operation. As
Fig. 1 indicates, the register file 50 has two sections:
One section is a number of software visible registers that,
in the implementation of the present invention, forms a
last-in-first-out (LIFO) stack of conventional design,
although it should be evident to those skilled in this art
that other storage arrangements can be used. A second
portion forms a number of "scratch" registers, used for
high-speed, temporary storage. The actual implementation of
the register file 50 is as a multi-ported (both input and
output) device. However, Fig. 1 illustrates the register
file 50 in simplistic form, showing the data paths.
201~0~
The outputs of the register file 50 are coupled to
J and K registers 54, 55, via multiplexers 56 and 57,
respectively, and through these registers to an arithmetic
logic unit (ALU) 58. The output of the ALU 58 is coupled
back to the input of the register file 50 by a multiplexer
52.
The ALU is also of conventional design, capable of
performing such various logic functions as exclusive-ORing,
shifting left or right, or merely passing the data
unaltered, in addition to the usual add and subtract
operations.
Additional storage is provided by the data unit 18
in the form ~f a scratch path (SPAD) memory 64, that
receives data directly from a multiplexer 60 which selects
between the outputs of the J and K registers 54, 55. The
output of the SPAD 64 is, in turn, coupled to an input of
each of the multiplexers 56, 57. Thus, data accessed from
the SPAD 64 is communicated to the register file 50 via one
or the other of the multiplexer-register combinations 56/54
or 57/55, the ALU 58 and multiplexer 52, convers'ely, data
from the SPAD may be communicated to the Bus 20 via the
multiplexer-register combinations 56/54~ 57/55, and
multiplexer 60. Data may be transferred from the register
file 50 to the SPAD 64 by this latter data path.
Data path widths are as shown in Fig. 1. Thus,
for example, the Bus 20 will carry 32 bits of parallel data,
as will the signal lines 30 (control unit 16), and the lines
of the data unit 18 from the multiplexer 60 to the Bus 20
and the SPAD 64, or the signal lines that communicate the
Bus 20 to the multiplexer 52.
Generally, operation of the processor 10 proceeds
as follows: Instructions are sequentially accessed from the
memory unit 14 by the instruction unit 12, decoded, and
resulting in issuance of various control signals (not shown)
as indicated above. Thus, depending upon the instruction
type, control signals (not shown) will be generated to
select the source of the operands for the ALU 58 of the data
201g0~
unit 18 (e.g ., the register file 50 or the SPAD 64), or,
for that matter, the source of the X and Y registers 26, 28
te.g., the register file 50, the SPAD 64, or either of the
J, K registers 54, 55). As can be seen, there is a large
amount of flexibility in what source is available for
multiplier operands.
Although execution of a multiply instruction will
be discussed in connection with the deferred comparison
checking that is performed, it will be advantageous to
describe its basic operation here. The execution of a
multiplier instruction will cause the instruction unit 12 to
issue appropriate control signals (not shown) that will
effect transfer of the multiply operands from the register
file 50 (more specifically, the LIF0 stack 50a) to the X and
Y registers 26, 28. The (16-bit) operands have been
previously loaded so that they are at the top of the LIF0
stack 50a. They are "popped" from the LIF0 stack and
respectively loaded in the J and K registers 54, 55, and
transferred from there, in parallel (as a 32-bit piece of
data) to the (16-bit) X and Y registers 26, 28 via the Bus
and signal lines 30.
Essentially simultaneous with the loading of the X
and Y registers, a control signal (not shown) initiates
operation of the multiplier state machine, which then takes
over control of the multiply operation to load the result
ultimately produced by the multiplier array 32 in the CU
output register 36. Under control of signals from the
instruction unit 12, the result is transferred from the CU
output register 36 to a one of the scratch registers 50b of
the register file 50 (data unit 18) via the Bus 50 and the
multiplexer 52.
At about the time the operands are transferred
from the J and K registers 54, 56 to the X and Y registers
26, 28, they are also being transferred to certain ones of
the scratch registers 50b. The content of the J-register 54
is altered, however, during this transfer: It is
communicated through the ALU 58 in a manner that causes the
201~
.._
operand to be shifted one bit left, in effect doubling the
operand; the content of the K-register 55 is stored in a
scratch register unaltered. The operands will be returned
to the J and K registers 54, 56, for communication to the X
and Y registers (for execution of a redundant multiply
operation) in a manner that causes what was previously set
in the J-register 54 to be set in the K-register 55, and
vice versa, i.e., the operands are swapped.
When the initial multiply operation is complete,
the content of the CU output register 36, the result of the
initial multiply operation, is transferred to a scratch
register 50b. Ultimately it will be pushed onto the top of
the LIF0 stack 50a, where it is made available to the
processor 10. At the same time, the contents of the J and K
registers (i.e, the swapped and altered operands) are
transferred to the X and Y registers, initiating the
redundant multiply operation, and the instruction unit 12
proceeds to other tasks. The result of the redundant
multiply operation ultimately is placed in the CU output
register 36, where is resides until a data check'is
performed by comparing the original or initial result (which
has been stored in the SPAD 64, shifted left one bit to
compensate for the alteration of one of the operands) with
the redundant result. That check is performed upon the
first to occur of two actions: When a subsequent multiply
(or divide) operation is initiated, or when the initial
result of the multiply operation is to leave the processor
10 for other destinations. Most times it will be the former
activity, and the check (as will be discussed further below)
is performed in parallel with the initiation of the
subsequent multiply operation. In this manner, by deferring
the check and then performing it in parallel with another
(multiply) operation, the time penalty is minimized, as is
the circuitry necessarily needed to make the comparison.
The comparison itself is performed by transferring
the redundant result from the CU output register 36 to one
or the other of the J or K registers 54, 56, transferring
201~6~1
the (shifted) initial result from the SPAD 64 to the other
of the J or K registers, and logically exclusive-ORing them
via the ALU 58. A ZERO detect circuit receives the output
of the ALU 58 to check the result of the comparison, issuing
an ERROR signal in the event the comparison indicates a
faulty operation. The ERROR signal is communicated to the
instruction unit 12 for handling in a manner that will be
discussed more fully below.
Turning now to Fig. 2, there is illustrated the
state diagram of the multiplier state machine 40, from which
those skilled in the art will see indicates that the design
of the multiplier state machine 40 itself is of a relatively
simple design.
As indicated in Fig. 2, the multiplier state
machine resides first in an IDLE state. After the X and Y
registers are loaded (preferably simultaneous therewith),
the multiplier state machine 40 is caused to move from the
IDLE state to a DRIVE_l state, during which the content of
the X and Y registers 26, 28 are applied to the multiplier
array 32. ~'
From the DRIVE_l state, the multipliër state
machine 40 moves to a DRIVE_2 state, during which a SEL
signal is issued by the state machine 40 to cause the
multiplier 34 to select the multiplier array 32 for
communication to the CU output register 36. At
approximately the same time, the multiplier state machine 40
will issue a load (LD) signal that causes the output of the
multiplier array 32 to be loaded in the CU output register
36. Thereafter, the multiplier state machine 40 will return
from the DRIVE_2 state to the IDLE state, where it will
remain until the next multiply instruction.
Having now described the circuitry used to
implement the present invention, and briefly discussing a
multiply operation, the deferred comparison multiplier check
can now be described in greater detail with respect to Fig.
3. For that discussion, assume that a previous multiplier
operation had been performed, resulting in (1) an initial
20 19~6~
multiplier result and (2) a redundant result. The initial
multiplier result is stored in the SPAD 64 (Fig. 1) of the
data unit 18 - shifted one bit. The redundant multiplier
result is presumed to be presently contained in the control
unit output register 36.
As Fig. 3 indicates, at step 80, the operands for
the present multiply operation are transferred to the X and
Y registers, and the multiply operation started. At the
same time, one of the operands is communicated to a scratch
register file 50b via the ALU 58, so that it is doubled
(i.e., shifted left one bit); and the other operand is also
stored in a scratch register 50b, unaltered.
At step 82, while the present multiply operation
is in progress, the result of the previous multiply
operation is checked as follows: The redundant multiplier
result is transferred from the CU output register 36 to a
scratch register 50b of the register file 50, and from there
to one of the J or K registers 54, 56 of the data unit 18.
At the same time, the (shifted) initial result is
transferred from the SPAD 64 to the other of the'J or K
registers 54, 56. Then, at step 84, the contents of the two
registers are exclusive-ORed with one another by the ALU 58,
and the result monitored by the zero detect circuit 70,
effecting the compare operation. If no error is detected,
the output of the ALU 58 will be all ZEROs, and the ERROR
signal is not asserted by the zero detect circuit. If an
error is detected, another series of steps are taken to
ensure that the error is a multiplication error, as will be
described more fully below.
Assuming the compare step 84 detects no error,
step 86 is proceeded to in which the operands of the present
multiplication are moved from their scratch registers 50b to
the SPAD 64 via the J and K registers 54, 55. In step 80,
the result of the present multiply operation is moved from
the CU output register 36 (where it has been loaded by the
multiply operation) and stored in one of the scratch
register files 50b. At the same time, the operands for the
20190~
.",..
11
redundant multiply operation of the present multiply
operation are transferred to the J and K registers, and from
there to the X and Y registers to start a redundant
multiply.
In step 90, the present multiply result is passed
through the ALU 58 to double it (i.e., shift left one bit
position) and stored back in a scratch register file 50b.
In step 92, the doubled result is moved to the SPAD 64, and
the present (unaltered) result is moved to the top of the
register stack (TOS) of the LIFO 50a, where it is available
for the processor 10. That completes the steps of the
multiplier operation.
As indicated above, the CU output register 36 is
available for use by other circuitry, delineated OTHER in
Fig. 1 coupled to one of the inputs of the multiplexer 34.
Thus, there is occasion when the data from such OTHER
circuitry is loaded in the CU output register before a check
of a multiplier result can be made, destroying the content
(i.e., the redundant result that sits in the CU output
register 36 until the compare step 84 step can b'e made).
Thus, step 84 will cause the zero detect circui-t 70 to
produce an error, and the step 84 most likely will be
followed by a series of steps, under control of the
instruction unit, to ensure that the redundant result is
compared to the shifted initial result.
Digressing for a moment, there is, as those
skilled in this art will recognize, a chance that in the
circumstance described above, the CU output register 36 will
be loaded with data by such OTHER circuitry which,
coincidentally, may have a bit configuration identical to
the initial (but possibly incorrect) multiplier result. In
this case the multiply operation will go unchecked.
However, as the skilled artisan will also readily recognize,
the chance of this occurrence, coupled with the chance of
multiplier error, is so small as to be infinitesimal -- and
an acceptable risk.
20190~
12
Thus, at step 100, the result of the present
multiply is moved from the CU output register 36 and stored
in a scratch register fi~e 50b. Next, in step 102, the
operands for the redundant multiply operation are moved from
the scratch register file 50b to the X and Y registers 26,
28, to start the redundant multiply operation. The
processor 10 waits until completion in order to ensure that
the CU output register 36 will contain the redundant result.
In step 104, the initial (doubled) result is moved from the
SPAD 64 to one of the J, K registers 54, 55, while the
redundant result is moved from the CU output register 36 to
the other of the J, K, registers, and exclusive-ORed with
one another via the ALU 58, as described above. This time,
if the zero detect circuit determines an error, and the
method of the present invention is used in a multi-processor
~fault-tolerant) system, the processor 10 can preferably be
halted, and a back-up processor (not shown) would take over
the work of processor 10. Other procedures may well be
taken in a single processor (or non-fault tolerant) system.
If no error is detected, the procedure moves back to step 86
and progresses through steps 88, 90, 94 as described above.
A word about certain of the features of the
invention: First, as has been seen, the multiplier
operation is checked by creating a redundant result from
swapped operand, one of which is shifted left (i.e.,
multiplied). A multiplication (shift left) of the one
operand is preferred over a divide (shift right) to preserve
the integrity of the shifted operand (i.e., bits are not
lost shifting left, and any overflow can be handled
specially).
Second, although the check is performed when a
subsequent multiply instruction is executed, as noted above,
there is another instance that will cause a check to be
initiated. Since the initial result of the multiply
operation is immediately available to the processor for use,
and a subsequent check indicates that the result may be
erroneous, it is relatively easy to back-up and make
2019064
13
appropriate adjustments -- as long as the use of the
(unchecked) initial result remains within the confines of
the processor 10. This back-up capability may be lost
however, if the unchecked initial result is allowed to
propagate beyond the processor 10. Thus, it is preferred
that if the unchecked initial result is to be sent beyond
the processor 10 (e.g., to secondary storage, another
processor, or the like) before a subsequent multiply
instruction causes initiation of a check, a multiplier check
is initiated before the initial result is allowed to leave
the processor. This is what is done. The steps of Fig. 3
are performed, as described above, except that no current
multiply is performed; only the check is made.
Finally, there are various other methods of
handling detection of an error in step 104 (Fig. 3) than
halting operation of the processor 10, many depending upon
the environment within which the present invention is used.
For example, as known to those skilled in this art, more
than one process can be running on the processor 10. Thus,
it could happen that a current process has initiated a
multiply operation, but before that multiply operation is
checked (e.g., by a check initiated by a subsequent multiply
or an operation that will send the initial result beyond the
confines of the processor 10) the current process is
momentarily swapped in favor of another process (or
processes). In such an instance, it would be preferable to
initiate a multiplier check before the process swap takes
place.
Thus, the deferred multiplier checking method of
the present invention is applicable to a wide variety of
computing systems, even single processor systems. And,
while a full and complete disclosure of the invention has
been disclosed, it will be evident to those skilled in this
art that modifications can be made without departing from
its essence. For example, while permutation of the operands
used in the redundant multiply operation involves swapping,
and shifting one operand, other techniques are available.
2019~6~
~ ,.", ~
14
One such other technique may involve complementing one of
the operands used in the redundant multiply operation, or
shifting one operand left while the other operand is shifted
right (if loss of resolution is not a problem). Thus, those
skilled in the art will readily recognize that one may use a
variety of combinations of the operand permutations
heretofore discussed, i.e., swapping complementation and/or
shifting (right or left).