Patent 1165005 Summary

(12) Patent:	(11) CA 1165005
(21) Application Number:	1165005
(54) English Title:	PROCESSING ELEMENT FOR PARALLEL ARRAY PROCESSORS
(54) French Title:	ELEMENT DE TRAITEMENT POUR GROUPE DE PROCESSEURS EN PARALLELE
Status:	Term Expired - Post Grant

Bibliographic Data

(51) International Patent Classification (IPC):	G06F 15/16 (2006.01) G06F 13/00 (2006.01)
(72) Inventors :	BATCHER, KENNETH E. (United States of America)
(73) Owners :	GOODYEAR AEROSPACE CORPORATION
(71) Applicants :	GOODYEAR AEROSPACE CORPORATION
(74) Agent:	MARKS & CLERK
(74) Associate agent:
(45) Issued:	1984-04-03
(22) Filed Date:	1983-04-12
Availability of licence:	N/A
Dedicated to the Public:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:

Application No.	Country/Territory	Date
108,883	(United States of America)	1979-12-31

Abstracts

English Abstract

PROCESSING ELEMENT FOR
PARALLEL ARRAY PROCESSORS
ABSTRACT OF THE DISCLOSURE
A processing element constituting the basic
building block of a massively-parallel processor.
Fundamentally, the processing element includes an
arithmetic sub-unit comprising registers for operands,
a sum-bit register, a carry-bit register, a shift
register of selectively variable length, and a full
adder. A logic network is included with each proces-
sing element for performing the basic Boolean logic
functions between two bits of data. There is also
included a multiplexer for intercommunicating with
neighboring processing elements and a register for
receiving data from and transferring data to neighbor-
ing processing elements. Each such processing element
includes its own random access memory which communicates
with the arithmetic sub-unit and the logic network of
the processing element.

Claims

Note: Claims are shown in the official language in which they were submitted.

The embodiments of the invention in which an exclusive
property or privilege is claimed are defined as follows:
1. An array of a plurality of processing elements
interconnected with each other and wherein each processing
element comprises:
an adder;
first and second data registers connected
with and supplying data bits to said adder;
a carry register connected to said adder and
receiving therefrom data bits resulting from arithmetic
operations and which functions according to the rule:
C ?APvPCvAC where A is the state of said first register,
P is the state of said second register, and C is the
state of said carry register;
a memory; and
a data bus interconnecting said first, second,
and carry registers and said memory for the transfer
of data thereamong.
2. The array as recited in claim 1 wherein each
processing element further includes a shift register
of selectably variable length interconnected between
said first data register and said adder.
3. The array as recited in claim 1 wherein said
carry register comprises a J-K flip-flop.
4. The array as recited in claim 3 wherein each
processing element further includes a sum register inter-
connected between said shift register and said adder,
said sum register functioning according to the rule
B ?A?P?C, where B, A, P, and C are respectively the
states of said sum, first, second, and carry registers.
5. The array as recited in claim 1 wherein each
said processing element includes logic means interconnected
21

with said second register for performing the sixteen
logic functions possible between the data of said second
register and a data bit from said data bus.
6. The array as recited in claim 1 wherein said
second register of each said processing element is communicat-
ingly interconnected with said second register of orthogonally
neighboring processing elements within the array.
22

Description

Note: Descriptions are shown in the official language in which they were submitted.

1. 3~0~
PROCESSING ELEMENT FOR
P~~LLEL A~RA~ ~OC~SSOP~S
_
BACKGROVND OF THE INVENTION
_ . _
The instant invention resides in the art
of data processors and, more particularly, with large
scale parallel processors capable of handling large
volumes of data in a rapid and cost-effective manner.
Presently, t~e demands on data processors are such
that large pluralities of data must b~ arithmetically
and logically processed in short periods of time for
purposes of constantly updating previously obtained
results or, alternatively, for monitoring large
fields from which data may be acquired and in which
correlations must be made. For example r this country
is presently intending to orbit imaging sensors which
can generate data at rates up to 1013 bits per day.
For such an Lmaging system, a variety of image
processing tasks such as geometric correction, correl
ation, image registration, feature selection, multi-
spectral classification, and area measurement are
required to extract useful information from the mass
( of data obtained. Indeed, it is expected that the
work load for a data processing system utilized in
association with such orbiting image sensors would
fall somewhere ~etween 109 and 101 operations per
second.
High speed processing systems and sophisti-
cated parallel processors, capable of simultaneously
operating on a plurality of data, have been known for
a num~er of years. Indeed, applicant's prior U.S.
Patents 3,800,289; 3,812,467; and 3,936,806, all
relate to a structure for vastly increasing the
data processing capability of digital computers.
Similarly, U.S. Patent 3,863,233, assigned to Good-
year Aerospace Corporation, the assignee of the

2, o ~
insta.nt application~ relates specifically to a
d~ta processi~ng element ~or an associati~e or paral-
lel processor ~hic~ also increases data processing
speed By ~ncluding a plurality of arithmetic units,
one for each word in the memory array. However, even
the great advancements of these prior art teachings
do not possess the capa~ility of cost effectively
handllng the large volume o~ data previously described.
~ s~tem of the required nature includes thousands of
pxocessiny elements, each including its own arithmetic
and lo~ic network operating in conjunction with its
o~n memory, ~ile possessing the capability of commu-
nicatl'ng ~ith other sLmilar processing elements with-
in the system~ ~ith thousands of such processing
e.le~ents operating simultaneously (~assive-parallelism),
the requisite speed-may be achie~ed. Turther, the fact
that typical satellite images include millions of
picture e.lements or p~x~is that can generally be pro-
cessed at the same time, such a structure lends itself
well to t~e solution of the aforementioned problem~
In a syste~ capa~le of processing a large
,~olume of data in a ~assively~parallel manner, it is
'~ most desirable that the system be capable of perform-
Ing ~.t-serial mathematics for cost effectiveness.
~o~eyex~ in order to increase speed in the bit-seria
c~mputation, it is most desira~le t~at a variable
length shi~ft reyister be included such that ~arious
word lengths m~y be accommodated. Further, it is
de$irab.1e t~at the massi~e array of processing ele-
ments be capable of intercom~unication such that
data ~ay ~e moved ~et~een and among at least neighbor-
ing process~ng elements~ ~urther, ;t i.s desirable
tha,t each process~ng element be capable of performing
all of the Boolean operations possible between two
~i`t~ of data, and that each such processing element
include its own rand~n access memory~ ~et ~urther,

0~
3.
for such a system to be efficient, it should include
means for bypassing inoperative or malf-unctioni~g processing
elements without diminishing system integrity.
OBJECTS OF THE INVENTION
In light of the foregoing, it is an object
of an aspect of the invention to provide a plurality
of processing elements for a parallel array processor
wherein each such element includes a variable length
shift register for at least assisting in arithmetic
computations.
An object of an aspect of the invention is
to provide a plurality of processing elements for a
parallel array processor wherein each such processing
element is capable of intercommunicating with at least
certain neighboring processing elements.
An object of an aspect of the invention is to
provide a plurality of processing elements for ~ parallel
array processor wherein each such processing element is
capable of performing bit-serial mathematical computations.
An object of an aspect of the invention is to
provide a plurality of processing elements for a parallel
array processor wherein each such processing element is
capable of performing all of the Boolean functions capable
of being performed between two bits of binary data.
An object of an aspect of the invention is to
provide a plurality of processing elements for a parallel
array processor wherein each such processing element includes
its own memory and data bus.
An object of an aspect of the invention is to
provide a plurality of processing elements for a parallel
array processor wherein certain of said processing elements
may be bypassed should they be found to be inoperative
or malfunctioning, such bypassing not diminishing the system
integrity.
An object of an aspect of the invention is to
provide a plurality of processing elements for a parallel

5~0~
4.
array processor which achieves cost-effective processing
or a large plurality of data in a time-efficient manner.
SUMMARY OF THE INVENTION
An aspect of the invention is as follows:
An array of a plurality of processing elements
interconnected with each other and wherein each processing
element comprises:
an adder;
first and second data registers connected with
and supplying data bits to said adder;
a carry register connected to said adder and
receiving therefrom data bits resulting from arithmetic
operations and which functions according to the rule:
C ~---APVPCVAC where A iS the state of said first
register, P is the state of said second register, and C
is the state of said carry register;
a memory; and
a data bus interconnecting said first, second,
and carry registers and said memory for the transfer of
data thereamong.
DESCRIPTION OF DRAWINGS
For a complete understanding of the objects,
techniques, and structure of the invention, reference
should be had to the following detailed description and
accompanying drawings wherein:
Fig. 1 is a block diagram of a massively-parallel
processing system according to the invention, showing the
interconnection of the array unit incorporating a plurality
of processing elements;
Fig. 2 is a block diagram of a single processing
element, comprising the basic building block of the array
unit of Fig. l;
Fig. 3, consisting of Figs. 3A 3C,
constitutes a circuit schematic of the control signal
generating circuitry of ~he processing elements main-

0 5
~.
tained u~orL a chip ~nd including the sum or and parit~
trees;
Fi~,, 4 ~s a detailed ci~cuit sch~matic
of the fundamental ci~rcu~try of a processing
element of the ~n~ention;
Fi~ 5, compri~s~ng ~igs~ 5~ an~ 5B, presents
circuit sche~atics of the ~w~tchi.ng circuitry utilized
in remoYin~ an inoperatl~ve or ~alfunctioni~n~ processin~
element f~om the array un~t~
, .. ... ... .... ..
DET~,ILED DESC:RIPTION OF PREFE:R~D 3~1BODIME~IT
Referring no~ to the drawin~s and ~oxe
particularly ~.~g< 1, ~t can ~e seen that a m~ssi~ely~
parallel processor is designated ~enerally by the
numer~l 10~ A ke~ el~ment of t~e. processor 10 is
the array unit ~2 which~ ~n a pre~erred em~odiment
of the invention., includes a ~atrix o~ 128 X.128
processing elements, for a total of 16,384 processing
ele~ents, to be descriBed in deta~l hereinaft~r~
The arra~ unit 12 i~nputs d~ta on i~ts left side and
outputs data on its right side oYer 128 parallel lines,
The maxL~um transfer rate of 128~bit columns of data
. is 10 mhz for a ~xLmum b~nd~idth o~ 1~28 billion bits
per second~ Input, output, or ~oth, can occur simul-
taneously ~ith processin~
Electron.ic s~itches 24 select the input of
the axra~ un~t 12 from the 128-b~t interf~ce o~ the
processor.10~ or ~rom the i~np~t register 16~ SLmi~
larlyr the array 12 output ~ay be steexed to the
3~ 12~h~t output i:nterface of the processor lQ or to the
output register 14 ~ia switches 26, These switches
24,26 are controlled ~y the program and data ~anagel
~ent unit 18 under suita~le program control~ Control
5ignals to the arr~y unit 12 and status ~.its from the
array un-`t may ~e connected to the external control
interface.of the proce~sor la ox to the arra~ control

-~ 3~
unit 2~ A~ainl this transfer is ach.ieved by electron-
ic switche~ 22, ~h.i.ch are under program control of
the uni~t 18
The array control UnIt 20 ~roadcasts
control signals and memory addresses to all pxoces-
sing el~ments of the arra~ unit 12 and receives
status bits there~rom, It is desi~ned to perform
~ookkeeping operations such as address calculation,
loop control, ~ranc~ing, su~rout~ne call~n~ and the
like It operates sLmultaneously with the processing
element control such that full processing power of the
processing elements of the array unit 12 can ~e applied
to the data to be handleds The control unit 20 ~n~
cludes three separate control units; the processiny
element control unit executes micro-coded ~rector
processin~ routines and controls the processing ele-
ments and their associated memories; the input~output
control unit controls the shiftin~ of data through the
array unit 12; and the main control unit executes the
application programs, performs the scaler processing
internally, and makes calls to the processing element
control unit or all ~ector processing/
The program and data management uni.t 18
manages data flo~ ~etween the units of the processor
10, loads programs into the control unit 20, executes
system tests and diagnosti~ routines, and pro~ides
progr~m de~elopment ~acilities~ The details of such
structure axe not important ~or an understanding of the
instant in~ention, ~ut it ~ould ~e noted that the unit
18 may readily co~prise a mi-ni-computer such. as the
Digital Equipment Corporation ~PEC~ PDP~ 34 ~ith
interface~ to the con.tr~l unit 2a, arra~ un.it 12
(registers 14,16~.~ and the external computer interace~
As is well known ~n the axt, the unit 18 may also
include peripheral equipment such as ma~netic tape drive
28, disks 30, a line printer 32, and an alphanumeric

s
terminal 34~
~hile the structure of Fig.. 1 is of some
significance ~or an appreciation of the overall
system incorporating the învention, it is to be
understood that the details thereof are not necessary
for an appreciation of the scope and ~readth of
applicant's inventive concept. Suffice it to say at
this time that the array unit 12 comprises the
inventive concept to ~e descri~ed in detail herein
and that such array includes a large plurality of
interconnected processing elements, each of w~ich has
its own local memory, is capable o~ performing arith;
metic computations, is capa~le o~ performing a full
complement of Boolean functions, and is furthar
capable of communicating wit~ at least the processing
elements orthogonally neigh~oring it on each side,
hereinafter referenced as north, south, east, and ~est~
With specific reference now to Fig 2, it
can be seen that a single process~ng element is desig-
nated generally ~y the numeral 36. The processing
element itself includes a P register 38 which~
together with its input logic 40, performs all logic
and routing functions for t~e process~ng el2ment 36~
The A, B, and C registers 42-46, the vari~able length
shift register 48 and the associated logic of the
full adder 50 comprise the arithm~tic unit of the
processing element 36. The G register 52 is proYided
to control masking of ~oth arithmetic and logical
operations, while the S register 54 is used to shi~t
data into and out of the processing element 36
without disturbing operations thereo~, ~inally, the
aforementioned elements of the processing element 36
are connected to a uniquely associated random access
memory 56 by means of a ~i-directional dat~ ~us 58,
35~ As presently designed, the processing
element 36 is reduced ~y larye scale integration to

` 8 1~6~S
such a size that a single chip may include eight such
processing elements along with a parity tree, a sum-or
circuit, and associated control decode~ In the pre-
ferred embodiment of the invention, the eight pro-
cessing elements on a chip are provided in a two row
by foux column arrangement. Since the size of random
access memories presently availa~le through large
scale integration is rapidly changing, it is preferred
that the memory 56, while comprising a portion of the
processing element 36, be maintained separate from the
integrated circuitry of the remaining structure of the
pxocessing elements such that, when technology allows,
larger memories may be incorporated with the process-
ing elements without altering the total system design.
The data bus S8 is the main data path ~or
the processing element 36. During each machine cycle
it can transfer one bit of data ~rom any one o~ six
sources to one or more destinations. The sources
include a bit read from the addressed location in
the random access memory 56, the state of the B, C,
P, or S registers, or the state of the equivalence
function generated by the element 60 and indicating
the state of equivalence existing between the out-
puts of the P and G registers, The equivalence
function is used as a source during a masked-negate
operation.
The destinations of a data ~it on the data
bus 58 are the addressed location of the random access
memory 56, the A, G, or S re~isters, the logic asso-
ciated with the P register, the input to the sum~or
tree, and the input to the p~XLty tree~
Before considering the detailed circuitr~
of the processing element 36, attention should ~e
given to Fig. 3 ~herein the circuitry 62 for generat.ing
the control signals for operating the processing
elements is shown. The circuitry of Fig, 3 is

9.
included in a large scale integrated chip which
includes eight processing elements, and is responsible
for controlling those associated elements~ Funda-
mentally, the circuitry o~ Fig 3 includes decode
logic receiving control signals on lines L0-LF
under program control and converts those signals
into the control signals Kl~K27 ~or application to
the processing elements 36, sum-or ~ree, and parity
tree. Additionally, the circuitry of Fig. 3 gener-
ates from the ~ain clock of the system all ot~er
clock pulses necessary for control of the processing
element 36.
One skilled in the art may readily deduce
from the circuitry of Fig. 3 the relationship between
the programmed input function on the lines L0-LF and
the control signals Kl-K27. For example, the
inverters 64,66 result in Kl=LC. Similarly, inverter
68-72 and NAND gate 74 result in K16=LO-Ll- By the
same token, K18=L2 L3.L4 L6.
Clock pulses for controlling the processing
elQments 36 are generated in substantially the
same manner as the control signals. The same would
~be readily apparent to those skilled in the art
from a review of the circuitry 62 of Fig. 3. For
example, the clock S-CLK = S-CLK-ENABLE- MAIN CLK
by virtue of inverters 76,78 and NAND gate 80.
Similarly, clock G-CLK = L8-MAIN CLK by virtue of
inverters 76,82 and NAND gate 84.
With further respect to the circuitry 62
3~ of Fig 3, it can be seen that there is provided
means for determining parity error and the sum-or
of the data on the data bus of all processing ele-
ments. The data bit on the data bus may be presented
to the sum-or tree, which is a tree of inclusive-or
logic elements which forms the inclusive-or of all

1 0 ~
processing element data bus states and presents the
results to the array control unit 20~
In order to detect the presence of process-
ing elements in certain states, groups of eight pro-
cessing elements are ORed together in an eight inputsum-or tree whose output is then fed to a 2048-input
or-tree ex~ernal to the chip to achieve a sum~or
of all 16,384 processing elements.
Errors in the random access memory 56 may be
determined in standard ~ashion by parity-generation
and checking circuitry. With each group of eight pro-
cessing elements 36 there is a parity-error flip-flop
86 which is set to a logic 1 whenever a parity error
is detected in an associated random access memory
56. As shown in the circuitry 62, the sllm-or tree
comprises the three gates designated by the numeral
88 while the parity error tree consists o~ the seven
exclusive-OR gates designated by the numeral 90.
During read operations, the parity output is latched in
the flip-flop 86 at the end of the cycle by the M-
clock. During write operations, parity is outputted
to a parity memory through the parity-bit pin of the
chip. The parity memory comprises a ninth randQm
access memory similar to the elements 56. The parity
state stored at the parity bit during write opera-
tions is exclusive -ORed with the output of the parity
tree 90 during read operations to affect the latch 86.
As shown, control signal K23 determines
whether a read or wr~te opèration is ~eing performed,
while K24 is used for clearing the parity-error
flip-flop 86. The sum-or tree 88 OR's all of thP
data bits D0-D7 on the associated data bus lines of
the eight processing elements 36 of the chip. ~s can
be seen, both the par~ty outputs and the sum-or out-
puts are transferred via the same gating ~atrix 92,

11 .
which is controlled by K27 to determine whether par-
ity or sum~or will be transferred from the chip to
the array control unit 20. The oUtputs of the
flip flops 86 of each of the processing elemenks are
connected to the 2048 input sum-or tree such that the
presence of any set flip-flop 86 might be sensed.
By using a flip-flop whîch latches upon an error, the
array control unit 20 can sequentially disable
columns of processing elements until that column
containing the faulty element is found.
Finally, and as will be discussed further
hereinafter, control signal K25 is used to disable
the parity and sum-or outputs from the chip when the
chip is disabled and no longer used in the system.
While the utilization of sum-or and parity
functions are known in the art, their utilization in
the instant invention is important to assist in
locating faulty processing elements such that those
elements may be removed from the operative system.
The trees 88,90, mutually exclusively gated via the
network 92, provide the capability for columns o
processing elements 36 to be checked for parity and
( further provides the sum-or network to determine the
presence of processing elements in particular logic
states, such as to determine th responder to a search
operation. The number of circuit elements necessary
for this technique have been kept to a minimum by
utilizing a single output for the two trees, wi-th
that output being multiplexed under program control.
With final attention to Fig. 3, it can
be seen that the disable signal, utilized for removing
an entire column of processing element chips from
the array unit 12, generates the signal K25,K26 for
this purpose. As mentioned above, the control signal
K25 disables the sum-or and parity outputs for asso-
ciated processing elements. Further functions of the

12.
signals K25,K26 with respect ko removing selected
processing elements will be discussed with respect
to Fig. 5 hereinafter.
With reference now to Fig. 4, and correlat-
ing the same to Fig. 2, it can be seen that the full
adder of the invention comprises logic gates 94-100.
This full adder communicates with the B register
comprising flip-flop 102 which receives the sum bit,
the C register which comprises flip-flop 104 which
receives the carry bit, and further communicates
with the variable length shift register 48 which
comprises 16, 8, and 4 bit shift registers 106-110,
flip-flops 112,114, and multiplexers 116-120.
The adder receives an input ~rom the shift
register, the output of the A register 122, and an
input from the logic and routing sub-unit the output
of the P register 124. Whenever control line K21 is
a logic 1 and BC-CLK is clocked, the adder adds the
two input bits from registers A and P to the carry
bit stored in the C register 104 to form a two-bit
sum. The least significant bit of the sum is clocked
into the B register 102 and the most significant bit
( of the sum is clocked into the C register 104 so
that it becomes the carry bit for the next machine
cycle. If K21 is at a logic 0, a 0 is substituted
for the P bit.
As shown, control line K12 sets the C
register 104 to the logic 1 state while control line
K13 resets the C register to the logic 0 state.
Control line K16 passes the state of the B register
102 onto the bi-directional data bus 58, while
control line K22 transfers the output of the C regis-
ter to the data bus.
In operation, the full adder of Fig. 4
incorporates a carry function expressed as follows:
C ~ AP v PC v AC.

i 16500~
13.
The new state of the carry register C, flip-flop 104,
i5 equivalent to the states of the A and P registers
ANDed together, or the states of the P and C regis-
ters ANDed together, or the states of the A and C
registers ANDed together. This carry function is
achieved, notwithstanding the fact that there is no
feedback of C register outputs to C register inputs,
because the JK flip-flop 104 follows the rule:
C~--JC v XC.
The new state of the C register is the complement of
the present state of the C register ANDed with the J
( input or the complement of the K input ANDed with the
present state of the C reyister, Accordingly, in the
circuit of Fig. 4, the flip-flop 104 follows the rule:
C~--APC v ~AvP)C .
The expression immediately above is equivalent to the
carry function first given.
With respect to the sum expression, the B
register, flip-flop 102, receives a su m bit which is
an exclusive O~ function of the states of the A, P,
and C registers according to the expression:
B~--A ~ P ~ C .
( The gate 98 generates A ~ P from gates 94 and 96 which
gates 100 exclusive OR's that result ~ith C to achieve
the sum expression.
The shift register of the arithmetic unit
of the processing element 36 has 30 stages. These
stages allow for the shift registers to have varying
lengths so as to accommodate various word sizes,
substantially reducing the time or arithmetic opera-
tions in serial-by-bit calculations, such as occur
in ~ultiplication. Control lines Kl-K4 control
multiplexers 116~120 so that certain parts of the
shift register may be bypassed, causing the length
of the shift register to be selectively set at either
2, 6, 10, 14, 18, 22, 26, or 30 stages. Data bits

0 ~
14~
are entered into the shi~t reyister through the B
register 102, these ~eing the sum bits from the adder.
The data bits leave the shift register through the A
register 122 and recirculate back through the adder.
The A and B registers add two stages of delay to the
round-trip path. Accordingly, the round-trip length
of an arithmetic process i5 either 41 8, 12, 16, 20,
24, 28, or 32 stages, depending upon the states of the
control lines Kl-K4 as they regulate the multiplexers
112-120.
The shift register outputs data to the A
register 122 which has two other inputs selectable
via control lines Xl,K2, and multiplexer 120. One
input is a logic 0. This is used to clear the shift
register to an all-zero state. The other input is
the bi directional data bus 58. This may be used to
enter data directly into the adder.
The A register 122 is clocked ~y A-CLK, and
the other thirty stages of the shift register are
clocked by SR-CLK. Since the last stage of the shift
register has a separate clock, data from the
bi-directional data bus 53 or logic 0 may be entered
( into the adder without disturbing data in the shift
register.
As discussed above, the P register 124
provides an input to the adder 50 with such input
being supplied from one o the orthogonally contiguous
processing elements 36, or from the data bus 58.
Data is recei~ed by the P register 124 from the P
register of neighboring processing elements 36 by
means of the multiplexer 126 under control of control
slgnals K5,K6. In transferring data to the P register
i24 from the multiplexer 126, transfer is made via
inverter 128 andND gates 130,132. The transfer
is effectuated under control of the control signal
K7 to apply the true and complement o the data to

:L5.
the J and K inputs respectively of the flip-flop
124c The data is la~ched under control of the
clock P-CLK. As noted, the true and complement
outputs of the P flip-flop 124 are also adapted to
be passed to the P flip-flops of neighboring pro~
cessing elements 36. The complement is passed off
of the chip containing the Lmmediate processing -
element, but is inverted by a driver at the destina-
tion to supply the true state of the P flip-flop.
The true state is not inverted and is applied to
neighboring processing elements on the same chip.
The logic circuitry 40 is shown in more detail in
Fig. 4 to be under control of control lines K8-Kll.
This logic receives data from the data bus 58
either in the true state or complementary through
the in~erter 130. The logic network 40, under
control of the control signals K8-Kll, is then
capable of performing all sixteen Boolean logic
functions which may be performed ~etween the data
from the data bus and that mai~tained in the P
register 124~ The result is then stored in the
P register 124.
( It will be understood that with K7=0, gates
130,132 are disabled. Control Iines K8 and R9 then
allow either 0, 1, D, or b to be gated to the J input
of the P register, flip-flop 124. D is the state of
the data bus 58. Independently, control lines K10
and Kll allow 0, 1, D, or D to ~e sent to the K input.
Following the-rule of J-K flip-flop operation, the
new state of the P register is defined as follows:
P~-JP v KP ,
As can be seen, in selecting all four s~ates of J
and all four states of K, all sixteen logic functions
of P and D can be obtained.
As discussed above, the output of the P
register may be used in the arithmetic calculations

16,
of the processing elements 36, or may be passed to
the data bus 58~ If K21 iS at a logic 1, the
current state of the P register is enabled to the
adder logic, If ~q is a logic 0, the output of
the P register is ena~led to the data bus. ~f
K15 is at a logic 0, the output of the P register
is exclusi~ely OR'ed with the complement of the G
register 132, and the result is ena~led to the
data bus. It will be noted that certain transfers
to the data bus are achie~ed via bi-directional
transmission gates 134,136, respectively enabled
by control signals K14 and K15. These types of
gates are ~ell known to those skilled in the art.
The mask register ~, designated by the
numeral 132, comprises a simple D-type flip-flop.
The ~ register reads the state of the ~i directional
data bus on the positive transition of G CLK. Control
line K19 controls the maskiny of the arithmetic
sub-unit clocks ~A-CLK, SR-CLK, and BC-CLK). When
Kl9 equal5 1, these clocks will only be sent to
the arithmetic suh-units of those processing elements
where G=l, The arithmetic su~-units of those
processing elements where G=0 will not be clocked
and no register and no sub-units will change state.
When K19 = 0, the arithmetic sub-un~ts of all
processing elements will participate in the opera-
tion.
Control line K20 controls the masking of
the logic and routing sub-unit. When ~ = 1, the
clock P-CLK is only sent to the logic and routing
sub-units of those processing elements where G=l.
The logic and routing sub-units of those processing
elements where G=0 will not be clocked and their P
registers will not change state.
Translation operations are masked when
control line K20 = 1~ In those processing elements

--~ I 16.5~
where G-l, the P register is clocked by P-CLK and
recei~es the state of its neighbor~ In those where
G~O, th.e P register is not clocked and does not
chan~e state. Regardless of w~ether G=0 or G=l,
each processing element sends t~e state of its P
register to its ne~hBors,
Brief attention is now given to the
equivalence function provided for ~y the inclusive
OR gate 138, ~hich provides a logic 1 outp~t when
the inputs thereof from the P and G registers are
of common logic states. In other words, the gate
138 provides the output function of P ~ G , This
result is then supplied to the data bus~
The.S register c~mprises a D-type flip-flop
140 ~ith the input t~ere.to under control of the
multiplexer 142~ The output from the S register is
transmitted to the data bus 58 by means of the
bi-directional transmission gate 144. The flip-flop
140 reads the sta~e.o.f..it.s input on the transition
of the clock pulse S-CLK-TN ~ When control line
Kl~ is at a logic 0, the multiplexer 142 receives
the state of the S register of the processing ele-
ment .immediately to the west~ In such case, each
S-CLK-IN pulse will shi~t the data in the S regis-
ters one place to the east. To store the state of
the S register 140 in local memory, control line
K18 is set to a logic 0 to enable the ~i-directional
transmission gate 144 to pass the complementary
output of the S register.l40 through the inverter
146 and to the data bus 58~ T~e S register 140 may
be loaded with a data bit from the local ~emory 56
by setting K17 to a logic 1, and thus enabling the
data ~us 58 to the input of the flip-flop 140.
As mentioned hereînabove, a particular
attribute of the massively-parallel processor 10
is that the array unit 12 is capable of bypassing

05
18.
a set of cOlUmn5 0~ processing elements 36 should
an error or fault appear in that set. As discussed
earlier herein, each chip has two processing elements
36 in each o~ four columns of the array unit matrix.
The instant in~ention disables columns of chips
and, accordingly, sets of columns of processing
elements. ~undamentally, the columns are dropped
ou~ of operation by merely jumping the set of columns
by interconnecting the inputs and outputs of the
east-most and west-most processing eïements on the
chips establishing the set of columns, The method
of inhibiting the outputs cf the sum-or tree and
the parity tree o~ the chips ~ave previously bePn
described. However, it is also necessary to bypass
the outputs of the P and S registers which intercom-
municate between the east and west neighboring chips.
As shown in Fig. 5A, a chip includes eight
processing elements, PE0-PE7, arranged as earlier
described. The S reg;ster of each processing element
may receive data from the S register of the processing
element immedlately to the west and may transfer data
to the S register of the processing element immediate-
ly to the east~ When ena~led, the chip allows data
to flow from S-INOr through the S registers of PEO-PE3
and then out S-OUT3 to the neighboring chip. Similar
data flow occurs from S-IN~ to S-OUT4. When it is
desired to disable a column of chips, the output
gates of the column of chips which pass the S regis-
ter data to the neighboring east chip are disabled.
That is, control signal X25 may inhibit output gates
148,150 while concurrently enabling the bypass gates
152,154. This interconnects S-IN0 with S-OUT3 and
S-IN~ with S-OUT4, for all chips in the column.
In Fig. 5B it can be seen that co~munica-
tions between the P registers of east-west neighbor-
ing chips may also be bypassed, P register data is

o ~
19.
received from the chip to the west via inverters
156,158 and is transmitted thereto ~y gates 160,162.
Similarly, P register data is received from the
chip to the east via inverters 164,166 and is trans-
mitted thereto via gates 168,170. If the chip is
enabled and P register data is to ~e routed to the
west, then control line K6 is set to a logic 1 and
K26 to a logic 0 so gates 160,162 are enahled and
gates 168,1~0 are disabled~ When routîng to the
east, K6 is set to zero and K26 to one. To disable
the chip, K6 and K26 are ~oth set to a logic 0 to
disa~le all P register east-west outputs rom the
chip and K25 is set to allow the bi-directional ~ypass
gates 172,174 to interconnect WEST-0 with EAST-3 and
WEST 7 with EAST-4. This connects the P registers of
PE3 of the west chip with PE0 of the east chip and
PE4 of the west chip with PE7 of the east chip.
By disabling the parity and sum-or trees and
by jumping the inputs and outputs o~ bordering P and
2~ S registers of the chips in a column, an entire column
of chips may ~e removed from service if a fault is
detected It will be understood that while the pro-
cessing el~ents of the disabled chips do not cease
functioning when disabled, the outputs thereof are
simply removed from effecting the system as a whole.
Further, it will be appreciated that, by removing
columns, no action need be taken with respect to
intercolmnunication between north and south neighbors.
Finally, by removing entire chips rather than columns
of processing elements, the amount of bypass gating
is greatly reduced.
In the preferred em~odiment of the inven-
tion, the array unit 12 has 128 rows and 132 columns
of processing elements 36. In okher words, there are
64 rows and 33 columns of chips. Accordingly, there
is an extra column of chips beyond those necessary for

OV~
, . . .
20.
achieving the desired square array. This allows for
the maintenance of a square array even when a fau~ty
chip ls found and a column of chips are to be removed
from service.
Thus it can be seen that the objects of
the invention have been satisfied ~y the structure
pxesented hereinabove. A massively-parallel processor,
having a unique array unit of a large plurality of
interconnected and intercommunicating processing
elements achieve rapid parallel processing. A
variable length shift register allows serial-by-bit
arithmetic computations in a rapid fashion, while
reducing system cost. Each processing element is
capable of performing all requisite mathematical
computations and logic functions and is further
capa~le of intercommunicating not only with neighbor-
ing processing elements, ~ut also with its own
unlquely associated random access memory. Provisions
are made for removing an entire column of processing
chips wherein at least one processing element has
been found to be faulty. All of this structure
leads to a highly reliable data processor which i5
capable of handling large magnitudes of data in
rapid fashion~
While in accordance with the patent stat-
utes, only the best mode and preferred embodiment of
the invention has been presented and described in
detail, it is to be understood that the invention is
not limited thereto or thereby. Consequently, for
an appreciation of the true scope and breadth of the
invention, reference s~ould be had to the following
claims.

Representative Drawing

Sorry, the representative drawing for patent document number 1165005 was not found.

Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee and Payment History should be consulted.

Event History

Description	Date
Inactive: Expired (old Act Patent) latest possible expiry date	2001-04-03
Grant by Issuance	1984-04-03

Abandonment History

There is no abandonment history.

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
GOODYEAR AEROSPACE CORPORATION

Past Owners on Record
KENNETH E. BATCHER

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Claims	1994-03-24	2	46
Cover Page	1994-03-24	1	18
Drawings	1994-03-24	6	149
Abstract	1994-03-24	1	25
Descriptions	1994-03-24	20	868

Language selection

Menus

English Abstract

Event History

Abandonment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 1165005 Summary

English Abstract

Event History

Abandonment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.