Language selection

Search

Patent 1198523 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 1198523
(21) Application Number: 462507
(54) English Title: MULTI-PROCESSOR INTERCOMMUNICATION SYSTEM AND METHOD
(54) French Title: SYSTEME ET METHODE D'INTERCOMMUNICATION A MULTIPROCESSEURS
Status: Expired
Bibliographic Data
(52) Canadian Patent Classification (CPC):
  • 354/234
(51) International Patent Classification (IPC):
  • G06F 9/46 (2006.01)
  • G06F 15/16 (2006.01)
(72) Inventors :
  • NECHES, PHILIP M. (United States of America)
  • SHEMER, JACK E. (United States of America)
  • WATSON, MARTIN C. (United States of America)
  • CRONSHAW, DAVID (United States of America)
  • HARTKE, DAVID H. (United States of America)
  • STOCKTON, RICHARD C. (United States of America)
(73) Owners :
  • TERADATA CORPORATION (Not Available)
(71) Applicants :
(74) Agent: SMART & BIGGAR
(74) Associate agent:
(45) Issued: 1985-12-24
(22) Filed Date: 1982-03-25
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data:
Application No. Country/Territory Date
250,022 United States of America 1981-04-01

Abstracts

English Abstract






Abstract of the Disclosure
A system using a sorting network to intercouple multiple
processors so as to distribute priority messages to all processors
is characterized by semaphore means accessible to both the local
processors and the global resource via the network. Transaction
numbers identifying tasks are employed in the messages, and inter-
faces at each processor are locally controlled to establish trans-
action number related indications of the current status of each
task being undertaken at the associated processor. A single
query to all processors via the network elicits a prioritized
response that denotes the global status as to that task. The
transaction numbers also are used as global commands and local
controls for the flow of messages. A destination selection system
based on words in the messages is used as the basis for local
acceptance or rejection of messages. This arrangement together
with the transaction number system provides great flexibility as
to intercommunication and control.


Claims

Note: Claims are shown in the official language in which they were submitted.






THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:

1. A computing system having a plurality of processors and being
capable of distributed asynchronous processing of multiple tasks and co-
ordinated usage of task results, comprising:
a plurality of processors operating asynchronously but generating
synchronous competing message transmissions, the messages including reference
values of varying data content;
network means coupled to the processors and responsive to the
data content of the competing messages for transferring a priority message
to all processors;
a plurality of storage means, each associated with a different
processor, for storing data relating to the reference values;
a plurality of controller means coupled to the different storage
means and responsive to the reference values and data relating thereto for
controlling processor intercommunication via the network means.

2. The invention as set forth in claim 1 above, wherein the
reference values comprise transaction numbers and the storage means comprises
a library of transaction number locations for storing digital values
representing processor readiness as to tasks identified by particular
transaction numbers.

3. The invention as set forth in claim 1 above, wherein the storage
means each include a dedicated memory section addressable by the reference
values for generating related data.

4. The invention as set forth in claim 2 above, wherein the
reference values comprise destination selection words and the dedicated memory
section comprises a file of destination section data.


5. The invention as set forth in claim 4 above, wherein the
-98-





controller means further comprises means responsive to the destination
selection words for addressing the dedicated memory section, and means
responsive to the related data derived from the memory section for comparing
the values to determine whether the message is intended for the particular
processor.


6. The invention as set forth in claim 5 above, wherein the dedicated
memory section comprises a portion for identifying an individual processor,
a portion for identifying the processor as one of a process class, and a
portion containing hash values.


7. The invention as set forth in claim 6 above, wherein the
dedicated memory section is arranged as at least two groups of commonly
addressed storage maps providing corresponding independent outputs, and gate
means responsive to at least a portion of the destination selection word and
the outputs for detecting the existence of a match.


8. The invention as set forth in claim 7 above, wherein the system
includes a secondary storage having a subset of a data base associated with
each processor and the hash values identify the portion of the data base
associated with the processor.


9. The invention as set forth in claim 3 above, wherein the reference
values comprise a transaction number and a destination selection word, each
having a selectable data content and providing a sort criteria for the
network means, and the dedicated memory section comprises a transaction
number section and a destination selection word section.



10. A system for intercommunication between a plurality of processor
modules comprising:
means at each processor module providing serial messages comprising
initial command bytes varying in data content to characterize the message as
-99-


comprising data, status, control or response messages, and further to
characterize messages within the different types, the command bytes being
followed by subsequent message bytes;
means at each processor for transmitting messages in
synchronism following the termination of prior messages; and
means intercoupling each processor to all other processors and
including means for merging the serial messages in accordance with the data
content therein during transmission to all the processors, wherein the data
contents of the command bytes followed by the remaining message information
successively determine message priority, and data, status, control and
response messages are transferred between processor modules in accordance
with a priority protocol without prefatory communications between processors.

11. The invention as set forth in claim 10 above, wherein said
further message bytes comprise alternatively transaction numbers and
originating processor identification, and wherein said means at each
processor for transmitting messages includes means for transmitting responses
to primary messages in synchronism, the responses having higher priority
than primary messages and being merged with a priority dependent first on
response type and then originating processor identification.

12. The invention as set forth in claim 11 above, wherein said
transaction numbers and originating processor identification are fixed byte
length words following immediately after the command bytes and varying in
data content to act in determining priority.

13. The invention as set forth in claim 12 above, wherein said
primary message information further includes a fixed byte length destination
selection word of varying data content following the transaction number.

-100-





14. The invention as set forth in claim 13 above, wherein said means
intercoupling each processor comprises a network of node means arranged
in successive tiers and includes means for advancing the serial messages in
successive time frames between tiers and means for determining priority
between competing pairs of messages at each node means, and wherein the
means intercoupling each processor further includes clock means coupled
thereto for advancing the serial messages in synchronism.


15. A multiprocessor system for performing multiple data processing
tasks in different individual processors within the system without the need
for substantial software to determine routing, availability and priority
status, comprising:
means providing message packets including priority determining
commands and data;
spatially distributed network means coupled to receive the message
packets and each including a plurality of bidirectional active circuit means
merging at a return circuit means, each of the active circuit means being
responsive to the message packets to continue transfer of competing message
packets having priority whereby multiple messages concurrently on the network
are prioritized in the time and space domains and the single priority message
packet is distributed concurrently on the network; and
a plurality of processor modules, each comprising a processor
operating at a different rate than the network means and interface means
including a random access memory means, the interface means being coupled
to both the network means and the processor means and including means
responsive to the message packets for responding only to those message
packets appropriate for the individual processor, the interface means
including means for storing receive and send messages to and from the
processor, and dedicated memory sections for retaining task identifying and
destination selection information separately accessible to the network means,
whereby tasks may be automatically accepted for the appropriate processors

-101-



without requiring interprocessor communications or central determinations
of priority and routing.


16. The invention as set forth in claim 15 wherein the interface means
includes clocking means operating at a higher rate than the network means
and processor means and control means responsive to the clock means for
time multiplexing operation of the random access memory means between
processor, send and receive phases.


17. The invention as set forth in claim 16 above, wherein the
processors operate asynchronously and the clock means is synchronized with
the network means.


18. The invention as set forth in claim 15 wherein the interface means
further comprise means, including the random access memory means, responsive
to the associated processor for storing status data in the random access
memory means as to different tasks in a form accessible to the network
means.


19. The invention as set forth in claim 18 above,wherein the random
access memory means comprises a reference directory of status indication
response texts and the processor module includes means for controlling status
indication response texts stored in the random access memory means.


20. The invention as set forth in claim 19 above, wherein the
message packets include transaction numbers and the interface means
includes means for directly addressing the random access memory means to

derive status indications without communication with the processor.


21. The invention as set forth in claim 20 above, wherein the status
data includes message count data for use in controlling the flow of
message packets.

-102-





22. The invention as set forth in claim 15 above, wherein the
random access memory means includes message buffer means for assembling
pluralities of receive and send message packets for communication between
the network means and the associated processor.


23. The invention as set forth in claim 22 above, wherein the
interface means includes means for determining from the immediate contents
of the receive and send buffer sections whether the processor module
capacity for receive or send is exceeded, and indicating the nonavailability
of the associated processor in response thereto.


24. The invention as set forth in claim 23 above, wherein the
means for indicating the nonavailability of a processor comprises means
providing a circular buffer receive section in the random access memory
means, and means for detecting an overrun condition therein.


25. The invention as set forth in claim 23 above, wherein the
message buffer means for assembling send message packets comprises a send
buffer storage for message packets, and a send output message complete
vector circular buffer for providing pointers to output message locations.


26. The invention as set forth in claim 15 above, wherein the
message packets include destination control information, and the interface
means include means responsive to the destination control information, for
1) providing a response from a specific identified processor module,
2) providing responses from a specific identified group of processor modules,
or 3) providing responses from all processor modules.



27. The invention as set forth in claim 15 above, wherein the message
packets comprise serial byte sequences including an initial command word,
and including at least one control bit per byte, the control bit sequences
defining internal fields, field transitions within the message packets, and
end of message.
-103-


28. The invention as set forth in claim 27 above, wherein the
network means includes means for granting priority in accordance with
lowest data content and the processor modules include means providing
an uninterrupted sequence of 1's in at least the control bit to represent
the idle state.


29. The invention as set forth in claim 15 above, wherein the
message packet bytes further include parity bit, the processor modules
comprise means for forcing parity errors on change of status, and the
network means comprises means for propagating the forced parity
error once.


30. A system for controlling routing of messages between processors
in a multiprocessor system comprising:
means at each processor for providing a serial message train
including destination selection data comprising alternatively specific
processor identification, identification of a process class, and hashing
values;
lookup table means at each processor including at least three
separate sections defining different destination selection criteria
comprising a specific processor identification section, a processor class
section and a hash value section;
means coupled to each of the processors for delivering to
all processors, from a number of messages concurrently transmitted by the
processors, a single priority message; and
means at each processor responsive to the destination selection
data in the single priority message for addressing the lookup table means
directly with the destination selection data to determine whether the
particular processor is being addressed, such that the processor may be
selected in accordance with one of a variety of multiprocessor modes,

including a general broadcasting mode in the absence of a destination
selection data entry.
-104-





31. The system as set forth in claim 30 above, wherein said means
for delivering a single priority message comprises an active logic
network and the lookup table means comprises a high speed random access
memory and means for accessing the memory separately from the associated
processor and the active logic network.


32. The invention as set forth in claim 31 above, wherein the means
for addressing the lookup table means comprises means for comparing the
value at the addressed location to at least a portion of the destination
selection data.


33. In a system having a number of processors communicating via
an interconnect network, the combination comprising:
a number of buffer means, each associated with a different
one of the processors and coupled to the network and the associated
processor, each buffer means including a dedicated memory reference section
having at least two different selection maps representing different modes
of selection;
the processors generating messages having selection data for
referencing the selection maps; and
a number of interface means, each associated with a different
processor and coupled to the network, and each responsive to the selection
data for referencing the selection maps in the buffer means to determine
whether the selection data specify that the message is intended for the
associated processor, and coupled to control reception of messages intended
for the associated processor.


34. The invention as set forth in claim 33 above, wherein the
interconnect network includes means for providing the messages concurrently

to the processors, and wherein each buffer means includes at least three
reference sections, one organized to denote individual processors, a second
to denote processors by class codes, and a third providing a hash map for
-105-


distributed data.

35. The invention as set forth in claim 34 above, wherein the
interface means further includes means responsive to the selection data and
the reference values for incorporating hash results in the message, and
the buffer means includes a receive message section coupled to store the
messages intended for the associated processor.

36. The invention as set forth in claim 35 above, wherein at least
the hash map section of the buffer means is divided into a number of parts
each identified by a different map selection code and commonly addressed
by a map address, and wherein the messages comprise destination selection
words containing map selection code and map address portions.

37. The invention as set forth in claim 36 above, wherein the
interface means each comprise means for addressing the buffer means with
the map address, and the interface means further comprise comparator means
responsive to the map address and map selection code portions, and the
contents of the buffer means addressed thereby, for determining whether
the message is intended for the associated processor.

38. A bidirectional branching node circuit for a message switching
system in which competing concurrent messages are received, comprising:
a bidirectional upstream port circuit and two bidirectional down-
stream port circuits each including parallel data lines and collision lines;
logic means coupled to the downstream port circuits for
transferring a single one of two competing upstream messages received at the
downstream ports to the upstream port by granting priority in accordance
with a predetermined priority rule determined by message content;
means coupling the upstream and downstream port circuits
and responsive to messages received at the upstream port for providing
downstream transmission of such messages to both downstream ports; and


-106-





control means responsive to the grant of priority between up-
stream messages for generating signals on the losing collision line to
indicate that priority has been granted a message other than the message
received at the port, and for transferring a signal received at the
upstream port on the collision line thereat.


39. The invention as set forth in claim 38 above, wherein the port
circuits each include clock lines to and from the port circuit, and the
node circuit further includes clock timing means coupled to the clock lines.


40. The invention as set forth in claim 39, wherein said control
means further comprises means responsive to the termination of a message
for resetting the control means.


41. The invention as set forth in claim 40 above, wherein the node
circuit comprises, at each port, a plurality of upstream and downstream
data lines, and at least one parity line and wherein the collision line
from the upstream port is coupled to reset the control means on the
termination of the collision signal from the upstream port.


42. A system for transferring data from multiple sources along a
chain of nodes with zero time skew between the data flow at the nodes
comprising:
a plurality of individual active circuit nodes in a sequential
array culminating at an apex node, the coupling between successive nodes
introducing no more than a given maximum time delay;
a master clock source coupled to the apex node;
a plurality of individual clock generating circuits, each
disposed at a different node and generating a controllably adjustable clock
signal at the nominal frequency of the master clock source;
means for returning clock signals from the next successive node
to an originating node;
-107-






means for comparing the clock from an individual clock
generating circuit to the retardation of the returning clock
signals to generate an error signal; and
means for controlling the individual clock generating
circuit with the error signal, whereby all clock generating
circuits maintain a uniform time relation to the master clock.


43. The system as set forth in claim 42 above, wherein the
sequential array comprises a bidirectional tree, wherein the
data is transmitted in message packets, and wherein the active
circuit node include means responsive to the individual clock
signals for transferring the message packets in both directions
from the tree base and along the tree to arrive back at the tree
base in synchronism.


44. The system as set forth in claim 43 above, including
in addition sources coupled to each node at the base of the tree,
and clock means at each source for subdividing the clock interval
to provide at least one other higher frequency clock in pre-
determined phase relationship therewith.


45. The system as set forth in claim 44 above, wherein the
means for providing the higher frequency clock signal includes
means defining at least three signal phases within the clock
interval.


46. The method of placing messages from a plurality of
processors in a sorted order, the messages being generated
during the completion of asynchronous processing tasks, the
method comprising the steps of:
assembling, at the individual processors, sorted
sequences of messages in suborders pertaining to transaction
numbers in the messages;

-108-




synchronously transmitting the highest priority mes-
sages from each of the processors pertaining to a given trans-
action number;
dynamically prioritizing the competing messages to
reject all but that message having highest priority;
synchronously repeating the transmission of successive
highest priority messages until all the messages in the sorted
suborder pertaining to a given transaction number have been
completely transmitted; and
successively repeating the sequence for messages per-
taining to transaction numbers in descending order of priority.


47. The method as set forth in claim 46 above further
comprising the steps of:
querying all processors concurrently as to state of
readiness pertaining to a given transaction number;
locally indicating status of readiness concurrently;
and
prioritizing the local indications to determine the
least state of readiness as to the given transaction number.


48. In a multiprocessor system in which processors transfer
message packets by broadcasting to all other processors, means
for generating messages organized as a series of bytes defining
a plurality of fields comprising:
a command field comprising one of a selected sequence
of message characterizing values of fixed byte length;
a tag field following the command field and comprising
a fixed length transaction number or processor identification
number, both of varying data content and fixed byte length; and
control bits in parallel with the bytes, the control
bit sequence defining the fields.

-109-




49. The system as set forth in claim 48 above, including
further an idle field to designate separations between messages,
the idle field being defined by control bit sequences of fixed
data content.


50. The system as set forth in claim 49 above, wherein the
message further comprises a destination selection field of fixed
length and varying data content following the tag field, and
a data field of variable length following the destination
selection field, the data field optionally including an initial
key field of varying data content for characterizing the data.


51. The system as set forth in claim 50 above, wherein the
destination selection field is subdivided into hash map address
and hash map code selection portions of fixed length.


52. The system as set forth in claim 51 above, wherein the
message further comprises an end of message field defined by a
predetermined control bit sequence.

-110-

Description

Note: Descriptions are shown in the official language in which they were submitted.


3 r
~r~r~
434~-llOD
This application is ~ division of our Canadian patent application
Serial No.399,368 filed ~iarch 25~ 1982.
The Canadian patent application Serial No.399,367 of Philip M.
Neches, filed March 25, 1982, describes and claims improved data processing
systems and methods which in substantial measure are based upon and
emanate from a novel active logic network. The active logic network is so
configured and arranged that it merges a number of concurrent, competing
message packets from dif~erent sources during message transfer on the
network. A converging progression of pair comparisons is continued without
interruption until a winning message packet is determined, and then
distrlbuted in the opposite direction to all s~urces. A coherent priority
scheme is employed that encompasses a variety of message types such that
responses, status and control messages can be intermingled in the trans-
missions. In consequence of these and other factors, the ratio of data
message traffic to overhead traffic on the network is significantly
increased over prior systems. Moreover, the system is broadly expandable
without a concomitant increase in overhead systems or software. Use o
these techniques greatly facilitates the ways in which multiple message
sources, such as an array of data processors may work cooperatively toward
a common purpose. The abovementioned copending application further
discloses and claîms improved multiprocessor systems, including relational
data base machines, arranged both as backend processors and stand alone
systems .

f. r
`t


Reference should be made to the full co-pending appli-
cation for a more detailed appreciation of tht~ different aspec's
and implications of the system, and for a listing and discussion
of patents that -typi~y the present state of the art. The.presen-t
S application is concerned with meeting more spec1fic needs of
multiprocessor systems in general, and the referenced system in
particular.
These needs pertairl to the ways in which the processors
intercommunicate so tha~ they perform their own tasks effi.ciently
and coherently within the global resource context. A versatile
multiprocessor must be able to distribute subtasks in a number of
ways, ascertain the status of the process~rs ~erforming the sub-
tasks, merge and sort message~, correct and revise data, and
ascertain when and how resources have changed (as when processors
lS fail or come on..line)O Performance of such functions has hereto-
fore ent~iled the use of excessive overhead software and hardware.
As one example, it is often required in a multiprocessor
system, such as a data base machine, to route me~sages between
processors so as to select an individual destination processor, or
select a class of processors, or select the destination not upon
a processor identification but instead upon the portion of the
data base distributed to that processor, as by a hashing technique.
In some known systems a prefatory communication sequence is used,
in which a linkage is established be-tween the sending processor
and one or more specific receiving processors. Repeated requests
and acknowledgments may be needed to establish the linkage, and
deadlock conditions that may arise must be overcome by the use of
further hardware and software. In other systems a supervisory
control is cxercised by one processvr or a bus controller, to
assure that the transmitting processor is ready to transmit,


r ?

~ ~~Q ~
Fd ~
tha-t the receiving processor or processors are ready to receive,
that other processors are blocked out of the linkage and that no
extraneous transmissions are generated. Again, the commitment -to
overhead and the intricacies needed to avoid deadlocks require
maintenance functions that become disproportionately large as
the system is expanded (e.g. to more than 16 processors).
One objective of the present invention is to make use
of the various capabilities of the Neches concept without dimi-
nution of the efficiency of the system, requiring adclitional
software, or restricting the capacity for expansion. Another
objective is to provide the needed intercommunication versatility
in such a way that extremely large numbers of processors (e.g.
10243 can be used in a system withou-t imposing overhead penalties
or creating deadlock or lockout possibilities.
Another example o~ what is required o~ a modern multi-
processor system relates to how the system can assuredly determine
the status of the subtasks being performed by one or a number of
processors. A basic requirement is that there be an ability to
interrogate a given processor as to its status, without having the
interrogation affect the status or creating ambiguity as to the
nature of the response. The term "semaphore" has been used in
the industry to characterize the function of testing and setting
a status indication without interruption. The presence of a
semaphore feature is desirable, but should not be incorporated
at the expense of diminished performance or increasecl overhead
load. The determination of status in turn becomes extremely
important in carrying out sort/merge operations ir, a multiproces-
sor system, because the combined results of a plurality of sub-
tasks within a major task cannot be united until the subtasks are
appropriately completed. Another requirement is that the processor

? - ~?
~ t


report its current status and that subtasks be performe~ only
once despite repea-ted interruptions and chan~es in the multi-
processor sequence. In most present systems processor routines
can be disrupted so that significant problems can axise in these
respects. It can readily be a~preciated that where a plurality
of processors are performing related subtasks, the sequences
involved in repeated interrogations and responses as to the degree
of readiness of the individual processors can require significant
overhead, and that the greater the number of processors the more
disproportionate is the commitment to dedicated overhead.
Illustrative of the above, a typical drawback in prior
art multiprocessor systems is the so-called "distributed update"
problem, which is the need to update information a copy of which
may be stored in each of several processing elements. This in-

formation may consist of data records or of information used tocontrol the operation of the system such that processing can be
initiated, terminated, resumed, suspended, or rolled backwards or
forwards without causing any re~uired steps to be erroneously
either duplicated or omitted. Solutions of the distributed
update problem in prior art systems suffer significant limitations.
Some consider only two processors at a time. Still others involve
protocols of in-tercommunication which are so complex that tiley
have resisted all efforts to date to prove their correctness in a
mathematically ri~orous manner.
The complexity of these protocols results from the
need to implement a "~lobal semaphore", a control bit which has
the appearance that it is tested and set in every proces~or as
one uninterruptable operation. The control bits resi~e in
different processors, with variable delays in communication be-

tween them, and necessarily imperfect communications channels

,q _



introduce noise an~ also increase the tendency to errors. Those
skilled in the art will therefore readily appreciate the dif-
ficulty of giving the appearance of a single, uninterrupted
operation when the eiements which compose the operation are
diverse, interruptible, cannot be accessed at the same times, and
prone to failures between access atternpts.
Summary of the Invention
The many message routing, mode control and status
indication functions required for a complex and versatile multi-

processor system are provided in accorda~ce with the invention bya unique combination of message organization and traffic con-
trolling interface circuits functioning with the active logic
network. Message packets, which may be of variable lengths, are
organized into variable se~uences of fields, the field values
lS being selected to have significance ln the prioritizing network.
The interface circ~its store incoming and outgoing message
packets with bidirectional accessibility and moreover include
sections dedicated to destination, status and control functions.
Particular message fields within a broadcast primary message
packet determine, solely by interface operation, whether a
message packet is to be accepted by the associated processor, and
the multiprocessor mode of operation that is to be used. Responses,
status and control messages all function wi-thin this cooperative
format, serving both a prioritizing function and a trafEic control
or routing function.
A further feature is that transaction identities are
included in primary data, status and control messages. ~sing the
transaction identity as a reference, the local processor maintains
at the interface circuits, accessible to the network, a locally
updated indication of the state of readiness of that processor
as to the task bearing the transaction identiry.




In con~unction with ~his feature, a distributed update
as to the ~lobal status of the system ~an be deri~ed by a single
query to all processors. Each responds concurrently with its
local status, and the least ready status is accorded highest
priority in sorting on the network. Thus a single response from
all processors is merged into one indication of global readiness.
In a more specific example of a system and method in
accordance with the invention, the message organization incorpo-
rates an initial command word, which includes a transaction
number for primary, status and control messages, or an originating
processor identification for responses. The numerical values of
the initial portion of the command word, and the transaction
number (or originating processor identification) all form a part
of the coherent set of priority values such that the network
distinguishes between and gives precedence to messages based both
upon type and specific data conterlt within the type. An optional
key field used as a sorting criteria may also be employed in the
message. The data field, which may be of variable length, may
initiate with a destina-tion selection word, which by its data
content designates a specific individual pr~cessor for receptio
of the message, a process class, or a hashing value related to
distributed data storage. ~ high speed random access memory
serves as a buffer between the relatively high speed network and
the substantial~y lower speed microprocessor, having individual
ports ior coupling to both the network an~ the processor bus.
portion o~ the high speeZ random access me~oLy includes dedicdted
sections which may be directly addressed by the destination
selection word, enabling a specific processor, process class or
hashing value to govern destination selection. ~he high speed
random access memory also contains a directory of numerical




--6--



values representing readiness sta.es for the processor, which the
processor may reference to locally update status pertaining to a
given transaction number. An in~uiry from the network elicits an
immediate and unambiguous res~onse as to the local readiness
sta~e. ~he status responses also have predetermined rankings
within the priority schedule, and merger of ihe ~es~onses of -the
network reveals the lo~esc state of readiness when received by
the requesting processor~
In a sort/merge operation, subtasks iden-tified by
transaction numbers are distributed as appropriate throughout
various individual processors, where the input messages are
stored in a circu]ar buffer section in the high speed random
access memory. Se~uences of output m~ssages available for trans-
mission to the network are assembled by each local processor, -the
high speed random access memory il~cluding a circular output
vector section that enables chaining of ~he output messages
together in an ordered sequence. Thus each local processor
provides a sorted sequence of output message packets peltaining
to each transaction number. Single queries suffice to determine
the global state of readiness as to the particular task. When
all processors are ready, they concurrently and repeatedly at-tempt
to launch their highest priority messages onto -the network, which
prioritizes them by transferring winning messages in sequence
until that transaction number is exhausted, after which the next
transaction number may be selected. A transac-tion number of
zero establishes the non-merge mode without any further communi~
cation being needed.
The system ~rovides an integra-ted, essentially hardware
~ased, mechanism for logical control of globally distributed
resources. The dedicated buffer system and interrelated message




7_

(
5,~;3

organization per~orm global semaphore, control and pacing functions
within the sys~em. The rar.do~ ac~ess memory is not only used in
particular ways in receiving and supplying outgoing and incoming
messages, but also in assuring that overrun conditions do not
occur.
Further in accordance with the invention, each yroup of
multiple parallel data lines interconnecting the node elements in
the network is ass~ciated with a control line, the signal patterns

on which vary in accordance with the field groupings, to demarcate
transitions between fields and also to characterize the different

fields for different types of messages. A parity line may also be
used, not only to identify parity errors occurring during -transfer,
at the node elements or at the processors, but also in conjunction

with means for forcing a parity error to denote the failure or
recovery of a processor in -the system. This s a reauily achieved

but immediately effective means for generating an in.errupt indi-
cation to denote a change in the global resource. Further, the
network also includes unidirectional couplings between node

e~.ements in the successive tiers, transferring indication of loss
of priority from the node at which a determination is made through

the successive lower tiers back to the originating processor,
which thereupon can terminate transmission for that message
packetO


The bidirectional network and a unique clocking arrange-
ment assure simultaneous broadcasting of a message packet to all

processors. A high speed clock fed in at the apex o~ the network
is regenerated at each of the active circuit nodes with zero timc
skew along the network. The ne~work may be physically distri~uted

with active circuit e]ements intercoupled by short path length
conductors or by transmission lines in ~ compact but expandable
configuration.



According to one broad aspect of the invention there is provi.ded
a computing system having a plurality of processors and being capable of
distributed asynchronous processing of multiple tasks and coordi.nated usage
of task results, comprising:
a plurality of processors operating asynchronously but
generating synchronous competi.ng message transmissions, the messages
including reference values of varying data content;
network means coupled to the processors and responsive

to the data content of the competing messages for transferring a priority
message to all processors;
a plurality of storage means, each associated with a different
processor, for storing data relating to the reference values;
a plurality of controller means coupled to the different storage
means and responsive to 'che reference values and data relating thereto for
controlling processor intercommtmication via the network means.
According to another broad aspect of the invention there is
provided a system for intercommunication between a plurality of processor
modules comprising:
means at each processor module providing serial messages

comprising initial command bytes varying in data content to characterize
the message as comprising data, status, control or response messages, and
further to characterize messages within the different types, the commc~nd bytes
being followed by subsequent message bytes;
means at each processor for transmitting messages in synchronism
following the termination of prior messages; and
means intercoupling each processor to a].l other processors and
including means for merging the serial messages in accordance with the data
content therein during transmission to all the processors, wherein the data
contents of the command bytes followed by the remaining message information
successively determine message priority, and data~ status, control and




- 8a -


response messages are transferred between processor modules in accordance
wit11 a prlority protocol without prefatory communications bet~een processors.
According to another broad aspect of the invention there is
provided a multiprocessor system for performing multiple data processing
tasks in different individual processors within the system without the need
~or substantial software to determine routing, availability and priority
status, comprising:
means providing message packets including priority determining
commands and data;

spatially distributed network means coupled to receive the
message packets and each including a plurality of biddirectional active
circuit means merging at a return circuit means, each of the active circuit
means being responsive to the message packets to continue transfer of com-
peting message packets having priority whereby multiple messages con-
currently on the network are prioriti~ed in the time and space domains and
the single priority message packet is distributed concurrently on the
network; and
a plurality of processor modules, each comprising a processor
operating at a different rate than the network means and interface means

including a random access memory means, the interface means being coupled
to both the network means and the processor means and including means
responsive to the message packets for responding only to those message
packets appropriate for the individual processor, the interface means
i:ncluding means for storing receive and send messages to and from the
processor, and dedicated memory sections for retaining task identifying and
destination selection information separately accessible to the network means,
whereby tasks may be automatically accepted for the appropriate processors
without requiring interprocessor communications or central determinations
o~ priority and routing.
According to another broad aspect of the inventi.on there is




- 8b -


provided a system for controlling routing of messages between processors
in a multiprocessor system comprising:
means at each processor for providing a serial message train

including destination selection data comprising alternatively specific
processor identification, identifi.cation of a process class~ and hashing
values;
lookup table means at each processor including at least three
separate sections defining different destination selection criteria
comprising a specific processor identification section, a processor class

section and a hash value section;
means coupled to each of the processors for delivering to a].l
processors, from a number of messages concurrently transmitted by the
processors, a single priority message; and
means at each processor responsive ~o the destination selection
data in the single priority message for addressing the lookup table means
directly with the destination selection data to determine whether the
particular processor is being addressed, such that the processor may be
selected in accordance with one of a variety of multiprocessor modes,
including a general broadcasting mode in the absence of a destination selection

data entry.
According to another broad aspect of the invention there is
provided, in a system having a number o processors communicating via an
interconnect network, the combinati.on comprising:
a number o~ buffer means, each associated with a different
one of the processors and coupled to the network and the associated processor,
each buffer means including a dedicated memory reference section having at
least two different selection maps represen~ing different modes of selection;
the processors generating messages having selec~ion data for
referencing ~he selection maps; and


a number of interface means, each associated with a different



- 8c -

~r3~

processor and coupled to the network, and each responsive to the
selection data for referencing the selection maps in the buffer means to
determine whether the selection data specify that the message is intended
for the associa.ted processorJ and coupled to control reception o~ messages
intended for the associated processor.
According to another broad aspect of the invention there is
provided a bidirectional branching node circuit for a message switching
system in which competing concurrent messages are received comprising:
a bidirectional upstream port circuit and two bidirectional
downstream port circuits each including parallel data lines and collision
lines;
logic means coupled to the downstream port circuits for
transferring a single one of two competing upstream messages received at
the downstream ports to the upstream port by granting priority in
accordance with a predetermined priority rule determined by message content;
means coupling the upstream and downstream port circuits and
responsive to messages received at the upstream port for providing downstream
transmission of such messages to both downstream ports; and
control means responsive to the grant of priority between
upstream messages for generating signa.ls on the losing collision line to
indicate that priority has been granted a message other than the message
received at the port, and for transferring a signal received at the upstream
port on the collisi.on line thereat.
According to another broad aspect of the inventi.on there is
provided a system for transferring data from multiple sources along a chain
of nodes with zero time skew between the data flow at the nodes comprising:
a plurality o:E individual active circuit nodes in a sequential
array culminating at an apex node, the coupling between success;.ve nodes
lntroducing no more than a given maximum time delay;
a master clock source coupled to the apex node;




- 8d -

852~
a plurality of individual clock genera-ting circuits,
each disposed at a differen-t node and generating a controllably
adjustable clock signal at the nominal frequency o:E the master
clock source;
means for returning clock signals from the next succes-
sive node to an originating node;
means for comparing the clock from an individual clock
generating circuit to the retardation of the returning clock
signal.s to generate an error signal; and
means for controlling the individual clock generating
circuit with the error signal, whereby all clock generating
circuits maintain a uniEorm time relation to the master clock.
According to another broad aspect of the invention there
is provided the method of placing messages from a plurality of
processors in a sorted order, the messages being generated
during the comple-tion of asynchronous processing tasks, the
method comprising the steps of:
assembling, at the individual processors, sorted
sequences of messages in suborders pertaining to transaction
numbers in the messages;
synchronously transmitting the highest priority
messages from each


~3


of the processors pertaining to a given transaction number;
dynamically prioritizing the competing messages to reject all but
that message having highest priority;
synchronously repeating the transmission of successive highest
priority messages until all the messages in the sorted suborder pertaining
to a given transaction number have been completely transmitted; and
successively repeating the sequence for message per~aining to
transaction n~nbers in descending order of priority.
According to another broad aspect of the invention there is
provided in a multiprocessor system in which processors transfer message
packets by broadcasting to all other processors, means for generating
messages organized as a series of bytes defining a plurality of fields
comprising:
a command field comprising one of a selected sequence of message
characterizing values of fixed byte length;
a tag field following the command field and comprising a fixed
length transaction number or processor identification number, both of
varying data content and fixed byte length; and
control bits in parallel with the bytes, the control bit
sequences defining the iields.




- 8f -

~ 7~3


~rief Description of the Drawings
~ better understallding of the invention may be had by
reference to the following description, taken in conjunction with
the accompanying drawings, in which:
Fig. 1 is a block diagram of a system in accordance
with the invention including a novel bidirectional networlc;
Fig. 2 is a set of sequential diagrams comprising
Fig. 2 and Figs. 2A to 2J showing the transmission of data and
control signals in a simplified example of the network of Fig. l;
Fig. 3 is a graphical representation of the organi-
zation of a message packet used in the system of Fig. l;
Fig. 4 is a block diagram showing further details of
the novel bidirectional network of Fig. 1 as to the active logic
nodes and clock circuits employed therein;
Fig. 5 is a state diagram depicting various conditions
of operation in the active logic nodes;
Fig. 6 is a timing diagram useful in explaining end
of message detection in the active logic nodes;
Fig. 7 is a diagram of timing waveforms useful in
explaining the operation of the clock circuits of Fig. ~;
Fig. 8 is a block diagram of a processor module,
including a high speed random access memory, that may be employed
in the system of Fig. l;
Fig. 9 is a diagram of address allocation in the main
R~M of a microprocessor system as shown in Fig. 8;
Fig. 10 is a block diagram of the arrangement of data
within one reference portion of the high speed random access
memory of Fig. 8;




_g_

' r
~a~985z3

Fig. 11 is a chart showin~ the message priority protocol
used in the system;
Fig. 12 is a graphical representation of transaction
number word formats;
FigO 13 is a block diagram of interface circuits
employed with each processor module in the system of Figs. 1 and
8 and comprises two sheets (Figs. 13 and 13A) that are to be
placed together with Fig. 13~ on the right;
Fig. 1~ is a timing diagram showing various clock and
phase waveforms used in the interface circuits of Fig. 13;
Fig. 15 is a block diagram showing further details of
organization of memory and a system for mapping in accordance
with destination selection words;
Fig. 16 is a simplified flow diagram depicting status
changes on reception of an input data message;
Fig. 17 is a flow diagram depicting status changes in
reeeiving a message and comprising two sheets and designated
Fig. 17 and Fig. 17A, to be abutted with Fig. 17 on the top;
Fig. 18 is a matri~ diagram showing the relationship
between primary messages and generated responses and between
primary messages and responsive actions;
Fig. 19 is a flow diagram depicting status chan~es in
sending a message and comprising two sheets and designated
Fig. 19 and Fig. 19~, to be abutted with Fig. 19 on the top;
Fig. 20 is a block diagram of a stand alone system in
aeeordance with the invention;
Fig. ~1, comprising Figs. 21~ and 21B, is a diagram
of messages stored i~l the high speed random access memory; and




--10--

~-- f
5~ ~
r~w

Fig. 22 is a simplified diagram of one way in which parts
of a data base may be distributed among different processes in a
data base system.
Detailed Description of the Invention
DATA BASE l~ NAGEMENT SYSTEM
The system depicted generally in Fig. 1 is illustrative
of usage of the concepts of the invention in a data base manasc-
ment application. Specifically, the system is configured for
cooperation with one or more host computer systems 10, 12, such
,~ ~
as one of the IBM 370 family or ~C PDP-ll family, which advan-
tageously, for purposes of this example, operate with existing
and conventional operating system and applications software. In
the IBM terminology the principal intercommunication networks
between the host and the data base computer are referred -to as
channels while in the DEC terminology the equivalent is referred
to as a "U~IBUS" or "~SSBUS" or some other variant. Whether one
of these systems or the main frame computer of another manu-
facturer is used, the channel or bus is an ohmic or logically
passive pathway on which data base tasks and subtasks are pre-

sented.
The example of Fig. 1 shows a backend processor complexin association with host systems 10, 12. The system accepts
tasks and subtas~s from the host system, references the appropri~
ate part or parts of an extensive data base storage, and returns
appropriate processed or responsive messages in such fashion that
no more than menial software management is required of the host
systems, irrespective of the configuration of the backend proces-
sor complex. Consequently, the user's data base can be structured
in a new multiprocessor system in which the data are organized in
relational data base files of broadly expandable capacity; ex-

- /I~a J e ~ rK
--11--

- r


pansion can occur without any need to change the operating system
or existing applications software resident in the user's host
sys-tem. An example of a stand alone system is described below in
conjunction with Fig. 20.
It will be recognized by those skilled in the art that
relational data base management involves an overall func-tion
which is divisible into processing tasks that are separable, at
least temporarily, because the data entries in storage are not
interdependently linked by address pointers. It will also be
recognized that many other data processing situations exist in
which dynamic subdivision and independent processing of limited
or iterative tasks can be used. Thus while this example of the
invention is described in terms of the demanding and widely
encountered data base management processing problem, the novel
methods and implementations disclosed herein are of broad appli-
cability elsewhere as well.
A large data management system involves both potential
advantages and the inherent difficulties when multiple processors
are to be used. Vast numbers of entries, ranging into the
hundreds of millions, must be held conveniently and rapidly ac-
cessible in storage. With the relational da-ta base format, a
wide range of data entry and information retrieval functions can
be carried out concurrently.
In the great majority of data base systems, however, it
is as important to mai~tain the integrity of the data base as it
is to process transaction data rapidly. Integrity of data must
be preserved across hardware failures, power outages, and other
operational mishaps. Further, the data base system must be
capable of restoring the data base to a known state to recover
from user errors which can include buys in applications software




-12-

~f~
code. However, the data cannot be lost or entered erroneously,
and all parts of the data base that relate to a specific entry
must be changed responsively, whethe~ the event involves new
data, corrections for pas-t errors or revision or a portion of a
data base.
Integrity therefore implies that a degree of redundancy
is required in the data base system, along with data roll back
and recovery operations, error detection and correction, and de-
tection o and compensation for changes in status of individual
parts of the system. The system may have to be used in a number
of different specific modes to accomplish these objectives.
It is further required of modern systems to be able to
accept discretionary queries that can be complex in form and to
respond if necessary in an interactive fashion. Those who seek
access to the system should not, despite the complexity of -the
query, be required to be experts in the sys-tem. Examples of dis-
cretionary queries that may be generated pertaining to a large
production operation include the following:
A. A production manager might ask not merely for an
item in inventory but for an inventory aging that identifies all
parts inventories which are in excess of the monthly production
rate for a part ~hose production rate is at least 10% less than
in the comparable month for the prior year.
B. A marketing manager might ask, not rnerely if a
2S particular account is 90 days overdue, but ror all 90 day re-
ceivables as to customers from a particularly depressed geographic
area who have exceeded 120 days in the past.
C. A personnel director might ask not merely for a
listing of all employees having in excess of two weeks sick leave
for a given year but for a listing o~ all ernployees with more than




~13-

.~
5;~:3

ten years longevity who were on sick leave for more than one week
during frogging season in more than two of the prior five years.
In all of these examples, the user seeks to gain an
insight into a real problem confronting him professionally by
correlating, in una~ticipated ways, information stored in the
computer. The user's experience with his own problem areas and
thus his intuition and imagination permits the non-computer-
trained professional to make facile use of a da~a base system
that is capable of handling complex queries.
Modern rnultiprocessor systems seek to satisfy these
many and often conflicting requirements through the use of
elaborate overhead and maintenance software systems, which
inherently militate against easy expansion of the system. Ex-
pandability, however, is a highly desirable concept, because a
lS growing business or operation inherently wishes to enlarg-~ and
retain its existing data base management system and not to be
forced into the adoption of a new system and software.
The multiprocessor array - ~n Fig. 1, a typical system
in accordance with the invention includes multiple microprocessors
of two principal types, herein designated the interface processor
(IFP) and the access module processor (AMP~. Two IFPs 14, 16,
are depicted, each coupled to the I/O system of a different host
computer 10 or 12. A number of access module processors 18-23
inclusive are also incorporated in what may be termed a multi-

processor array. The term "array" is used in the conventionalsense of referring to a set, collection or number of processor
units disposed in a generally ordered linear or matrix fashion,
and does not connote what has come to be referred to as an array
processor. Although only eight microprocessors have beell de-



.~3


picted as a simplified example o~ the system concept, many moreIFPs and AMPs can and typically will be used.
The IFPs 14, 16 and ~MPs 18-23 incorporate Intel 8086
16 bit microprocessors having an internal bus and a main memory
with direct memory access for perip}leral device controllers. Any
of a wide variety of microprocessors and microprocessor system
products o~ dif~erent manufacturers may be utili~ed. The "micro-
processor" is merely a specific example of one type of computer
or processor that may be used in the array, because the sys~em
concept can be used to advantage with minicomputers or large
computer systems where the application demands such computing
power. ~he 16 bit microprocessor is an advantageous example of a
low cost unit having substantial data processing power and a
standard replaceable con~igura-tion with a wide range of available
hardware and sotware options.
The IFPs and AMPs utilize similar actlve logic, control
logic and interface circuitry; microprocessors; memories; and
internal busses, as described below in conjunction with Figs. 1
and 8 respectively. These two processor types differ, however,
in the nature of and control logic for their associated peri-

pheral devices. Those skilled in the art will readily appreciate
that other processor types with different peripheral controllers
and functional assignments can be readily incorporated into this
invention.
Each microprocessor has associated therewith a high
speed random access memory 26 (described in conjunction with Fig.
8) which not only provides buffering of input and output rnessages
but also coacts uniquely with other parts of the system to provide
message management. Brie~ly, the high speed random access memo-
ries ~6 function as a circular buffer for variable length input




-15~

`

S~;3

(called "receive") messages, provide sequential message output
(called "send") storage, incorporate a table lookup portion for
use in hash mapping and other modes, and store control infor~
mation for orderly and seq~lenced handling of receive and send
messages. The memories 26 are further used to fulfill unique
roles in multiprocessor mode selection and in handling data,
status, control and response message traffic. As described in
detail hereafter, they are also arranged such that, based upon
transaction identities in the messages, local and global s~atus
determinations and control functions are processed and communicate~
in highly efficient fashionO Control logic 28 (described below
in conjunction with Fig. 13) at each IFP 14, 16 and AMP 18-23 is
used in data transfer within the module and in the performance of
overhead functions.
The IFPs 1~, 16 each include an interface control 30
coupling the IFP to the channel or bus o~ the associated host
computer 10 or 12. In the AMPs 18-23, howeverr the comparable
unit is a disk controller 32 which may be of conventional con-
figuration and which is employed to interface the ~1PS 18-23
respectively with individual]y associated magnetic disk drives
38-43 respectively.
The magnetic disk drives 38-43 provide the secondary or
mass storage for the data base management systemO In the present
example, they comprise proven commercial products, such as Win~
~S chester technology, to provide high capacity and high reliability
storage with extrernely low cost per byte.
The relational data base is stored on these disk drives
38-43 in scatter storage fashion as shown in simplified form in
Eig. 22. Each processor and associated disk drive is assigned a

disjoint primary subset of the records comprising the complete



-16-

s~

data base, so that each of n storages has nth of the datcl base.
Further, each processor also is assigned disjoint backup data
subsets making up nth of the data base. As seen in Fig. 22, each
primary file is duplicated by a backup file at a different
processor, giving two complete data bases distributed in diEferent
ways. This redundant arrangement of the primary and backup data
subsets protects the integrity of the data base, because no large
blocks of data or groups of relations can be substantially affec-ted
by a single failure~
Dis;tribution of the data base is interrelated, as also
shown in Fig. 22, with hashing of the various files, and incorpo-
ration of hash mapping data in the messages. The files at each
processor are designated by simplified hash buckets shown as groups
of binary series. The relations and tuples in a relational data
base system thus can be located by the tables of relationships
defined by the buckets. ~ashing algorlt~ns are used to derive
the bucket assignments from keys in the relational data base
system, so that expansion and modification of the data base system
are readily feasible.
Selection of storage capacity is dependent upon data
base management needs, transaction volume, and the processing
power of the associated microprocessors. While a number of disk
drives may be coupled to a single ~MP, or a single disk file
coupled to more than one P~IP, such modifications will usually be
limited ~o speclal applications. Extension of the data base
typically is achieved by expanding the number of processors (and
associated disk drives) in the multiprocessor array.
~ ctive logic network - The objectives of providing
orderly message packet flow and facilitating task perforrnance
are met by the use of a uni~ue system architecture and message




-17-

f~
5~;~3

organization, centered upon a novel active logic network struc-
ture 50. This structure comprises, relative to the outputs o~
the microprocessors, a converging, ascending hierarchy of bi-
directional active logic nodes 54. The nodes 54 comprise three
S poxt bidirectional circuits which ma~y be described as forming a
tree network, with couplings to the microprocessors 14, 16 and
18-23 being made at the base o~ the tree.
It will be recognized by those skilled in the art that
nodes may be constructed where the nurnber of logical sources is
greater than 2, say 4 or ~, where the greater number o~ source
inputs may also be resolved in the same time by the addition of
more combinatorial logic.
For convenience of reference, all nodes (N) in the
first tier are designated by the prefix I, in the second tier by
the prefix II, and so forth. Individual nodes within a tier are
designated by the subscripts 1 2 -~ so that, for eYample, the
fourth node in the ~irst tier may be referred to as IN4. At the
up-tree (or ups-tream) side, there is a single port, called the C
port, which is coupled to one of the two down-tree ports, called
the A and B ports, of a node in the next higher tier. The tiers
converge to an uppermost or apex node 54a representing a con-

vergence and recirculating means which directs upstream (up-tree)
messages back in the downstream (down-tree) direction. Two tree
networks 50a, 50b are utilized, the nodes and interconnections of
the two networks being disposed in parallel to provide the re-
dundancy desired for a large scale system. Inasmuch as~the nodes
54 and the networks are identical, only one need be described.
To aid in visualization, it should be understood first
that multiple message packets in the ~orm of serial signal trains

are or can be launched concurrently into the active logic network



-18-





50 on the couplings for many of the microprocessors. The active
logic nodes 54 each function in binary fashion in determining
priority between two colliding packets, using the data contents
o~ the message packets themselvesO Further, all nodes 54 in a
network are under the command of a clock source 56 arranged with
the nodes 54 in such fashion as to synchronously advance the
message packet trains toward the apex node 54a. In this manner
each succeeding byte or other incremental segment of a serial
train progresses to the nex-t tier at the same time as the cor-

responding by~,tes of other messages also advance along other pathsin the network 50.
A prioritized sort of competing signal trains takes
place for message packets moving up-tree, ultimately to select a
single message train to be redirected from the apex node 54a
downstreamO Because of the system organization, the decision as
to ultimate priority need not occur at any par-ticular point in
the message packet, so that message transmission can be carried
forward without requiring any more than the binary decisions
between two colliding packets that are being made at the indl~
~0 vidual nodes 54. As a resul~ the system provides message se-

lection and data transfer in the space and time domains but does
not delay message transmissions for purposes of gaining control
oE the bus, identifying sending or receiving processors, or
performing handshaking operations between processors.
Further, it is important to recognize that when several
processors send identical packets at the same time, if successful,
it will appear that all such sending processors were successful.

This property is extremely useful in exercising efficient control
of a large multiprocessor complex, because of the savings in
time and overhead.

-19-



The nodes 54 also operate in bidirectional fashion to
enable unimpeded downstream distribution of the message packe-ts.
At a given node 54 downstream messages received at the port C
on the up-tree side are distributed to both ports A and B on the
down-tree side and then transmitted Cll to both associated nodes
at the next lower tier. Under the control of the common clock
circuit 56, the message packet advances synchronously down-tree
to be broadcast to all microprocessors simultaneously, enab~ing
one or many of the processors to carry out the desired processing
task or to accept a response.
The network 50 has a high data transfer rate in com-
parison to the data transfer rates of the microprocessors, .ypi-
cally being a multiple greater than two. In this particular ex-
ample the network 50 has a byte clock interval o~ 120 nanoseconds
lS and the data transfer rate is five times that of the microproces-
sor. Each node 54 is coupled, at each of its three ports, to the
associated node port in the next tier, or to the microprocessor,
by a set of data lines (here 10 in number) and by control lines
(here 2 in number) and devoted to clock and collision signals
respectively. The data and clock lines run in pairs, with sepa-

rate lines for the uptree and downtree directions. The collision
line propagates down tree only. The connections form a full
duplex data path, with no delay needed to "turn around" the drive
sense of any line.
Referring now to Eig. 3, the 10 data lines comprise an
8 bit byte, designated as bits 0-7 inclusive, occupying 8 of the
10 data lines. Another line, deslgnated C, is a control line,
carrying a control se~uence that is used to characterize dif-
ferent parts of the message packet in particular ways. The 10th
bit is used for odd parity in the present example. Practitioners




-20-

~9~5~3

skilled in the art will recognize that the system can readily be
operated with more or fewer bits in the data path.
The byte sequences are arranged in successive fields,
basically divided into command, key, destination selection, and
data fields. As is discussed furtller below, a message may
utilize only a single field, and concludes with a detectable
End of Message code. An intervening idle field between messages
is designated by an unbroken sequence of l's on the C line, as
well as on lines 0-7, and is transmitted whenever no message
packet is available. The parity line is also employed in a
unique fashion to communicate a change of status of an individual
processor.
The idle state is an interrnediate state and is not a
part of the message packet, which typically begins with a 2 byte
command word tha-t includes a tag in the form of a transaction
number (TN) ~or data messages or an originating processor ID ~OPID)
for response messages. The transaction number has many levels of
significance in the system and serves as the basis for a number
of functional communications and controls. The packet may
thereafter contain any or all of a variable length key field and
a fixed length destination selection word (DS~) as the first part
of a variable length data field. The key field serves the purpose
of providing sorting criteria, where messages are otherwise sub-
stantially identical. The DSW provides the basis for a number of
special functions and also merits particular attent~on, along
with the TN.
1`he system operates with the interfaces in word synchro-
nism, so that the first bytes of the command words are provided
to the network 50 concurrently by all processors ~hich have a
packet to transmit. The data contents of the successive fields




-21-

.



are used by the network in sorting on a binary basis at each
node, with the lowest numerical value ~eing yiven priority.
T~king bit C as the largest quantity and bit 0 as the smallest ln
the successive data bits, the sorting priority order is:
1. first arrival at the network 50;
2. lowest command code (word);
3. lowest key field;
4. shortest key field;
5. lowest data field (including the destination
; selection word);
6. shortest data field.
For purposes of this general overview i-t should be
noted primarily that when a priority decision has been made at a
node 54, à collision indication (referred to as A ol or B ol) is
returned along the path from which the losing transmission was
received. This indication enables the transmitting microprocessor
to recognize that the network 50 is busy with a higher priority
transmission so that the transmission is -terminated and must be
retried again at a later time.
A simplified example is shown in the various representa-
tions of Fig. 2 of the manner in which the network 50 operates
with the high speed random access memories in a tree using four
different microprocessors, specifically an IFP 14 and three AMPs
18, 19 and 20. Ten subfiyures 2~, 2B,...2J each correspond to
one of ten successive time samples, from t = 0 -to t = 9, to show
the distribution of different simplified (four character) serial
messages from each of the microprocessors within the network at
each of these points in time, and the communications between
ports and microprocessors at the different times. The diagram

labeled simply Fig~ 2 shows the state of the system prior to the



-22- -

r



beginning of signal transmission. In these separate views, the
null or idle state requires a transmission designated ~. With
the convention of lowest data content having priority, the mes-
sage packet "EDDV" from AMP 19 in Fig. 2A should be the first to
be transmitted through the system. These mess~ges are retained,
as described in greater detail below, in -the high speed random
access memories (sometimes HoS~RP~I) 26 in the microprocessors.
The H.S~ RAMs 26 have input and output subdivisions that are
depicted generally in Fig. 2, with the pac~ets being arranged in
FIFO vertical ordex in the output por-tion at t - 0, thus being
available for transmission, as indicated by the cursor arrow in
H.S. R~M 26. At this point in time all transmissions in the
network 50 indicate the null or idle state ~.
~t t = 1, however, as designated in Fig. 2B, the first
L5 byte of each of the message packets is launched into -the network
50 concurrently, with all nodes 54 still returning the idle
state indications and all transmissions above the first tier
also being in the idle state. In the first cloc~ interval, the
initial bytes of the messages are set into the lowest tier nodes,

IN and IN , so that at t = 2 (Fig. 2C~ contentions have been
1 2
resolved and both upstream and downstream transmissions continue.
Node INl has received an "E" o~ both input ports and is trans-
mitting this upstream to the next tier, indicating the undecided
state downstream to both sending processors. At the same tier,
however, node IN2 has determined collision priority betwecn the
"E" from proccssor 19 and the "P" from microprocessor 20, in
favor of the former, thus coupling port A to up-~ree port C and
providing the BCol signal back to microprocessor 20 As the B ol
signal is returned toward the microprocessor 20, the IN2 node in
effect loc~s the A input port to the C output port, so that the




-23-

r.
`3

serial train from microprocessor l9 is transrnitted on to the apex
node TINl.
At the IN1 node the first two characters are bo-th
and no decision can be made at this node at time t = 2, as
shown in Fig. 2C. At t = 3 (Fig. 2D), moreover, the common
initial character "E" from the three microprocessors 14, 18, and
l9 reaches the IIN1 apex node and is redirected toward the down-
stream direction, as the second character "D", also common to all
messages, is transmitted toward ape~ node IIN1. Node INl cannot
yet make a decision at this time, but the third characters, "F",
"E" and "D" rom the successive microprocessors 14, 18, and l9
respectively are in the course of transmission to that node.
Reception of the s l signal at the microprocessor 20 designates
that its contention for priority has been lost, and it then and
thereafter trans~its only the idle indication ~. The cursor
arrows in the output buffers show that the microprocessor 20 has
been returned to its initial state but that the other micropro-
cessors continue to send successive characters. Thus at t = 4
(Fig. 2E) ~he significant events are the decision for -the port at
node INl and the return transmission toward the first node tier
of the initial character ("E"~ on all lines. The next collision is
indicated at t = 5 (Fig. 2F), with the B port of node IINl winning
contention and ACol being generated.
Broadcasting of the serial signal train in the down-
stream direction continues through successive clock times, and
at time t = 6 (Fig. 2G) the initial message character is set into
the input portions of all H.S. RP~Is 26. Concurrently it should
be noted that the earlier priority determination at node IN1 is
now overridden by the ACol indication frorn the higher tier node
IINl when the third character ("E") from the microprocessor 18




-24-

~9~5,~3

loses in contention with the third character ("D") from the micro-
processor 19. ~s the cursor arrows show in Fig. 2E~, microproces~
sors 14, 18 and 20 have returne~ to their initial states and
winning microprocessor 19 previously completed i-ts full trans~
mission at time t = 4. All input buffers are successively loaded
wlth the priority message "ED~V" as seen in Figs. 2H, 2I and 2J.
At t = 8 (Fig. 2I), the message has run out of the first tier and
the ape~ node II~l has been reset at t = 7, because only idle
signals are in contention, as the last downstream character is

transferred to the microprocessors. At t = 9 (Fig. 2J) the nodes
INl and IN2 in the first tier are reset, and all the losing
microprocessors 14, 18 and 20 now contend again for priority on
the network by emitting the irst message character when the
network is again indicating idla. In practice, as described

hereafter, acknowledgment signals are transmitted to thc winning
microprocessor(s), but this is not required for the most general
case of the invention.
The message, once broadcas-t to all microprocessors in
this manner, may be used by any or all as required. This de- -
pends upon the mode of operation and the functions being per-
formed, which include many variations.
GLOBAL INTERCO~UNIC~TION AND CONTROL
The foregoing example of the manner in which the net-
work prioritizes a given message out of a group of contending
messages pertains to the transfer of primary data messages. A
complex multiprocessor system must however use many other types
of communications and commands to have the efficiency and versa-
tility now required. The principal functions to be supplied
encompass, in addition to primary da~a transfer, wha-t may broadly

be termed multiprocessor modes, message acknowledgments, status



-25-



indications and control signals. The following section provides
a general overview from the global, or multiprocessor system,
standpoint as to how different modes and messages coact with the
prioxitized sorting and commur~ication network. Reference should
be made to Figs. 8 and 13 and the accompanying descriptions here-
after for a more detailed understanding.
In the general distribution or broadcasting mode,
messages are simultaneously delivered to all processors without
specific delineation of one or more recipients. This mode is
lU typically used for responses, status queries, commands, and
control functions.
Where there is to be a delineation as to the recipient,
the destination selection information within the message packet
itself provides criteria for local acceptance or rejection of the
lS packet. For example, interface logic in the receiving processor
modules identifies whether the data is in range for their par-
ticular processor according to map information stored in the high
speed RAM 26. A variety of selection criteria can be readily
implemented by means of various settings of the map bits in the
high speed RAM, including selection of a specific recipient
processor, portion of the data base stored ("hashed"), logical
process type ("class"), etc. The use of broadcasting with local
access control is of particular benefit for a data base manage-
ment system, inasmuch as minimal overhead software is needed to
gain access to any part of the widely dispersed relational data
base or to dispersed local copies of any of a number of globally
known logical processes. The system is thus capable of speci-
fically selecting a single destination or a class of resources as
the destination for a message.




-26-

! (
5,~"3

Also, high level data base inquiries often require
cross-referencing between different portions of the data base,
and consistent reference to a given task. The TN incorporated
in the messages provides this global transaction identity and
reference, among other features. ~any tasks may be worked on
concurrently by the asynchronous local processor modules, and
each task or subtask has its appropriate TN. Using various
combinations of the TN, DSW and commands, virtually infinite
flexibility is achieved. An e~tended sort/merge operation can be
undertaken on,a large number of tasks tha-t are asynchronously
assigned and processed. TNs can be assigned and relinquished and
merges can be started and stopped. Certain messages, such as
continuations, can have priority over other transmissions. ~sing
the TNs and local processor updating of status as to the TNs,
one ~uery can determine the status of the global resource as to a
given TN. A distributed update can also be accomplished in one
communication. The present system enables all these functions to
be performed without extending the software or materially in-
creasing the overhead load~
As a consequence of the invention, multiprocessor
systems with much larger numbers of processors than feasible in
the prior art can be operated with high effectiveness against
problem tasks. Because of the present low cos~ of microprocessors,
systems of high perforrnance in a problem domain, and not just in
"raw" power, can be realized at low cost.
A coherent priority protocol that encompasses all
message types, and various subtypes, embraces all the different
messages that are applied to the networ~. Although responses,
status and control messages are of a different form than the
primary data messages they also use the contention/mcrge operation




-27-


of the network and thus are prioritized during transfer. Res-
ponse messages in the present system are positive acknowledgment
(~CK), negative acknowledgment (NAK), or an indication that the
processor does not have the resources to process the message
meaningfully ("not applicable processor" - NAP). The NAK
response may be any of several different types, indicating a
locked, error or overrun condition. secause an originating
processor or processors require such responses after termination
of a message -transmission, the responses have a higher priority
level than prjimary data messages.
The present sys-tem also employs SACK (s-tatus acknowl-
edgment) messages that denote the readiness state of a local
processor with respect to a particular task or transaction. Such
SACK responses are locally updated and held accessible to the
network. They provide, in conjunction with the merge operation
of the network, a single query global status report for a given
task or transaction. Because the status responses are in accord
with the priority protocol, the lowest data content respor.se
automatically gains priority and establishes the least ready
~0 status as the global system state for a transaction number in one
uninterruptible operation. The SACK indications also are used
in conjunction with cer-tain primary messages to implemen~ rious
protocols, such as system initialization and lockout operations.
The priority protocol definition for the various mes-

sage types begins with the command code, using the initial 6 bitsof the command word that starts each message and response, as
shown in Fig. 11. An ade~uate range of distinctions as to message
types, and subtypes t is available although more could be used.
~lere the SACK response differentiates seven different status levels
(and provides a basis for prioritizing as well), as can be seen




-28-



by reference to Fig. 11. For responses these first 6 bits are
followed by the ~ag in the form of a 10 bit OPID (see Fig. 3).
Both the TN and OPID can serve as further sorting criteria be-
cause of their differing data contents, within the tag.
After each primary message has been transmitted through
the network, the interface section of every processor generates a
response message, even if it is only a NAP. The responses also
contend on the network, and the single or common winning response
message is broadcast to all processors. Losing message packets
are retried later, synchronously, after a minimal delay so that
the network is substantially constantly in use. Where a number
of processors provide an ACK response, the response are sorted
by the OPID.
As a consequence of the invention, tasks can be started,
stopped, controlled, and interrogated in synchronism in a very
large number of physical processors with a minimum of overhead.
This permits the raw power of a large number of processors to be
effectively applied to problem-state processing with a minimal
diversion of that power to coordination and control. The over-

head of coordination and control is a fundamental limitation onthe efficacy of any distributed processing system.
Different types of control communications are employed
where the purpose is globally (i.e. network) oriented. Thus,
Stop Merge, Status Request, Start Merge, and certain task assign-

~5 ment and relinquishment messages have the same format as datamessages and are also referred to herein as primary messages.
These control messages also include the TN and have their places
in the priority protocol, as is later discussed relative to Figs.
10 and 11.




-29-

-~, f' ~
( ;`


The term "global semaphore buffer system" has been
adopted to connote the fact that the high speed random access
memory 2G and control logic 28 shown in Fig. 1 also play a signi-
ficant role in both multiprocessor mo~e selection ancl bidirectional
communication of status and control indications. The global
semaphore buffer system provides duality of access, in that
both the high speed network structure 50 and the slower speed
microprocessors can reference a message, response, control or
status indication in the memory 26 without delay or requiring
direct commun1cation with each other. To this end the control
logic 28 time multiplexes the memory 26 to network 50 and to
the microprocessor in interleaved word cycles, crea~i ng in effect
different ports having common access to the memory ~6. The
global resource or network 50 and microprocessors can use the
transaction number as an address locater to a portion of the
memory 26 devoted to transaction status. At the local level,
the status of a subtask relating to a given transaction covering
all useful states, is updated in the memory 26 under control of
the microprocessor and locked by the control logic 28 at the
buffer system. One of seven different readiness states is used,
the entries conveniently being derived fror ~ different dedicated
portion of the memory 26. Upon a query from the network, the
status of the processors is communicated (the "semaphore" is
read) and prioritized in the network with the least complete
readiness state taking priority. This arrangement provides an
immediate hardware response from all processors to a query. Thus
it can be known without delay or the use of software whether all
o~ the distributed subtasks of a given task have been accomplished.
In the instant system, moreover, any communicatiny processor
module can assign an available transaction number for use wi~h

the messages and in each global semaphore buffer system.
-30-




A good example of this integrated use of transactionidentity and status indication is presented by a complex merge
operation, in which each of a number of processors is called
upon to place in order all messages pertaining to a given cri-

terion. In prior art systems each processor would have to receiveand complete its tasks and then communicate the results to some
"master" processor, which would do the final merge operation.
That master processor thus constitutes a significant bottleneck
to system throughput.
When the global readiness state establishes that all
affected processors are ready, the messages of hiqhest priority
from the memory 26 at each processor are entered on the network
concurrently and prioritlzed during merger as previously described.
Successive retries with groups of messages generates a serial
train of messages of descending priority, and ending with lowest
for that transaction number. Specific command messages provide
the system with the ability to suspend and resume the merge
operation partway through so that the network 50 can be shared by
a number of copending merge operations and thus make most ef-

fective utilization oE the resources of the systems.
At any given time, therefore, all active processorsconnected to the network 50 can be working asynchronously on
messages pextaining to different transaction numbers. When
referenced to the same or "present" transaction number by a
status query, all respond synchronously wi-th one of the available
status levels. For example, The ST~RT MERGE message tests the
global semapilore represented by a particular transaction number,
and if the global state is ready (SEND READY or RECEIVE RE~DY),
the present transaction number (PTN) is set to the TN conveycd in




-31-

5~3

the ST~RT ~ERGE message. (If the global state is not ready, the
PTN reverts to a value of TN0).
A STOP MERGE message also resets the present trans-
action number to 0. TN0 is thus utilized as the "default"
transaction number used for single processor to single processor
(point-to-point) messages. In another sense, it identifies the
"non-merye" mode of operation.
The glohal intercommunication system uses the message
organi~a~ion shown in Figs. 3A, 3B, 3C and 11, and the high speed
random access memory 26 organization shown in Figs. 8 and 10.
More detailed examinations are made below in conjunction with
Figs. 5, 7, 9 and 13.
In Figs. 3A-3C and Fig. 11 it can be seen that command
codes for the responses range from 00 to OF (hexadecimal) and
that those for primary messages range from 10 (hexadecimal) to
some higher valueO Thus, responses take priority over primary
messages, lowest value first, in the sequence shown in Fig. 11.
One dedicated section of storage in the high speed R~M
memory 26'' (Fig. 8), (designated "transaction numbers") is used
for storaye of the word formats (the seven readiness states, an
~ssign TN and an ~nassigned TN state) of FlgO 12. Other dedicated
portions of the memory 26'' include a circular huffer for input
(receive messages) and an output message space. Another separate
section of the memory 26'' is used as a message complete vector
26 section, in which pointers can be placed to completed output
messages so that output message space can be used efficiently.




32-




I-t should be appreciated, -therefore, that while the
queuing and data buffering functions of the memory 26 and control
logic 28 are of importance, the multiple coactions by which global
transactions are dispersed and manipulated in relation to the
individual processors are uniquely significant.
ACTIVE LOGIC NODES
The active logic nodes 54 of Fig. 1 are alike, in both
of the redundant networks, except that the recirculation node
54a at the apex o each networ~ has no upstream port, but merely
a signal recirculation path tha~ returns to the downstream di-
rection. As shown in Fig. 4, each node 54 may be broadly divided
into functional groupings, one of which pertains to message and
collision signal transmissions and the other of which pertains to
generation and retransmission of the common clock signal. Clock
signals are synchronized such that there is zero skew between
them at tlle different nodes. These two functional groupings are
not separate, inasmuch as the zero skew clock circuits form
important parts of the signal transmission system. Bo~h a word
clock (two serial bytes) and a byte clock are utilized. Note
should be taken of -the fact that external control of the active
logic nodes 54 is not required or utilized, whether to establish
or reset the node's state or to set up diferent modes of opera-
tion. Furthermore, the identity between the nodes 54 enables
them to be made in quantity using modern IC techniques, sub-

stantially reducing cost while improving reliability.
The A, B and C "ports" previously referred to each haveten input data lines and ten output data lines. Taking -the A
port as an example, the input lines are designated AI and the
output AO, The single l'collision" line is used at each port
(e.g. ACol for the A port), along with upstream and downs-tream




-33-

~3 9~ 3

clock lines. The data lines from -the ~ and B por-ts are applied
to a multiplexer 60 which switches the priority word of two
competing words, or the common word (if both words are alike) to
an up register 62 coupled to the upstream port (C~ as the C0
data signals. Concurrently, the downstream data received at the
C port from a higher tier node is shifted into and out of a down
register 64, appearing as output a-t both of the A and B ports.
Although one upstream byte serial signal train may be
blocked, no added upstream or downstream delay is i; :od~lced and
words are advanced in unbroken sequence through the up register
62 and down register 64 under control of the word and byte clocks~
Competing bytes concurren-tly applied at the A and B
ports are supplied to first and second parity detectors 66~ 67
and also to a comparator 70 which determines priority on the
basis of the eight data bits and one control bit, with lowes~
data content having priority. The "idle" or no message signal in
this protocol is an unbroken sequence of l's. Parity errors can
occur due to typical causes, such as the presence of excessive
noise or some other factor affecting signal transmission or circuit
operation. In the present system, however, an important addi-
tional use is made of parity error indications. Each transition
of a microprocessor to an inoperati~e state is marked by all
ou-tput lines~ including the parity line, going high (or 1 valued),
thus establishing an odd parity error. This indicatlon is trans-

25 f erred through the network once upon the presence of an error, as
.
a marker which enables the system to identify a change in global

resources and initiate procedures to determine the nature of the
change .
The pair of parity detectors 66, 67 and the comparator

70 feed control circuits 72 that include priority message switching



-34-



circuits 74, responsive to the comparator 70 for locking the
multiplexer 60 in one state or the other if priority is determined,
and for generating and propagating the downstream collision sig-
nals. Transitional parity error propagation circuits 75 are so
called because they force the one-time all l's parity error state
along the network. Reset circuits 7~ for returning the node to
its initial state include an end of message (EO1~1) detector 80.
It will be appreciated that the functions described
above and hereinafter may be accomplished at each active logic
node by the use of a microprocessor chip, bu~ they may even more
readily be implemented in accordance with the state diagram of
Fig. S and the logic equations set out below. In the state
diagram, the state S0 represents the idle state, al)d also the
state in which competing messages are equal, so that no decision
is made to favor one port against anotherO The Sl and S2 states
are the states favoring the A port and B port respectively. Thus
the A port is favored (Sl state is established) if the data
content o BI is greater than AI and there is no parity error on
A~ or if there is a parity error on BI (these conditions being
designated AIPE and BIPE res~ectively and represented by flip-
flop states). The converse logic conditions as to AI and BI
exlst for the system to go into the S2 s~ate. Any indication
from a higher tier node that a collision has occurred at that
tier is reflected back in a downstream signal as COLIN. ~hether
the system is in the S0, Sl or S2 states, i-t goes into the S3
state, transferring the collision signal downstream as A ol and
B 1 In the Sl and S2 states, with the node having made a
decision, the collision signal is sent downstream to the lower
tier nodes in like fashion, with the priority message switching
circuits 74 locked to the A port or B port as the case may beO




-35-



The reset circuits 78 include the EOM detector 80,
used to reset the node from S3 to S0 (Fig. 5). A first reset
mode uses the end of message ~EO~) field that concludes the data
field in a primary message, as shown in Fig. 6. A group of flip
flops and gates are used to establish the logic:
URINC-URC-URCDLY
where UP~C represents the control bit in the up register, URINC
represents the con-trol bit value in the up register input and
URCDLY represents the C value in an up register delay flip flop.
As iseen in Fig. 6, control bit sequence pairs establish
certain fields and transitions between them. For example, a
transition from the all l's used during idle to a 0, 1 bit sequence
defines the start of a field. The same 0, 1 sequence is used to
identify the start of the data field. Successive 1, 0 control
bit strings denote the internal field or subfield, and the end of
message (EOM) is identified by the 0, 0 control bi-t pair. The
condition in which the string of 1, 0 pairs is followed by the
0, 0 pair is unique and readily iden-tified. The VRINC, URC and
URCDLY signals are ANDed together, with each having a 1 byte clock
delay from the other~ The result is a waveform that is high
until the start of the message packet, and at which point it goes
low and stays low through the data. It returns high 2 byte clocks
following the EOM occurrence. This positive-going transition in
the waveform URINC-VRC-URCDLY is the EO~ detec-tion. It triggers,
as shown by the legend in Fig. 5, a return from Sl or S2 to S0.
A higher node tier that is reset goes to COLIN , indi-
cating that the collision state has been removed. This loyic
state initiatcs a return from S3 back to the base state, S0.
Note that the COLIN state will propagate down the tiers of the
network as the end of the message "runs out". The nodes are thus




~36-



self-resetting no matter how long or short the message. Also
note that no matter the state in which the network starts ou-t,
all nodes will be reset to the SO state by the idle signals.
Collision signals are returned to the processor modules,
which store ~he collision state information and revert to the
transmission of the idle sequence as the winning processor
continues to transmit. A processor may begin a new transmission
as soon as it detects the transition from COLIN to COLI~ . In
addition, a processor may begin a new transmission after receiving
idles for 2N pyte times, where N is the number of tiers in the
network, as this also indicates that the network is clear of any
prior transmissions. This latter method of enabling new trans-
missions permits a processor entering a network for the first
time to get into message synchronism with the network under
conditions of liyht traffic, so that it need not wait for a poll
from another processor in order to start interchange with other
processors on the network.
Parity error states have been noted in the state diagram
, of Fig. 5 and are established pursuant to the following logic:

PESIG = AIPE-AIPEDLY~BIPE-BIPEDLY
If PESIG then (URIN 0:7, C, P = 11, 1, 1)
to implemen-t this logic, the transitional parity error propagation
circuits 76 comprise an AIPE, or A input parity error flip flop
and a delay flip flop (AIPEDLY). I'he latter is set 1 byte clock
later in accordance with the AIPE setting. Eor A inputs, the
PESIG value thus goes high ~or 1 byte clock when the AIPE flip
flop is set by a parity error, so that the PESIG sign~l is pro~
pagated once, at the first indication of the parity error. The
same condi-tion arises when all of the data bits, control and




-37-

( ~
5;~:~

parity bit are 1 values, which occurs at the previously noted
transition in the state of the global resources. ~11 lines then
go high, forcing all l's and establishing an even ~otal (odd
parity) so that the ~IPE and ~IPEDLY flip flops are set as pre-

viously described, to denote the parity error. This systemoperates in the same way when the message packet received on the
s~pOrt contains a parity error or a forced parity indic~tion of
change of status.
Parity errors arising because of noise effects or
other variables will typicall~I not affect processor operation
because of the redundant networks. For monitoring and mainte-
nance purposes, indicator lights (not shown) are utilized to
indicate the occurrence of parity error. The once-propagated
parity error denoting change of status, however, initiates
routines for assessing the significance of the change.
The clocking system used in the node 54, as shown in
Fig. 4, provides a unique means for maintaining zero skew between
the clocks at all the node elements, despite the number of tiers
used in the network. The clock circuits 86 include first and
second EXCLUSIVE OR gates 88, 89 respectively, the outputs of
which, desiynated A and B, respectively, are subtractively com-
bined (in the B-~ sense) by a summing circuit 92, the output of
which is passed through a low pass filter 9~ to control the phase
of the output from a phase locked loop or oscillator 96. The
inputs to the first gate 88 are the output of the PLO 96 and a
downstream clock passed froln the next higher tier node elemellt
through an isolating driver 97. This line is designated as the
word clock, and is derived from the next higher tier after a
certain ~nown delay,~ , the same signal being returned through




-38-

-~ ( (
5~;~

another isolating driver 98 to the node at the next higher tier.
The inputs to the second gate 89 comprise the word clock and a
clock feedback from the next lower tier, which also receives a
signal from the PLO 96.
The word clock line feeds the two inputs of a third
gate 100, both directly and through al~c delay line 101, to
derive a byte clock signal at twice the frequency of the word
clock and in timed relation to it.
The functioning of the clock circuits 86 may be better
understood by reference to the timing diagram of Fig. 7. The
clock out signal is the output of PLO 96. Inasmuch as a paramount
objective of the system is to maintain a zero time skew between
these outputs for all nodes in the network, it is clear that
they must also have the same nominal frequency. The transmission
line delay,1~, between nodes is kept substantially constant, but
can be long. ~sing the presently disclosed technique the length
could be as ~ong as 28 ~eet, with the network and node byte clock
rates (nominally 120 ns.) used in a practical system. Those
skilled in the art will recognize that lengths which are integer
multiples of 28 feet can readily be obtained by adding tiers to
the network which are not fully populated with the maximum possible
n~lmber of processor modules. There will be a corresponding in-
crease in the latency or transmit -time through the network~
The word clock derived from the next higher tier, as
~5 shown by the next lower waveform, is a similar waveform but
delayed by ~. The word clock constitutes the basic timing refer-
ence throughout all the nodes, and this is made possible because
the leading edge of each clock out signal is controllable within
the circuit and can be made to lead the word clock so tllat all
3Q nodes can be held in synchronism. Referring to the waveforms




-39-


and ~, it can be seen that the first gate ~ generates a pulse
which terminates with the leading edge of the word clock, while
the second gate 89 generates a pulse B whose leading edge is
coincident with the word clock. The trailing edge of the B pulse
5 i5 defined by the initiation of the feedback pulse from the next
lower tier mode, which is delayed by1~, so tha-t the B pulse is of
fixed duration~ The clock circuits 86 ~unction to keep the pulse
A of the same duration as pulse B, because the summed signal,
B-~, tends toward a null, as the PLO 96 is advanced in phase so
as to establlsh synchronism. In effect, the leading edge of the
A signal, which may lead or lag the desired position as shown ~y
dotted lines, is adjusted to precede the leading edge of -the word
clock by the interval ~. When the leading edge of the clock out
signal is in this desired nominal position at all the nodes, there
is zero skew between the word clocks. For this reason the
processors coupled to the network are freed from any const:raints
as to the total length o~ the path between one processor and
another, because additive delays and differential propagation
t.imes are eliminated.
To produce the double frequency byte clock, the word
clock signal is replicated at a delay ~c by the delay line 101,
which also feeds the gate 100. Thus, as seen in the waveform
labeled byte clock in Fig. 7, at either edge of the word clock,
a byte clock pulse is produced having a duration 1~ . This occurs
twice each word clock interval and in synchronism with the word
clock throughout all the nodes. It is imulicit in the prior
description that the delay introduced by the transmission lines
between nodes are nearly identical in both direc-tions between
tiers so that in effect all word clocks and byte clocks within
the system are held in stable phase relationship. The iocally

--'10--

(
S,~3

generated byte clocks therefore provide clockin~ at each node Eor
the individual bytes in the 2 byte words of the m~ssages.
The active logic nodes are of potential benefit
wherever a competition between concurrent message packets is to
be resolved on the basis of data content. Most kno~n systems,
as exemplified by patent No. 4,251,879 lssued February 17, 1981
on a "Speed Independent Arbiter Switch for Digital Communication
Networks" are directed toward determining the first signal re-
ceived in time, and utilize external processing or control
circuits.
PROCESSOR MODULES
The individual processors in the overall system diagram
of Fig. 1 are identified as examples of interface processors
(IFPs) 14 and 16 and access module processors (AMPs) 18 to 23
respectively, and are broadly subdivided into principal elements.
A more specific e~ample of the organization of the processor
modules shows the correspondence to the broad functional sub-
divisions of Fig. 1 but also reveals a substantial number of
further subdivisions. As used herein, the term "processor module"
refers to the entire assembly shown in Fig. 8, which with the
optional features that are noted can serve either as an IFP or an
AMP. The term "microprocessor system" refers to a system 103
that incorporates a microprocessor 105 such as a 16 bit micro-
processor of the Intel 8086 type. The address and data busses of
the microprocessor 105 are coupled within the microprocessor
system 103 to conventional peripheral systems, such as the main
R~M 107, and a peripheral controller 109. The peripheral con-
troller 109 exemplifies what may be used when the processor
module is an ~MP and the peripheral unit is a disk drive 111. ~s
shown in the dotted line rectangle, however, this controller or


~ J ~ Je /~

! s
~85~3

interface ma~ alternatively be a channel interface i the processor
module is to serve as an IFP. In this instance the channel inter
face would communicate with the channel or bus of a host system.
Inasmuch as conventional controllers and interEaces may be used
in the microprocessor system 103, they need not be further des-
cribed.
It is noteworthy that it can be sho~n to be advan-
tageous to use one disk drive per microprocessor, in terms o
both cost and performance. This is true as to data hase machines
in general, even though there may be benefit at times in arranging
one microprocessor so as to have access to a number of secondary
stora~es. The diagram omits, for purposes of brevity, the
incorporation of other subsystems that would typically be used,
such as interrupt controllers that are supplied by semiconductor
manufacturers for use in conjunction with their systems. Those
skilled in the art will recognize the importance of a suitable
scheme for distribution of electrical power to the processor
modules to attainmen-t of the full degree o redundancy and re-
liability the invention can provide.
The peripheral controller 109 and the channel interface
depicted as an option in the microprocessor system 103 correspond
to the IFP interface and dis~ controller in Fig. 1. The high
speed RAM 26 of Fig. 1, however, actually comprises first and
second H.S. RAMs 26', 26'' respectively, each of which through
time multiplexing is efectively a three-port device coupled to
the microprocessor bus system at one of its ports (designated C).
Each of ~he ~I.S. R~Ms 26', 26'' cooperates respectively with a
first or second network interface 120, 120', providing communi-
cation witll the first and second networks 50a and SOb (not shown
in Fig. 8) respectively at an input (receive) port A and output




-~2-

~'

~85~3

(send) port B. With these redundant systems, only the second
network interface 120' and the second H.S. RAM 26'' need be
described in detail. The network interfaces 120, 120' ~re further
shown and described in conjunction with Fig. 13, but can be
generally subdivided into four pri.ncipal parts:
Input regi.ster array and control circuits 122
coupling the ten input l.ines from the second network 50b
to the A port of the H~Sr RAM 26'' via an interface data
bus and address bus.
An output register array and control 124
coupling the OUtpllt lines of the second network 50b to
the interface data and address busses and to the B
port of the second H.S. RA*1 26''.
A microprocessor bus interface and control 126
coupled to the interface address and data busses and to
the A and B ports of the H.S. R~ 26''.
A clock generator 128 that receives the word
clock from the network and generates synchronized,
properly phased clocks for controlling the interface 120'.
The second network interface 120' and H.S. ~M 26''
cooperate with the microprocessor system l~3 in coordinating data
transfers between the high speed network and the relatively slower
speed microprocessor and also provide queuing of messages between
these different systems. The microprocessor bus interface and
control 126 may be referred to as performing read/write (R/W)
functions with the microprocessor system which (at least with the
Intel 8086) has the capability of writing directly into and re-
ceiving data from the ll.S. R~M 26''.
Although the IFP and AMP systems are functionally

alike, there can be a substantial disparity in the sizes of the



-43-~

~ ~ ~a~

incoming message storage and outgoing message storage in the ~I.S.
RAI~1 26'', as between the IFP and the ~lP. In a relational data
base system, the ~FP has a large incoming message space in the
ll.S. ~1 26'', in order to receive new messages from the high
S speed network so that the needs of the host computer may be
serviced by constant usage of the network. In the AMP the reverse
is true, because more storage space should be available to send
processed message packets to the high speed network. ~'he H.S.
R~l 26'' functions with the main R~M 107 in the microprocessor
0 system 103, which has message buffer sections for each network.
The allocation of system address space in the main
R~M 107 for the microprocessor system 103 is shown in Fig. 9, to
which reference should briefly be made. It is conventional in
having addresses devoted to system random access functions, an
e~pansion space for use in the event the random access capacity
is increased, and I/O address space and an address space reserved
for ROM and PROM (including EPROM) functions. In addition, por-
tions of the system address space are reserved for messacJe packets
from and to the first and second high speed RAMs 26', 26'' res-

pectively. This provides greater flexibility in the systemoperation, inasmuch as even though the microprocessor lQ5 can
address the H.S. RAMs 26'', the main RAMs 107 assure greater
freedom from software and hardware interdependence.
It has been stated, referring again to ~ig. 8, that
the bidirectionally accessible H.S. R~Ms 26'' are organized in
such fashion that they perform central functions in mult-iprocessor
mode control, distributed updatin~ and the management of message
packet flow. For these and other purposes, the H.S. RAM 26'' is
divided into a number of different internal sectors. The relative
disposition of the different sectors shown in Fig. 8 is used




-44-

~9

3S~3

throughout the different processor modules in the syste~, and
the specific addresses that designate the limits of the sectors
refer to those used in an actual system. It will be appreciated
that the sizes of these sectors of memory and their relative
disposition are widely variable dependent on the specific system
context. Sixteen bit memory words are employed in this example.
The selection map and response directory are dedicated lookup
tables of the type that may be written in once during initiali7a-
tion, while the transaction number section provides a dynamically
revisable lookup table.
The selection map section of memory starts with location
0 but is based upon the use of four different maps used in inter-
related fashion within the memory section. The destination se-
lection word (DSW) that is contained within the message packet is
used cooperatively with the dedicated selection maps in the H.S.
R~M 26''~ The destination selection word, comprising 16 total
bits, has a map address in 12 bit positions, and map selection
data in the four other bits. Each of the first 1024 16 bit
memory words of the H.S. RP~I contains four map address values.
The address value specified by the DS~ provides, with a single
memory access to the H.S. RP~I, map bits for all four maps, while
the map selection bits in the DSW determine which map is to be
used.
Fig. 15 shows the conceptual organi7ation of the map
25 section as if each map had physically separate 4096-by-1 bit R~M.
As a ma~ter of implementation convenience all map data is stored
in a single portion of the H.S. R~M, as shown by Pig. 8. The DS~
Management Section 190 (Fig. 13) controls multiplexing of four
bits from each of the four maps of Fig. 15 from one 16-bi-t word
of ~I.S. RAM. Those skilled in the art will recogni7e the advantage




-45

_~ ~ 3

35;~3

of the scheme in that the maps can be initialized by the processor
by the same means as used to access other parts of the H.S. RAM.
There are also three different classes of destination
selection word that are used, and the selection map locations are
correspondingly divided into a hash selection portion, a class
selection portion and a destination processor identification
(DPID) selection portion. The DPID specifies whether the proces-
sor 105 is the specific one for which the message packet is in~
tended, whereas the class selection portion specifies whether or
not the processor is one of the number of processors in a par-
ticular process class that is to receive the message packet. The
hash values are stored in accordance with the manner in which the
data base is clistributed throughout the relational data base
system, following a predetermined algorithm for the particular
relations and method of scatter storage that is employed. The
hash value in this instance can designate the processor either as
having prlmary or backup responsibility for the data. Thus the
selection maps provide a technique for directly addressing the
~.S. R~l 26'' so as to determine processor destination. This
function complements the broadcasting of prioriti~ed messages to
all network interfaces 120' and enables local accessing withou-t
interruption of microprocessor 105 status.
A separate section of H.S. RAM 26'' serves as a pivotal
means for checking and controlling globally distributed activities.
Transaction numbers (TNs) are assigned to various ones of the
processes that are sent on and received from the network 50b, as
discussed above and shown in Fig. 3. TNs within messages are
retained as global transaction identities as each microprocessor
system 103 independently performs the subtasks acce~ted by it.

The block within the H.S. R~M 26'' that is dedicated to a number



-46~


of available transaction number addresses contains status entries
that are locally controlled and updated by the microprocessor
system 103 as these subtasks are performed. The TN is used in
a number of different ways, both locally and globally, in per-

forming intercommunication functions. ~rhe transac-tion number is
used to identify subtasks, to call forth data, to provide com-
mands, to control message flow and to characterize the dynamics
of a global process. Transaction numbers may be assigned, re-
linquished and changed iIl the course of global comm~lnication.
l.0 These aspects are explained more fully in the following description.
The most complex, but perhaps the most dramatic, aspect
of the TN is the capability it affords, with the sort network,
for distributed updating of the status of local processors as to
a given control process. Each control process (i.e., task or
multiprocessor activity) has its own TN.
Readiness state values are held in the transaction
number section of H.S. RAM 26'' and are locally modified under
control of the microprocessor system 103~ The microprocessor 103
can initialize the appropriate entry (e.g. SACK/Busy) in the
response directory (address OSOD (hex)) of Fig. lO, and enter the
SACK/Busy status by transferring the exact image thus reproduced
to the H.S. RAM 26''. An entry at a TN address is accessible to
the network 50b via the interface 120', at the A and B ports of
the H.S. RA~I 26''. Queries are made using a Status Request message
containing thc status request command code (see Fig. ll) and TN.
The interface 120' uses the content of the designated TN-to
reference the response directory which contains a properly for-
matted response message. ~ global status query as to a given TN,
when received at tne second network interface 120' elicits a
direct response that is solely hardware controlled. No preEatory




~47-

( ~
5~

communication is needed, and the microprocessor system 103 is not
interrupted or affected~ However, the microprocessor 103 can
assure against interruption when setting the status by trans-
mitting a I,OCK indication to the interface 120', which communicates
the Lock word derived from 0501 (hex) until removed at a later
time.
The word format of the readiness states is shown by
the seven states from "busy" to "initial" in Fig. 12, which
depict the useful variants employed in a practical sys-tem~ More
or fewer status variants may be used, but these seven states
provide versatile and comprehensive control. It is the responsi-
bility o~ the microprocessor system continually to update status
levels for different TNs in the ~.S. R~M 26'' to reflect availa-
bility for, or progress toward the completion of a subtask. Such
updates are made simpl~ by writing into the TN address in the
ll~S. RAM 26'', using the formats shown in Fig. 12.
In Fig. 10 each status response is accompanied by an
initiating status acknowledgment command code (SACK), from 05 to
OD (he.~adecimal). The SACK responses sent to the network are
essentially the command codes of Fig. 10, the numeric portions of
the word formats of Fig. 12, and an originating processor ID
~OPID), as seen in Fig. 11. The SACK responses thus define a
consecutive priority subgrouping within the overall coherent
priority scheme shown in Fig. 11~ The OPID is significant in
the priority scheme, because if a number of processors are working
on a TN but are "Busy", the highest priority message tha-t is
broadcast is determined by the OPID. Transfers and system
coordination can be based on this data.
The SACK message priority schedule, the simultaneous
responses from a number of microprocessor systems 103, and the




-~8-

~9
~ - ;
(: (
5,~3

dynamic prioritizing in the network 50b, enable the status of
global resource as to a given task to be determined in a vastly
improved manner in comparison to prior ar~ systems. The response
given is unambiguous, cannot represent an invalid state and
requires no software or local processor timeO Thus Deadlock
cannot arise due to repeated requests interfering wi-th task per-
formance, for example. Numerous multiprocessor options can be
used at the different status levels. It is unique that the local
processors can continue to operate independently and that a single
query derives,a global, prioritized, response~
Some specific discussion of the successive states
depicted in Fig. 12 may be useful. The "busy" and "waiting"
states connote successively more complete phases as to the as-
signed subtask, the latter identifying a condition in which a
further communication or event is requiredO These states
exemplify the "elevation" of the status of the TN until it
reaches a level at which a message packet corresponding to the
TN can be sent or received.
When a message packet is to be sent or received,
~ however, a different feature of the TN comes into play, namely
its capability ~or message control. When the microprocessor
sys~em 103 has a message fox transmission, the status indication
becomes "send ready". The microprocessor system 103 not only
updates status but it also enters a "next message vector" value
in the H.S. R~M 26'', using the word format of Fig. 12. This
entry defines the location at which the corresponding output
message may be fetched from El.S. R~M 26''. This vector is used
internally in thc network interface 120' in chaining together

output messages pertaining to a specified TN.




-49-



A related function is performed during the "receive
ready" state, in which the TN storage location retains an input
message count from the microprocessor system 103 as to the number
of messages that may be received pertaining to a given TN. This
count can be decremented until reduced to zero as successive in-
put messages are transferred. At zero, no more messages can be
received and an overrun con~ition can be indicated. This enables
the TN to be used in pacing transfer between the network 50b and
the microprocessor system 103.
Locally, at each processor -the TN is retained in the
send and receive messages during processing, as a constant and
uniform reference throughout the system. The TN0 or default
state also provides a local command to identify the fact that a
message is to be used in a non-merge mode.
From the global standpoint, moreover, the distinction
between TN0 and TN >0 values establishes one of the command
functions ~or which the TN is used. The merge/non-merge charac-
terization thus inherent in each message packet, provides a
valuable systems approach to prioritizing and sorting messages.
Similarly, the "Assigned", "Unassigned", "Non-Participant" and
"Initial" status are used to fulfill global intercommunication
and control functions. The "Unassigned" state is one in which
the processor has previously relinquished a TN, so that it must
receive a new primary message reactivating the TN. If the proces-

sor indicates "Unassigned" when it should be "~ssigned", thisestablishes that the TN was not properly entered and that cor-
rective action should be taken. When a TN is "Assigned" where
it should be "Unassigned", this may indicate a faulty transfer
or a competition between two processors for a new TN. Neither




--50--


"Assigned" nor "Unassigned" is treated as a readiness state,
inasmuch as the processor has not yet undertaken work on the TN
at these stages.
The "Initial" and "Non-Participant" states are also
significant in global resource terms. A processor t~hich comes
on line and which therefore must be brought into the system is
in the "Initial" state, which indicates that a~ministrative steps
are needed to bring the processor on line. Those processors
which are "Non-Participants" in a given task do no~ need to do
any processing locally, but must keep track of the TN so as not
to inadvertently use it in an erroneous manner.
The dedicated directory or reference section of the
~i.S. R~M 26'', referring again to Fig. 10, also includes other
types of prioritized messages for use in generation of responses
by hardware. NA (not assigned) entries are held available for
future use. Three different types of ~AK responses (Overrun;
TN Error; Locked) are of lowest data content and at highest
priority levels because they identify error conditionsO The S~CK
responses are followed, in order of decreasing priority, by the
ACK response and the N~P (not applicable processor) response. In
the present implementation, two response cpmmand codes are not
assi~ned (NA) and are available for future use. This directory
can be initialized by software and is used by hardware to quickly
and flexibly generate any of the range of response message
texts
~ separate portion of this directory is used to store
TOP, GET, PUT, and ~OTTOM addresses or pointers related to the
functioning of circular buffers for input messages and completed
output message pointers. These pointers function in conjunc~ion
with the dedicated sectors of ~I.S. R~M 26'' devoted to in~ut




-51

~3~ 3

message management and output message management respcctivel~.
For incoming messages, a circular buffer scheme is used, with
"TOP", stored in the directory section of the ~.S. RAM 26'',
being a variable address defining the upper position for incoming
messages. The PUT address, also stored in the directory section,
deEines where the circuits are to store the next message that is
received. The GET address is provided and kept updated by soft-
ware to enable the hardware to recognize the location at which
the software is emptying the buifer.
The,incoming message buffer is managed by setting PUT
at the bottom of the buffer and starting with the GET address
equal to TOP. The rule of operation assured by software is that
GET is not set e~ual to PUT, which would create an ambiguous
condition. As messages are entered into the incoming message
buf~er in H.S. RAM 26'', the message length value in the message
itself establishes the be~inning point of the next message and the
PUT address stored in the directory is then changed to indicate
where the ne~t succeeding message is to be received in the buffer.
Incoming messages can thus be fetched by the microprocessor system
103 as its capabilities permit.
Data in the outpuk message space within the ll.S. RAM
26'' is utilized in conjunction with the output messa~e complete
vectors held in a separate circular buffer and the next messa~e
vector in the ~I.S. RAM 26''. Messages can be assembled and stored
in arbitrary locations, ~nd related messages can be chained to-
gether for transmission on the network. In the directory section
of the H.SO RAM 26'', TOP, BOTTO~I, PUT and GET addresses are
entered and updated as previously described to maintain active
present references to locations within -the output messa~e complete
30 buffer. The message complete vectors constitute addresses which




-52-

!

5,~3

reference messages in the output message space -that have been
successfully transmitted as in~icated by a response received. ~s
described below, the system enables the microprocessor system 103
to enter output messages readily, but to handle complex linkage
vector sequences in orderly fashion so that output message space
is efficiently used and message chains can be transmitted.
The protocol of Fig. 11, which was previously discussed
as to responses, is continuous with respect to the primary messages
as well. Response messages are given in sequence, the hexadecimal
command codes being stated in ascending order. In the primary
message grouping, the stop merge (also the base or non-merge
control) message is of lowest data content and highest priority.
This message constitutes a control communication that terminates
merge mode within the network and at the processor modules.
A substantial number of different types of primary data
messages can be used in ascending priority, and can be categorized
in priority order based upon application and system requirements.
~s mentioned, continuation messages can have higher priority so as
to maintain continuity with the preceding message packet to which
they rela~e.
The last grouping of four primary messages in ~ig. 11
comprise, in descending order of priority, the status reques-t
message, which is the only type of status message needed to obtain
the status response, control rnessages calling for "relinquish TN"
and "assign TN", and, of lower priority, a "start merge" control
message.
This system permits versatile operation, as will be
evident from the more detailed examples given hereafter. ~ proces-
sor module operates on a present transaction nurnber (PTN hexe-


after), whether this is externally commanded from the net~ork or



~53-


generated internally in the course of successive operations~
When merge operations are being carried out, the processor modules
utilize the global reference or transaction identity defined by
the TN. Starting, stopping and restarting of merqe operations
utilizes only simple message changes. I~hen subtasks do not
require messages to be merged, or message packets are generated
that have no specific relationship to other messages, they are
queued to an output against TN0 and are transmitted when the base
or default condition defined by Present Transaction Number (being
0), holds true. TNO condition enables messages to ~e queued for
transmission whenever a merge mode is not utilized.
NETWORK I~TERFACE SYSTEM
E`ig. 13, to which reference is now made, depicts in
further detail one example of interface circuits useful in systems
in accordance with the invention. This section of the description
includes a number of detailed features that are not necessary to
an understanding of the invention but are embedded in a practical
system example and are therefore included to place the examples
more firml~ in context. Specific gating arrangements and details
not the subject matter of the invention and involving well known
expedients for which many alternatives are available have been
omitted or abbreviated. Fig. 13 is an elaboration of the second
networ]c interface 120' and the H.S. RA~I 26'' from Fig. 8. The
interfaces 120 for both networks function in like fashion and
thus description of one will suffice.
In Fig. 13, inputs from the particular active logic
network 50 associated with the interface are applie~ at network
message management circuits 140 via a multiplexer 142 and known
parity check circuit 144. The multiplexer 142 is also coupled
to the microprocessor system data bus, enabling access to the




-5~-



message management circuits 140 vla the bus. This feature
permits the microprocessor system to operate the interface in a
step-b~-step test mode, transferring data as if the interface
were on line to the network. Inputs from the networks are
applied to a receive network data re~ister 146, both directly in
a first section and through a receive byte buffer 148 which
thereafter enters the byte into a different section o~ the receive
network data register 146. Consequently, both bytes o~ each word
received are entered into and held available at the receive
network data register 146.
Output messages for transmission are entered into a
send network data register 150l while a parity bit is added in
a conventlonal parity generator 132. Messages are sent to the
associated network from the net~ork message manaqement unit 140,
or (where the test mode is to be used) -to the microprocessor
system data bus. For message management purposes within the
interface, the send message format in the random access memory
16~ comprises identifying data as well as message data. As seen
in Fig. 21A, command, tag, key and DS~ can all be incorporated
along with primary data that is to be transmitted.
The organization shown in Fig. 13 is essentially the
same as that shown in Fig. 8, which illustrates the interface
data bus and address bus as separately coupled to input ports A
and B at the H.5. RAM 26', while the address and data busses of
the microprocessor system 103 are illustrated as coupled to a
separate ~ port. In actuality, as seen in Fig. 13) this separate
bidirectional a~cess is achieved by time division multiple~ing of
input and output address functions within the interface and at
the H. S~ RAM 26''. The microprocessor data and address busses




--55--

5,~3

are coupled to the interface busses via gates 145, 149 respectively
so that the ~icroprocessor can operate asynchronously on its own
internal clock.
The timing system used is based upon clock pulses,
phase control waveforms and phase subdivision waveforms generated
by interface clock circuits 156 (Fig. 13) and having the timing
relat-ionships shown in Fig. 14, to which reference is also made.
The interface clock circuits 156 receive the network word clock
from the nearest node and a phase locked clock source 157 includes
means for maintaining zero time skew as previously described in
conjunction with Fig. 4. The nominal network word clock rate of
240 ns. in the network is subdivided in time in the interface
clock circuits 156 because a frequency multiplier (not shown in
detail) held ir. phase locked relationship provides a faster cloek
defining reference period of 40 ns. duration (shown in Fig. 14
as PLCLX). The basic word period is defined by the opposite-going

,
half cyclcs o~ a cycllc signal designated CLKSRA having a total
240 ns. duration. Two other signals of like frequency and duration
are generated from PLCLK by frequency dividers 158 at times
delayed by one and two cyeles of PLCLK respectively from CLKSRA,
and these are designated CLKSRB and CLKSRC respectively.
From these signals, control logic 159 develops -timing
waveforms, designated IO GATE, RECV GATE and SEND G~TE, denoting
successive equal thirds of the word period. These intervals are
appropriately referred to as IO, receive and send phases. The
phases defined by the gate signals are each further subdivided
into two equal half intervals by XO CLK, RECV CLK and SEND CLK
signals whieh define the last half of each o~ -the phases. syte
elocking functions are governed by BYTE CTRL and BYT~ CLK signals.




-56-



The IO, RECV and SEND phases provide the basis for
time division multiplexed operation of the random access memory
168 and the microprocessor system busses. The interface can
receive or send no more than one word per word period from or to
the high speed network, and receiving and sending are obviously
never concurrent. Transfer rates to and from the microprocessor
system are substantially lower, but even if equal the capacity o
the interface circuits would not be taxed. The interEace system
arrangement hinges ln large part on direct access to the random
access memory~168, and substantially obviates the need for internal
processing or software~ Thus as the system cycles through the
successive phases in each word period, words are successively
advanced in non-conflicting pro~ressions along their predetermined
signal paths for the different functions taking place. For
example, sending a message to the bus may be interleaved with
reception of a message from the microprocessor, each being inter~
changed using a different portion of the memory 168.
Intercommunication between the data bus for the micro-
processor system and the network interface is effected in IO
(which may also be referred to as Read/Write) management circuits
160. A write gate 162, for words from the microprocessor system,
and a system read register 164, for transferring wor~ls lo ~he
microprocessor system provide the coupling between the micro-
processor's bus and the bus interface to the network interface.
A memory address register 165 and parity generator and
check circuits 166 are also incorporated in the network interface
subsystem. In this example the hiyh speed storage cornprises a
4K word x 17 bit random access memory 168, the internal subdivision
of which and the use of dedicated memory portions within which
have previously been described. The size of the random access




-57-



memory can readily be re~uced or expanded to meet the needs of a
particular application.
Receive message buffer management circuits 170 are
coupled to the data ~us of the micro~rocessor and in turn are
coupled to the address bus for the memory 168. The term "received
- messages" refers both to the incoming messages from the network
for entry into the circular buffer a-t a location referred to as
PUT and to the subsequen~ transfer of that message to the micro-
processor system when a GET value specifies where the sys-tem is
to sequence in e~tracting a received message for transfer to the
microprocessor system. Address values for accessing the random
access memor~ 168 are entered into a GET register 172, a TOP
register 174, a PUT counter 175 and a BOTTOM register 176 res-
pectively. The P~T counter 175 is updated by incrementing from
an initial position defined by the BOTTOM register 176. The TOP
register 174 provides an opposite limit reference~ Both TOP and
BOTTOM may be manipulated by software control to modify both the
size of the receive buffer and the absolute location with H.S. RAM.
1~1hen the contents of the PUT register equal the contents of the
TOP register, the PUT register is reset to the contents of the
BOTTOM register, thus effecting the circular nature of usage of
the buf~er. The GET, TOP and BOTTOM registers and the PUT
counter are shared in managing both the incoming message and
output message complete circular buffers.
Entries are made in the GET register 172 under software
contxol, because the length of the then active message in the
buffer determines the next address. comparison circuits 178, 179
coupled to the outputs of the GET register 172, the PUT counter
175 and the TOP register 174 are used in detecting and indicating
overrun conditions. An overrun exists when the GET and PUT




-58-

SJ"3

settings are set equal or when GET is attempted to be set greater
than TOP. In either instance an overrun status indication is to
be sent until the condition can be corrected.
The concatenated manner in which the "receive message"
circular buffer is arranged and operated is particularly useEul
in the present system. ~lardware management of PUT and dynamic
management of GET can be employed with cr4ss-checks being avail~
able to avoid conflict. ~lowever, other buffer systems may be
employed, although perhaps at some added expense in circuitry and
software. Th~e receive message format in memory 168 also, re-
ferring now to Fig. 21B, contains identifying data in the form of
map results, data length and ~ey length, derived as described
hereafter.
A 3SW management section 190 within the interface
includes a destination selection word register 192 into which the
destination selection word is entered for transfer to the address
bus. In using the DSW to address the dedicated DSW section of
the memory 168, the output on the data bus, from the memory,
re~urns data from which the DSW management section 190 may also
determine that the messaye packet is appropriate for the processor.
It will be noted from Fig. 13 that the Destination Selection word
comprises 2 bits of map nybl address, 10 bits of map word address,
and 4 bits for map selection. The "nybl" address is used to de-
lineate a subsection of a word from memory 168. The 4 map selection
bits are applied to a map result comparator 194, which receives
relevant map data from the memory 168 via a multiple~er 196. The
multiplexer 196 receives 16 bits, representing 4 differcn~ map
data nybls stored at the address defined by the 10 map word
address bits in the DSW. The memory 168 is specifically organized
in comparable fashion in the dedicated map section to facilitate




-59-

~JC~t'~

this comparison. The appropriate one of the four rnap nybls is
selected by the 2 remaininy bits in the DSW, applied to control
the multiplcxer 196. A comparison is made r and the resul-ting map
code is entered in the map result register 197 and inser-ted in
the incoming message entered in the memor~ 168. If the comparison
shows that a "one" bit is not present in any of the selected
maps, a "reject" is generated to indicate that the processor
module is not intended to receive the message packet.
Referring to Fig. 15, there is shown in general form
an advantageQus way to subdivide the memory 168 in the dedicated
destination selection portion and to ma~e the map result comparison.
Each map is organized as 4096 words x 1 bit, and further subdivided
(see Fig. 8) into specific processor ID, class ID and hashing
sectors. Using the 12 address bits (10 address and 2 nybls)
the common map address is selected and a 1 bit output is derived
from each map. (The multiplexer and nybls of Fig. 13 are not shown
for simplicity in Fig. 15). The four parallel bit outputs can be
cornpared to the 4 map selection bits in a yroup of four ~ND gates
198, so that if one or more compare, the output of an OR gate 199
goes true. The map result can be entered in the map resul-t
register 197 of Fig. 13 and the message can be accepted in the
memory 168. Alternatively, the message is rejected and a NAK is
transmitted.
The cornmand word management section 200 includes a
command register 202 that receives the command word. The ~N field
of the command word has access to the address bus so that ~he
referenced receive TN may be examined to deterrnine the proper
response message (see Fig. 18). In addition, during a Start Merge
comrnand, a data path exists from the TN field to the PTNR 20f~ in
order that thc PTN value may be changed in conjunction with a
Start Merge command.

f,0-



~ - J
/ `:
3523

The incoming messages entered into the memory 168 also
include, as discussed relative to E`ig. 21, for address vector
purposes, the value of the length of the data field and the key
field, i~ these fields are used. These values are derived by a
receive data length counter 210 and a receive key length counter
211, each of which counts the sequence of words in the appropriate
field as the fields are presented by the input source.
A send message management section 220 is also used to
encompass the functions of accepting processed packets for memory
168 and trans,ferring them to the network at a later time. This
section 220 comprises a send transaction vector counter 222, a
send data length counter 224 and a send key length counter 226
coupled bidirectionally to the data bus. The send transaction
vector counter 222 is coupled to the address bus, while the send
lS data length counter 224 is coupled to an address generator 228
which is in turn coupled to the data bus. The output buffer
section and the circular buffer comprising the output message
complete vector section of FigO 8 are both used in sending mes-
sages. In this instance, however, message packets are entered
serially, but then fetched in a sequence defined by the vectors.
Within the interface, the separate operating phases
are carried out at mutually exclusive times, and this time sharing
permits the mernory 168 to receive and supply network message
packets at the network clock rate, operate internally at an
effectively higher rate, and communicate with the microprocessor
system, which operates asynchronously at its slower clock rate.
q'o control gating of the messages to the various counters and
registers, phase controls respond to control bits which generate
command, DSW, data and other signals denoting the individual

fields within the messages. Send state controls 250, receive

35,~3

state controls 260 and R/W state controls 270 receive the clock
pulses, identify the fields within the data and control sequencing
of data flow during the send, receive and processor clock.
Control of the interface is then effected by three
finite state machines (FSMs) one each for the send, receive and
processor (R/W) phases. The FSMs are implemented in a conventional
manner, using prograrnmable logic arrays (PLAs), a state register
and action ROMs. Each FSM is advanced to its next state once per
network clock cycle. Because of the number of control signals to
10 be gen~rated,, the outputs Of the PLAs are further encoded by
action ROMs. Those skilled in the art will readily appreciate
that translation of -the control sequences implied by the operation
of the network as described to FSM mode so as to incorporate
conventional details and operations is a tedious but straight-

forward task.
The state diagrams o~ Figs. 17 and 19, and the matrixdiagram of Fig. 18, are included to provide comprehensive detail
as to internal design features that may be used within a complex
system. In Fig. 17, which pertains to the receive phase, and
~0 Fiy. 19, which pertains to the send phase, the designations used
correspond to those employed elsewhere in the specification and
drawings. For e~ample, the ~ollowing terms apply:
R~LC = Receive Key Length Counter
RDL~ = Receive Data Length Counter
RNDR = Receive Network ~ata Word ~egister
PUTC - Put Counter
GETR = Get Register
The state diagrams thus are essentially self e~planatory,
when taken in conjunction with Fig. 13 and the speciication.
The state diagrams detail the sequences and conditional statements




-62-

q


5,~3
involved in complex messaye management and inter-processor com-
munication. In Fig. 17, the states labeled "Generake ~esponse"
and "Decode Response" and the conditional statements indicated by
dashed line rectangles are in accordance with the designa-ted
responses and actions set out in the matrix diagram of Fig. 1~3.
Fig~ 18 depicts both the responses generated and the actions
undertaken for any given combination of primary messaye and
readiness state pertaining to a given TN. Obviously, normal
system operation involves some message rejection but ~ery few
Error Conditions.
In both Figs. 17 and 19, many of the conditional de-
terminations can be made concurrently, while the state steps will
be changed sequentially. In either event, the Send and Receive
operations are regular progressions which do not require external
control because of the organization of the messages and manner of
operation of the network.
A number of features that are used in typical processor
systems, or ln multiprocessor sys-tems, are not germane to the
invention and are therefore not depicted. These include parity
error circuits, interrupt circuits, and various means ~f moni-
toring activity, including a watchdog timer and a wide variety of
test functions.
EXAMPLES OF SYSTEM OPER~TION
The following are provided as exarnples of how the
integrated system of Figs. 1, 8 and 13 functions in different
modes of operation internally while cooperating wi-th the networ~
and the H.S. R~ls. These exarnples demonstrate how interrelations
between the priority scheme, the addressing techniques utilized and
the transaction identities provide both local control global
intercommuJIication.




-G3-



Primary Data Message Reception and l'ransmission - Fig.
16, to which reference is now made in addition to the other
Figures, is a simplified state diagram of the states involved in
the ultimate acceptance of a ~rimary message. Reception of the
message, at buffers and in storage, does not effectuate accept-
ance until these logical states have been satisfied. Although
depicted as a serial train of events, determinations are es-
sentially made in parallel or concurrently because conditions are
mutually exclusive or circuitry bypasses intermediate stages in
arriving at a certain operating level.
A message on the network of Fig. 1 is passed through
the receive network data register 1~6 of Fig. 13 until the EOM
state is identified, at which point it is known that the message
is complete. If a LOCK condition e~ists, the system references
the response directory in the H.S. RAM 26'' of Fig. 8, sending
the N~K/LOCK reject message.
Alternatively, if the LOCK condition does not exist,
the system moves to the map comparison check, carried out wi-thin
the ~SW management section 190 in the interface as shown in Fig.
13. If a proper comparison exis-ts, as indicated by Map Output - 1,
the system can continue to receive the message If not, the mes-
sage is rejected and a NAP is sent.
~ laving a correct Map determination, the system is then
ready to check for TN status by making reference -to the directory
of the TN's as shown in Fig~ 8, specifically to deterrnine whether
the local status is "receive ready". It is assumed that the TN
was previously assigned by a prior primary message.
If this check reveals that the TN is in the done, non-
participant or initial states, a NAP reject message is sentO If
the status is some other condition that is invalid, the rcject




-64-

5,~
message is a NAK/TN error, both types being again taken from the
response directory of Fi~. 8. If the status is "receive ready"
another determination can be made.
This determination is for "input overrun", an~ is made
in the input/output management buffer section 170 of Fig. 13, by
comparing the GET and PUT addresses as previously describe~. The
transaction number is also checked for a receive message count of
zero, which also indicates input overrun. If the overrun condi-
tion does exist, the NAK/input overrun is sent and the message
is rejected.
When all conditions are satisfied, an ACK message is
derived from the response directory in H.S. RAM 26'' and trans-
mitted on the ne-twork, in priority competition with other processor
modules, some of which may also have acknowledged the message.
At this point the message is accepted if the common (i.e. merged)
response from the network is an ACK affirming that all selected
receiving processor modules can accept the message. If the
response is in any other form, the message is rejected by all
processors
In this example of reception and response, it should
be noted that after a primary message has been received, all
processors generate one of the ACK, NAK, or NAP responses. After
receiving one of these response messages, processors may attempt
to transmit a primary message. (They may also do so after a
delay that is equal to or greater than the total latency delay
through the network as discussed above under "~ctive Log'ic Node").
Note further that if several processors transmit identical mes-
sages, they may all win, in effect, in contention on the network.
In this case, all of the transmitting processors receive the AC~
-
response. This fact is of importance in the operation of the




-65-

~ ?

35~3

broadcast and global semaphore modes, which are described in
detail in later examples.
Practical examples of the inverltion incorporate sub-
stantially more responses and perform a number of different
actions in addition to those just descrihed. Fig. 18 demonstrates
these in the vertical colurnns entries for Lock, TN error, and
Overrun interrupt conditions, the nine different previously identi-
fied status levels and the Ac~nowledgment ar,d ~ot Applicable
Processor responses.
Wh~n a processor module is ready to transmit a message,
the PTN value stored in the PTN register 206 of Fig. 13 is avail-
able, and it need only be ascertained that the TN status is in
the "send ready" condition. As seen in Fig. 12, the "send ready"
entry contains the next message vector address for the outpu~-
message. The assembled output message will be transmitted on the
network and, i it loses in contentiont it is repeated until
successful, and a response is received, unless the PTN is changed
in the interim. After successful transmission and acknowledgment,
the address vectors are changed. The next message vector is
obtained from the second word of the present message (Fig. 21A)
that is transmitted from the send transaction vector counter 222
to the random access memory 168. The PUT counter 175 is advanced
by one if the output message section is not in an overrun condition,
which would be indicated by PUT equals GET. Finally, the next
~5 message vector from the send transaction vector counter 222 is
entered into H.5. RAM at the transaction number address specified
by the present transaction number register 206. Thus if the new
TN is in the "send ready" state, the vector value again points ~o
the location of the next message pertaining to the transaction
identity. Refer to Fig. 21 for the format of an output message

in H.S. R~M.
-66-

~85,~3

Message management in transmittiny messages, however,
can involve a number of variants, including an internal or ex-
ternal change in the PTN. Errors, overrun or locked conditions
can cause the system to shift the transaction number to TN0,
causing the system to revert to the norl-merge mode and examine
the status at TN0 until the "send ready" state is identified or
a new TN is assigned. Refer to the flow chart of Fig. 19 for a
delineation of states and conditions that may be used in a de-
tailed example.
Cut~ut Message Complete suffer Example - Upon completion
of transmission of a message, as evidenced by any response mes-
sage except LOCK, a pointer to the newly-completed output buffer
is placed in the Output Message Complete Circular Buffer section
of H.S. R~l (see Fig~ 8). The pointer is simply a 16 bit word
which gives the address of the output messaye buffer. (The
format of an output message buffer is shown in Fig. 21. It
shoul~ be noted that the output message buffer includes a place
to record the response message received from -the network).
The output message complete circular huffer provides
communication between the network interface hardware 120 and
supervisory programming on the microprocessor 105. Programs in
the microprocessor place messayes to be output in ~I.S. RAM. As
described in a subsequent example in detail, output messages can
be chained together, with TNs acting as head-of-chain pointers,
to form complex sequences of activity. A further factor is that
because the network can be multiplexed among TNs (also described
in detail below), messages may be output in various orders de-
pending on events throughout the network.
It is important, however, to quickly recover the space
30 in H.S. RA~I taken by a successfully transmitted packet so that




-67-

.7
,'?3

the space can be reused for another outgoing packet. The output
message complete circular buffer serves this func~ion.
When a data message has been successfully sent and a
non-Lock response received, the network interface advances the
PUT pointer at 0510 (hex) in ~.S. R~l (see Fig. 10) by one and
s-tores the address of the first word of the output message just
sent at the address in the P~T register. (~f the PUT value be-
comes larger than the TOP poin~er at 0512 (hex), it is first reset
to be the same as the BOT pointer, which is stored at 0513 (hex)).
If the PUT pointer becomes larger than the GET pointer (location
0511 (heY~)), the circular buffer has overrun, and an error inter-
rupt is generated to the microprocessor.
Asynchronously, software executing in the microprocessor
examines the output message buffer pointed to by the GET pointer.
lS After completing any processing required, the processor advances
the GET pointer by 1 (if ~ET becomes larger than TOP, it is reset
to BOT). If, GET = PUT, there are no more output messages to
service. Otherwise additional output messages have been success-
fully sent, and must be processed. Processing incl~des returning
H.S. RAM output buffer space to a free pool, so that it can be
reused for other packets.
It is important to note that the ou~put messa~e complete
circular buffer is distinct from the input message circular
buffer, and the two circular buffers are managed by different
PUT, GET, TOP, and BOT pointers. In one implementation, as shown
by Fig. 13, circular buffer management hardware 170 can-be shared
by both circular buffers, although this is not essential.
Initializit-g Procedures - Each processor module has
access to the TNs in its own high speed random access memory 168
(Fig. 13) comprising the directory of potentially a~ailable T~s.




-~8-

` (

5,~3
Those TNs which are unassigned, however, are specifically so de-
signated by the transaction number value in the relevant location.
Consequently, the microprocessor system 103 can identify the un-
assigned transaction numbers and select one for use in initiating
communications with other processor modules pertaining to a given
~ransaction identity.
Although TNs are locally assigned and updated under
local microprocessor control, global control through the network
is achieved by the primary control messages "relinquish TN" and
"assign TN". No deadlock condition can arise between competing
processor modules that may desire the same TN, because the network
will give priority to the lower-numbered processor. Other
attempting processors will receive a N~CK/T~l error response which
indicated that they must try to reserve a different TN. There
is thus complete flexibility in reserving and referencing these
transaction identities within the system and locally.
It will also be noted that repea-ted use is made of the
TN in shifting between the base transmission mode, TN0, and the
merge mode, when TN is greater than 0. The system is thus able
to change not only the focus of its operations but also the
character of its operations by a single broadcast transmission of
the TN~
~ different and particularly useful technique for
transmitting changes of global status is the forced parity errl:,Z
propagation previously discussed in conjunction with Fig. 4. This
unique indication, interleaved among other transmissions, enables
discontinued system resources to be surveyed and appropriate
action to be undertaken.
Processor-to Processor Communications - There are two
forms of specific processor communication, one directed to an




-69-

35,~3

individual destination and the other directed to a class of pro-
cessors. Both types of transmissions utilize the DSW, and both
functions are carried out by broadcasting in the non merge mode.
When specifically communicating between an originating
processor and a single destination processor, a destination
processor identification (DPID) is used in the DSW. Referring to
Fig. 8, when this value is used to address the selection map
portion of the H.S. RP~I 26'' at each receiving processor module,
only that specific desired pro~essor module provides an affirmative
response and accepts the message. Transmission and ultimately
successful reception of the acknowledgment enable both processors
to take whatever future action is required.
I~hen a class of processors related to a control process
are to receive a message, the map nybl and map address within the
DSW specify the-corresponding section in the selection map portion
of the H.S. RAM. All receiving processors then transmit acknowl-
edgments, competing for access to the originating processor
module until the communication interchange is finally complete~
The full broadcast mode oE processor communication
may be used with prirnary data messages, as well as status, control
and response messages. The inherent capabilities of the priority
protocol and prioritizing network facilitate the interjection of
such messages into other message sequences.
The hashing mode of processor selection is predominantly
2S used in data processing tasks in a relational data base system.
Primary and backup disjoint data subsets are distribu-ted in
accordance with an appropriate algorithm among the di~ferent
secondary storages. When two processors respond concurrently,
because one is responsible for the primary and the other for
the backup, subset, the primary message will be prioritized. A




-70-

( -
~8~3

hic~her priority command code (see Fig. 12) can be cilosen to
ins~re this condition. Maintenance of the reliability and in-
tegrity of the data base is also achieved by usin~ the various
multiprocessor modes to best advanta~e for each condition that
arises. If, for example, the secondary storage having responsi-
bility for a primary data subset fails, it can be updated by a
specific processor-to~processor communication. An error can be
corrected or a part of the data base can be rolled back in similar
fashion or by operating in a class mode.
Transaction Number Example ~ The transaction number
concept provides powerful new hardware facilities for the control
of a multiprocessor system. In the present system, the transaction
number implements a "global semaphore", plays important roles in
sending and receiving messages on the network, and in rapidly
ascertaining the readiness of a given task distributed among
plural processors.
The physical realization of transaction numbers (TNs)
is as a set of 16 bit words in H.S. R~ 26. The words are format-
ted as shown in Fig. 12 to achieve a variety of functions. Be-

cause TNs are stored in ~i.S. RAM-, they can be accessed by both
the microprocessor 105 and the network interface 120.
Global Semaphore - The term "semaphore" has come into
common use in computer science literature to denote a variable
used for control of asynchronously executing processes. A sema-

phore has the property that it can be tested and set in one un-
interruptible operation.
Consider, for example, a semaphore variable with two
states: UN~SSIGNED and ~SSIGN~D. The test-and-set operation is

then definecl as: "if the semaphore is in the ~NASSIGNED st~te,
set it to the ASSIGNED state and indicate success; otherwise if



-71-

à~
3~3

the semaphore is already in the ~SSIGNED state, leave it in the
ASSIGNED state, but indicate failure." The semaphore thus permits
the process which successfully tested and set the semaphore to
proceed with i-ts tas~, where a process which fails must either
wait for the semaphore to be reset to the U~SSIGNED state or
try to test and set another semaphore controlling another equiva-
lent resource. It will be readily seen that if the test-and-se-t
operation could be interrupted, two processes could simultaneously
~ain access to the same resource, resulting in unpredictable
erroneous results.
Every multiprocessor system implements in hardware a
concept which can be equated to semaphores to con-trol access to
system resources. Prior art systems, however, can maintain only
one copy of the semaphore. It is desirable to maintain plural
copies of a semaphore, one in each processor, in order to reduce
contention for simple test only access to semaphores, and to use
multiple-valued semaphore variables for other purposes, as will
be discussed below. The problem is that multiple copies of a
semaphore must be manipulated in exact synchronism, or else the
access-to-resources integrity which semaphores are intended to
enforce will be lost.
A plural-copy, or "global" semaphore is provided by
the present system. The following table contrasts -the operations
on global semaphores with a simple semaphore:




-72-

Operation/Property Simple Semaphore Global Semaphore
states 2 (U~ASSIGNE~, ASSIGNED) n ~UNASSIG~ED, ASSIGNED,... ) J

number of semaphore 1 m (number of processors)
"copies"

test returns eithe~ ASSIGNED returns highest priority
or UNASSIG~ED. (numerically lo~est) state
of all m copies.

reset sets semaphore to sets all m copies o~ the
UNASSIGNED, regardless semaphore to UNASSIGNED,
of prior state. regardless of prior state.

test-and-set if semaphore is UNASSIGNED, if all m copies are UN- ~"
set to ASSIGNED and indi- ASSIGNED, set all m copies Cu' ~~
cate success - else if to ASSIGNED and indicate
semaphore is ASSIGNED, success; else if any copy
leave it alone and indi- is in any state other than
cate failure. UNASSIGN~D, leave all copies
unchanged and indicate failure.

'
. Q
5~`


In the present system, the ASSIGN TN and RELINQUIS~I
TN commands provide the test-and-set and reset functions res-
pectively on transaction numbers used as global semaphores.
With reference to Yig. 12, the NAK/TN error response pro~.des the
failure indication, where the SACK/ASSIGNED response provides the
success indication.
The nature of the network, including the synchronous
clocking scheme employed for nodes and the broadcast of the
priority packet simultaneously to all processors, is the basis
for implementiation of the global semaphore concep-t. With this
concept in place, the system can allocate, deallocate, and
regulate access to plural copies of any desired system resource
simply by associating the resource with a TN. It is important
to notice that the control of a distributed resource can be
effected with nearl~ the same modest software overhead of a
simple semaphore. This is a considerable advance on prior art
systems which either cannot manage distributed resources or
require complex software protocols and lead to hardware bottle-
necks.
State of Readiness The set of values BUSY, WAITING,
READY (send or receive), DONE, and NON-PARTICIPANT (refer to
Fig. 12) provide the capability to rapidly ascertain the state
of readiness of a task associated with a TN. In the present
system, the following table shows the meaning associated with
each state:
BUSY - The processor is working on the task in
question, and résults are not ready.
W~ITING - The processor has completed processing, and
is waiting for all other processors to

complete processing for this task.

--74--



SEND - Data is available in H.S. RAM for output
READY for this task (TN~.
RE.CEIVE - Space and other needed resources are available
READY in this processor to receive data for this
task (T~).
DONE - The processor has no further data to send
for this TN.
NON-PAR- ~ The processor does not have any processing
TICIPANT for this task.
INITIAL , - This TN has never been used since the processor
started.
A task is associated dynamically with a TN by use of
the ASSIGN TN command. Success (a SACK/ASSIGNED response to the
ASSIGN TN message) indicates that all active processors have
successfully assigned the TN to a task, Note from Fig. 11 -that
since the N~K/TN ERROR response has higher priority (lower value),
if any processor's network interface 120 detects a conflict in
the use of the TN, all processors see the failure response.
Further, the OPID toriginating processor ID) field of the
response on the network will indicate the first (lowest numbered)
processor with a conflict - a fact of use to diagnostic routines.
By software action, each processor will process the
task and will set the TN to BUSY, WAITING, SEND READY, RECEIVE
READY, DONE, or NON-PARTICIPANT as appropria-te. ~ny processor/
including the processor which issued the original ~SSIGN TN, can
readily ascertain the state of completion of the task (TN) by
issuing either a STATUS ~EQUEST or START MERGE command at any
time.
The STATUS REQUEST corresponds to a simple test o the
multipie-valued global semaphore, Notice from Fig, 11 that the




-7S--

; 7
15/ 33

highest priority status response (S~CK) message will win in con-
tention on the network, thus indicating the least s~ate of
readiness. Further, the OPID field will indicate the identity
of the first (lowest numbered) processor in the least ready state.
This latter property is used to implement the "non busy"
form of waiting for completion of a task distributed to plural
processors. The processor which originally issued the ASSIGN TN
is considered the original "wait rnaster". That processor then
designates some other processor the new "wait master" on any
arbitrary basis. When the new "wait master" has itself reached
the desired state of readiness, it interrogates all processors by
issuing either START MERGE or STATUS REQUEST. If all other
processors have become ready, the SACK will so indicate. If
some processors are still not ready, the OPID field of the SACK
response will indicate the first least ready processor. The
"wait master" instructs that processor to become the ne~ "wait
master". Eventually, all processors will become ready, but in
the meantime, the system only tries to interrogate status when it
is know~ that at least one processor has become ready. The
system is thus not burdened by periodic status interrogations,
which consume resources without producing results. Further, this
scheme guarantees that the system will know that all processors
have completed work at the exact moment when the last processor
to complete is done. Those skilled in the art will recogni~e that
many other "waiting" schemes are feasible within the context of
the invention.
The START MERGE command is a special kind of test-and-
set instruction. If the status of the global semaphore is SEND
READY or RECEIVE RE~DY, the Present Transaction Number Register
(PTNR) 204 (see Fig. 13) is set to the transaction number in the




-76-

(; ~


ST~RT MERGE message (see Fig. 3), thus setting the PTN~ register.
If any active processor is in a lesser state of readiness, the
PTNR will not be altered.
The STOP MERGE command is the corresponding reset
operation, and unconditionally resets the PTNR of all active
processors to TN0.
As discussed below, messages pertaining only to the
PTNR current global task are output by the network interface 120.
The START MERGE and STOP ME~GE commands thus provide the ability
to time multiplex the network among plural tasks, which can
arbitrarily be suspended and/or resumed.
An important detail of the present system is that the
network interface 120 must insure that a command from the net-
work and the microprocessor 105 cannot access a TN simultaneously.
lS In the present implementation this is accomplished by a signal
from the receive state controls 2~0 to the read/write state
controls 270 which is asserted whenever a command from the network
which could alter a TN is being processed. For this brief time,
access to the H.S. RAM is denied to the processor by the controls
270. Those skilled in the art will recognize that many alternate
implementations are possible within the scope of the invention.
Receive Control - ~nother function of the TN is control
of incoming messages. A given task can be associated with an in-
coming message stream on plural processors by the ASSIGN TN
command. When the TN for that task in a given processor is set
to RE OE IVE READ~, the TN then additionally indicates a count of
packets that processor is prepared to accept (Fig. 12). The net-
work interface 120 decrements the count tby arithmetically sub-
tracting 1 from the TN word) for every packet successfully re-


ceived, until the count reaches zero. ~t that time, a NACK/OVERRUN



-77-

8~,~3

response is generated, signaliny sending processors that they
m~st wait until the NACKing processor is ready to accept more
input. Note also from Fig. 18 that in this case, the PTN~ is
also reset to TN0.
This mechanism leads to straightforward implementation
of control of flow of packets through the network 120. It
guarantees that a processor will not become congested with un-
processed packets and thuS become a bottleneck to the system.
Send Control - Referring to Fig. 21, it can be seen
that each message in H . S . R~ contains a field for a new TN
vector value. Aftex a message is sent and the correspondin~
response received successfully, the new TN vector from the



mes~age just sent is stored in H.S. RAM at the address for the

Present Transaction (~rom the PTNR). The TN is thus updated for
every message sent~ and thus can automatically be set to any
desired state on successful transmission of a message.



Referring to ~ig. 12, the SEND READY TN format includes

a 14 bit address in H.S. RAM which is used to point to the next
packet to be output for the given task (~N). Thus, the TNs in

H.S. RAM also serve as head pointers to FIFO queue5 of messages
for various tasks. Within a given task (TN), each processor will
thus attempt to output lts packets in the serial order defined

by the New TN vec tor chains.
When combined with the facilities for rapidly multi-
plexing the network among TNs (tasks) discussed previously, it
becornes apparent that comple~ Sets of tasks distribu-~ed among

many processors can be rnanaged with minimal soEtware overhead.

The coac~ion of the network, in~erface, and processors provides

facilities to allocate, deallocate, suspend, resume, and other-




-78-

j,` (
385,~3

wise eontrol resources and tasks copies of which are distributed
arnong potentially hundreds or even thousands of processors.
DSW Examples - The destination selection word (Fig. 3)
coacts with DSW logic 190 (Fig. 13) ancl the DSW section o~ H.S.
RAM 26 (Fig. 8) to provide several modes by which the network
interface 120 of each receiving processor ean rapidly determine
if the message being received is intended for processing by the
associated microprocessox 105. ~s described above, the DSW in
the received message hoth selects and is compared to a nybl in
the DSW section of H.S. RAM.
Processor Address - As shown in Fig. 8, a part of the
DSW section of H.S. RAM is devoted to processor address selection
nybls. In the present system, each of the 1024 possible proces-
sors is associated with a bit address in this part of l-l~S. ~AM.
The bit is set to 1 at the bit address which corresponds to the
processor's ID, all other bits in this section are set to 0.
Each processor thus has only one bit set in this section.
Hash Maps ~ Ano-ther part of the DSW section of H.S. RAM
is devoted to hash maps. In the present system, two of the map
seleetion bits are devoted to hash maps, giving two complete sets
of 4096 possible values. In the hashed mode, keys to records
stored on secondary storages are put through a hashing algorithm
whieh results in a "bucket" assignment between 0 and 4095. A
processor which has responsibility ~or a given "bucket" of records
has a 1 bit set in the map bit whose address corresponds to the
bueket number. Other bits are set to 0. A given processor can
be assigned responsibility for plural buekets simply by setting
plural map bits.
In the present implementation, it is readily appreeiated
that map bits ean be set so that for a given map seleetiotl bit,




-79-

``\ ~; (~
5,~3

each bit address is set to 1 in only one processor, ancl further
that every bit address is set to a 1 in some processor. As a
direct consequence, each pxocessor (~MP) is responsible for a
distinct disjoint subset of the records of the data base, and
further, across the entire system, a complete set of records
exists.
~ lthough the present example is couched in terms of
the relational data base problem those skilled in the art will
readily appreciate that the same technique can be applied to any
problem domain where disjoint subsets of the problem can be
assigned to different processors in a multiprocessor complex.
It is further worth noting that with two complete maps,
the scheme described above can be arranged so that buckets as-
signed to a given processor in one map can be assigned to dif-

ferent processor(s) in the other map. I~ one map is consideredas "primary" and the other as "backup", then as a direct conse-
quence, records which are primary on a given processor can be
guaranteed to be backed up on other processor(s). Further, the
number of processors which provide backup to a given processor is
cornpletely flexible.
Those skilled in the art will recognize that the number
of distinct map sets which can be realized within the scope of
this invention can be greater than two, and further, that the
number of buckets can be any value.
Class - In both of the previous e~amples, examination of
a given bit address in every processor shows that that bit address
is set to a 1 in only one processor; the corresponding bit address
in all other processors is set to 0~ ~lowever, it is possible and
useful for a corresponding bit address to be set to 1 in plural
processors. This is referred to as "class address" mode.




-80-

. . ~ .

135~3

The class address is thought of as the name of a
process or capability, a copy of which exists in plural proces-
sors. Any processor which has the process or capabili-ty in
question has a 1 bit set in the corresponding bit address.
A messaye is sent to a class address by setting that
class address in the DSW (Fig. 3). All active processors which
"belong" to the class, as indicated by a bit set to 1 at the ap-
propriate spot in H.S. RAM will respond to ~he packet with an ACK.
Processors which do not belong to the class respond with N~P.
ThelDSW thus provides in hardware most of the routing
calculations needed to control the flow of messages in a multi-
processor system. Further, programs can be independent of
knowledge of in what processors various capabilities of the
system reside. Since th~ maps are part of l~.S. RAM, and thus
can be accessed by the microprocessor 105, it is further possible
to relocate ~ capability from one processor to another dynamically.
Merge Example - In complex multiprocessor systems, tasks
may require a series of interrelated actions to be undertaken.
This is particularly true in a relational data base system
handling complex queries, in which reference to a number of
secondary storages may be needed to assemble data into a file that
can then be redistributed in a particular way to a numbex of
processors. The following example briefly delineates how the
system of Figs. 1, 8 and 13 can readily carry out these functions,
by manipulating the TNs, DSWs and global semaphores.
First, a merge coordinator, typically but not neces-
sarily an I~P 14 or 16, identifies a class of AMPs (from AMPS
18-~3) that are to merge a file, acting as data sources. ~n
unassigned TN is selected and assigned to identify the da-ta

source function. The second principal function, of distributing

,~
~3135~3

or hashing the file to another set of ~MPs (which may be the
original processors) is referenced to a different hitherto un-
assigned TN.
The coordinator for the merqe function uses the D~W
~o identify the class of processor modules that are to perform
work in merging of a file related to the first T~. Each par-
ticipant elevates the status of its TN to a busy or waiting status,
and control of the merge operation is then passed over (assigned)
to one of the participants in the merge. After reception and
ac~nowledgmen,t of the message packets pertaining to the merge
task that has been defined, each participant (all o~her processor
modules are nonparticipants for that transaction number) carries
forward with its subtask, updating its status level as appro-
priate. Consequently, when the assigned merge coordinator reaches
the end oE its task, it can request status from all other partici-
pants as to that transaction number, and receive a response that
indicates the least ready of the participants. Control of the
merge can be passed to the least read~ respondent, which can then
poll all of the other participants at such time as its work is
completed, and this process can be repeated if necessary until a
response has been received that indicates that all participants
are ready. The then active coordinator can then, using a DSW to
identify the class of participants, initiate the transfer of mes-
sages to ~.5. R~M 26, accompanied by update of the status levels
to "send ready" with appropriate output message vector in~ormation.
When subsequent polling reveals that all participant AMPs are in
a send ready state~ the coordi.nator issues a start merge command
for the specified TN.
During merge, the processed data packets are to be
directed to the class of processor modul.es which are then to




-82





distribute the ~esults to secondary storage in accordance with
the relational data base. Whether or not the receiving proces-
sors are the same as the now originating processors, the class of
participants in the distribution are identified by a DSW and the
transaction is identified by a new TN. All participants in the
new transaction will have been assigned that TN and will have
elevated their readiness state to "receive ready". The DSW can
be a hashing selection designation ins-tead of a class designa-tion,
but in any event all of the participants are enabled to receive the
broadcast messages during merge. Initiation of the start merge
is followed by concurrent transmission of the message packets
by each sending participant on the network with dynamic pri-
oritizing taking place. When it completes its set of messages,
each sending participant tries to send an identical End of File
message of lower priority than the data messages. The End of
File messages lose in contention with data messages, until such
time as all participants send End of File, when the End of File
finally carries through. The coordinator then transmits an End
of Merge message, and can follow this with a ~elinquish TN which
completes the transaction. Overrun, error or lock conditions can
be treated appropriately by reinitiating the merge or the trans-
mlsslon .
When a merge has been completed relating to one TN, the
system can then shift to the next successive TN in the sequence.
2S Each processor module will have queued the appropriate message
packets for that TN and can agaln commence trying the network to
carry the merge operation forward. Separate in-tra-processor merges
accompanied by ef~icient use of the network merge enables the
system to carry out extremely large sort/merge tasks with marked

superiority to prior art systems. The time to sort a file in a



-~3-

,'3

system in accordance with the i~vention haviny n records and m
processors can be shown to be:
Cl -m log2 m ~ C2n
where C2 is a constant, estimated for the present implementation
to be about 10 microseconds for 100 byte message; and Cl is a
constant estimated to be about 1 millisecond for a typical 16 bit
microprocessor. The approximate sort/merge times, in seconds,
for combinations of n and m under different relationships are
shown in the followlng table, based on 100 byte records:


\\ .
\
\
~\

'\


\~
\
\,
\~
\\




\\\
\

The OCR engine was not
able to convert this image.




Comparing these examples to prior art systems is not
readily feasible because ~wo interrelated sorting sequences
(processor and network) are involved and because few systems even
exist that have this capability. Moreover, the systern sorts and
S rnerges long and variable length messages whereas most sort capa-
bilities are rated in terms of bytes or words.
A significant further factor is that the present system
is truly a multiprocessor and not dedicated to sort/merge opera-
tions. It can shift with complete flexibility between merge and
non-merge operations, ~oth locall~ and globally, and do so without
software penalties or loss of system efficiency.
Task Request/Task Response Cycle Example - Any processor
14, 16 or 18-23 on the network 5~, referring to Fig. 1, can
formulate a request, in the form of a message packet, for a task
to be performed-by one or more other processors. In the relational
data base system, most of these tasks will originate with the
host computers 10, 12 and be entered into the system via the
inter~ace processors 14, 16, but this is not a necessary condi-tion.
The message packet that is ~ormulated is put into competition on
the network 50 with packets from other processors to gain priority
at a time dependent upon the priority level of other t~sks and
the level of activity on the processor. The tas~ may be defined
by a single message packet or continuation packets that are as-
signed higher priority levels within the data message yroupings
(see Fig. 11) so as to minimize delay in the reception o~ the
continuation portion.
The message packet contains the transaction identity in
the forrn of the transaction number, which inherently distinguishes,
by the choice made, bet~een the nonmerge or default mode (TN0) and
the merge mode (all other TNs) as to the manner in which results




-86~

3523

are to be derived. Further the message packet contains the DSW,
which implicitly identifies the target processor and the mode of
mul-tiprocessor operation, whether by specific processor identifi-
cation, plural processor class or hashing, in this instance to a
part of the relational data base. A message packet, being broad-
cast through the network 50 to any target processor is locally
accepted at that processor and the reception is acknowledged by a
response. All processors 1~, 16, and 18-23 provide a concurrent
response following the EOM to the network 50, but the ACK from
the target processor(s) gains priority and is received by the
originating processor.
The target processor or processors then asynchronously
perform the processing required by the request pac)cet, at such
time as the message is transferred through the local H.S. R~M 26
and interface 1~0 (Figs. 8 and 13) into the local microprocessor.
For a relational data base task, the DSW typically references a
part of the disjoint data subset stored at the associated disk
drive although tasks sometimes may be performed that require no
reference to the data base. Any specific operation or algorithm
may be executed by the individual processor, and if plural proces-
sors have been targeted, each can work on a disjoint subset of
the overall task. A variable length message packet is structured
such that the request can specify the operations to be performed
and the files to be referenced in the data base system. It
should be noted that there may be numerous message packets in
conjunction with a given task, and to provide adequate sorting
distinctions within the network 50, the optional key field
(Fig. 3) becomes important.
The task response packet that is genera~ed by each
processor that is to respond is shiEted from the microprocessor




~87-

( `: (


into the local ~I.S. ~M 26 via the control logic 28 of Fig. 1
where it is stored in the send message format of Fig. 21A. Where
the task response requires continuation packets to be employed,
these are transmitted a~ter the lead packet but yiven the higher
continuation priority. Where the system is operating in the
merge mode and each processor generates a number of packets
pertaining to a transaction number, the packets may first be
locally chained together in a sor~ order for subsequent merge
into a global sort order on the network 50.
The'task result packet or packets are sent fxom the
processors 14, 16 and 18-23 to the network 50 in concurrent
groupings, with a single priori-ty message packet being broadcast
back to all processors after the predetermined network delay.
The transfer of these task result packets may, dependent upon the
nature o~ the task, be targeted ~or the original request or to
one or more other processors, and may be transferred in accordance
~lth any one of the multiprocessor modes. The most general case
in the relational data base system is a concurrent merge and re-
distribution using hashing for destination selection. Thus it
is seen that each processor can act as an originator~ coordinator
or responder, or all three, in a task request/task response cyGle.
Because multiple task request/task response cycles are involved,
the processors 14, 16 and 18-23 and the network 50 are multiplexed
among the tasks, but ~he multiplexing is done on a priority as
well as a time basis.
Complex Query Example - In the relational data base
system, ~sing the host computers io, 12 and a distribution of the
relational data base among the disk drives 38-43 in accordallce
with an algorithm definin~ the tuples and the primary and backup
disjoint data subsets, complex queries are entered into the




-88-

q
(~?


system from the host coMputer 10 or 12 at the IFP 14 or 16. The
message packet is first parsed by the I~P 14 or 16, in order to
transform the message from the host into task requests for the
~MPS 18-23. At the outset, the IFP 14 or 16 may be required to
initiate request packets to derive information from one or more
specific ~MPs to obtain -the system data that is needed to parse
the message from the host. ~aving the data needed for processing
the request, the IFP 14 or 16 may then require any number of
task request/test response cycles with the P~IPs 18 23, and
actually proc,ess the data and satisfy the request from the host.
In the sequences, the task request and task response cycle
enumerated above are utilized, and may be of arbitrary length.
The ~FP 14 or 16 then communicates with the host via the IFP
interface. This response may simply provide the data needed for
the host computer 10 or 12 to generate another complex query.
STAND ALONE MULTIPROC~SSOR SYSTEM
The primary example of a system in accordance with the
invention, discussed above in conjunction with Fig. 1, exempli-
fies a backend processor that may be utilized in conjunction with
a host computer and its existing software package. ~s noted
above, however, the present invention is of particular advantage
for a wide range of processing applications, particularly those
of the type in which processing tasks may be readily subdivided
and distributed without requiring massive central processing
powerO Fig. 20 illustrates one simplified example of a stand
alone multiprocessor system in accordance with the invention. In
Fig. 20l all of a number of processors 300 are coupled by inter-
faces 302 to an active logic network 304, such as previously
described. It is understood that redundant active logic networks
30 304 may be utilized for data integrity. The processors 300 may

~89~

~ -`` ~

5~3

again use 16 bit microprocessor chips, and incorporate adequately
large main RAM memory. Only nine processors 300 are shown, each
associated with a different peripheral, in order to depict the
versatility of the system. In actuality the system becomes far
more efficient with many more processors on the network, although
particular advantages can be gained in system reliability and
data integrity even with relatively few processors.
In the present example, the processors 300 may be
physically separated by convenient distances, inasmuch as the
maximum inter~ode spacing of 28 feet for the stated data rates
allows a large array of processors to be utilized on one building
floor or on several adjacent floors without undue crowding.
In the stand alone system the peripheral controllers
and peripherals themselves can vary much more widely than in the
backend processor example discussed above. It is convenient to
assume that each input and output device is associated with a
separate processor 300. For example, a~ I/O terminal 310 in-
cluding a keyboard 312 and display 31~ is coupled to its respective
processor 3~0 by a terminal controller 320, even though a 16 bit
processor might control a substantial network of relatively slow
terminals. The I/O terminal shown merely illustrates how manual
keyboards and other manual entry and processing devices may be
tied into the system~ Using the processing power of the proces-
sor 300, the terminal 310 can comprise a word processor, and it
can communicate via the network 304 with a data base, other word
processors, or various output devices. Large secondary memories,
such as a rigid disk drive 322 may be coupled to their respective
processor by disk controllers 324, it ~eing understood that more
disk drives or a different form o mass memory could be used in a

large system. Output devices, such a-s a printer 326 and a plotter



~90-

!

~3 .,
3~ 3

330 interface with their processors 300 via a printer controller
328 and plotter controller 332, respectively. Interaction with
other systems, not shown, is via a communications system 336,
such as a TTY or one of the larger networks (e.g. Ethernet),
through a communications controller 338. Some processors 300
may be coupled to the network 304 without a peripheral device
(not shown).
Bidirectional data transfers may take place using a
tape drive 3~0 and tape drive controller 342, and a floppy disk
drive 344 wit~h its associated controller 34G. Tape drives not
only provide large storage capacity for on line use, but can
also be employed for disk drive backup. For this purpose tape
is used to retain data that was stored, as o~ a fixed point in
time in a sealed rigid disk system. Because these backup
operations are usually performed during low load periods (e.g.
nights or weekends) long "streaming" transfers can take place
through the network 304. Similarly, because ~loppy disk drives
344 may be used for program entry during system initialization,
some ne-twork time is available in these modes for transfers o~
substantial amounts of data. An optical charactcr reader 350
provides another source of input data for the system through its
controller 352. Finally, peripheral units, labelecl simply
"other devices 354" can ~e coupled to the system via a controller
356 to provide further capabilities as needed.
Using the technique of concurrently transmitting mes~
sage packets from the various processor modules, and prior1-~izing
these message packets into a single or common message packet that
is concurrently broadcast within a predetermined fi~ed time

interval back to all processor modules, each processor -that is on
line can have equal access to other modules in the system.



-91-

~3~35:~3

Because thc global semaphore system using prioritizcd transaction
numbers and readiness indications, as well as destination selec-
tion entries in the messages, permits any processor to act as
controller, the system can function in either a hierarchical or
non-hierarchical manner. It is highly significant that the system
can be expanded, or contracted, without requiring software elabo-
ration or modification.
~ here access for messages substantially longer than
those previously described but still relatively limited in length
is required, this can still be accomplished. For example complex
computer graphics systems (not shown) may require access only to
particular parts of an extensive data base to generate sophisti-
cated two dimensional and three dimensional representations. A
word processing system may need, from the data base, only small
sequences of data at any one time because of the slowness of the
human operator. In these and other situations the capability of
the system for handling variable length messages and giving
priority to continuation messages can be of benefit. While situ-
ations that call for concentrated processing power and extremely
long message transfers impose limits on use oE the system, there
are other circumstances in which it functions to much greater
advantage. Any dynamic situation involving the manipulation of
different data forms and inherent sorting or merging functions
falls in this category. Management decision making involving the
gathering, collation and analysis of complex data is one such
example, while the preparation and assembly of the visual and
graphical input for a periodical is another.
CONCLI~SION
It will be immediately evident to those skilled in the
art that the system of Fig. 1 is expandable to incorporate any




-92-


number of processors (up to a practical limit imposed by data
transfer capacity) without any need for software modification.
It will also be appreciated that the system circumvents in large
measure the need for supervisory and overhead software to ascertain
the status of units, to establish task and processor priorities
and to assure efficient usage of processor capability.
Clear benefits are derived for data base systems and
for other systems in which it is similarly feasible -to su~divide
the overall task into subtasks that can be processed independently.
In the relational data base context for example a dramatic in-
crease in secondary s-torage capacity requires only that the ad-
ditional data base be properly integrated into the primary and
backup structure. Stated in another way, the network can be in-
definitely expanded because being based upon a binary progression
with standardized intersections or nodes ~here is no change with
expansion in the functions performed at each individual node.
Further, no setup sequence or external control of node operation
is required. Consequently if a system in accordance with the
invention is coupled as shown in Fig. 1 to serve as back-end
processor for one or more host computers, the system user can
arbitrarily expand ~or contract) the data base without changing
either operating system software or applications software. From
the standpoint of the host processor system the back-end proces-
sor is transparent, whatever its configuration, because its
interaction with the host processor remains the same. Conversion
of`the back-end processor to service a different host processor
system requires only that the IFPs converse properly with the new
host processor channel or bus.
The network configuration in one exemplification enables
up to 1024 microprocessors to be used in a single array without




~93-

:~9~5,~


excessive delay in transferring messages through the network or
encountering undue delays due to interprocessor competition~
Those s~illed in the art will readily see how to extend the
example described to more than 1024 processors. For a system
using 1024 processors, the maximum line length between active
nodes in a practical example is found to be 28 feet, which
presents no problems in configuring an array. The delay intro-
duced by the network is constant for all messayes, and is 2 1~ N,
where~ is the byte clock period and N is the number of tiers in
the hierarchy~. As is evident, doubling of the number of proces-
sors by adding another tier imposes only a minimal increase in
the delay. secause data messages will necessarily be long (of
the order of 200 bytes) and because the prioriti~ing is done
during data transfer along the network with respect to all com-

peting messages, the network can carry data messages with muchhigher utilization percentages than prior art systems.
Important economic and operative aspects of the system
also derive from the fact that standardized active logic circuits
are used instead of software or even firmware in the network
system. Uslng modern LSI and VLSI techniques, reliable circuitry
can be incorporated at low cost in relation to overall processor
and peripheral equipment costs. Expenditures of time and money
for software are limited to the important aspects that per-tain
to data base management or other problem-domain tasks. For ex-

ample, the system organization is such that all of the functionsneeded to maintain the integrity of a data base can be carried
out within the message packets and networ~ configurations. S~ch
functions as polling, change of status and recovery of data are
provided for within the system.




-94-

-` ~

~9~52~

~ n important further consideration is that this network
compares very favorably to the conventional ohmic wire bus in its
capability for rapid data transfer. Message packets are trans-
mitted concurrently for a time, and transferred while priority is
being determined, so that delays for conventional status requests,
responses and priority determinations are avoided. Moreover, the
in~er-nodal couplings can be kepk to less than a predetermined
length, even for an extremely large number of processors, so that
propagation time through the bus is not a limitation on data
transfer rate.
It has been established that the system approaches the
optimum in terms of efficiency of usage of the microprocessors
and the network as well. What is important is that all micro-
processors be kept busy and that the network be utilized to full
advantage. The~IFP-network-A~lP configuration effectively makes
this possible because microprocessors whose transmitted message
packets lose in contention for priority simply retry at the
earliest feasible time, thus helping to assure a high bus duty
cycle. The high speed random access memories contribute further
to this result, because they accumulate both input message packets
to be processed and output messages to be transmitted, and thus a
backlog of work is constantly available for each microprocessor
and a backlog of message packets are also available for the
network. When all input buffers are filled, a processor indicates
that ~act on the network. Further, when the input buffers used
in an IFP to receive messages from a host computer are full, an
indication to that effect is placed on the channel. Both inter-
nally and e~ternally, therefore, the system is self pacing.
The system is realized in such a way, employing both

the architecture and the message organization, as to carry out



-95-




numerous other functions necessary to a versatile multiprocessor
system. The prior art, for example, devotes considerable at-
tention to schemes for assessing and monitoring changes in the
status of the global resource. In accordance with the present
invention, however, the parity channel alone is arranged and
utilized as the means for communicating both the existence of
parity errors and the fact of a change in processor availability.
Shut down of one or more processors is substantially immedia~ely
con~unicated throughout the system such that an interrupt sequence
can be initia~ted. Using the prioritized sort of responses, the
nature of the change in global capability can be identified with
far less circuity and system overhead than previously.
The single query, prioritized global response that is
achieved by the use of global semaphores and the active logic
network has profound systems implications. Obtaining an unam-
biguous global result by a broadcast query in this fashion bypasses
any need for complex software and overhead. Distributed update
and other status identifying operations can be realized even
though numerous simultaneous operations are taking place at
different processors.
The system also provides, by using the network, trans-
action numbers, global semaphores, and destination selection
words, superior capabilities for distributing work and collecting
results in a multiprocessor system. A number of multiprocessor
modes and control messages are available and priority levels are
easily defined and changed using only the priori~y protocol.
The capability of simultaneously broadcasting -to all processors,
in conjunction with message sorting in the network, insures that
any processor grouping or individual processor can be reached




-96--

{
5,~

and that processed results can be derived in orderly fashion.
Cornplex queries to a relational data base system can thus
initiate any sequence needed for a data base operation.
Another advantage oE the system lies in the redundancy
which can readily be built into a multiprocessor, such as a
relational data base system. Dual networks and dual interfaces
provide redundancy which permits the system to continue to operate
if one network fails for any reason. The distribution of the
data base into disjoint primary and backup subsets reduces the
probabiiity o,f data loss to a minimal level. Where a failure or
change does occur, the integrity of the data base can be maintained
because of the versatile controls tha- can be used.




-97~

Representative Drawing

Sorry, the representative drawing for patent document number 1198523 was not found.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date 1985-12-24
(22) Filed 1982-03-25
(45) Issued 1985-12-24
Expired 2002-12-24

Abandonment History

There is no abandonment history.

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Application Fee $0.00 1984-09-05
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
TERADATA CORPORATION
Past Owners on Record
None
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Drawings 1993-06-22 25 704
Claims 1993-06-22 13 534
Abstract 1993-06-22 1 27
Cover Page 1993-06-22 1 22
Description 1993-06-22 103 4,543