Note: Descriptions are shown in the official language in which they were submitted.
CA 0220~708 1997-0~-20
Process for interconnecting the nodes of a real time parallel
computer
The present invention relates to a process for
interconnecting the nodes of a real time parallel computer,
in which use is made, as an interconnection system, of a
local switching network with virtual channels, each virtual
channel being associated with a given pass band, so as to
ensure end to end control of the transmission latency of a
message to be transmitted on a real time basis, in a chain
including a transmitting node with a main processor and an
interconnection interface adaptor, a receiving node with a
main processor and an interconnection interface adaptor, and
at least one switch connecting the said interface adaptors.
It also relates to an interconnection interface adaptor for
implementing the said process.
The present invention relates to the field of parallel
(multi-node) computing systems and, more especially, to those
used in distributed real time environments. The process
according to the invention applies whenever a parallel
computing system requires a high bandwith for data transfers
between nodes, at the same time as low deterministic latency
for real time communications.
It is known that, in a system comprising a plurality of
calculating nodes connected by an interconnection system, it
is necessary to exchange control type and data type
informations, via messages conveyed by the interconnection
system.
A first method consists in connecting the computing
nodes by a specific, local communication network, providing
a fixed, regular network topology, of the toric, matrix, etc.
type. Such interconnection systems are used in machines of
the MPP (Massively Parallel Processor) type.
This method generally permits the interconnection of a
fairly large number of computing nodes with good performances
in terms of bandwith. The main drawback of this type of
CA 0220~708 1997-0~-20
method is that the latency of message transmission, from end
to end, is very poorly controlled as the data type
information is transferred in large-sized blocks to make use
of the network pass band, and the control messages are not
differentiated.
In addition, these interconnection systems do not permit
real geographical distribution of the computing nodes. Thus,
remote data acquisition can only be accomplished through
input/output interfaces, which makes local processing of the
data impossible, or by a local network connected to one of
the nodes of the computer, which further increases the
latency of the transfers for such data acquisition.
A second approach, used at present, is to place
computers in a network, this method being known as the
" cluster " method. In this method, the priorities of
transferred data are neither transmitted nor processed by the
network, which makes impossible to control the overall
behaviour of the system. In addition, the operation of the
hardware necessitates the assistance of software to control
the interconnection network and to ensure reliable transport
of the information, which multiplies the latency of the
transmissions by a large factor and does not permit the
control over latency necessary for real time operation.
The main object of the present invention is thus to
overcome these drawbacks and, to do so, it provides a process
enabling low, deterministic latency to be obtained between
the different nodes of a real time parallel computer using
the switch and the physical interface of a local switching
network as an interconnection system.
This process is essentially characterised in that it
comprises the steps of :
- allocating, upon initialisation of a real time
application on the main processor of the transmitting node,
virtual channels with a bandwith available for the intensive
data to be transmitted and virtual channels with a bandwith
CA 0220~708 1997-0~-20
reserved for the data to be transmitted on a real time
basis ;
- allocating a priority level to each message to
transmitted on a real time basis ;
- segmenting the intensive data and the real time data
into blocks of data, adding to each block of real time data
the priority information of the corresponding message ;
- transmitting, over the adaptor of the interconnection
interface of the transmitting node, the blocks of data
corresponding to the intensive data at a speed corresponding
to the available bandwith of the virtual channel under
consideration and the blocks of data corresponding to the
data to be transmitted on a real time basis at a speed
corresponding to the reserved bandwith of the virtual channel
under consideration, while observing the priority level
allocated to each block of data ;
- re-assembling the blocks of data in the adaptor of the
interconnection interface of the receiving node in order to
reconstitute the messages ; and
- transmitting the latter to the main processor of the
receiving node, while observing the priority level of the
said messages.
Thus, the real time aspects are taken into account from
end to end. Among others, latency determinism is obtained
thanks to differentiation between data messages and those
ensuring the control of the execution of the software from
the node transmitting the message, through its interface with
the communication network, the network itself, and the
interface with the network of the receiving node, to the
receiving node. This process remains valid when the data
transferred is voice, image or video data.
Finally, the proposed approach permits identical
interconnection for internal and external communications with
adaptation of the rates as a function of requirements.
Integration in a communication network is thus simple, while
CA 0220~708 1997-0~-20
providing both internal and external real time behaviour.
It will be noted, moreover, that the process according
to the invention makes it possible to place certain nodes of
the parallel computer at remote locations, in particular data
acquisition nodes, according to the capabilities of the local
communication network.
According to the invention, an interconnection interface
adaptor for implementing the said process is essentially
characterised in that it includes :
at the transmission end
- means for placing the real time messages, created by
the main processor of the transmitting node and pre-allocated
a priority level, in message transmission queues taking
account of each virtual channel used and of the priority
level inside the said channel ;
- arbitration means for selecting the next message to be
transmitted and the corresponding virtual channel, as a
function of the priority level of each message ;
- means for segmenting the message to be transmitted
into blocks of data comprising the priority information of
the corresponding message ;
- means for formatting the blocks of data ;and
- means for transmitting the said blocks of data over
the local switching network comprising at least one switch ;
and, at the reception end
- means for receiving the blocks of data that have
transited via the said local network switch ;
- means for re-assembling the message from the blocks of
data received ; and
- means for placing the reconstituted messages in
reception queues, taking account of each virtual channel used
and of the priority level of the message inside the said
channel prior to transmitting them to the main processor of
the receiving node.
One form of embodiment of the invention is described
CA 0220~708 1997-0~-20
hereinafter by way of example, with reference to the annexed
drawings, wherein :
- figure l very schematically represents the
interconnection of the nodes of a parallel computer by means
of a local switching network ; and
- figure 2 schematically illustrates the different steps
in the transmission of a message between two nodes of figure
l, according to the process of the invention.
With reference, first of all, to figure l, we can see a
plurality of nodes A, B, ... X, Y of a real time parallel
computer, interconnected by means of a local switching
network including a network of switches l and of physical
links such as 2A~ 2B ... 2x and 2y.
Each of nodes A, B, ... X, Y includes a main processor,
3A~ 3B~ ... 3X~ 3y~ respectively, which accesses the local
switching network by means of an interconnection interface
adaptor, 4A~ 4B~ ... 4X~ 4y~ respectively, via an input/output
bus, 5A' 5B~ ~-- 5X~ 5y~ respectively. According to the
invention, each adaptor 4A~ 4B~ ... 4X~ 4y provides a medium
for real time communications, in addition to the set of
standard characteristics proper to the local switching
network.
It should be noted that, in the particular form of
embodiment of the invention described herein by way of
example, the local switching network used is of the ATM
(Asynchronous Transfer Mode) type. In such an ATM network,
communication between nodes takes place by means of virtual
channels using physical links 2A~ 2B~ ... 2x, 2y, and with
which are associated " Quality of service, known as QoS "
representing guarantees of bandwith or data rate parameters.
According to the invention, the Constant Bit Rate grade
of service, known as CBR, is used for the transfer of data
having real time characteristics, and the Available Bit Rate
grade of service, known as ABR, is used for the transfer of
intensive data, which makes it possible to use the priority
CA 0220~708 1997-0~-20
mechanisms of switches l to differentiate between these two
types of data and to ensure that the priority handling
process for real time data is compatible with the standard
operation of the said switches.
5We shall now describe the process for transferring real
time data between two nodes of the parallel computer, for
example between node A, operating as a transmitter, and node
B, operating as a receiver, with more particular reference to
figure 2.
10When a real time application that is run on processor 3A
of node A creates a message for transmitting real time data
to the real time application that is being run on processor
3B Of node B, it transmits, in the header of the message, the
current priority allocated to the said message. This is
15ensured by the operating system or by the application itself.
The different messages thus created are then queued in
transmission queue queuing means 6A. According to the
invention, a message queue is used for each CBR virtual
channel, VCl..., Vci VCn, respectively, and for each priority
level inside the said virtual channel, Pl... Pi... P
respectively.
It is to be noted here that ABR virtual channels, which
do not convey real time data, but only intensive data, do not
need to be organised according to priority. The intensive
Z5 data is thus transmitted between the different nodes of the
parallel computer in the same way as the real time data, as
a function of its particular Quality of Service, without
priority considerations.
The message to be transmitted is selected thanks to
arbitration means 7A. These arbitration means first select
the virtual channel, observing the Quality of Service
parameter, as specified in the operation of the local
switching network under consideration, in this case the ATM
network, and then, according to the invention, select from
the transmission queue of this virtual channel the message
CA 0220~708 1997-0~-20
having the highest level of priority.
Supposing, for example, that virtual channel Vci has
been selected ; arbitration means 7A will then select, from
this channel VCi, the non-empty message queue having the
highest priority, for example queue Pi. The message selected
will then be the message first to enter the said queue Pi.
It will be noted that each message the transmission of
which has been interrupted in favour of a higher priority
message has its references preserved in the message
transmission queue of the virtual channel that is allocated
thereto, this remaining so until the message has been
transmitted in full. Supposing that there was a message being
transmitted in the case of this virtual channel VCi, this
message being of a lower priority than the message in queue
Pi; the references of this message are preserved in the queue
of corresponding priority, as well as an address indicating
the part of the message still to be transmitted. This message
will be selected again by the arbitration means 7A when all
the queues with a priority higher than its own for the said
virtual channel VCi are exhausted.
The message selected by the arbitration means 7A is then
segmented into cells or blocks of data by segmenting means 8A
so as to ensure transfer of the message cell by cell.
According to the invention, and in the case of a message to
be transmitted on a real time basis, the useful load of the
cell or block of data is reduced in order to insert the
priority information of the message, as given by the
application and used for placing in the transmission queue 6A
and arbitration 7A.
The cell is then formatted at 9A~ according to the
specification of the local switching network under
consideration, in this case the ATM network, in such a way
that it can be transmitted correctly by transmission means
1OA over the physical links, such as 2A~ and correctly
processed by the switches of interconnection network l, with
CA 0220~708 1997-0~-20
the corresponding Quality of Service.
It should be noted, moreover, that the cells or blocks
of data corresponding to the intensive data are transmitted
over adaptor 4A at a speed corresponding to the available
bandwith of the ABR virtual channel under consideration,
while the cells corresponding to the real time data are
transmitted over the said adaptor at a speed corresponding to
the reserved bandwith of the CBR virtual channel under
consideration.
After passing through the network of switches l, the
cell arrives at adaptor 4B ~f receiving node B via physical
link 2B. When the reception means 11B receive a cell, they
separate the useful load from the control information,
including the priority information. This information is then
used to re-assemble the message in re-assembly means 12B.
According to the invention, one message queue is used for
each CBR virtual channel and for each priority level inside
the said virtual channel. All the messages can thus be re-
assembled simultaneously whatever the order of interleaving
of the cells between the messages.
When the complete message has been re-assembled, its
reference is placed in a reception queue 13B organised on a
priority basis, according to the priority of the message.
This priority is then taken into account by the real time
application running on processor 3B Of receiving node B. It
will be noted that, at a given moment, the reception queue
means 13B can contain several messages originating from
different nodes and having different priorities.
Obviously, all the adaptors 4A' 4B~ 4x, 4y of the
different nodes are of identical design and contain the means
necessary both for transmission and reception of the
messages.
It can be seen, in the final analysis, that the process
according to the invention for interconnecting the nodes of
a parallel computer enables the message cell with the highest
CA 0220~708 1997-0~-20
priority to be transmitted at any time, in observance of the
Quality of Service and of the priority of each message, given
by the applications that are run on the different processors.
This process thus makes it possible to obtain low,
deterministic latency for transmission of the real time
messages between the different nodes of the computer.
It will be noted, moreover, that such an interconnection
process makes it possible, since a local network is used, to
place certain nodes of the parallel computer at remote
locations, in particular the data acquisition nodes, which
makes gives the system considerable flexibility.