Patent 2515283 Summary

(12) Patent:	(11) CA 2515283
(54) English Title:	MULTIPROCESSOR COMPUTER ARCHITECTURE INCORPORATING A PLURALITY OF MEMORY ALGORITHM PROCESSORS IN THE MEMORY SUBSYSTEM
(54) French Title:	ARCHITECTURE D'ORDINATEUR MULTIPROCESSEUR INCORPORANT PLUSIEURS PROCESSEURS ALGORITHMIQUES DE MEMOIRE DANS LE SOUS-SYSTEME DE MEMOIRE
Status:	Expired

Bibliographic Data

(51) International Patent Classification (IPC):	G06F 15/76 (2006.01) G06F 12/00 (2006.01) G06F 13/00 (2006.01)
(72) Inventors :	HUPPENTHAL, JON M. (United States of America) LESKAR, PAUL A. (United States of America)
(73) Owners :	SRC COMPUTERS, LLC (United States of America)
(71) Applicants :	SRC COMPUTERS, INC. (United States of America)
(74) Agent:	GOWLING WLG (CANADA) LLP
(74) Associate agent:
(45) Issued:	2011-04-05
(22) Filed Date:	1998-12-03
(41) Open to Public Inspection:	1999-06-24
Examination requested:	2005-09-12
Availability of licence:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:

Application No.	Country/Territory	Date
08/992,763	United States of America	1997-12-17

Abstracts

English Abstract

A multiprocessor computer architecture incorporating a plurality of programmable hardware memory algorithm processors (MAPs) in the memory subsystem. The MAP may comprise one or more field programmable gate arrays (FPGA), which function to perform identified algorithms in conjunction with, and tightly coupled to a microprocessor and each MAP is globally accessible by all of the system processors for the purpose of executing user definable algorithms. A circuit within the MAP signals when the last operand has completed its flow thereby allowing a given process to be interrupted and thereafter restarted. Through the use of read only memory (ROM), located adjacent to the FPGA, a user program may use a single command to select one of several possible pre-loaded algorithms thereby decreasing system configuration time. A computer system structure MAP may function in normal or direct memory access (DMA) modes of operation and in the later mode, one device may feed results directly to another thereby allowing pipelining or parallelizing execution of the user defined algorithm. The system also provides a user programmable performance monitoring capability and utilizes parallelizer software to automatically detect parallel regions of the user applications containing algorithms that can be executed in the programmable hardware.

French Abstract

Cette invention concerne une architecture d'ordinateur multiprocesseur incorporant plusieurs processeurs algorithmiques de mémoire (MAP) de matériel programmable dans le sous-système de la mémoire. Les processeurs MAP peuvent contenir un ou plusieurs réseaux de portes programmables par l'utilisateur (FPGA) qui fonctionnent de façon à exécuter des algorithmes identifiés en association et en relation étroite avec un microprocesseur, et chaque processeur MAP est globalement accessible par tous les processeurs système pour permettre l'exécution des algorithmes définissables par l'utilisateur. Un circuit contenu dans le processeur MAP signale à quel moment la dernière opérande a terminé son opération, permettant ainsi d'interrompre puis de redémarrer un processus donné. Grâce à l'utilisation d'une mémoire morte (ROM) située à côté du réseau FPGA, un programme d'utilisateur peut utiliser une seule instruction pour sélectionner l'un des algorithmes préchargés possibles, réduisant ainsi le temps de reconfiguration système. Une structure de mémoire système d'ordinateur MAP, présentée ici, peut fonctionner en mode d'accès mémoire directe (DMA) ou normal et, dans ce dernier mode, un dispositif peut acheminer les résultats directement à un autre dispositif, ce qui permet l'exécution simultanée ou en parallèle d'un algorithme défini par l'utilisateur. Ce système offre également une capacité de surveillance des performances programmable par l'utilisateur et il utilise un logiciel de mise en parallèle pour détecter automatiquement les régions parallèles des applications utilisateur contenant des algorithmes qui peuvent être exécutés dans le matériel programmable.

Claims

Note: Claims are shown in the official language in which they were submitted.

What is claimed is:

1. A computer system comprising:
at least one processor;
at least one circuit of direct execution logic,
a common memory space accessible by said at least one processor and said
at least one circuit of direct execution; and
a unified executable program comprising a first portion thereof executable by
said at least one processor and a second portion thereof executable by said at
least
one circuit of direct execution logic;
wherein said at least one circuit of direct execution logic is programmed to
perform at least one identified algorithm on an operand received from said
common
memory space.

2. The computer system of claim 1, wherein said at least one processor
comprises a microprocessor.

3. The computer system of claim 1, wherein said at least one circuit of direct

execution logic comprises at least one field programmable gate array.

4. The computer system of claim 1, wherein said at least one circuit of direct

execution logic is operative to access said common memory space independently
of
said at least one processor.

5. The computer system of claim 1, wherein said at least one identified
algorithm
is programmed into a memory device associated with said circuit of direct
execution
logic.

6. The computer system of claim 5, wherein said memory device comprises at
least one read only memory device.

12

7. The computer system of claim 1, wherein said first portion of said unified
executable program executable by said at least one processor is resident in
said
common memory space.

8. The computer system of claim 1, wherein said second portion of said unified

executable program is resident in said at least one circuit of direct
execution logic.

9. The computer system of claim 1, wherein said second portion of said unified

executable program is resident in said at least one field programmable gate
array.

10. The computer system of claim 1, wherein said at least one processor
comprises a fixed instruction set processor.

11. A method for operating a computer system comprising:
providing at least one processor;
providing at least one circuit of direct execution logic;
enabling access by said at least one processor and said at least one circuit
of
direct execution logic to a common memory space;
executing a unified executable program on said computer system such that a
first portion of said unified executable program is executable by said at
least one
processor and a second portion of said unified executable program is
executable by
said at least one circuit of direct execution logic;
wherein said common memory space is accessible by said at least one circuit
of direct execution logic independently of said at least one processor.

12. The method of claim 11, wherein said step of providing at least one
processor
is carried out by a microprocessor.

13. The method of claim 11, wherein said step of providing at least one
processor
is carried out by a fixed instruction set processor.

14. The method of claim 11, wherein said step of providing at least one
circuit of
direct execution logic is carried out by at least one field programmable gate
array.

13

15. The method of claim 11, further comprising:
programming said at least one circuit of direct execution logic to perform at
least one identified algorithm received from said common memory space.

16. The method of claim 15, further comprising:
storing said at least one identified algorithm in a memory device associated
with said circuit of direct execution logic.

17. The method of claim 16, wherein said step of storing said at least one
identified algorithm is carried out by a read only memory device.

18 The method of claim 11, further comprising:
storing said first portion of said unified executable program in said common
memory space.

19. The method of claim 11, further comprising:
storing said second portion of said unified executable program in said at
least
one circuit of direct execution logic.

20 The method of claim 11, further comprising:
storing said second portion of said unified executable program in said at
least
one field programmable gate array

21. A system for processing data using a plurality of circuits of direct
execution
logic, said system comprising:
at least one processor;
a common memory space coupled to said at least one processor and said
plurality of circuits of direct execution logic;
a first one of said plurality of circuits of direct execution logic coupled to
a first
address in said common memory space and responsive to a first data value being

written to said first address, said first one of said plurality of circuits of
direct
execution logic performing a first configured function in accordance with a
unified

14

executable program, generating a second data value and writing said second
data
value to a second address in said common memory space;
a second one of said plurality of circuits of direct execution logic coupled
to
said second address in said common memory space and responsive to said second
data value being written to said second address, said second one of said
plurality of
circuits of direct execution logic retrieving said second data value and
performing a
second configured function in accordance with said unified executable program;
a first control logic block in a first communication path between said at
least
one processor and said common memory space for accessing data at specified
addresses within said common memory space;
a data bus and an address bus coupling said control logic block and said
common memory space;
a third communication oath between said first one of said plurality of
circuits of
direct execution logic and said address bus;
a second control logic block in said third communication path between said
first one of said plurality of circuits of direct execution logic and said
address bus;
where said second control logic block comprises a command decoder for
decoding commands from said at least one processor, a pipeline counter for
counting clock cycles, an equality comparator for determining whether an
output of
said pipeline counter corresponds to a predetermined number of said clock
cycles
and status registers for receiving an output from said equality comparator.

22. The system of claim 21, wherein said second one of said plurality of
circuits of
direct execution logic generates a third data value.

23. The system of claim 21, further comprising a second communication path
between said first one of said plurality of circuits of direct execution logic
and said
data bus.

24. The system of claim 21, wherein said at least one processor transmits
commands on said address bus.

25. The system of claim 21, wherein said at least one processor periodically
accesses said status register.

26. The system of claim 21, wherein said first and second ones of said
plurality of
circuits of direct execution logic comprise field programmable gate arrays.

27. The system of claim 21, wherein said first and second ones of said
plurality of
circuits of direct execution logic are operative to access said common memory
space
independently of said at least one processor.

28. The system of claim 21, wherein said first one of said plurality of
circuits of
direct execution logic is programmed to perform at least one identified
algorithm on
an operand received from said common memory space.

29. The system of claim 28, wherein said at least one identified algorithm is
programmed into a memory device associated with said first one of said
plurality of
circuits of direct execution logic.

30. The system of claim 29, wherein said memory device comprises at least one
read only memory device.

31. The system of claim 21, wherein a first portion of said unified executable

program is resident in said common memory space for execution by said at least
one
processor.

32. The system of claim 21, wherein a second portion of said unified
executable
program is resident in said first one of said plurality of circuits of direct
execution
logic.

33. The system of claim 21, wherein said at least one processor comprises a
fixed instruction set processor

34. A method for operating a computer system comprising:

16

providing at least one processor; providing at least one circuit of direct
execution logic;
enabling access by said at least one processor and said at least one circuit
of
direct execution logic to a common memory space;
executing a unified executable program on said computer system such that a
first portion of said unified executable program is executable by said at
least one
processor to generate a first value and a second portion of said unified
executable
program is executable by said at least one circuit of direct execution logic
to
generate a second value, wherein said first value is used by said at least one
circuit
of direct execution to generate said second value;
programming said at least one circuit of direct execution logic to perform at
least one identified algorithm received from said common memory space, and
storing said at least one identified algorithm in a memory device associated
with said circuit of direct execution logic.

35. The method of claim 34, wherein said third data value is written to a
third
memory location in said common memory space

36. The method of claim 34, wherein performing said first function includes
multiplying.

37. The method of claim 36, wherein configuring said first circuit of direct
execution logic includes at least one processor selecting configuration bits
corresponding to said first function.

38. The method of claim 37, wherein said at least one processor comprises a
fixed instruction set processor.

39. The method of claim 37, wherein said at least one processor performs a
math
function.

40. The method of claim 39, wherein said math function comprises a 64-bit
floating point math function.

17

41. The method of claim 37, further comprising:
signaling said at least one processor when said third data value is available.

42. The method of claim 41, wherein said signaling said at least one processor

includes writing a status value to a status register.

43. The method of claim 34, wherein writing said second data value includes
operatively passing said second data value from said first circuit of direct
execution
logic to said second circuit of direct execution logic.

44. The method of claim 37, wherein said configuring said first circuit of
direct
execution logic is carried out in accordance with a unified executable
program.

45. The method of claim 37, wherein said at least one processor is operative
in
accordance with said unified executable program

46. A computer system comprising:
at least one processor comprising a fixed instruction set processor;
at least one circuit of direct execution logic;
a common memory space accessible by said at least one processor and said
at least one circuit of direct execution; and
a unified executable program comprising a first portion thereof executable by
said at least one processor for generating and storing in the common memory a
first
result and a second portion thereof executable by said at least one circuit of
direct
execution logic for generating a second result by retrieving and using said
first result;
wherein said first portion of said unified executable program executable by
said at least one processor is resident in said common memory space; and
wherein said second portion of said unified executable program is resident in
said at least one circuit of direct execution logic.

47. The computer system of claim 46, wherein said at least one processor
comprises a microprocessor.

18

48. The computer system of claim 46, wherein said at least one circuit of
direct
execution logic comprises at least one field programmable gate array.

49. The computer system of claim 46, wherein said at least one circuit of
direct
execution logic is programmed to perform at least one identified algorithm on
an
operand received from said common memory space.

50. The computer system of claim 49, wherein said at least one identified
algorithm is programmed into a memory device associated with said circuit of
direct
execution logic.

51. The computer system of claim 50, wherein said memory device comprises at
least one read only memory device.

52. The computer system of claim 46, wherein said first portion of said
unified
executable program executable by said at least one processor is resident in
said
common memory space.

53. The computer system of claim 46, wherein said second portion of said
unified
executable program is resident in said at least one circuit of direct
execution logic.
54. The computer system of claim 46, wherein said second portion of said
unified
executable program is resident in said at least one field programmable gate
array.
55. The computer system of claim 46, wherein said at least one processor
comprises a fixed instruction set processor.

56. A system for processing data using a plurality of reconfigurable
processors,
the system comprising:
a memory subsystem coupled to a data processor and including an
addressable memory array;

19

a first reconfigurable processor within the memory subsystem and coupled to
the addressable memory array, wherein the first reconfigurable processor is
identified to the addressable memory array by a first address, and wherein
responsive to a first data value being written at the first address, the first

reconfigurable processor performs a first configured function, generates a
second
data value, and writes the second data value to a second address in the
addressable
memory array;
a second reconfigurable processor within the memory subsystem and coupled
to the addressable memory array, wherein the second reconfigurable processor
is
identified to the addressable memory array by the second address, and wherein,

responsive to the second data value being written at the second address, the
second
reconfigurable processor retrieves the second data and performs a second
configured function;
a control logic block in the memory subsystem in the communication path
between the data processor and the addressable memory array for accessing data

at specified addresses within the addressable memory array; a data bus and an
address bus connecting the control logic block and the addressable memory
array; a
communication path between the first reconfigurable processor and the address
bus;
and a control block in the communication path between the first reconfigurable

processor and the address bus, wherein the control block comprises a command
decoder for decoding commands from the data processor.

57. The system of claim 56, further comprising-
a first control logic block in a first communication path between said at
least
one processor and said common memory space for accessing data at specified
addresses within said common memory space.

58. The system of claim 57, further comprising a data bus and an address bus
coupling said control logic block and said common memory space.

59. The system of claim 58, further comprising a second communication path
between said first one of said plurality of circuits of direct execution logic
and said
data bus.

60. The system of claim 58, further comprising a third communication path
between said first one of said plurality of circuits of direct execution logic
and said
address bus.

61. The system of claim 60, further comprising a second control logic block in
said
third communication path between said first one of said plurality of circuits
of direct
execution logic and said address bus.

62. The system of claim 61, where said second control logic block comprises a
command decoder for decoding commands from said at least one processor, a
pipeline counter for counting clock cycles, an equality comparator for
determining
whether an output of said pipeline counter corresponds to a predetermined
number
of said clock cycles and status registers for receiving an output from said
equality
comparator.

63. The system of claim 62, wherein said at least one processor transmits
commands on said address bus.

64. The system of claim 62, wherein said at least one processor periodically
accesses said status register.

65. The system of claim 56, wherein said first and second ones of said
plurality of
circuits of direct execution logic comprise field programmable gate arrays.

66. The system of claim 56, wherein said first one of said plurality of
circuits of
direct execution logic is programmed to perform at least one identified
algorithm on
an operand received from said common memory space.

67. The system of claim 66, wherein said at least one identified algorithm is
programmed into a memory device associated with said first one of said
plurality of
circuits of direct execution logic.

21

68. The system of claim 67, wherein said memory device comprises at least one
read only memory device.

69. The system of claim 56, wherein a first portion of said unified executable

program is resident in said common memory space for execution by said at least
one
processor.

70. The system of claim 56, wherein a second portion of said unified
executable
program is resident in said first one of said plurality of circuits of direct
execution
logic.

71. The system of claim 56, wherein said at least one processor comprises a
fixed instruction set processor.

22

Description

Note: Descriptions are shown in the official language in which they were submitted.

CA 02515283 1998-12-03
WO 9913161? , PC'T/US98/25587
1
MULTIPROCESSOR COMPUTER ARGHITECTURE
INCORPORATING A PLURALITY OF MEMORY
ALGORITHM PROCESSORS IN THE MEMORY SUBSYSTEM
BACKGROUND OF THE INVENTION
The present invention relates, in general, to the field of computer
architectures incorporating multiple processing elements. More
i
particularly, the present invention relates to a multiprocessor computer
architecture incorporating a number of memory algorithm processors in
the memory subsystem to significantly enhance overall system
processing speed.
Ail general purpose computers are based on circuits that have
some form of processing element. These may take the form of
microprocessor chips or could be a collection of smaller chips coupled
together to form a processor. In any case, these processors are
designed to execute programs that are defined by a set of program
steps. The fact that these steps, or commands, can be rearranged to
create different end results using the same computer hardware is key
to the computer's flexibility. Unfortunately, this flexibility dictates that
the hardware then be designed to handle a variety of possible
functions, which results in generally slower operation than would be
the case were it able to be designed to handle only one particular
function. On the other hand, a single function computer is inherently
not a particularly versatile computer.
Recently, several groups have begun to experiment with creating
a processor out of circuits that are electrically reconfigurable. This
would allow the processor to execute a small set of functions more
quickly and then be electrically reconfigured to execute a different
small set. While this accelerates some program execution speeds,
there are many functions that cannot be implemented well in this type
of system due to the circuit densities that can be achieved in
reconfigurable integrated circuits, such as 64-bit floating point math.

CA 02515283 1998-12-03
WO 99/31617 PCT/US98/25587
2
In addition, all of these systems are presently intended to contain
processors that operate atone. In high performance systems, this is
not the case. Hundreds or even tens of thousands of processors are
often used to solve a single problem in a timely manner. This
introduces numerous issues that such reconfigurable computers cannot
handle, such as sharing of a single copy of the operating system. In
addition, a large system constructed from this type of custom hardware
would naturally be very expensive to produce.
SUMMARY OF THE INVENTION
In response to these shortcomings, SRC Computers, Inc.,
Colorado Springs, CO, assignee of the present invention, has
developed a Memory Algorithm Processor ("MAP") riiultiprocessor
computer architecture that utilizes very high performance
microprocessors in conjunction with user reconfigurable hardware
elements. These reconfigurable elements, referred to as MAPs, are
globally accessible by all processors in the systems. In addition, the
manufacturing cost and design time of a particular multiprocessor
computer system is relatively low inasmuch as it can be built using
industry standard, commodity integrated circuits and, in a preferred
embodiment, each MAP may comprise a Field Programmable Gate
Array ("FPGA") operating as a reconfigurable functional unit.
Particularly disclosed herein is the utilization of one or more
FPGAs to perform user defined algorithms in conjunction with, and
tightly coupled to, a microprocessor. More particularly, in a
multiprocessor computer system, the FPGAs are globally accessible by
all of the system processors for the purpose of executing user
definabia algorithms.
In a particular implementation of the present invention disclosed
herein, a circuit is provided either within, or in conjunction with, the
FPGAs which signals, by means of a control bit, when the last operand

CA 02515283 1998-12-03
WO 99/31617 PCT/US98/25587
3
has completed its flow through the MAP, thereby allowing a given
process to be interrupted and thereafter restarted. In a stilt more
specific implementation, one or more read only memory ("ROM°)
integrated circuit chips may be coupled adjacent the FPGA to allow a
user program to use a single command to select one of several
possible algorithms pre-loaded in the ROM thereby decreasing system
reconfiguration time.
Still further provided is a computer system memory structure
which includes one or more FPGAs for the purpose of using normal
memory access protocol to access it as well as being capable of direct
memory access ("DMA") operation. In a multiprocessor computer
system, FPGAs configured with DMA capability enable one device to
feed results directly to another thereby allowing pipeiining or
parallelizing execution of a user defined algorithm located in the re-
configurable hardware. The system and method of the present
invention also provide a user programmable pertormance monitoring
capability and utilizes parallelizer software to automatically detect
parallel regions of user applications containing algorithms that can be
executed in programmable hardware.
Broadly, what is disclosed herein is a computer including at least
one data processor for operating on user data in accordance with
program instructions. The computer includes at least one memory
array presenting a data and address bus and comprises a memory
algorithm processor associated with the memory array and coupled to
the data and address buses. The memory algorithm processor is
configurable to perform at least one identified algorithm on an operand
received from a write operation to the memory array.
Also disclosed herein is a multiprocessor computer including a
first plurality of data processors for .operating on user data in
accordance with program instructions and a second plurality of memory
arrays, each presenting a data and address bus. The computer

CA 02515283 1998-12-03
WO 99/31617 PCT/US98125587
4
comprises a memory algorithm processor associated with at least one
of the second plurality of memory arrays and coupled to the data and
address bus thereof. The memory algorithm processor is configurable
to perform at least one identified algorithm on an operand received
from a write operation to the associated one of the second plurality of
memory arrays.
BRIEF DESCRIPTION OF THE DRAWINGS
The aforementioned and other features and objects of the
present invention and the manner of attaining them will become more
apparent and the invention itself will be best understood by reference
to the following description of a preferred embodiment taken in
conjunction with the accompanying drawings, wherein:
Fig. 1 is a simplified, high level, functional block diagram of a
standard multiprocessor computer architecture;
Fig. 2 is a simplified logical block diagram of a possible
computer application program decomposition sequence for use in
conjunction with a multiprocessor computer architecture utilizing a
number of memory algorithm processors (°MAPs") in accordance with
the present invention;
Fig. 3 is a more detailed functional block diagram of an
individual one of the MAPs of the preceding figure and illustrating the
bank control logic, memory array and,MAP assembly. thereof; and
Fig. 4 is a more detailed functional block diagram of the control
block of the MAP assembly of the preceding illustration illustrating its
interconnection to the user FPGA thereof.
DESCRIPTION OF A PREFERRED EMBODIMENT
With reference now to Fig. 1, a conventional multiprocessor
computer 10 architecture is shown. The multiprocessor computer 10
incorporates N processors 12o through 12N which are bi-directionally

CA 02515283 1998-12-03
WO 99/3I617 PCTlUS98/25587
coupled to a memory interconnect fabric 14. The memory interconnect
fabric 14 is then also coupled to M memory banks comprising memory
bank subsystems 160 (Bank 0) through 16M (Bank M).
With reference now to Fig. 2, a representative application
5 program decomposition for a multiprocessor computer architecture 100
incorporating a plurality of memory algorithm processors in accordance
with the present invention is shown. The computer architecture 100 is
operative in response to user instructions and data which, in a coarse
grained portion of the decomposition, are selectively directed to one of
(for purposes of example only) four parallel regions 102, through 1024
inclusive. The instructions and data output from each of the parallel
regions 102, through 1024 are respectively input to parallel regions
segregated into data areas 104, .through 1044 and instruction areas
106, through 1064. Data maintained in the data areas 104, through
1044 and instructions maintained in the instruction areas 106, through
1064 are then supplied to, for example, corresponding pairs of
processors 108,, 1082 (P1 and P2); 1083, 1084 (P3 and P4); 1085, 1086
(P5 and P6); and 108, 1088 (P7 and P8) as shown. At this point, the
medium grained decomposition of the instructions and data has been
accomplished.
A fine grained decomposition, or parallelism, is effectuated by a
further algorithmic decomposition wherein the output of each of the
processors 108, through 1088 is broken up, for example, into a number
of fundamental algorithms 110,A, 110,8, 11O2p, 11028 through 11088 as
shown. Each of the algorithms is then supplied to a corresponding one
of the MAPs 112,A, 112,8, 1122A, 11228 through 11288 in the memory
space of the computer architecture 100 for execution therein as will be
more fully described hereinafter.
With reference additionally now to Fig. 3, a preferred
implementation of a memory bank 120 in a MAP system computer
architecture 100 of the present invention is shown for a representative

CA 02515283 1998-12-03
WO 99/31617 PCT/US98/25587
6
one of the MAPs 112 illustrated in the preceding figure. Each memory
bank 120 includes a bank control logic block 122 bi-directionally
coupled to the computer system trunk lines, for example, a 72 line bus
124. The bank control logic block 122 is coupled to a bi-directional
data bus 126 (for example 256 lines) and supplies addresses on an
address bus 128 (for example 17 lines) for accessing data at specified
locations within a memory array 130.
The data bus 126 and address bus 128 are also coupled to a
MAP assembly 112. The MAP assembly 112 comprises a control block
132 coupled to the address bus 128. The control block 132 is also bi-
directionally coupled to a user field programmable gate array ("FPGA")
134 by means of a number of signal lines 136. The user FPGA 134 is
coupled directly to the data bus 126. In a particular embodiment, the
FPGA 134 may be provided as a Lucent Technologies OR3T80 device.
The computer architecture 100 comprises a multiprocessor
system employing uniform memory access across common shared
memory with one or more MAPs 112 located in the memory subsystem,
or memory space. As previously described, each MAP 112 contains at
least one relatively large FPGA 134 that is used as a reconfigurable
functional unit. In addition, a control block 132 and a preprogrammed
or dynamically programmable configuration read-only memory ("ROM"
as will be more fully described hereinafter) contains the information
needed by the reconfigurable MAP assembly 112 to enable it to
perform a specific algorithm. It is also possible for the user to directly
download a new configuration into the FPGA 134 under program
control, although in some instances this may consume a number of
memory accesses and might result in an overall decrease in system
performance if the algorithm was short-lived.
FPGAs have particular advantages in the application shown for
several reasons. First, commercially available, off-the-shelf FPGAs
now contain sufficient internal logic cells to perform meaningful

CA 02515283 1998-12-03
WO 99/31617 , PCT/US98I25587
7
computational functions. Secondly, they can operate at speeds
comparable to microprocessors, which eliminates the need for speed
matching buffers. Still further, the internal programmable routing
resources of FPGAs are now extensive enough that meaningful
algorithms can now be programmed without the need to reassign the
locations of the inputloutput ("IIO") pins.
By placing the MAP 112 in the memory subsystem or memory
space, it can be readily accessed through th'e use of memory read and
write commands, which allows the use of a variety of standard
operating systems. In contrast, other conventional implementations
propose placement of any reconfigurable logic in or near the
processor. This is much less effective in a multiprocessor environment
because only one processor has rapid access to it. Consequently,
reconfigurable logic must be placed by every processor in a
multiprocessor system, which increases the overall system cost. In
addition, MAP 112 can access the memory array 130 itself, referred to
as Direct Memory Access ("DMA"), allowing it to execute tasks
independently and asynchronously of the processor. in comparison,
were it were placed near the processor, it would have to compete with
the processors for system routing resources in order to access
memory, which deleteriously impacts processor performance. Because
MAP 112 has DMA capability, (allowing it to write to memory), and
because it receives its operands via writes to memory, it is possible to
allow a MAP 112 to feed results to another MAP 112. This is a very
powerful feature that allows for very extensive pipelining and
parallelizing of large tasks, which permits them to complete faster.
Many of the algorithms that may be implemented will receive an
operand and require many clock cycles to produce a result. One such
example may be a multiplication that takes 64 clock cycles. This same
multiplication may also need to be performed on thousands of
operands. In this situation, the incoming operands would be presented

CA 02515283 1998-12-03
WO 99131617 PCT/US98/25587
8
sequentially so that while the first operand requires 64 clock cycles to
produce results at the output, the second operand, arriving one clock
cycle later at the input, will show results one clock cycle later at the
output. Thus, after an initial delay of 64 clock cycles, new output data
will appear on every consecutive clock cycle until the results of the last
operand appears. This is called "pipelining°.
In a multiprocessor system, it is quite common for the operating
system to stop a processor in the middle of a task, reassign it to a
higher priority task, and then return it, or another, to complete the
initial task. When this is combined with a pipelined algorithm, a
problem arises (if the processor stops issuing operands in the middle
of a list and stops accepting results) with respect to operands already
issued but not yet through the pipeline. To handle this issue, a
solution involving the combination of software and hardware is
disclosed herein.
To make use of any type of conventional reconfigurable
hardware, the programmer could embed the necessary commands in
his application program code. The drawback to this approach is that a
program would then have to be tailored to be specific to the MAP
hardware. The system of the present invention eliminates this
problem. Multiprocessor computers often use software called
parallelizers. The purpose of this software is to analyze the user's
application code and determine how best to split it up among the
processors. The present invention provides significant advantages
over a conventional parallelizer and enables it to recognize portions of
the user code that represent algorithms that exist in MAPS 112 for that
system and to then treat the MAP 112 as another computing element.
The parallelizer then automatically generates the necessary code to
utilize the MAP 112. This allows the user to write the algorithm directly
in his code, allowing it to be more portable and reducing the knowledge
of the system hardware that he has to have to utilize the MAP 112.

CA 02515283 1998-12-03
WO 99/31617 PCT/US98125587
9
With reference additionally now to Fig. 4, a block diagram of the
MAP control block 132 is shown in greater detail. The control block
132 is coupled to receive a number of command bits (for example, 17)
from the address bus 128 at a command decoder 150. The command
decoder 150 then supplies a number of register control bits to a group
of status registers 152 on an eight bit bus 154. The command decoder
150 also supplies a single bit last operand flag on line 156 to a
pipeline counter 158. The pipeline counter 158 supplies an eight bit
output to an equality comparitor 160 on bus 162. The equality
comparitor 160 also receives an eight bit signal from the FPGA 134 on
bus 136 indicative of the pipeline depth. When the equality comparitor
determines that tile pipeline is empty, it provides a single bit pipeline
empty flag on line 164 for input to the status registers 152. The status
registers are also coupled to receive an eight bit status signal from the
FPGA 134 on bus 136 and it produces a sixty four bit status word
output on bus 166 in response to the signals on bus 136, 154 and line
164.
The command decoder 150 also supplies a five bit control signal
to a configuration multiplexer ("MUX") 170 as shown. The
configuration mux 170 receives a single bit output of a 256 bit parallel-
serial converter 172 on line 176. The inputs of the 256 bit parallel-to-
serial converter 172 are coupled to a 256 bit user configuration pattern
bus 174. The configuration mux 170 also receives sixteen single bit
inputs from the configuration ROMs (illustrated as ROM 182) on bus
178 and provides a single bit configuration file signal on line 180 to the
user FPGA 134 as selected by the control signals from the command
decoder 150 on the bus 168.
In operation, when a processor 108 is halted by the operating
system, the operating system will issue a last operand command to the
MAP 112 through the use of command bits embedded in the address
field on bus 128. This command is recognized by the command

CA 02515283 1998-12-03
WO 99/31617 PCT/US98/255$7
decoder 150 of the control block 132 and it initiates a hardware
pipeline counter 158. When the algorithm was initially loaded into the
FPGA 134, several output bits connected to the control block 132 were
configured to display a binary representation of the number of clock
5 cycles required to get through its pipeline (i:e. pipeline "depth")on bus
136 input to the equality comparitor 160. After receiving the last
operand command, the pipeline counter 158 in the control block 132
counts clock cycles until its count equals the pipeline depth for that
particular algorithm. At that point, the equality comparitor 160 in the
10 control block 132 de-asserts a busy bit on line 164 in an internal group
of status registers 152. After issuing the last operand signal, the
processor 108 will repeatedly read the status registers 152 and accept
any output data on bus 166. When the busy flag is de-asserted, the
task can be stopped and the MAP 112 utilized for a different task. It
should be noted that it is also possible to leave the MAP 112
configured, transfer the program to a different processor 108 and
restart the task where it left off
In order to evaluate the effectiveness of the use of the MAP 112
in a given application, some form of feedback to the use is required.
Therefore, the MAP 112 may be equipped with internal registers in the
control block 132 that allow it to monitor efficiency related factors such
as the number of input operands versus output data, the number of idle
cycles over time and the number of system monitor interrupts received
over time. One of the advantages that the MAP 112 has is that
because of its reconfigurable nature, the actual function and type of
function that are monitored can also change as the algorithm changes.
This provides the user with an almost infinite number of possible
monitored factors without having to monitor all factors all of the time.
While there have been described above the principles of the
present invention in conjunction with a specific multiprocessor
architecture it is to be clearly understood that the foregoing description

CA 02515283 1998-12-03
WO 99/31617 PCT/US98/25587
11
is made only by way of example and not as a limitation to the scope of
the invention. Particularly, it is recognized that the teachings of the
foregoing disclosure will suggest other modifications to those persons
skilled in the relevant art. Such modifications may involve other
features which are already known per se and which may be used
instead of or in addition to features already described herein. Although
claims have been formulated in this application to particular
combinations of features, it should be understood that the scope of the
disclosure herein also includes any novel feature or any. novel
combination of features disclosed either explicitly or implicitly or any
generalization or modification thereof which would be apparent to
persons skilled in the relevant art, whether or not such relates to the
same invention as presently claimed in any claim and whether or not it
mitigates any or all of the same technical problems as confronted by
the present invention. The applicants hereby reserve the right to
formulate new claims to such features andlor combinations of such
features during the prosecution of the present application or of any
further application derived therefrom.

Representative Drawing

A single figure which represents the drawing illustrating the invention.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee and Payment History should be consulted.

Administrative Status

Title	Date
Forecasted Issue Date	2011-04-05
(22) Filed	1998-12-03
(41) Open to Public Inspection	1999-06-24
Examination Requested	2005-09-12
(45) Issued	2011-04-05
Expired	2018-12-03

Abandonment History

There is no abandonment history.

Payment History

Fee Type	Anniversary Year	Due Date	Amount Paid	Paid Date
Request for Examination			$800.00	2005-09-12
Registration of a document - section 124			$100.00	2005-09-12
Application Fee			$400.00	2005-09-12
Maintenance Fee - Application - New Act	2	2000-12-04	$100.00	2005-09-12
Maintenance Fee - Application - New Act	3	2001-12-03	$100.00	2005-09-12
Maintenance Fee - Application - New Act	4	2002-12-03	$100.00	2005-09-12
Maintenance Fee - Application - New Act	5	2003-12-03	$200.00	2005-09-12
Maintenance Fee - Application - New Act	6	2004-12-03	$200.00	2005-09-12
Maintenance Fee - Application - New Act	7	2005-12-05	$200.00	2005-09-12
Maintenance Fee - Application - New Act	8	2006-12-04	$200.00	2006-11-17
Maintenance Fee - Application - New Act	9	2007-12-03	$200.00	2007-11-19
Maintenance Fee - Application - New Act	10	2008-12-03	$250.00	2008-12-02
Maintenance Fee - Application - New Act	11	2009-12-03	$250.00	2009-11-24
Maintenance Fee - Application - New Act	12	2010-12-03	$250.00	2010-10-06
Final Fee			$300.00	2011-01-05
Maintenance Fee - Patent - New Act	13	2011-12-05	$250.00	2011-12-02
Maintenance Fee - Patent - New Act	14	2012-12-03	$250.00	2012-11-21
Maintenance Fee - Patent - New Act	15	2013-12-03	$450.00	2013-11-14
Registration of a document - section 124			$100.00	2013-12-19
Maintenance Fee - Patent - New Act	16	2014-12-03	$450.00	2014-11-14
Maintenance Fee - Patent - New Act	17	2015-12-03	$450.00	2015-11-13
Maintenance Fee - Patent - New Act	18	2016-12-05	$650.00	2017-04-17
Maintenance Fee - Patent - New Act	19	2017-12-04	$450.00	2017-12-04

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
SRC COMPUTERS, LLC

Past Owners on Record
HUPPENTHAL, JON M.
LESKAR, PAUL A.
SRC COMPUTERS, INC.

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Claims	2010-06-09	11	393
Abstract	1998-12-03	1	31
Description	1998-12-03	11	534
Claims	1998-12-03	17	716
Representative Drawing	2005-10-24	1	22
Drawings	1998-12-03	4	99
Cover Page	2005-11-02	2	70
Cover Page	2011-03-04	2	72
Correspondence	2005-09-23	1	39
Assignment	1998-12-03	4	105
Correspondence	2011-01-05	2	52
Prosecution-Amendment	2005-10-26	1	38
Correspondence	2005-11-17	1	17
Fees	2006-11-17	1	40
Fees	2007-11-19	1	41
Fees	2008-12-02	1	45
Fees	2009-11-24	1	42
Prosecution-Amendment	2010-03-01	2	48
Prosecution-Amendment	2010-06-09	13	455
Fees	2010-10-06	1	49
Assignment	2013-12-19	5	140

Language selection

Menus

English Abstract

French Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 2515283 Summary

English Abstract

French Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.