Language selection

Search

Patent 1262968 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 1262968
(21) Application Number: 1262968
(54) English Title: METHOD AND APPARATUS FOR ADDRESSING A MEMORY BY ARRAY TRANSFORMATIONS
(54) French Title: METHODE ET DISPOSITIF D'ACCES A UNE MEMOIRE PAR TRANSFORMATION D'AGREGATS DE DONNEES
Status: Expired and beyond the Period of Reversal
Bibliographic Data
(51) International Patent Classification (IPC):
  • G06F 12/06 (2006.01)
  • G06F 9/345 (2018.01)
  • G06F 12/02 (2006.01)
(72) Inventors :
  • DEERFIELD, ALAN J. (United States of America)
  • SIU, SUN-CHI (United States of America)
(73) Owners :
  • MICRON TECHNOLOGY, INC.
(71) Applicants :
  • MICRON TECHNOLOGY, INC. (United States of America)
(74) Agent: SMART & BIGGAR LP
(74) Associate agent:
(45) Issued: 1989-11-14
(22) Filed Date: 1986-03-19
Availability of licence: Yes
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data:
Application No. Country/Territory Date
720,330 (United States of America) 1985-04-05

Abstracts

English Abstract


Abstract of the Disclosure
A memory having an address generator in an intelligent
port which generates address sequences specified by an array
transformation operator in a programmable processor, thereby
allowing a controlling processor to proceed immediately to
the preparation of the next instruction in parallel with
memory execution of a present instruction. The intelligent
port of the memory creates complex data structures from
input data arrays stored in memory and directs the trans-
formation of the data structures into output data streams.
The memory comprises a plurality of read-write memory banks
and a bank of read-only memory interconnected through in-
telligent ports and busses to other units of the processor.
An arbitration and switching network assigns memory banks to
the intelligent ports.


Claims

Note: Claims are shown in the official language in which they were submitted.


62901-680D
THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:
1. A method of generating addressing sequences for an array
of data in a digital system comprising the steps of,
interpreting instruction commands;
performing arithmetic operations based on said commands;
storing data used in said arithmetic operations;
generating addressing sequences specified by an array
transformation for accessing said data; and
transferring said data between means for storing said data
and means for storing said arithmetic operations as specified by
said array transformation.
2. The method as recited in claim 1 wherein said step of
storing data comprises the step of:
generating said addressing sequences according to a nested
series of a plurality of parameters in said array transformation.
3. The method as recited in claim 1 wherein:
said method further comprises the step of interpreting a
boundary parameter of said array transformation for controlling
the generating of said addressing sequences when an address of
said sequences may be generated outside a boundary of said array.
4. The method as recited in claim 3 wherein:
said boundary parameter specifies a plurality of modes
comprising a wrap-around mode, a zero-fill mode and an ignore

boundaries mode. 62901-680D
5. A digital system comprising:
means for interpreting instruction commands;
means for performing arithmetic operations based on
said commands;
means for storing data for said arithmetic operations,
said storing means comprising means for generating a plurality
of addresses in response to an array transformation for accessing
said data; and
means for transferring said data between said storing
means and said arithmetic operations performing means in accor-
dance with said array transformation.
6. The digital system as recited in claim 5 wherein:
said storing means comprises a port means for generat-
ing said plurality of addresses according to said array trans-
formation.
7. The digital system as recited in claim 5 wherein:
said array transformation specifies array addressing
in terms of a factored series of nested addressing sequences
defined by a plurality of displacement and length parameters.
8. The digital system as recited in claim 6 wherein:
said port means comprises an addresses generator for
generating said plurality of addresses to load or access a data
array in said storing means.
- 51 -

Description

Note: Descriptions are shown in the official language in which they were submitted.


~6~
METHOD AND P.PPARATUS FOR ADDRESSING
A MEMORY eY AE~RAY TRANSFORMATIONS
The Government has rights in this invention pursuant
to Contract No. N62269-82-C-0492 awarded by the Department
of the Navy.
Back~round of the Invention
This invention relates generally to addressing a memory
of a digital system~ and in particular to the method and
apparatus for addressing a memory by a set of parameters
which specify an addressing sequence within the memory for
data arrays.
In many digital processin~ systems, specially programmed
units control the ordering of data access from memories while
special purpose interfacing units link up components with
different data formatting requirements. The proliferation of
special purpose units results.in inefficiencles, causing high
system development costs, long development times, high
programming costs, and high system maintenance costs.
The arithmetic unit (AU) in prior processors usually
assisted the processor control unit to sequence data items
to the arithmetic.section or transform data items into a form
appropriate for an operation to be performed; this not only
increased the complexity of the'control unit, but also in-
terrupted data processing, causing a reduction in processor
e~ficiency. In addition, o~ten the AU is idle while the next
~.

3~;~
instruction is being interpreted. It is desirable to
continuously control formatting operations over related data
items, like arrays, and to let the AU per~orm continuous AU
functions.
Sometimes special purpose instructions are implemented
in digital signal processors to facilitate performing vector-
matrix mathematics. Generally, a series of instructions are
required to per~orm a signal processing al~orithm using the
available special purpose instructions and other instructions
for correctly indexing and dimensioning arrays. A higher
order language that eliminates the need for ancillary
parameters to index and dimension arrays is highly desirable,
especially when the hardware required to implemen~ such a
language is no~ prohibitive.

Summary of the Inven_ on
In accordance with the present invention, a method is
provided for generating addressing sequences for arrays, in-
cluding a vector, a matrix or a block, in a memory specified
5 by an array transformation operator comprising a plurality
of parameters and under the control of said memory, the
method comprising the steps of loading an address generator
portion of the memory with a plurality of parameters of the
array transformation operator including an initial address
parameter, generating a plurality of row and column indices
specified by displacement and length parameters of the array
transformation relative to the initial address parameter and
translating each paix of the row and column indices into an
address for the memory. The method further comprises the
step of interpreting a boundary parameter of ~he arxay
transformation for controlling or modifying the generating
o the addressing sequences when one of said displacement
parameters causes the address to be generated outside a
boundary of an array. The boundary parameter speci.fies one
of the boundary modes of operation comprising a wrap-around
mode, a zero-fill mode and an ignore boundaries mode.
In accordance with a further feature of the invention a
method is provided for generati~g addressing sequences in
a memory for an array specified by paxameters of an array
transformation comprising the steps of loading an- index
--3--
!

generator means in an address generator of the memory with
an initial address specified by an initial address parameter,
delta 0, of the array transformation, loading the index
generator means with a displacement parameter of the array
S transformation, loading a counter in the address qenerator
with a length parameter, L1, of the array transformation and
generating a plurality of addresses by incrementing the
initial address parameter in the index generator with the
displacement parameter a number of times equal to the length
parameter. This method of the invention produces a line or
vec~or of Ll addresses. Further methods in accordance with
this invention produce a sequence of addresses for arrays of
a two dimensional matrix or a three dimensional block.
In accordance with one embodiment of the invention, a
memory having a plurality of ports is provided with at least
three of said ports being intelligent ports comprising an
address generator for generating a sequence of addresses for
arrays including a vector, a matrix or a block in response
to parameters of an array transformation operator. The
parame~ers are loaded into an indices generator by a dis
placement control word and a length control word for
generating a plurality of row and column indices relative to
an initial reerence point of a single ~r multi-dimensional
array. An address translator coupled to the indices generator
2S converts each pair of row and column indices into an address.

A microprogrammed sequencer controls the operation of ~he
address generator. Each of the intelligent ports may operate
in a read or write mode.
In accordance with a further feat~re o the embodiment
o~ the invention an address generator of an intelligent
memory port is provided comprising an index generator means
for generating a plurality of row and column indices, the
index generator means being loaded with an initial address
specified by an initial address parameter, delta 0, of an
array transformation, means for storing in the index generator
means a displacement parameter, delta 1, of the array trans-
formation, means for storing in the address generator a
length parameter, Ll, of the array transformation means for
incrementing the initial address with the displacement param-
eter a plurality of times equal to the length parameter, and
means for translating the row and column indices into a
plurality of addresses. The index generator means comprises
a plurality of row index generators and column index gener-
ators~ This embodiment generates a line or vector of L1
addresses, and it may also generate addresses for a plurality
o arrays o ~wo ~r more dimensions.
--5--
.

~62~
6290~-680
According to a ~irst broad aspect, the present invention
provides a method of generating addressiny sequences for an array
of data in a digital system comprisin~ the steps of: interpreting
instruction commands; performing arithmetic operations based on
said commands; storing data used in said arithmetic operations;
generating addressing sequences speciiied hy an array
transformation for accessing said data; and transferring said data
between means for storing said data and means for storing said
arithmetic operations as speclfied by said array transformation.
According to a second broad aspec~, the present
invention provides a digital system comprising: means for inter-
preting instruGtion commands; means for performing axithmetic
operations based on said commands; means for storing data for said
arithmetic operations, said storing means comprising means for
generating a plurali~y of addresses in response to an array
transformation for acces~ing said data; and means for transferring
said data between said storing means and said arithmetic
operations performing means in accordance with said array
transformation.

3~;8
Brie~ Descripti _ of the Drawings
The above-mentioned aspects and other features of the
invention are explained more fully in the following descrip-
tion taken in connection with the accompanying drawings in
which:
FIG. 1 is a block diagram of a Macro Function Signal
Processor (MFSP) utilizing an intelligent memory device of
the present invention.
FIG. 2 is a block dlagram of an intelligent memory and
its interfacing bussesO
FIG. 3 is a block diagram of the invention comprising
an intelligent memory port.
FIG. 4A and FIG. 4B show the control word formats for
specifying an array trans~ormation.
FIGs. 5A-5E show the sequences of factored addressing
for transposing an array specified by an array transformation.
FIG. 6 illustrates three boundary modes and functions
when an address generator displacement encounters the righ~
edge of a matrix.
FIG. 7 illustrates directions of i, j and k displace
ments on a I; J; X block.
~IG. 8 shows some initial points within a block for
wrap-around boundary.
FIGo 9 shows some initial points in the ~ero-fill
boundary mode.

:~6~8
FIG. lO is a block diagra~ of the address generator
of the present invention.
FIG. 11 shows the location of the array transformation
parameters within the upper and lower ma~rix access chips.
FIG, 12 is a block diagram o one of the matrix access
chips in the indices generator of the address generator.
FIG. l~ is a ~lock diagram o~ one of the index gener-
ators in the matrix access chip.
FIG. 14 is a block diagram of one o~ the length
counters in the matrix access chip.
FIG. 15 illustrates a matrix having row and column
boundaries and references zQnes outside said boundaries.
F~G. 16 shows one illustrative sequence of elements
g nerated by a matrix access chip to define an output shape.
FIG. 17 illustrates index adjustment from outside
- array boundaries.
FIG. 18 shows a Digital Fourier Transform coefficient
matrix illustrating that the exponent of w is the product of
the row and column index.

'~ ~
;8
Descriptlon of the Preferred Embodiment
Referring to FIG. 1, there i5 shown a block diagram of
a Macro Function Signal Processor (MFSP) 10 illustrating an
overall system in which an intelligent memory 12 may be used
comprising ~n array transformation address generator implemen-
tation of the present invention. More particularly, the
intelligent memory 12 may have a plurality of intelligent
ports 14-22, although in the MFSP 10 embodi~ent described
herein two of the ports 14 and 22 are simply serving as
direct memory access (DMA) ports known to one skilled in the
art, Portl 16, port2 18 and port3 20 are the intelligent
ports in the system of FIG. 1, primarily because of their
ability to execute addressing sequences based on an array
transformation operator while completely hiding data attribute
considerations from an arithmetic unit 38.
In addition to the intelligent memory 12, the MFSP 10
comprises the arith~etic unit 3~, a control processor 32,
a node control unit 24, a Syste~ I/0 unit ~0, an I Bus 26,
an S ~us 28 and an ~ Bus 30~ elli~ent ~rt2 18 and
intelligent port3 20 o the intelligent me~ory 12 each'have
32-bit direct con.nections to in?uts of the arithmetic unit 38
and a 64-bit output of the arit..~etic unit 38 connecting to
intelligent portl 16, thereby p-oviding th~ means for stream-
ing data to and from the arith~-tic URit 3~. The A bus 33
not only interconnects tlle three intellige~.t ports 16-20, but
--8--
,

12~
in addition serves as the MFSP 10 internal control bus inter-
connecting a 2-port RAM 34 of control processor 32 and the
arithmetic unit 38. The conkrol processor 32 comprises the
2-port RAM 34 and a command interpreter 36 for interpreting
instruc~ions and setting up the intelligent memor~ 12 and
the arithmetic unit 38 for execution of said instruc~ions.
The system bus, S Bus 28, interconnects a plurality of units
including the node control unit 24, portO 14l port4 22,
2-port RAM 34, command interpreter 36, arithmetic unit.38
and system I/O unit 40. DMA portO 14 is connected to the
node control unit 24 via ~he I Bus 26. The node control
unit 24 provides an interface between networks o high speed
busses in a distributed multiprocessor and a processor or
other device such as the MFSP lO. DMA port4 24 has a direct
lS 32-bit connection to the system I/O unit 40 for system
application I/O information transfer.
Referring now to FIG. 2, there is shown a block diagram
of the intelligent memory 12 comprising five memory banks
52-60, an arbitration and switching networX 62 and a plurality
o ports 14-22, including the three intelligent ports ~6-20
and two DMA ports.l~ and 22. The arbitration and switching
network 62 directs the flow of data between the ports 14~22
and the memory banks 52-60; it provides an 88-bit wide, 5 X 5
crossbar switch with arbitration logic for the five inputs
(ports) and five outputs (memory banks)O .~lthough a
.

particular embodiment of the intelligent memory is described
here for the M~SP lO, the invention is not limited to a
specific number of memory banks nor to a specific number
of intelligent ports nor to a specific size crossbar switch.
Still referring to FIG. 2, ~he 5 X 5 crossbar handles 64
bits for data, 24 bits for address and control from each of
the ports 14-22. The arbitration logic resolves conflicts
between the ports 14-22. If more than one port requests the
same bank of memory, the arbitration logic holds off the
lower (fixed) priority port until the higher priority port
completes its transfer. Arbitration is performed on a cycle
by cycle basis. Four of the ~emory banXs, bankO 52, bankl
54, bank2 56 and bank3 58 are random acces~ memories ~RAM),
each organized as 64K words by 64 bits. Me~ory bank4 60 is
a read-only memory (ROM) organized as 16K words by 64 bits.
Two of the RAM memory banks 52, 58 are used primarily by
port2 18 and port3 20 which are primarily used as READ ports~
The other two RA~ memory banks are scratch pad areas of
memory and as such are used primarily for storing intermediate
values via portl 16 which is primarily used as a WRITE port.
~OM memory bank 6D stores constants and approximation tables
for use during the operation of various macro functions.
The DMA ports 14 and 22 act as an interface between the I
Bus 26 or the system I/O unit 40 and the intelligent memory
12 for the purpose of transferring large blocks of data; the
--10--

6~1
blocks of transferred data reside in consecutive locations
in the intelligent memory 12. The DMA ports 14 and 22 also
interface with the S Bus 28 for control and status, the I
Bus 26 or the system I~O unit 40 for transfer block data
accesses external to the intelligent memory 12 and to the
arbitration and switching network for transfer block data
accesses to the intelligent memory 12.
The intelligent ports, portl 16, port2 18 and port3 20,
have independent controls to address and format each data
element. Each port's setup parameters describe the shape of
the packed data, where it is to be found (i.e. base address),
and the method of access (i.e~ read/write, transposed,
reversed, etc.). When a port is started, i~ begins accessing
the first element of the described data and continues until
all data is read or written regardless of errors. The read
ports and ~rite ports are identical except for port ID bits
for dete~nining the port's function.
Referring now to FIG. 3, there is shown a block diagram
o~ one of the intelligent memory ports 16-20 comprising an
address generator lO0, a memory controller 102 and a data
formatter 104. The address generator lO0 produces addresses
to access a data array in memory banks 59 so that the data
array can appear in various convenient shapes for arithmetic
unit 38 manipulation. The data formatter 104 acts as a data
2; translator between the memory banks 59 and the arithmetic

unit 38. Data i~ packed into 64-bit words in the ~emory
bank S9. Packed data is unpacked and left justified in the
data formatter 104 prior to its use in arithmetic unit 38.
The memory controller 102 provides control for the address
generator 100 and the data formatter 104, as well as providing
(control line) interfaces to the arithmetic unit 38. In
addition, the memory controller 102 is coupled to the arbi-
tration and switching network 62 and it initiates and controls
all memQry accesses of an intelligent memory port.
The intelligent memory 12 requires a small set of
parameters to execute any addressing sequence required by a
high~level signal processing language such as a Macro Function
Language ~MFL) described hereinbelow. These parameters have
been integrated into a single addressing operator called an
array transformation that directly specifies the hardware
control parameters from the signal processing language syntax.
The address generator 100 implements the address functions
specified by the array transformation. A pair of 16-bit
control words, the displacement control word 80 and length
control word 90, as shown in FIGS. ~A and 4B and described
hereinbelow, cont.ain array transformation parameters and
initialize address registers within the address generator 100
of any one of the intelligent ports 16-20 which then proceeds
to execute memory address sequences specified by the array
-25 ~ransformation.
-12-
,

1lZ6;29~8
Prior to describing furthe- the structure and operation
of the invention in conjunctior. with the drawings, it is
nece~sary at this point to describe the array transformation
operator and certain aspects of the language used to specify
the parameters in the array tr~ sformation in order to
understand the invention. The address generator 100 of the
intelligent memory l2 as previo~sly noted is functionally
specified by the array transfo~ation operator. The word
"operator~ is used here in the ~eneral mathematical sense of
an entity that transforms an ir.~ut into an output according
to the definition of the operat~r. The input and output are
both arrays, hence the name "ar-ay transformation. n This
operator describes array addressing in terms of a ac~ored
series of nested addressing sec~ences. An array transforma-
tion comprises ten parameters 'or ~pecifying an operation and
has the following syntax which is described below:
~4 a3 A2 ~ 0~
~L4 L3 L2 Ll ¦ B]
The language syntax corresponds directly to th~ parameters
required to initialize the address generator 100 component
in the intel}igent memory 12. '-5 a result, the mathematical
definition oE the array transfc~ation operator serves as
the hardware definition of the _~d~ess generator 100.
An intelligent memory 12 t-co_es possible when the
teohnique of instruction factor-ng is used in conjunction
~~ .
.

- '~
with the separation of data parameters from the processing pro
gram. Instructions are factored into control operators, variable
functions, array modifiers, and operands. Each of these with
the exception oE function~ ha~ a signiicant effect upon memory
operation. Control operators act as an addressing control mode
to determine the sequence(s) of applying operand-data ~o the
arithmetic unit 38. Control operators specify relationships
between array transformations such as length parameters. Vari-
able functions specify the arithmetic and logical operations to
be performed on the elements fetched from memory. Array modi-
fiers alter the normal addressing mode specified by the control
operator in use. Operands refer to the specific data to be used~
When these instruction para~eters are intentionally separated
from the parameters of data, the data parameters can be maintained
in a data descriptor. A data descriptor would consist of a
collection of information describing the variable operands; such
as data type, format and location. At run-time, a program refer-
ences an operand through the variable's descriptor. Dynamic
changes in data "shape" can be handled with no changes affecting
the program. One important requirement of an intelligent memory
is the treatment o~ an entire variable data array as a single
operand. Consequently, the location of the data is determined
by a base address or an initial ~eference~ point which references
the initial element o the array of data~
Signal processing algorith~s are conve~iently expressed

3~3
in the language of matrix mathematics. For this r~ason, MFL
is an array-oriented language. Most variables in MFL programs
denote many elements of related data to be treated as single
entities. Most operations are defined directly on arrays
S without requiring item-by-item statements. MFL ~rrays take-
the Eorm of Yectors, matrices and blocks. Referencing an
individual element of an array requires one, two or three
numbers called indices to mark its position in the array.
A vector is an array whose elements are selected by a
single index, In other words, a vector has one coordinate
and is considered to be a collection of ele~ents arranged in
a line. The number of elements in this line is called the
leng~h of the vector. A null vector is a vector containing
no elements. The length of a null vector is 3ero.
A matr1x is an array whose elements are selected by two
~ndices. It has two coordinates and i5 considered to be a
collection of elements arranged in a rectangle. The number
oE elements in each row is called the row length. The posi-
tion oE an element along the row coordinate is called the
row position or the column number. The number of elements
in each column is called the column length. The position of
an element along the column coordinate is called the column
position or row number. Togethér, the row length I and
column length J constitute the ~shape" of the matrix. The
shape is written J;I~

`~
6~
In some applications the matrix has a direct ~orres-
pondence to some physical reality such as the data derived
from an array of pixels which represent a planar i~age. In
the~e cases the properties of a matrix are directly applicable
to the processing. In other cases the matrix is simply a
convenience for purposes of processing. In many signal
processing operations, a matrix is merely a collection of
vectors. The shape is consistent with the number of row
vectors and the length of each vector (which is equivalent
to the number of columns). The processing require~ an itera-
tive use of the vector set using and modifying one vector at
a time. Consequently the usual interpretation of an array
modifier i5 to apply it to each row vector rather than to
the entire array or matrix. Of course it is sometimes neces-
sary to modify the complete array as well. In this case the
array can be considered as a single long vector.
A block is an ~rray whose elements are selected by three
indices. It has three coordinates and is considered to be a
collection of elements arranged in a set of matrices. A
block uses the row and column terminology defined prevlously
for matrices. In.addition, the number of matrices in the
block K is called the depth of the block. The position of
an element along this coordinate is called the depth position
or the matrix number. Together, the row length, column
length and depth constitute the shape of the block. The
-16

3~;~
shape of the block is written K;J;I.
To illustrate the concept of factored ~ddressing, the
following example qualitatively describes ~ray transposition
in terms of factored addressing. When a mztri~ in row-major
order is read by columns instead, the outp~t will be a matrix
with rows consisting of columns from the in?ut array, The
output array is of rank two, implying that the procedure mu~t
require two displacement sequences.
Referring now to FIG. 5, let I equal ~he length of rows
in the input array and let J equal the length of columns.
The array transformation is described by t~e following
sequences:
(O) Start at the upper left corner o~ the array.
(1) Move down one point in the colu~- direction
J 1 times to define a line of J points.
~eturn to the point before the f ~st
displacement in the column direc,ion~
(2) Move across one point in the row ~irection
and repeat step (1) I - 1 times ~ d~fine an
I x J matrix.
The sequences illustrated in the precs~in_ example may
be generalized into a procedural definitlo- of the array
transforma~ion: '
(~) Go to the initial point delta O 'O).
(13 Move by displacemen~ delta 1 ~1 a =otal of
_17_
.

~ 2 ~
Ll - 1 times to define a line o. Ll points.
Return to the point before the first dis-
placement of this sequence.
(2) Move by displacement delta 2 (Q2) and repeat
step one L2 - 1 times to define a matrix of '
L2 x Ll points. Return ~o the point before
the first displacement of this s~quence.
~3) Move by displacemen~ delta 3 (~3~ and repeat
steps one and two L3 - 1 times to deflne a
block of L3 x L2 x Ll points. R~turn to the
point before the first displace~ent of this
sequence~
(4) Move by displacement del~a 4 (~4) and repeat
steps one, two and three L4 - 1 times and
stop.
The re~ult is a sPt o~ ~4 blocks of L3 x ~2 x Ll points.
The rank of the out~ut array is equal to t:~e number of dis-
placement sequences required to generate it. With the above
definition, output arrays of up to rank folr are possible.
Each nested sequence corresponds to a sepzrate hardware
circuit~ When ne,cessary, more sequences ~y be added to ~he
,definition to produce shapes of higher ra~c.
In a Macro Function Language ~MFL), an array trans-
formation on an input array C is specifiec ~y ten parameters
-25 written directly below the name of the ar_~y and its da~a
-18-

6~
descriptor according to the following syntax:
C
~1~ 10;20
[~4 ~ Ql I ~0]
[L4 L3 L2 Ll ¦ B 1
The parameters fall into tbree categories: displa~ements
(~), lengths (L) and boundary modes (B).
Referring now to FIG 6, the boundary mode (B) parameter
of an array transformation sets a geometrical context for
the displacements and lengths. The boundary mode determines
the action o the address generator 100 if a given displace-
ment results in an address that falls outside the boundaries
of the array. The boundary modes are as follows:
W = wrap-around. When a boundary is encountered
by a displacement, the address generator con-
tinues to read on the other side of the array.
For example, when reading from left to right
along a row, the address generator 100 moves
back to the left-most element of the row if
the right edge is encountered.
Z a zero-fill. All points outside the array are
assumed to be zero for a read port and valid
data from the AU 38 ~s dropped for ~ write
port.
I = ignore boundaries. This suffix may be appended
--19~
.

to either zero-fill or wrap-around. In this
case, all boundaries except the last point in
the array are ignored. So wrap-around with this
suffix moves the address pointer back to the head
of the array, whereas zero-fill keeps the pointer
9oing.
FIG. 6 shows the effect of each boundary mode on an address
generator displacement encountering the ri~ht edge of a
ma~rix.
Displacements of an array transformation are iteratively
added to an initial point to generate addresses in a regular
sequence. The displacements are defined as pairs of numbers
representing generalized spacing on a two-dimensional surface,
thereby facilitating detection of the endpoints of each row
and column o~ a matrix. This representation is appropriate
for most si~nal processing macros. If desired, the concept
may be extended to n-tuples for general displacements on an
n-dimensional array. Displacements on arrays may be written
as complex numbers where the imaginary part is the displace-
ment in the row direction and the real part is the displacement
in the column dir,ection. Real number displacements are in
the row direction wi~h no displacement in the column direction.
Either form may be used with vectors, matrices and blocks~
Symbols may also be used~ As shown in ~IG. 7, "i~ defines a
unit displacement to the right across the row direction, ~j n
-20-
,

. ~ J
~6~
defines a unit displacement down the colu~ direction, and
"k" deines a unit displacement into the de~th directlon.
Displacements by multip~es of either i, j ~r k are indicated
by preceding the symbol with the size of t~e displacement.
For example, -5j denotes a displacement by five points in the
negative column direction.
A general displacement through a bloc~ requires a triplet
of numbers in a particular form (e.g~ dept~; column; row~. For
most applications, the k direc~ion does no- require the s~me
level of flexibility given to the j and i Zirections. Data
ormed into a three-dimensional block usua'ly can be treated
as a set of matrices. In these instances, ~wo dimensional
displacements on each matrix along with se~lential accessing
of each matrix in the block are suffficien_~ As a result,
displacement triplets through a block are not always directly
supported in hardware. However, devices i-?lementing larger
~-tuples might be advantageous for some ap-Iications. A
variable displacement value may be stored n a co-operand to
the array transformation. Presence of an explicit value in
the co-operand is indicated by a "d" symbo' in the corres-
ponding displacement of the array transfor-~3tion. The "d"
is used for user-defined variable, and may ~lso be defined
by a translator in order to insært a non-c~ded constant into
the appropriate register of the address ge-~rator 100. The
displacements notation of an array transfc-ation along with
-21-

their interpretation in terms of ~1 through ~4 are summarized
as follows:
Xi = Move across the row X points to the right.
Yj = Move Y points down the column.
Zk = Move Z points into the depth.
0 = Do not displace -- repeat the previous sequence.
d = The displacement i5 contained in a variable
co-operand to the array transformationO
b;a = Mo~ "a" points alon~ the row and "b" points
down the column.
c;b;a = Move "a~ points along the row, "b~ points down
the column, and "ca points into the depth~
The instruction loading time is reduced by assigning
codes to the most frequently required displacements. One
possibility is to choose three-bit codes for ~i, +j, ~k, or
0. The eighth code is for "d~ r to signify that the displace-
ment value is a separate complex number sent by the command
lnterpreter 36. All variable displacements and constants
not equal to +i, +j, +k, or 0 are sent to the address gen
erator 1~0 as ~d"'s. In a two-dimensional address gen~rator,
each "d" is replaced with a comolex number denoting a general
displacement in the combined i znd j directiQns. If either
the real or imaginary part is zero, the displace,ment is
solely in the i or j direction ~espectively. "k" displace-
ments of the form Zk are specified as Z x Jj, on an array

~ J
.~6~6~3
reshaped to (K x J);I. Hardware directly supporting the
full three-dimensional representation would re~uire "d"'s in
the form of ~riplets. With the three bit codes, four dis-
placements and the initial point can be placed into a single
16 bit word as shown in FIG. 4Ao
Referring now to FIG. 8 and FIG. 9, the initial point
is a displacement from 0;0;0, the upper left corner of the
array. For nesative values, its location depends on the
boundary mode. FIGS. 8 and 9 compare the different locations
of some initial points for wrap-around and zero-fill.
For wrap-around, the displacements to a few of the corners
of a block are as fol~ows:
-i = upper right corner
-j = lower left corner
-k = upper left deep corner
-';-1 = lower right corner
O = upper left corner
For zero-fill 9 all of these points are clustered about the
upper left corner of the array as shown in FIG. 9.' The
zero-fill mode interprets the array as having infinite'extent
wi,th elements out,side the R:J;I shape set to zero.
Lengths oE an array transformation are real integers
indicating the number of times ~ displacement is performed
and the resulting shape of the array. Several mnemonics are
2~ provided to specify output lengths in terms of the input
-23-

~J
shape of the array. A capital letter ~K" indicates the
depth, "J~ indicates the column leng~h and "I" indicates the
row leng~h of the input array. Other numeric lengths must
be written explicitly in the transformation if they are
constant and in a co-operand if they are variable in a form
similar to explicit displacements.
For most numerical appllcations, two ports simultaneously
access data to create a pair of input streams for a dyadic
function. If a displacement must occur as many times as
necessary to match the corresponding displacements of the
other input argument to a dyadic function, the number "1" is
written. This symbol may be overwritten at run time by the
command interpreter 36 with the appropriate length to match
the other argument's array transformation or portl 16 and
port2 18 can monitor the corresponding port length to deter-
mine the replacement length dynamicallyO If the two array
transformations hav~ corresponding lengths equal to one, no
displacement occurs for the se~uence. This is how a control
operator becomes two coupled array transformations.
A displacement may be specified to continue until a
boundary of the a~ray is encountered. The length S~ mean~
ing "stop on boundary, n is used for this caseO In wrap-around
mode, if length m is set to S, displacements by ~m continue
until a ~m encounters a boundary. In zero-fill mode, dis-
placements by ~m continue until any lower level dis~lacement
-24-

J
~6;~3168
encounters a boundary.
When stop on boundary is used as a length parameter of
a port addressing one of the input data streams of a dyadic
function, the corresponding length in the other port must be
"1", meaning ~repeat until the other array stops on its length.
The following summarizes the array transformation length
symbols:
K = depth of input array
J - column length of input array
I = row length of input array
d = the length is contained in a variable
1 = repeat as necessary to match a corresponding
output array shape
S - stop displacing when a boundary is
encountered
These six lengths are specified in terms of three-bit codes
similar to displacements. The entire length ield and bound-
ary mode occupy a single 16 bit word as shown in FIG. 4B.
The following examples show a few simple forms of array
operations possible with array transformations. Each example
is given in terms of a Macro Function Language (MFL) syntax.
The array transformation is applied to an array B or C of
some sample shape and type contained in its data descriptor.
Below the array transformation is written the data descriptor
of the output array from the transformation~ The parentheses
-2;-

enclosing this information indicate that it is a responsefrom the intelligent rnemory 12.
Normal_read. The ordering of arrays in row-major order
with blocks consisting of sets of matrices implies that
normal accessing of data from memory will be:
B
C16 2;8;4
[O k j ilo]
~1 K J I¦W]
(C16 2;8;~3
The transformation is applied in this example to a block B
of 16-bit complex data shaped 2;8;4. The notation specifies
that the array access begins at the initial point 0, the
upper left corner of the input array. The primary displace-
ment is i, the row direction, to form an output row of length
I. Next the access moves by j, the column direction, and
repeats the first set of displacements to form an output
matrix shaped J;I. The ~hird displacement sequence i5 by k,
the depth, fo]lowed by repetition of the column and row dis-
placements to produce the final K;J;I output shape. The
fourth displacement is a repeat with length 1 to allow for
repetition of the block shape to match a c~rresponding output
array of rank four, if necessary. Since ,;~is transfor~ation
left the array in its original form, it is considered the
identity operator.
-26-
.

:~Z~i2~
Block Transposition. Some algorithms require that a
, .
block of data is to be read by depthr then by column, for
each row. In this case, the array transformation is as
follows:
C16 2;8;4
- 10 i j klO]
[1 I J ~¦~]
~C15 4;8;~)
The array access begins at the initial point 0, the upper
left corner of the input array. The primary displacement
- is k, the depth direction, to form an output row of length K.
Next the access moves by j, the co].umn direction, and
repeats the first set of displacements to form an output
matrix shaped J;R. The third displacement sequence is by
i, the row direction, followed by repetition of the column
and depth displacements to produce the final I;J;K output
shape.
~ . The following array transformation trans-
poses matrix B so that rows become columns and columns
become rows. The R16 means that the array shaped 10;20
contains real 16-bit data.
--21--
.

6B
R16 10;20
[O o i j 1]
~1 1 I J¦o]
tR16 20;10)
In this and subsequent matrix examples, it is understood
that for blocks, the array trans~ormation would be repeated
for each matrix of the block.
Inner product addressingO The addressing for a matrix
multiply matches the row vectors of B against the column
vectors of C. Each row vector of B is repeated to match
each of C[I] column vectors in C before the next row of B is
read. The array transformations are as follows:
E3, C
15R16 10;20 R16 20;15
~O j O i I O] [O U i j I O~
[1 J 1 I¦W] [1 1 I J~W]
(R16 10;15;20) (R16 10;15;20)
The first sequence in the B array transformation is by i for
length I r meaning that a row vector from B is read. The
corresponding sequence for C reads a column, implying that
B[I]~ the row length of B, and C[J], the column length of C,
must be identical. The second sequence for C is i for length
I, meaning that the columns of C are read consecutively.
The row vector of B is repeated to match the number of columns
-28-
.

J
d 9 f~ ~B
in C, as indicated by the O displacement and the 1 length.
This matching is repeated for every row vector in B, as
indicated by the j for length J in the third sequence of B
and ~he coxresponding repeat for C.
Several conventions may be used to abbreviate array
transformations as follows:
1. When no displacement is written, assume that it
is 0(1), meaning "repeat as necessary to match
the other input array.~
2. When no length is written with an i or ~i,
assume that the length is I, the row length of
the variable.
3. When no length is written with a j or -j, assume
that the length is j, the column length of the
variableO
4. When no length i~ written with a k or -k,
assume that the lenyth is K, the depth of the
variable.
5. When no initial point is given, assume that it
is 0. When the initial point is defaulted, the
vertical line used to separate the initial
point from other displacements is deleted.
6. When no boundary mode is given, assume that
B equals wrap-around.
7. If no array transformation is given, assume
--2g -

that it is a normal read:
[0 k j i¦o]
~1 K J I¦W~
8. An array transformation may be written with the
entire displacement line omitted. In this case,
the remaining line specifies the lengths in the
form to be used with a normal read. To indicate
that the displacement line has been omitted, the
length line is written with the type and packing
in the form of a new data descriptor. Semi-
colons are used to.separate the lengths.
B B
R16 3;8 is equivalent to R16 3;8
R16 3;4 [0 k j i¦o]
[1 K 3 4¦W]
Returning now to the description of the structure and
operation of the invention, and referring now to FIG. 10,
there is shown a block diagram of the address generator 100,
which provides an address 130 for its intelligent port each
machine cycle. The address generator 100 comprises a micro
sequencer 122 along with its control store 120, an indices
- generator 111 comprising two matrix access chips (MAC) 110
and 112, an address translator 114, a multiplier 118 and a P
Bus 128 for information transfer within the address generator
100. The matrix access chips (MAC) 110 and 112 provide row
-30-
.

12~9'{;B
and column indices specified by the parameters of an array
transformation which are loaded into MAC 110, 112 via ~he
16-bit displacement control word 80 shown in FIG. 4A and the
l~-bit length control word 90 shown in FIG. 4B for addressing
S each and every data element of an array.
This paix of 16-bit control words containing array
transformation parameters shown in FIG. 4A and FIG. 4B
initialize address registers within the address generator
100 of any one of the intelligent ports 16-20 which then
proceeds to execute memory address sequences specified by
the array transformation. The displacement control word 80
as shown in FIG. 4A has six fields. Five of these fields
are identical and contain 3-bit codes for speciying each of
the displacement parameters, delta 4 through delta 0, of an
array transformation. The other field is a l-bi~ fractional
field (F) which when set to a 1 indicates that the least
significant 5-bits of a 16-bit data bus are used to support
fractional displacement. Table 1 lists the eight functions
available for most of the delta fields (except for the P
function).
The length control word 90 as shown in FIG. ~B has six
fields. Four of these fields, length 4 through length 1,
are identical and contain 3-bi~ codes for specifying each of
the length parameters of an array transformation. The
"mode field" is also 3-bits and the Boundary (Bj field is
-31

~L~6~
l-bit~ Each of the lerlgth fields has eight possible func-
tions and Table 2 lists these functions and their definitions.
The mode field defines the modes of operation within the
address generator 100 o-f the matrix access chips 110, 112,
and Table 3 lists the eight modes. The boundary field
bit, B, determines the response of the intelligent ports
16, 18, 20 when a boundary condition is encountered. When
set to "0~ the port performs a zero-fill mode. The boundary
modes are described herein and illustrated in FIG. 6.
Each MAC device 110, 112 may be implemented with a 180
pin, 7500 gate, CMOS gate array technology. The address
translator 114 is coupled to the MAC 110 and 112 and the
multiplier 118. The multiplier 118 may be embodied by
model IDT7217L multiplier manufactured by Integrated Device
Technology, Inc. of 3236 Scott 80ulevard, Santa Clara, CA
95051. The address translator 114 converts the row and
column indices to the 30 bit address 130 which points to
the location of the most signiicant bit of a data element
in the intelligent memory 12. The multiplexer 116 can be
controlled so that row and column indices are multiplied
together to produce FFT coefficients. The address trans-
lator may be implemented with a 144 pin, 5700 gate, CM05
gate array technology. The microsequencer 122 and its
associated control store 120 directly control index g~n-
eration in the MAC 110 and 112 and provide control signals

for data flow within the entirC intelligen~ memory 12 pipeline.
The microsequencer may be imple_ented with an AS890 sequencer
manufactured by Texas Instrumer.~s Incorpor2ted of Dallas,
Te~as 75265. The MAC 110, 112, devices together with the
microsequencer 122, constitute ~he bulk of the 1l intelligence"
of the Ports 16, 18 and 20.
The partitioning of the M~C 110, 112 logic provides a
slice architecture that allows each MAC llQ, 112 to independ-
ently address two dimensional variables sp~cified by an
array transformation. The intelligent memory architecture
supports two independent address streams one at a time, with
switching between the streams ~der microcc~e control 124 as
directed by the 3 mode bits in ~he length control word 90.
Hence, the 3 mode bits provide eight modes of operation
summarized in Table 3 for the ~C 110, 112 and are as follows:
- MAC DUPLEX Mode refers to the c?eration of each MAC independ-
ently, each producing its own a~ress sequence, while MAC
SIMPLEX Mode refers to the operation of th~ MAC 110, 112
devices together to produce a s ngle address sequence.
Simplex mode comprises a full s-mplex mode and a half simplex
mode~ Full simpl~x mode uses ~th MACs 11~, 112 but half
simplex mode uses only the lowc~ MAC 112~ ~alf simplex mode
is only an optimization for 5pc~ that com-s from microcode
partitioningO Duplex mode con-^ins submod^s called flushed
duplex mode and non-flushed du-:ex mode ea~'n having three
-3~
.

;8
variations for a scalar, vector, and matrix. Flushed mode
data are routed back into Address Generator 100 through the P
BUS 128, whereas nonflushed mode data go to the AU 38 or the
address directs where AU 38 data goes in RAM memory 50.
Scalar, vector, and matrix modes determine when the r~st of
intelligent memory services the indices generated by the
other MAC~
The architecture of the MAC 110, 112 devices allows
direct support of array transformation statements and MAC
110, 112 devices are cascaded within the indices generator
111 to handle up to four dimension variables~ The idea of
displacement and length parameters as described hereinbeore
is the basis of the array transformation statementC. FIG. 11
shows the location of the array trans~ormation parameters
within the cascaded upper and lower matriY access chips 110,
112 of the indices generator 111. Array transformations
supported by the pair of MAC 110, 112 devices allow for a
generic method of generating an address sequence for a rank
four output shape array from a rank two input shape array.
Referring now to FIG. 11, ~IG. 12 and FIG. 13, there is
shown the functional structure of a MAC 110, 112 device.
Within each MAC 110, 112 there are four index generators
140-143 comprising an upper row and column index pair 140,
141 and a lower row and column index pair 142, 143. Each
index generator contains a 16-bit base working register 162
-34-
, , .

`-- ~
and a 16~bit displacement ~3gister 160 as shown in FIG. 12.
Each of these two registers can be wri~ten to and read from
via the A BUS 30 prior to -~struction execution.
The data in the displacement register 160 of the lower
MAC 112 once loaded from t~e CI 36, will not change. All
data held in the MAC 110, 112 will be 16-bit two's complement
and loaded as integers. W2en the fractional addressing
mode is selected, the shiftnrs 148, 149 will shift the data
5 bits to the right ~sign e~tended) before the index is pro-
duced. When the fractional addressing mode is not selected
then all loaded data is in integer format and the shifters
are transparent.
The comparators 146, 1~7 make running comparisons between
the current index and the c~ntents of the two boundary regis-
ters, column length registe~ 144 and row length register
145. The running comparisc~ allows the circuitry to deter-
mine when the current inde~ has exceeded the boundary limits
of the "Input Shape~ via t~s status bits in the condition
code register and MUX 158. The Input Shape is defined by
the contents of the column length register 144 and row length
register 145. The upper le~gth counter 150 and lower length
counter 151 are used to co~t the number of displacement
increments imposed on the conteffts of the ~0 register con-
tained in the same level. ~s each displacement is added to
~he current content of ~0 -:~e associated length counter
-35-
.

within that level is decremented. Negative sign detection
circuits generate status 152, 153 bits which flag to the
microsequencer 122 the boundary of the output shape.
Still referring to FIG. 13 showing a hlock diagram of
s one of the index generators 140-143~ both the displacement
register 160 and the working re~ister 162 have four clear
and preset inputs 171-174. Two of these are dedicated to
the selective clears or presets of the upper 15-bits of the
register. The other two preset and clear inputs are dedi-
cated to the least significant bit (LSB) of the register.
Hence, these registers 160 and 162 are asynchronously prese~
and cleared to one of four default values 0, 1, -1, -2. Each
one of the index generators 140-143 has a 16 bit complement
capabil~ty performed by exclusive-or gates 168 and the com-
plement 175 signal. Also, there is a 16-bit fast carry
adder 170 complete with carry input 176~ This architecture
allows the uncomple~ented or complemented contents of the
displacement register 160 or column length registers 144
and row length register 145 to be added to the curren~
content of the working register 162.
The upper and lower level registers in the index genera-
tors 140-143 in each MAC 110, 112 are distinguished by the
prefix U and L respectively. Thus, the displacement register
160 and the working register 162 in the index generator of
the MAC 110, 112 devices have the following designations:
-3~-

~i uci = Row and Column upper level w~rking r~isters ~U 0)
lri;lci = Rcw and Column lower level w~rking registers (L 0
~ri;udci = Row and Column upper level displacement registers (U 2/4
ldri;ldci = Raw and Column lawer level displa~nt registers (L 1/3~
Hence, each MAC 110, 112 has an upper and lower displacement
register 160 in each of the upper and lower index generators
140-143 for storing the ~4 and ~3 displacement parameters
of an array transformation. Where there i5 a need to dis-
tinguish between upper and lower MACs, then the prefix U or L
will be used.
Referring now to FIG. 14, there is shown a block diagram
of the length counter 150, 151. The length number register 180
contains static data (iOe., not updated during execution) and
acts as a reference for re-initializing the length count
number register 182. The length count number register 182
contains a current count of the number of displacements that
an index generator 140-143 will invoke in otder ~o generate a
one dimension access sequence. Each level of ~ MAC 110, 112
handles one dimension. Thus, the, MAC 110, 112 length registers
are referred to as follows:
UN = upper level number register
UCN = upper level count number register
LN = lower level n,umber register
LCN = lower level count number register
The length counter 150, 151 i5 provided with a negative sign
-37-
,

6~
detectlon capability which allo-~ the microcode to determine
via the XCNEG 194 statu~ bit whether or not to continue with
the displacement increments wit~in the index generator belong-
ing to the same level~
The length counter 150, 151 decrements the current content
of the length count number register 182 by one when a valid
index offset pair has been generated. If the current count
is negative then it stays negative even after several decrements
executed due to condition code ~ipeline delay. This gives the
circuitry more time to catch th~ end condition. Both the
length number and count number registers 180 and 182 contained
in a length counter 150, lSl al'~w read and writ`e access
over ~he bus in a similar manner as the index generator
140-143. However, these registers 180 and 182 are not
lS independently writable because -~is function is not required.
Referring now to FIG. 15, 'here is 5hown a matrix with
a plurality of zones identified outside of its boundaries,
.The row length 200 and column l_ngth 202 comprise the "INPUT
SHAPEn. The top left hand correr is regarded as the zero
reference point. An initial d,s?lacement 0 is an offset
from this reference point. The in?ut shape or matrix consists
of the number of rows and colur-.s of data elements and is
thus limited to two dimensions. Each data element is uniquely
defined in terms of its row in-^x z~d column index. The
reference point is therefore de-ine~ as (0,0). An element
,

- .~2~ 6~
ln zones 1, 4 or 7 is said ~o have exceeded the lower row
boundary of the input shape. The column index of any such
elements will be negative. The MAC 110, 112 are therefore
required to handle negative numbers via the ~wo's-complement
notation. The MAC 110, 112 supplies to the microsequencer
122 a status bit 157 sourced from the sign bit of the current
column index~ An element in zones 1, 2 or 3 is said to have
exceeded the lower column boundary of the input shape or
matrix. The row index of any such elements will be negative.
The MAC 110, 112 supplies to the microsequencer 122 a status
bit sourced from the sign bit of the current row index.
An element in zones 3, 6 or 9 is said to have exceeded
the upper row boundary of the input shape or matrix. Any
such elements will have positive column indices but the
integer value of the column length 202 will be greater than
or equal to the row length 200.
A running comparison is performed between the current
column index of each level and the column length of the input
shape. The MAC 110, 112 supplies to the microsequencer 122
a ~tatus bit 154 which indicates the result of this running
comparison.
An element in zones 7, 8 or 9 is said to have exceeded
the upper column boundary of the input sha~e or matrix. Any
such elements will have positlve row indices oE ~nteger value
greater than or equal to the column length. A running
-39-

comparison i5 performed between the current row index of each
level and the column length. The MAC 110, 112 supplies to
the microsequencer a status bit 155 which indicates the
result of this running comparison.
Referring now to FIG. 16, in addition to FIGs. 11-14,
a MAC 112 is required to generate a sequence of elements in
response to an array transformation where each element is
specified by a row index and a column index pair which will
define an output shape. Movement from one element to the
next element in a sequence is accomplished by specifying the
contents of a row and column displacement register 160 and
adding it to the contents of the current working register 162
and decrementing the length counter 182 in a MAC 112. The
lower level of the MAC 112 is used for this one dimensional
movement. The displacement register 160 in the lower level
and the working register 162 in lower level are loaded with
the ~1 displacemen~ index para~eter of the array trans-
formation, In the example sho~n in FIG. 16 the working
register 162 initially is loade~ indirectly by the command
interpreter 36 prior to going into execution mode with the
index of point "1." as defined by a ~0 in an array trans-
formation. When a movement in the one dimension specified
by the contents of the lower level displacement register
160 comes to an end (index 6 in FIGo 16) and when the length
counter is negative then the following occurs: the displacement
-4~-

2t 9 6 ~3
register 160 in the upper level of the sa~e MAC 112 will
contain the ~2 displacement index of the array trans~orma~ion
required to move the current index of that level to the next
linear sequence ~i.e. from 1 to 7 in FIG. 16) D The result,
after being loaded back into the lower level working register
162 and validated, will be down-loaded into the lower level
working register 162 of the MAC 112 and the length counter
number (LCN) register 182 will be re-loaded from the content~
o~ the length number (LN) register 180. The lower level o
MAC 112 is then ready to once more complete a linear sequence
from 7 to 12 in FIG. 16 as specified by the ~1 displacement
index parameter. This process corresponds to a two-dimensional
address sequence since ~1 and ~2 will not be altered; also,
this description can be extended to higher order address
sequences up to four dimensions in the present embodiment.
Each time a new dimension is invoked, transfer of data from
an upper level to a lower level of MAC 110, 112 hardware is
re~uired. It is therefore a requirement of the MAC 110, 112
that each ~0 working register 162 have access from the
level above which includes upper MAC 110 to lower MAC 112
transfers.
Referring now to FIG, 16 and FIG. 17, the indexes 5, 6
and 12 as shown in FIG. 16 are 'said to be outside the input
shape or "out of bounds." The occurrence o these indices
will be handled differently depending on the boundary mode

selected~ Each level of MAC 110, 112 contains a length
number register 180 which stores the length control word as
shown in FIG. 4~. The length number register 180 i9 loadable
from the CI 36 during initialization and one bit of this
control word is allocated to boundary mode B. The two
modes selectable via this bit are "wrap-around~ and "zero-filln.
Each invo~es different system responses when the current
index is out of bound (indicated by the boundary status
bits). In zero-fill mode an out of bounds index is ~valid"
but the data obtained from memory will be substituted by
zero. In a write port the data would simply not be written
to memory. In wrap-around mode indices outside the boundary
are regarded as invalid. It is a requirement in this mode
that if a single displacement results in an index outside
the input shape (matrix) it must be adjusted to point to an
index inside the input shape. For index adjustment of ~he
example shown in FIG. 17 to go from 24 to 19, it is required
that the row length be subtracted rom the column index. To
go from 4 to 40 requires column length to be subtracted rom
the row index. In order to support this adjustment require-
ment, the MAC 110, 112 architecture allows the selective
subtraction of the content of the boundary length registers
144, 145 in each level of the MAC 110, 112.
Referring now to FIG. 10, tne MAC 110, 112 de~ices pro-
duce row and column indices representing offsets from an
-42-

2~
initial reference or starting point (~ 0) as speci~ied in an
array transformation. The Address Translator (AT) 114 toge-
ther with the multiplier 118 converts this offset into a 30
bit address 130. In order to calculate this address, the
AT 114 must know the base address, ~0 (initial starting
point), the row length (Ll - L4 number of elements in a row),
and the packing factor of the data element (number of bits
that comprise the data element~. These values are loaded
into the AT 114 prior to current instruction execution by
the command interpreter 36 via the A BUS 30. The AT 114
registers are double-banked, allowing the command interpreter
36 to set up for the next instruction while the AT 114 is
executing the current instruction.
The address translator 114 essentially converts a row
index and column index identifying a location of data to a
linear address to identify the location of the element of
data. The AT 114 supplies the row length 139 to the multi-
plier 11~, which then multiplies the row index 137 offset by
the row length 139 and supplies the result back to the AT
114. The AT 114 adds this product with the column index 138
offset to obtain the total index offset from the initial
starting point. The AT 114 converts this index offset to a
physical offset by shifting the index offset by an amount
equal to the pac~ing factor effectively multiplyinc by the
number of bi~s in the data element. This address cffset is
-~3-

J
then added to the base address to obtain the 30 bit address
130 for the data element. This 30-bit address 130 now points
to the most significant bit of the data element to be accessed.
An alternate path exists ~or generating Fast Fourier Transform
(FFT) coefficient addresses. This path replaces row length
- by a column index (multiplier 167) and the sum of the product
and column index by the product itsel~ (which is the product
of the row and column index). The product of the row and
column index represents a linear offset into the linear
table of complex exponentials stored in the ROM memory bank
60, The Digital Fourier TransforTn (DFT) coefficient ~atrix
is shown in FIG. 13; it illustrates that the exponent of w
is the product of the row and column index.
Referring now to FIG. 3, the data formatter (DF) 104
may be implemented with two identical 180 pin, 2500 gate,
CMOS gate arrays, each having a 32-bit slice and may be
configured either as.Read or Write data ormatters. This
configuration is controlled by hardwired connections,
As a Read formatter, the data formatter 104 reads 64
bit packed data from the Memory banks 59, unpacks the data
element, shifts and masks all unnecessary bits to zero, and
presents to the AU 38 a left-justified data element. Shift
amounts and mask parameters (i.e. packing factor etc.) were
pre~iously loaded into the data formatter 104 by the command
interpreter 36. As in the address translator 114, data

formatter 104 control registers are also double banked. All
normalization calculations are performed by the command in-
terpreter 36, whiah in turn informs the DF 104 of the results
via the shift amount.
As a Write formatter, the data ~ormatter 104 ls presented
left-justified data elements from the arithmetic unit 38.
The data formatter 104 must perform a read-modify write
operation, packing this new data element among the unchanged
data elements of the 64 bit word in intelligent memory 12.
The control circuitry of intelligent memory 12 is such that
a read for the Read modiy write occurs only if the 64 bit
boundary has been crossed. This eliminates unnecessary memory
reads by the write port.
S~ill referring to FIG. 3 there is shown the Memory
Controller (MC) 102 which may be implemented with a 144 pin,
3500 gate, CMOS gate array and provides overall central
control for the intelligent memory 12 pipeline. In
addition, it provides interfaces to the command interpreter
36, arithmetic unit 38 and the arbitration and switching
network 62.
The A BUS 30.interface of ~he memory controller 102
decodes ~he A BUS address, providing chi? selects as required,
and acts upon or distributes all A BUS control signals as
required. The memory controller 102 also controls bidirec-
tional buffers for the A BUS 30 data path. This method

J
allows each intelligent memory 16-20 port to present only
one load to the A BUS 30, and allows all decode circuitry to
reside in one central location for each of said ports 16-20.
The AU interface supplies data ready and data request
control lines to the arithmetic unit 38. These lines are
used to control data flow between the AU 38 and an intelligent
port 16/ 18, 20. Based on the state of these control lines,
the memory controller 102 has the ability to selectively
start and stop ~he intelligent port pipeline as required.
The memory controller 102 also provides the intelligent
ports interface to the arbitration and switching network 62.
The memory controller 102 receives the 30 bit address 130
from the address generator 100, and decides if a memory
access is necessary (if the 64 bit word boundary has been
crossed). If a memory access is required, the memory con-
troller 102 generates a 3 bit BANK REQUEST 93 code to the
arbitration and switching network 62. The memory controller
102 then looks for the bank acknowledge 94 signal from the
arbltration logic, stoppiny the intelligent port pipeline
and notifying the arithmetic Ullit 38 if the port has lost
memory arbitration. Thus thé memory controller 102 controls
the overall flow of data through the intelligent port pipelinet
and between the intelligent port 16-20, the arlthmetic unit
38 and the memory banks 52-50.
This concludes the description of the preferred embodiment.
-46-

6~3
However, many modifications and alterations would be obvious
to one of ordinary skill in the art without departing from
the spirit and the scope of the inventive concept. For
example, the number of RAM or ROM storage locations in the
S intelligent memory 12 may vary and the number of intelligent
ports may vary depending on system applications. Also, the
multiplier 118 in the address generator 100 could be removed
if memory chips with two dimensional structures were available
externally for row and column indices to index into directly
instead of the current approach of producing a linear address
displacement first and then brea'~ing it down inside memory
chips into row and column addresses. In this case, FFT
coefficients could be generated by using variable "del~as"
as provided for in duplex mode. In addition, the parameters
of the array transformation and supporting hardware embodiment
- can be expanded to specify additional displacement sequences
for arrays of higher rank. Therefore, it is intended that
the scope of this invention be limited only by the appended
claims.
-47-

TABLE 1 - DISPLACEMENT FLELD
Function Description
z Initialize the specified row and column
registers to zero so that they point to
upper left corner of the matrix. T~is
results in zero displcement.
+i Initialize the specified delta row register
with zero, and the delta column register
with one. This results in the movement
across the row one point to the right.
~j Initiali2e the specified delta row register
with one, and the delta column register
with zero. This results in the movement
down the column one point.
+~ Initialize the specified delta row register
with one and the delta column register
with one. This results in a one point
- diagonal movement downward to the right.
P Initialize the specified delta row register
with zero and the delta column register
with zero. This function ~lso specifies that
the row and column delta registers need
data substitution prior to every round of
execution. (Not applicable to Delta 1 and
Delta 2 fields.)
i Initiali2e the delta row register with zero
and the delta column register with negative
one. Movement is one point to the left~,
-j Initialize the delta row register with
negative one and the column register with
zero/ Movement is one point up the column.
-k Initialize the delta row 'and column register
with negative one. Movement is one point
diagonally upward to the left.
-48-

TA8LE 2 - LENGTH FIELD
._~
FUNCTION DEFINITION
1 Repea~ as necessary to match a corres-
ponding output array shape
I Row length of input array
J Column length of input array
S Stop on boundary
NOP Default option
NV Not used
NU Not used
ABLE 3 - MODE FIELD
.
NAME DEFINITION
FSPX Full Simplex Mode
DPXS Duplex Mode Scalar
DPXV Duplex Mode Vector
DPXM Duplex Mode Matrix
HSPX Half 5implex Mode
FDPXS FIushed Duplex Mode Scalar
FDPXV Flushed Duplex Mode Vector
FDPXM Flusbed Duplex Mode Matrix
~ 9_

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Inactive: IPC from MCD 2006-03-11
Inactive: IPC from MCD 2006-03-11
Inactive: CPC assigned 2003-04-24
Inactive: CPC assigned 2003-04-24
Inactive: Adhoc Request Documented 1993-11-14
Time Limit for Reversal Expired 1993-05-15
Letter Sent 1992-11-16
Grant by Issuance 1989-11-14

Abandonment History

There is no abandonment history.

Fee History

Fee Type Anniversary Year Due Date Paid Date
Registration of a document 1998-07-23
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
MICRON TECHNOLOGY, INC.
Past Owners on Record
ALAN J. DEERFIELD
SUN-CHI SIU
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Drawings 1993-09-14 10 244
Cover Page 1993-09-14 1 17
Claims 1993-09-14 2 64
Abstract 1993-09-14 1 20
Descriptions 1993-09-14 50 1,533
Representative drawing 2002-02-14 1 10
Fees 1991-10-29 1 29