Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.
6i3~0
BACKGROUND _ THE INVENTION:
1. Field of the Invention.
The invention relates to a memory system for use in
digital computers.
2. Prior Art.
Countless memory systems are known for permitting a
processing means (e.g., central processing unit, arithmetic logic
unit, etc.) to select locations in a random-access memory ~RAM).
For purposes of discussion, and recognizing the pitfalls in
10characterizing memory systems, the prior art is briefly discussed
in two general categories. One category (non-virtual memories)
receives a logical address and employs some means such as an
address extender technique, memory management unit (MMU)~ bank
switching, etc., to provide a larger, physical address for
15 addressing a RAM. In the second category, a larger logical address
from the processing means is translated to a generally smaller,
physical address for accessing the RA~I. As will be seen, the
present invention is more like the second category of memory
systems than the first.
The first category of memory systems is typically used by
microprocessors, and the like, and often uses an ~iU. This unit
receives a portion o~ the logical address and provides a portion
of the physical address. For the mapping provided with this
non-virtual storage, a physical address exists for each logical
25 address.
In the second category, often two memories for storing
data are employed. One, commonly referred to as a data cache, is
a smaller, higher speed RAM (e.g., employing static devices and
'7~
h~ving a system cycle tirne oE approxima-tely 200 nsec.). Data
frequently addressed by the processing means is stored in the data
cache memory. A larger RAM (e.g., dynamic devices with system
cycle times of 1-2 micro sec.) provides the bulls of the RAM
storage. In a typical process, usually more than 90% of -the time,
data sought by the processing means is in the data cache and if
not there, much greater than 99% of the time the data is in the
dynamic RAM~ A fast memory (address translation unit (ATU) or
translation look-aside buffer) is used to examine addresses from
the processing means and for providing addresses for the RAMs. As
many as three serial accesses can be required with this
arrangement. The effective cycle time for this memory system is in
the 300-400 nsec. range for the described examples. The effective
cycle time is reduced from what would appear to be faster access
in the data cache, since to resolve a miss in the data cache, and
actually access the main RAM requires approximately 1-2 microsec.
because of the serial accessing. With the above described memory,
for each context switch, the ATU cache must be reprogrammed, thus
further reducing the speed of the memory system where context
switching is required.
As will be seen, the present invention employs only one
type of memory for data storage (e.g., dynamic ~AMs) without the
equivalent of a data cache. An associative memory operation used
to identify locations in RAM occurs in parallel with the accessing
of portions of the RAM to accelerate the overall cycle time.
Context switching can occur much more quickly than with prior art
systems. The cycle time in the invented system is slightly slower
than in the above-described virtual memory systems. However,
because of numerous operation advantages the effective cycle time
7(~
in many cases is fa~ter without the complications inherent in
prior art memory systems. For instance, the inven-ted system has a
guaranteed, cons-tan-t cycle time (assuming the data is in memory).
This is particularly important for "pinned" or "locked" pages~
Sll~MARY _F T~IE INVENTION
A memory system for use in a digital computer is
described. The memory itself comprises a plurality of R~Ms which
store digital signals for the computer's processing means or like
5 means. A plurality of tag storage memories are used for storing
information relating to the locations of information stored in the
RAMs. These tag storage memories are programmed from the data
bus. A plurality of comparators each of which is associa-ted with
one of the tag storage memories compares a first field and second
field of digital signals, one received from the address bus and
the other from the tag storage memories. The comparators' output
signals indicate a set association used in selection of sets in
memory. A hardware implemented "hash function" greatly reduces
the likelihood of collision. This logic receives least
significan-t hits of the segment, space and page offset o~ a
universal or uniform address and provides a line address for the
tag storage memories and RAMs. The invented memory system includes
other unique features which shall be described in the body of the
application, such as a hardware implemented page replacement
algorithm. In general, the invented memory system provides
equivalent (or better) performance over the more commonly used
virtual memory systems without many of the complications
associated with these systems.
~RIEF DESCRIPTION OF T~IE DRAWINGS~ 637V
Figure 1 is a block diagram illus-trating the coupling of
the invented memory system in a computer.
Figure 2 is a block diagram used to describe the address
bit distribution used in the presently preferred embodiment.
Figure 3 is a block diagram of the portion of the invented
memory system used for set association.
Figure 4a is a diagram used in describing the hash
function used in the presently preferred embodiment.
Figure 4b is an electrical schematic of the hash function
means used in the presently preferred embodiment.
Figure 5 is a block diagram of the hardware implemented
page replacement algorithm.
6~
1 DE~AILED DESCRIPTION OF THE INVENTION
. _
A random-access memory system for use with a
digital computer is describedO In the following description
numerous specific details are set forth such as-specific
number of bits, etc., in order to provide a thorough
understanding of the present invention. It will be
obvious to one skilled in the art, however, that the present
invention may be practiced without these specific details.
In other instances, well-known circuits and processes
have not been described in detail in order no-t to un-
necessarily obscure the present invention.
8LOCK DIAGRAM OF FIGURE
Referring first to Figure 1, the memory system of
the present inVention is illustrated as memory 11 and
field isolation unit 12. The memory 11 includes the
RAMs (e.g., 64K "chips") which provide system storage
-and the circuits for accessing these R~s. In the
presently preferred embodiment, the memory system is used
with an arithmetic logic unit 10 which provides a 67 bit
address. 60 bits of this address are coupled to memory
11 and 7 to the field isolation unit ~FIU~ 12. The
data bus 20 associated with the AL~ 10 is coupled to
the memory 11 and FIU 12. The remainder of the computer
system associated with the ALU 10 such as input/output
ports, etc., is not illustrated in Figure 1.
The memory 11 in its presently preferred embodiment
6 -
,
.
3~
1 may emplo~ one to four boards each of which stores two
megabytes. The 60~bit address coupled to the memory 11
selects a 128~bit word which is coupled to the field iso~
lation 12 over the bus 2Q. The 7 bits on bus 15 selects
tisolates) 1 to 64 bits within the 128-bit word. Thus, the
A~U 10 may address anything from a single hit to and includ-
ing a 64-bit word. As presently implemented, the memory
employs 64k dynamic NMOS "chips" for main storage, although
other memory devices may be employed.
LOGICAL ADDRESS BIT DISTRIBUTION
_ _ . _ .... . .. _
Referring to Figure 2,the 67 bits of the universal or
uniform address from the ALU are shown at the top of the
figure. As discussed in conjunc-tion with Figure 1, 7 bits
of this address are coupled to the FIU 12. The remaining
60 bits are coupled to the memory boards (one to four)
within the memory system. Six bits of the 60 bits on each
board are used for a page offset. Fifty-four bits are used
for set association on each board (block 25). These bits
identify one set on the boards by association; there are
four sets on each board and 512 lines within each set.
Eighteen bits of the 54 bits select from the 512 possible
lines a 64x128 bit field (block 261. As illustrated by
block 27, 6 bits ~page offset) select a single 128~bit
word from the 64~128 bits. Then from the 128 bits, a one
to 64-bit word is selected by the 7 bits coupled to the
FIU as illustrated by block 28.
In practice, 15 bits of the address begin accessing
- 7 -
!L63~
1 the RAM memory to select four 128-bit word sets, one ~xom
each set on each board present in the system. Concurrently
with this accessing, the set association occurs as
illustrated by blocks 25, 26 and 27 of Figure 2 to
select a single 128-bit word from all o~ the 1~8-bi.t
words selected by all the boards. Thus, the accessing of four
37()
128-bit words on each board occur in parallel with the se-t
association required to select a single 123-bit word.
Parity bits used throughout the memory system are not
discussed or shown to prevent unnecessary complications in the
description. These bits may be implemen-ted using well-known
circuits and techniques.
SET ASSOCI~TION LOGIC
Referring to Figure 3, the set association process on each
board employs four tag store memories such as memories 40-43, and
four comparators, one associated with each of the tag store
memories shown as comparators 45-48 in Figure 3. These tag store
memories and comparators are duplicated on each of the memory
boards. ~ach tag store memory is addressed by a 9-bit line number
field and a four bit set number field not relevant to the present
discussion, and provides a 54-bit output to its respective
comparator. (A field of four additional bits occurs for the page
replacement algorithm discussed later in addition to other outputs
such as a parity bit.) This 54-bit tag value is compared in the
comparators with a 54 bit -field of address from the ALU. I~ the
54 bits from the tag store memory matches the 54 bits o the
logical address, then an output signal from the comparator selects
a 128-bit word. Tag values are written into the tag memories from
data bus 20 and may be read on this bus.
As again can be seen in Figure 3, 7 bits of the 67 bits
of the physical address are used for field isolation, 6 bits for
the page offset, and 54 bits are coupled to the comparators.
Eighteen bits are coupled to a hash function means 35 which is
discussed in conjunction with Figures 4a and 4b. The output of
n
'7C3
this means are the 9 bit line address field used -to address both
the tag store memories and the ~AMs.
In the preferred embodiment there are two tag storage
memories and two comparators per board. Each pair is used -twice
per memory cycle. For purpose of explanation, four (4) separate
memories and comparators are illustrated.
~ASH FUNCTION LOGIC
In Figure 4a, the universal or uniform address from the
ALU is shown as comprising a 32 bit name, a 3 bit space field, and
a 32 bit offset. The name is further implemented as a 24 bit
segment and a 8 bit virtual processor identification (VPI~). The
page offset includes a 19 bit page field, a 6 bit word field, and
finally, 7 bits which are used to isolate a 1-64 bit field from a
128 bit word.
In a typical application, the more significant bits of the
segment and page and the most significant bit of the space field
will var~v very little. And, in contrast, the least significant
bits of the segment, page and space field tend to vary a great
deal. If the memory system is implemented without a hash
function, the repeated variations of the least significant bits,
particularly of the segment and page, will cause repeated
collisons, that is, address lines within the RAM will not be
available for many addresses. To reduce the p~obability of such
collision, the hash function logic of Figure 4b provides exclusive
ORing of these highly varying, least significant bits.
The hash function logic of Figure 3 comprises 9 exclusive
OR gates, 70a through 70i, as shown in Figure 4b. The exclusive
OX gate 70a receives the segment address bit 23 and the space
address bit 2, and provides the line address bit 0. Similarly,
L6~
the exclusive OR gate 70b receives the segmen-t acldress bit 22 an(l
space bit 1, and provides the line address bit 1. This hashing is
rep~esen-ted by line 71 of ~igure ~a. The gates 70c -through 70i
provide the segment/page hashing, and for instance, the gate 70c
receives the segment address 21 and page address 12 and provides
the line address 2. The remaining gates 70d through 70i receive
the remaining least si~nificant bits of the page address and
segment address as indicated in Figure 4b and provide the
remaining line addresses 4-8. This hashing is represented by line
?2 of Figure 4a.
This exclusive ORing causes a wide dispersal, or mapping,
of the most frequently used addresses, thereby reducing the
probability of collision. It should be noted that the hash
function is implemented in hardware, and substantially, no cycle
time is lost in hashing the addresses~ The delay associated with
the exclusive OR gates 70a through 70i (e.g., 5 nanoseconds) is
almost de mimimus when compared to the cycle time of the MOS RAMs.
In some prior art memories, hash functions are implemented
with software routines, and thus effectively, are performed in
series with memory processing. This increases cycle time, thereby
reducing the usefulness of the hash function.
PAGE REPLACEMENT ALGORITHM
The memory system includes a hardware implemented page
replacement algorithm. It is used to identify least used lines
when a collision occurs, allowing a page in RAM to be displayed.
As previously mentioned, a ~-bit field is associated with
each line in each of the tag store memories. This field is used
for implementing the page replacement algorithm. The memory value
represented by these four bits is referred to as the least
L637~
recently used (LRU) value. As will be seen, this value is unique
in all sets for any given line.
~ hen the memory system is addressed, two possible
conditions can occur at the output of the comparators, such as
comparators 45 through 48 o~ Figure 3. Either no match occurs
indicating that there is no location in memory corresponding to
that address (no hit) or if a comparison occurs, this indicates a
location exists in memory corresponding to that address (hit~.
The signal (and its complement) at the output of each comparator
~or these conditions is coupled to the circuit of Figure 5 on
lines 74 and 75.
The circuit of Figure 5 is used to implement the LRU
algorithm. For purposes of discussion, it will be assumed that
the circuitry is repeated for each set, that is, up to 1~ such
circuits are used if four boards are employed. In actual
practice, as was the case for the tag store memories and
comparators, half this number of circuits is employed; each
circuit is used twice per memory cycle.
Referring now to Figure 5, the LRU value from the tag
store memory is coupled to a register 77. A clocking signal
applied to *his register causes it to accept the ~ bit LRU value
from the tag storage memory. The output of the register 77 is
coupled to the driver 78, comparator 79, adder 81, and multiplexer
82. The "greater than" comparator 79 compares the two input
signals to this comparator and provides a "one" output if the 4
bit digital number on bus 87 is greater than the number on bus 80,
and conversely, a "zero" output if the number on bus 87 is equal
to or less than the number on bus 80. A one at the output of
comparator 79 causes the multiplexer 82 to select bus 87;
3'7~
otherwise, the multiplexer 82 selects the output of the "minus
one" adder 81. The outpu-t of -the multiplexer 82 is coupled to hus
86 through a clriver 83 for a no-hit condition.
First, as~u~e that a particular set contains the data for
a particular address and thus, a hit occurs. The S~T i Hit/
signal on line 74 is low, and since it is coupled through an
inverting input terminal of the driver 78, this driver is
selected. The 4 bit L~U value for the selecte~ set passes through
the driver 78 and is broadcast to all the other sets on the l.RU
bus 80. At the same time, driver 85 is selected and the maximum
value (most recently used) value is coupled through the driver 85
onto the bus 86. For four boards, this value is 15; for three
boards, 11; for two boards, 7; and for one board, 3. The value is
coupled to the tag store memory associated with the hit condition.
Assurne now that the circuit of Figure 5 is part of one of
those sets which did not hit. Obviously, for this condition, the
LRU value coupled to register 77 is not put on the bus 80 since
the driver 78 is not activated. That is, only one LRU value, that
corresponding to the hit set is put on the bus 80. The LRU value
is coupled to the comparator 79, multiplexer 82, and adder 81. If
the value on line 87 is greater than the LRtl value ~or the hit
set, this 4 bit value on bus 87 passes through the multiplexer 82,
driver 83, and is restored into the tag store memory for that set.
If, on the other hand, the value on bus 87 is less than, or equal
to, the value on the bus 80, multiplexer 82 selects the output of
the adder 81. The adder 81 subtracts 1 ~rom the value on line 87
and this new LRU value is coupled through the multiplexer 82 and
driver 83 and is placed back into the tag store memory. (In
practice, adder 81 is a ROM used to also update the parity bit.)
'7~
Th~ls, as apparen-t, the circuit of Figure 5, ti) for the
hit set places in memory the ma~imum LRU value; (ii) i~ the stored
LRU value is greater than the LRU value for the hit set, the
stored value is returned to memory, and finally, (iii) if the
stored LRU value is less than or equal to that of the hit set, it
is decremented bY 1 and re-stored.
If no hit occurs in any of the sets (collision condition),
the set with an LRU value of zero is the least recently used set
for the address line. This line is used (data rep]aced in RAMs)
and the LRU value for this line is set to the maximum value and
all the LRU values are decremented.
Initially, the LRU values for each line are set with a
different value for sach set based on a predetermined highest
implemented set number. From analyzing the LRU values from the
circuit of Figure 5, it will become apparent that the LRU values
are always unique. Thus, there will only be one set with a zero
~alue LRU number.
It is important to note that the LRU algorithm is
implemented in hardware and the LRU values are determined
substantially in parallel while the memory is accessed. This
eliminates the time required in some prior art memories to
calculate new LRU values.
Referring again to Figure 3, when the memory is to be
accessed, ignoring for the moment the 7 bits to the field
isolation unit, the 6 bits of the page offset and 9 bits for the
line address, are immediately coupled to the memory since there is
no substantially no delay involved in the hash function logic
means 35. These bits immediately begin accessing four 128 bit
words on each board. While this is occurring, the tag store
;3~
m~l~ories are addressed, and the comparison is completed within the
comparators 45 through 48. The tag store memories are static
memories and their cycle time is much shorter than that of the
RAMs. Before the four, 128 bit words on each board are selected,
the results of the comparison are completed, and assuming a hit
condition occurs, one 128 bit word is coupled to the FIU. The ?
bits to the field isolation unit simply allow some or all of these
bits to appear on the data bus, and the "setting up" for this
isolation occurs during the time that the RAMs are being accessed~
Consequently, the cycle for accessing the data or the like stored
in the RAMs begins substantially when the address is available on
the address bus with the set association performed through the tag
storage memories and comparators occurring in parallel.
Thus, as mentioned, the cycle time of the memory is a
known, constant time without the unusually long access times that
occur in prior art systems, for instance, when the data is not
located in the data cache. An important feature of the presently
described memory is the fact that both the least recently used
algorithm and hash functions are implemented in hardware in a
manner that does not significantly increase access time, ~r
require interaction with the ALU or CPU.
14