Note: Descriptions are shown in the official language in which they were submitted.
CA 02233435 2006-12-15
1
ENHANCED PROGRAM COUNTER STACK FOR MULTI-TASKING
CENTRAL PROCESSING UNIT
FIELD OF INVENTION
The invention relates generally to the structure of a central processing unit
and
more specifically to an enhanced microprocessor program counter stack for
providing
efficient multi-tasking capability.
BACKGROUND OF INVENTION
Central processing units such as microprocessors typically employ a program
counter to indicate the memory address of the next instruction to be executed
in a program
sequence. The instruction referenced by the value or contents of the program
counter is
retrieved from memory and typically transferred to an instruction register for
subsequent
decoding and execution. After each instruction is retrieved from memory, the
program
counter is typically automatically incremented so that it references the
following instruction in
the program sequence or instruction thread.
Some instructions, such as subroutine calls, deviate from the default
sequential
instruction flow and require that the program counter be modified to point to
the starting
address of the first instruction of the subroutine. At the end of the
subroutine, the program
20455187.1
CA 02233435 2006-12-15
2
counter contents must be restored to contain the address of the next
instruction to be
executed from the calling instruction thread. This is accomplished by storing
or pushing the
contents of the program counter onto a last-in first-out (LIFO) stack at the
beginning of a
subroutine, and retrieving or pulling the saved program counter value from the
stack at the
end of the subroutine, when a return-from-subroutine instruction is
encountered. The stack
also enables subroutines to be nested.
The stack, which is a memory construct, may be implemented as an
independent, hard-wired, series of interconnected LIFO registers. An example
of such a
hardwired stack is found in the PlCmicroTM microcontroller manufactured by
Microchip
Technology Inc., of Chandler, Arizona. Alternatively, the LIFO stack can be
implemented
in a general random access memory, in which case a stack pointer register (or
dedicated
general memory location) is typically needed to keep track of the top of the
stack.
The conventional practice of providing a program counter and associated
stack for controlling program flow results in various inefficiencies when the
microprocessor is
required to execute multiple processes or tasks, i.e. multi-tasking, by time
slicing usage of the
microprocessor amongst the various tasks. The time slicing is typically
administered by a
core software program, often referred to as a"kernel", which typically has to
save the state
of the program counter and its associated stack every time the kernel
temporarily terminates
the execution of a given task. Later, the kernel has to restore the saved
state of the program
counter and its associated stack when the time comes to continue the execution
of the given
task. This set of events represents inefficient overhead in the sense that
many memory
accesses may be required to transfer the contents of multiple copies of the
program counter
stacks merely to switch tasks.
20455187.1
CA 02233435 2006-12-15
3
For example, the above mentioned PICmicroTm microprocessor suffers from
this problem. This microprocessor exhibits a Harvard-type architecture,
meaning that the
program memory and data memory are physically distinct and accessed from
different buses,
thereby allowing an instruction to be executed while another instruction is
being fetched. As
is not atypical in Harvard-type architectures, the instruction length and
instruction address
bus is wider than the data length and data bus. (Specifically, the PlCmicroTM
has a thirteen bit
wide instruction address bus to accommodate an 8K program memory and an eight
bit wide
data bus.) Thus, the program counter and its associated hardwired LIFO stack
require two
instruction cycles to transfer each entry in the program counter and
associated stack to the
data memory in order to store the program counter state of a given task. This
is quite
wasteful of processor resources.
SUMMARY OF INVENTION
The invention seeks to improve the multi-tasking capability of central
processing units such as those found in low cost microprocessors. Broadly
speaking, the
invention presents a central processing unit having a program counter
connected to multiple
program counter stacks and a means for selecting which of the various program
counter
stacks the program counter is set to read and write from. In this manner,
multi-tasking can
be more efficiently accompfished by selecting the use of one of the program
counter stacks in
association with a given task.
According to one aspect of the invention, a central processing unit (CPU) is
provided. The CPU comprises at least one memory for storing instructions and
data, and a
20455187.1
CA 02233435 2006-12-15
4
program counter for storing a program counter value. An execution unit is
connected to the
memory for retrieving and processing an instruction located in the memory at
an address
corresponding to the program counter value. A plurality of stacks (independent
of the
memory) are provided for storing program counter values. A multiplexer
connects the
program counter to each of the stacks, and a stack select register is
connected to a control
input of the multiplexer. The stack select register enables the transfer of
program counter
values between the program counter and the stack indicated by the stack select
register.
20455187.1
CA 02233435 2006-12-15
gRiFF DESCRIPTION OF DRAWINGS
These and other aspects of the invention are described in greater detail with
reference to the following drawings, provided for the purposes of description
and not of
5 limitation, wherein:
Fig. 1 is a block diagram illustration of a Harvard-type microprocessor having
multiple program counter stacks in accordance with the preferred embodiment;
Fig. 2 is a block diagram of a software architecture for implementing multi-
tasking capability on the microprocessor shown in Fig. 1; and
Fig. 3 is a block diagram illustration of a von Neuman-type microprocessor
having multiple program counter stacks in accordance with the preferred
embodiment.
DETAILED DESCRIPTION OF PREFERRED EMBODIlyIENTS
Fig. I is a simplified block diagram of a Harvard-type microprocessor 12. The
microprocessor comprises a program memory 14 which stores microprocessor
instructions
associated with one or more processes or tasks. The program memory 14 is
connected to a
program counter (PC) 16 via an instruction address bus 18. The program memory
14 is also
connected to an execution unit 20 via an instruction bus 22. The execution
unit 20 comprises
various components (not shown), including an arithmetic logic unit, an
instruction queue,
internal reg_Aers, status registers, timers and clocks. The execution unit 20
also includes an
instruction decoder and control module, e.g., microcode execution module,
which enables the
execution unit 20 to decode and execute native microprocessor instructions.
2045S1s7.I
CA 02233435 2006-12-15
6
The execution unit 20 is connected to a data memory 24 via a bidirectional
data bus 26 in order to enable the transfer of data therebetween. The data bus
26 also
connects the program counter 16 to the execution unit 20. In addition, a data
address bus 34
is disposed between the execution unit 20 and data memory 24. The execution
unit 20 uses
the data address bus 34 to provide the address or location for a read or write
operation from
or to the data memory 24. (In this manner, the microprocessor 12 features a
Harvard
architecture having separate instruction and data buses.)
The program counter 16 is additionally connected to one side of a multiplexer
28. The other side of the multiplexer 28 is connected to the data inputs of a
stack array 30
which is organized into a plurality of last in, first out (LIFO) stacks,
indicated in Fig. 1 by
reference numerals 30a, 30b, .., 30n. Thus, the multiplexer 28, as known in
the art,
provides a bi-directional data path connecting each stack 30a, 30b, ..., 30n
in the stack array
30 to the program counter 16.
The stack array 30 is preferably implemented in the integrated circuit
topography of the microprocessor 12, as known by those skilled in this art, or
alternatively
provided by way of separate commercially available packaged integrated
circuits. Each stack
comprises a plurality of memory locations, each of which preferably has the
same bit-length
as the program counter 16 in order to enable the stack to store a plurality of
program counter
values, which reference instruction addresses in the program memory 14. The
number, n, of
stacks is a design choice that depends on the number of anticipated concurrent
processes or
tasks that the microprocessor can efficiently handle. For example, if it is
desired that the
microprocessor support multi-tasking between eight different processes or
tasks, then the
20455187.1
7
stack array 30 preferably comprises eight stacks. The depth of each stack is
also design
choice that depends upon how many levels of nesting the microprocessor is
designed to
handle.
The selection of a particular stack for the transfer of data to and from the
program counter 16 is determined by the value or contents of a stack select
(SS) register 32
which is connected to a control input of the multiplexer 28; that is, the
connection provided
by the multiplexer 28 is controlled by the contents of the stack select
register 32. The length
of the stack select register 32 in terms of the number of bits depends upon
the number, n, of
stacks in the stack array 30 since the stack select register 32 must be able
to identify each
stack. Therefore, if there are n stacks, the stack select register 32
preferably comprises at
least roundup(1 + log2(n)) bits. For example, if there are six stacks, the
stack select register
32 preferably comprises at least three bits.
The stack select register 32 is also connected as an input to a decoder 36.
Similarly, a control signal 38 from the execution unit 20 is connected as an
input to the
decoder 36. The control signa138 instructs the stack array 30 to store the
contents of the
program counter 16 or to load the program counter with a (top) valve from the
stack array.
The decoder 36 has a plurality of outputs connected to control inputs 40 of
the stack array
30, and functions to distribute or route the store/retrieve control signal 38
to the stack
control inputs based on the contents of the stack select register 32. Thus,
for example, when
the execution unit 20 signals that the contents of the program counter 16
should be stored,
the control signa138 is routed to the correct stack in the stack array 30.
20455197.1
CA 02233435 2006-12-15
8
The preferred embodiment also contemplates the presence of a native
microprocessor instruction for setting the value of the stack select register
32. Accordingly,
the execution unit 20 comprises the necessary decoding logic to execute such
an instruction.
The details of implementing such an instruction will vary depending upon the
design and
construction of the execution unit 20, but will be known to those skilled in
this art.
In operation, the program counter 16 stores the address of the next
instruction
to be executed by the execution unit 20. The program memory 14 receives this
data (i.e., the
address of the next instruction to be executed) via the instruction address
bus 18, which is
connected to the address lines of the program memory 14. Thus, the execution
unit 20
fetches and then executes the instruction stored at the location in the
program memory 14
indicated by the program counter 16. This data (i.e. the instruction) is
transferred over the
instruction bus 22 to the execution unit 20. After (or during) an instruction
fetch, the
execution unit 20 increments the program counter 16 to address the next
instruction in the
program memory.
However, when the instruction being executed is a subroutine call, the
execution unit 20 will cause the contents of the program counter 16 (which
represents the
next instruction to be executed from the calling instruction thread) to be
pushed onto or
stored in the stack selected by the stack select register 32, as described
above. When the
subroutine terminates, the execution unit 20 will cause the program counter 6
to be restored
with the instruction address stored at the top of the currently selected stack
in stack array 30,
thereby continuing the execution of the calling instruction thread from the
point at which it
was interrupted. This process can be nested to the depth of the stack.
20455187.1
CA 02233435 2006-12-15
9
The effectiveness of the microprocessor architecture shown in Fig. 1 for multi-
tasking capability is described with additional reference to the software
architecture
illustration of Fig. 2. As shown in Fig. 2, a kernel program, as is known in
the art,
administrates the time slicing of microprocessor 12 amongst m tasks 52a, 52b,
... 52m. This
includes associating each task 52a, 52b, .. 52m, with one stack in the stack
array 30.
Preferably, the kernel 50 does not allow more tasks 52 to be spawned than the
number n of
stacks, i.e. max(m) = n.
The kernel 50 is a compact program which sets the microprocessor 12 to
generate a timed interrupt for the time period or window allotted to the
current task. When a
currently executing task, say 52i, is interrupted because its window has
terminated, the kernel
50 must save the state of the microprocessor 12. This includes the state of
the program
counter 16 and its associated stack. In the microprocessor 12, the kernel 50
accomplishes
this function by executing an instruction to store the contents of the program
counter 16 onto
its associated stack. Then, the kernel 50 sets the stack select register 32 to
point to the stack
(in array 30) which is associated with the next task, 52(i+l), to be executed
by the
microprocessor 12. Next, the kernel 50 executes an instruction to load the
program counter
16 with the (top) contents of the stack. This will cause the stack associated
with task 52(i+1)
to re-introduce the previously saved next instruction address to be executed
for task 52(i+1)
into the program counter 16, after which the task 52(i+l) can continue to be
executed from
its previously interrupted position. (Status and other context information
would be saved
explicity using an index register.)
20455187.1
CA 02233435 2006-12-15
10
It will be seen from the foregoing that since there are multiple program
counter stacks independent of the data memory 24, it is not necessary to
access the data
memory 24 for the purposes of saving and/or retrieving the program counter and
its
associated stack when switching from one task to another. Moreover, if the bit-
length of the
program counter 16 is larger than the width of the data bus 26, the necessity
of requiring two
instruction cycles to transfer the program counter and its associated stack to
and from the
data memory 24 is precluded.
Fig. 3 illustrates an alternative embodiment of the invention featuring a von-
Neuman type microprocessor 112. It is similar to the microprocessor shown in
Fig. 1, but
features a common program and data memory 125 and a common bus 130 linking the
program counter 16, execution unit 20, and memory 125. (For this example, the
true Von
Neuman architecture has been modified to use a separate stack area.) The
microprocessor
112 otherwise behaves as described above. Similarly, those skilled in the art
will appreciate
that the invention is not limited by what has been particularly shown and
described herein as
numerous modifications and variations may be made to the illustrated
embodiments without
departing from the spirit and scope of the invention.
20453187.1
CA 02233435 2006-12-15