Patent 2053941 Summary

(12) Patent Application:	(11) CA 2053941
(54) English Title:	SYSTEM FOR PREPARING INSTRUCTIONS FOR INSTRUCTION PARALLEL PROCESSOR AND SYSTEM WITH MECHANISM FOR BRANCHING IN THE MIDDLE OF A COMPOUND INSTRUCTION
(54) French Title:	SYSTEME DE GENERATION D'INSTRUCTIONS POUR PROCESSEUR D'INSTRUCTIONS PARALLELE ET SYSTEME A MECANISME DE BRANCHEMENT AU MILIEU D'UNE INSTRUCTION COMPOSEE
Status:	Deemed Abandoned and Beyond the Period of Reinstatement - Pending Response to Notice of Disregarded Communication

Bibliographic Data

(51) International Patent Classification (IPC):	G06F 09/30 (2018.01) G06F 09/38 (2018.01)
(72) Inventors :	VASSILIADIS, STAMATIS (United States of America) BLANER, BARTHOLOMEW (United States of America) JEREMIAH, THOMAS L. (United States of America)
(73) Owners :	INTERNATIONAL BUSINESS MACHINES CORPORATION
(71) Applicants :	INTERNATIONAL BUSINESS MACHINES CORPORATION (United States of America)
(74) Agent:
(74) Associate agent:
(45) Issued:
(22) Filed Date:	1991-10-22
(41) Open to Public Inspection:	1992-09-30
Examination requested:	1991-10-22
Availability of licence:	N/A
Dedicated to the Public:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:

Application No.	Country/Territory	Date
677,685	(United States of America)	1991-03-29

Abstracts

English Abstract

??9-90-040
SYSTEM FOR PREPARING INSTRUCTIONS FOR INSTRUCTION PARALLEL
PROCESSOR AND SYSTEM WITH MECHANISM FOR BRANCHING
IN THE MIDDLE OF A COMPOUND INSTRUCTION
ABSTRACT OF THE DISCLOSURE
An instruction processor system for decoding compound instructions created
from a series of base instructions of a scalar machine, the processor
generating a series of compound instructions with an instruction format text
having appended control bits in the instruction format text enabling the
execution of the compound instruction format text in said instruction
processor with a compounding facility which fetches and decodes compound
instructions which can be executed as compounded and single instructions
by the arithmetic and logic units of the instruction processor while preserving
intact the scalar execution of the base instructions of a scalar machine which
were originally in storage. The system nullifies any execution of a member
instruction unit of a compound instruction upon occurrence of possible
conditions, such as branch. which would affect the correctness of recording
results of execution of the member instruction unit portion based upon the
interrelationship of member units of the compound instruction with other
instructions. The resultant series of compounded instructions generally
executes in a faster manner than the original format which is preserved due
to the parallel nature of the compounded instruction stream which is
executed.

Claims

Note: Claims are shown in the official language in which they were submitted.

??9-90-040
The embodiments of the invention in which an exclusive property or
privilege is claimed are defined as follows:
1. A system for processing instructions with an instruction processor,
comprising
compounding decoding means for decoding a series of base instructions
of a scalar machine and for generating a series of compound instructions
with an instruction format text having appended control bits in the instruction
format text enabling the execution of the compound instruction format text in
said instruction processor;
fetch means for fetching said compound instructions;
decoding means for decoding fetched compound and single instructions:
execution means for executing compounding and single instructions;
and wherein said system provides a compound instruction program to
said execution means which preserves intact the scalar execution of said
base instructions of a scalar machine.
2. A system for processing instructions with an instruction processor
according to claim 1, wherein said base instructions are program instructions
originally constructed for a scalar machine which differs from said instruction
processor, but which may be executed by said instruction processor by
compilation or assembly of said compound instruction program.
3. A system for processing instructions with an instruction processor
according to claim 1, wherein an original base instruction may be of varying
length, and there is provided a compounding facility having a plurality of n
compoundable units which enables execution by said instruction processor of
compound instructions of n member instruction unit length, and wherein in
each compound instruction is found a control field relevant to execution of
said compound instruction.

EN9-90-040
4. A system for processing instructions with an instruction processor
according to claim 3, wherein said control field contains function information
with function bit information indicating the beginning of a compound
instruction or the beginning of a scalar instruction.
5. A system for processing instructions with an instruction processor
according to claim 3, a compound instruction includes delimiter bit
information which set in one condition delimits a compound instruction and
wherein the compound instruction contains field information which links the
initial and other member units of the compound instruction.
6. A system for processing instructions with an instruction processor
according to claim 5 wherein compound instructions and scalar instructions
are intermixed in said compound instruction program, and said delimiter bit
information indicates whether a member instruction unit of the compounded
format text belongs to a previously initiated compound instruction or is the
start initiator member unit of another instruction of the compounded format
text which may be scalar or compounded in execution of the compounded
instruction program.
7. A system for processing instructions with an instruction processor
according to claim 1 including a pre-execution time compounding facility in
which a program having instructions of a compare or branch class the result
of which is related to another instruction member unit of the compounded
instruction program which follows, and pre-execution time analysis of the
various members of the compare class and all members of the branch class
are identified during a pre-execution time instruction decoding by said
compounding decoding means.

EN9-90-040
8. A system for processing instructions with an instruction processor
according to claim 1 wherein said system is provided with architecture
compounding rule means based on the architectural information of the
instruction processor identified for execution of said compound instruction
program, said compounding rule means providing a check that ensures that
no interlocks between member instruction units of a compound instruction
exist which cannot be handled by the instruction processor.
9. A system for processing instructions with an instruction processor
according to claim 1 wherein said compound instruction format text includes
control field information for a compound instruction, and wherein said control
field information is selected from a group consisting of field information with
the function that:
a) marks the beginning of a compound instruction,
b) causes execution of two compound instructions in parallel
c) indications that the compound instruction has more than one execution
cycle
d) suspends pipelining
e) causes a branch predicted to be taken if the instruction is a branch and
the field is set for taking a branch
f) indicates whether of not the instruction has a storage interlock from a
previous compound instruction
g) enables dynamic instruction issue
h) indicates whether or not the instruction uses an arithmetic logic unit
(ALU).
10. A system for processing instructions with an instruction processor
according to claim 9 wherein the control field which marks the beginning of a
compound instruction functions as a delimiter for instructions, including
instructions which are not compounded but remain scalar for execution.

EN9-90-040
11. A system for processing instructions with an instruction processor
according to claim 1, wherein said compound instruction program is
executed by a process which includes fetching up to the maximum length
compound instruction permitted by the pre-compounding of base instructions
and by beginning execution at a branch target address and executing all
instructions where a beginning control field indicates said instructions are
members of a compounded instruction.
12. A system for processing instructions with an instruction processor
according to claim 1 further including means for checking compounding rules
for the instruction processor based upon architectural information of the
instruction processor to execute the instruction format text as a compound
instruction program that insures that no interlocks between member
instruction units of a compound instruction exist which cannot be executed by
the instruction processor which is to execute the compound instruction
program.
13. A system for processing instructions with an instruction processor
according to claim 1 wherein said compound instruction program originates
from a series of base instructions which is either originally in source or
assembly language applicable to a machine which is different from said
instruction processor which is to execute said compound instruction program,
and said compound instruction program is recompiled or compounded for
said instruction program with static optimization applied to the machine code
to enhance performance of the compound Instruction program when executed
by said instruction processor.

EN9-90-040
14. A system mechanism for processing instructions in a machine with
branching in the middle of a compound instruction, comprising:
decoding means for decoding a compound instruction contained in
fetched text and composed of member units for examining appended control
bits in an instruction format of said compound instruction which are relevant
to the execution of the compound instruction, and
issuing means, responsive at least in part to the decoding means, for
issuing the member units from a target address to the end of the compound
instruction and maintaining the remainder of the text for potential execution
after the execution of the compound instruction has been initiated.
15. A system mechanism for processing instructions in a machine with
branching in the middle of a compound instruction, comprising:
decoding means for decoding a compound instruction composed of
member instruction units for examining appended control bits in an
instruction format of said compound instruction which are relevant to the
execution of the compound instruction, and
nullifying means for nullifying any execution of a member unit of a
compound instruction upon occurrence of possible conditions which would
affect the correctness of results of execution of the said member unit of the
compound instruction or the correctness of a series of compound instructions
coupled as a compound instruction program, said correctness being based
upon the relevant interrelationship of member units of the compound
instruction with other instructions of the series; and
means for continuing the execution of the series of instructions as if the
nullified execution had not occurred.

EN9-90-040
16. A system mechanism for processing instructions with branching in the
middle of a compound instruction according to claim 15, wherein the
compound instruction is a multimember compound instruction and one of
said member units of the compound instruction is a branch followed by at
least a second member unit, and wherein during execution if a branch is
taken the results of the second member unit are nullified.
17. A system mechanism For processing instructions with branching in the
middle of a compound instruction according to claim 14, wherein said
compound instruction includes a member unit that is a branch instruction
which is able to be executed in parallel with another member unit of the
compound instruction, then said nullifying means are provided, if a branch is
taken, to nullify the member unit(s) following the branch taken so that the
effect of the subsequent instruction is as if said nullified member unit(s) werenot executed.
18. A system mechanism for processing instructions with branching in the
middle of a compound instruction according to claim 15, wherein if a second
member unit of a compound instruction after a branch appears to be an
instruction, and such said second member unit could begin to be executed as
if it were an instruction, the nullifying means nullifies the execution of the
second member unit.
19. A system mechanism for processing instructions with branching in the
middle of a compound instruction according to claim 14, where fetching
means have been provided to fetch a maximal compound instruction and
wherein said decoding means causes execution of component member units
of a compound instruction to start at an arbitrary point in the compound
instruction and execute to the end of the compound instruction.

EN9-90-040
20. A system mechanism for processing instructions with branching in the
middle of a compound instruction according to claim 14, wherein said
decoding means detects the end of a compound instruction and causes the
issuing means to maintain the remainder of the instruction text for later
execution.

Description

Note: Descriptions are shown in the official language in which they were submitted.

I,N9-9()-()4() I 2 ~ 5 3 9 4 ~
SYSTi-:M FOR PREPARIN~ IN~TRUCTIONS FOR INSTRUCTION PARALLEL
PROCESSOR AND SY~TEM WITH MECHANISM FOR BRANCHING
IN THE MIDDL.E OF A COMPOUND INSTRUCTION
FIELD OF THE INVENTION
This invention relates to digital computers and digltal data processors,
and particularly to digita, computers and data processors capable of
executing 1wo or more instructions in parallel, and provides a mechanism for
branching in the middie o~ a compound instruction. Particularly also, the
system thus provided enables a set of instructions which may be combined
into compound instructiom tu be statically determined, and allows the
appending of control inforrnation to the Instructions In the program to be
executed with branching in l:he middle of a compound instruction enabled.
BACKGROUND OF THE INVENTION.
Traditional computers which receive a sequence of instructions and
execute the sequence one instruction at a time are known, and are referred
to as scalar computers. With each new generation of computing machines,
new acceleration mechanlsms must be dlscovered for tradltional scalar
machines. A recent mechanlsm for acceleratlng computatlonal speed is
found In reduced instruction set architecture (RISC) that employs a limlted
set of very simple inst! uctlons executing at a high rate of speed.
Alternatively, another acceleration mechanism may be to add more complex
instructions to the architecture to provlde more computation function per

2~5334~
r;~is-s~ 4() - 2 -
instruction and thus re~ ce the number of instructions in a program.
Application of either of these approaches to an existing scalar computer
would require a fundament~l alleration of the instruction set and architecture
of the machine. Such a far-reaching transformation is fraught with expense,
downtime, and an initial reduction in the machine's reliability and availability.
Recently, "superscalar" computers have been developed which further
accelerate computation speed. These machines are essentially scalar
machines whose performance is increased by adapting them to execute more
than one instruction at a time from an instruction stream including a
sequence of single scalar instructions. These machines typically decide at
instruction execution time whether two or more instructions in a sequence of
scalar instructions may be executed in parallel. The decision is based upon
the operation codes (OP codes) of the instruc~ions and on data dependencies
which may exist between instructions. An OP code signifies the
computational hardware required for an instruction. In general, it is not
possible to concurrently execute two or more instructions which utilize the
same hardware (a hardware dependency) or the same operand (a data
dependency). These hardware and data dependencies prevent the parallel
execution of some instruction combinations. In these cases, the affected
instructions are executed serially. This, of course, reduces the performance
of a superscalar machine.
Superscalar computers suffer from disadvantages which it is desirable to
minimize. A concrete amount of time is consumed in deciding at instruction
execution time which instructions can be executed in parallel. This time
cannot be readily masked by overlapping with other machine operations.
This disadvantage becomes more pronounced as the complexity of the
instruction set architecture increases. Also, the parallel execution decision
must be repeated each time the same instructions are to be executed.
In extending the useful lifetime of existlng scalar computers, every means
of acceleratlng execution is vit31. However, acceleration by means of

2 ~ ~ 3 9 4 1
reduced instruction set architecture, complex instruction se~ architecture, or
traditional superscalar t~cllniques is potentially too costly or too
disadvantageous to consider for an existing scalar rnachine~
It would be preferred tO accelerate the speed of execution of such a
computer by parallel, or concurrent, execution of instructions in an existing
instruction set without requiring change of the instruction set, change of
machine architec4ure, or extension of the time required for instruction
execution .
SUMMARY OF THE INVENTION
It is to this object that the inventions which are here stated are addressed
as a system which enables original programs to be processed as parallel and
single instructions as fits the original program functions which are
implemented for execution by a machine capable of instruction-level parallel
processing. We have provided a way for existing programs written in
existing high level languages or existing assembly language programs to be
processed by a preprocessor which can identify sequences of instructions
which can be executed as a single compound instruction in a computer
designed to execute compound instructions.
The instruction processor system provides compounding decoding for a
series of base instructions of a scalar machine, generates a series of
compound instructions, provides for fetching of the compound instructions
and for decoding the fetched compound instructions and necessary single
instructions, and provides a compound instruction program which preserves
intact the scalar execution of the base instructions of a scalar machine when
the program with compound instructions Is executed on the system. This
system also provldes a mechanism for branching in the middle of a
compounding instruction. For branching in the middle of a compound
instruction, compound instructions are examined for appended control bits

2~539~1
r;~'9 9()-04() - 4 -
and a nullification mechanism responsive to the control bits is provided for
nullifying the execution of;`member units of the branched-to or subsequent
compound instructions if Iheir execution would cause incorrect program
behavior.
The instruction processor described herein provides decoding of
compound instructions created from a series of base instructions of a scalar
machine, the processor generating a series of compound instructions with
instructlon text having appended control bits enabling the execution of the
compound instruction in said instruction processor with a compounding
facility which is provided in the system. The system fetches and decodes
compound instructions which can be executed as compounded and single
instructions by the functional instruction units of the instruction processor
while preserving intact the scalar execution of the base instructions of a
scalar machine which were originally in storage. The resultant series of
compounded instructions yenerally executes in a faster manner than the
original format due to the parallel nature of the compounded instruction
stream which is executed.
For a better understan~ing of the invention, together with its advantages
and features, reference should be made to the following description and the
below-described drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
-
FIGURE 1 is h,gh-level schematic diagram of a computing system which is
capable of compounding instructions in a sequence of scalar instructions for
concurrent execution.
FIGURE 2 is a timing dlagram for a uni-processor implementation showing !
the parallel execution of certain Instructions which have been selectively
grouped in a compound instruction stream.

T;~9-s()-()4() ~ 2 ~ ~ 3 9 41
FIGURE 3 is a block diagram of a hierarchical memory organization in a
scalable compound instru~tion set machine with in-memory processing
shown as an alternative preferred embodiment for the operational
envlronrnent .
FIGURE 4-A is a high level schematic diagram of the process which will
be described pictorially for providing means for processing existing
programs to identify sequences of instructions which can be executed as a
single compound instruction in a computer designed to execute compound
instructions, while
FIGURE 4-B illustrates the preferred operational environment of the
inventions and the inventions' location in the environment.
FIGURE 5 shows a path taken by a program from original program code
to actual execution.
FIGURE 6 is a flow diagram showing generation of a compound
Instructlon set program frorn an assembly language program; while
FIGURE 7 is a flow diagram showing execution of a compound instruction
set.
FIGURE 8 illustrates a situation where a branch target may be at the
beginning or middle of a compound instruction.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
. _
Background

1 ~'9-9()-04() 1, 2053941
A scalable compound instruction set machine (SCISM) architecture
method is a system in which Instruction level parallelism is achieved by
staticaily analyzing a sequence of scalar instruction at a time prior to
instruction execution to generate compound instructions formed by grouping
adjacent instructions in a sequence which are capable of parallel execution.
Relevant control information in the form of compounding tags is added to the
instruction stream to indicate where a compound instruction starts, as well as
to indicate the number of instructions which are incorporated into a
compound instruction. i~<elatedly, when used herein, the term "compounding"
refers to the grouping of instructions contained in a sequence of instructions,
the grouping being for the purpose of concurrent or parallel execution of the
grouped instructions. At minimum, compounding is satisfied by "pairing" of
two instructions for simultaneous execution. Preferably, compounded
instructions are unaltered from the forms they have when presented for
scalar execution.
In a digital computer system which includes a means for executing a
plurality of instructions in parallel, a particularly advantageous embodiment
of the invention of this ai~plication is based upon a memory architecture
which provides for compounding of instructions prior to their issue and
execution. This memory structure provides instructions to the CPU (central
processing unit) of a computer. Typically, a hierarchical memory structure
includes a high-speed cache storage containing currently accessed
instructions, a medium speed main memory connected to the cache, and a
low-speed, high-capacity auxiliary storage. Typically, the cache and main
storage (referred to collectively as "real storage") contain instructions which
can be directly referenced for execution. Access to instructions in the
auxiliary storage is had through an Input/output (1/0) adaptor connected
between the main memory and the auxiliary storage.
An in-memory preprocessor for a SCISM architecture is a system in which
an instruction compounding mechanlsm in real storage produces
compounding tag information for a sequence of scalar instructions, the

J:~9_9()_~)4() 7 205~941
compounding tag information indicaSing instructions of the sequence which
may be executed in parallen The instruction compounding unit produces the
cornpounding tag information as a page of instructions is being prefetched
and stored in the main memory. Although the particular embodiment of that
patent application teaches compounding of up to two instructions, it is
contemplated that up to N instructions may be compounded for concurrent
execution in a scalar computer.
A compounding preprocessor for a cache in a hierarchical memory is a
system in which the instruction compounding unit is located between the
main memory and cache of a hierarchical storage organi~ation. The
instruction compounding Ul1it produces compounding tag information for the
instructions in a line of instructions being fetched into the cache from the
main memory.
Instructions in a cache have accompanying compounding tags to which a
plurality of parallel execution units respond by executing one or more of the
instructions in a group of up to N instructions in parallel. In these related
systems, there is proposed in the context of a computer system capable of
concurrently executing up to N instructions in a sequence of scalar
instructions, the sequence including compounding tags which accompany the
set of ordered scalar instructions and which are activated to indicate the
instructions which are to be concurrently executed. There is a mechanism for
managing the compounding tags of scalar instructions which are stored in
the real storage of the computer system, and it includes a merging unit
connected to the real memory for merging a modified instruction from the
real memory with non-modified instructions in the real memory. A tag
reduction unit connected to the merglng unit and to the real memory
deactivates the compounding tags of the modified Instruction and N-1
instructions in the real mernory with which the modifled instruction could be
compounded.

~ ~9-9~)-04() - x - 2 0 ~ 3 9 4 1
However, in all of ~he~;e developments there is an unfulfilled need to
determine in the process`lng of a set of instructlons or program to be
executed by a computer exactly which instructions may be combined into
compound instructions, and in what manner there shall be accomplished the
appending of cor,trol information to the instructions in the program to be
executed .
Overview
Referring to FIGURE 1 of the drawings, there is shown a representative
embodlment of a portion of a digital computer system for a digital data
processing system constructed in accordance with the present invention.
This computer system is capable of executing two or more instructions in
parallel. In order to support the concurrent execution of a group of two or
more instructions, the computer system includes a plurality of functional units
which operate in parallel in a concurrent manner; each on its own, is capable
of processing one or more types of machine-level instructions. The system
Includes the capabllity of compounding Instructlons for parallel or concurrent
executlon. In this regard, "compounding" refers to the grouping of a plurality
of instructions in a sequence of scalar instructions, wherein the size of the
grouping is scalable from 1 to N. The sequence of scalar instructions could,
for example, be drawn from an existlng set of scalar instructions such as that
used by the IE~M~ System/S70 products. (IBM is a registered trade mark of
International Business Machines Corporation.) However, it is understood
that compounding is intended to facilitate the parallel issue and execution of
instructions in all computer architectures that can be adapted to process
multlple instructions per cycle.
When an instruction stream is compounded, adlacent scalar Instructions
are selectively grouped for the purpose of concurrent or parallel execution.
In general, an instruction compoundlng facllity will look for classes of
instructions that may be executed in parallel. When compatible sequences of
instructions are found, a compound instruction is created.

I;~9-9n-n4() (~ 20~3941
Instructi~)n Compounding anci Execution
As is generally shown in FIGURE 1, an instruction compounding unit 20
takes a stream of binary scalar Instructions 21 and selectively groups some of
the adjacent scalar instructions to form encoded compound instructions. A
resulting compounded instruction stream 22, therefore, provides scalar
instructions to be executecl singly or in compound instructions formed by
groups of scalar instructions to be exeGuted in parallel. When a scalar
instruction is presented to an instructlon processing unit 24, it is routed to
the appropriate one of a plurality of execution units for serial execution.
When a compound instruction is presented to the instruction processing unit
24, its scalar components are each routed to an appropriate execution unit
for simultaneous parallel e~ecution. Typical functional units include, but are
not limited to, an arithmetic logic unit (ALU) 26, 28 a floating point arithmetic
unit (FP) 30 and a storage address generation unit (AU) 32.
Referring now to FIGURE 2, compounding can be implemented in a
uni-processor environment where each functional unit executes a scalar
Instruction !S) or, alternatively a compound instruction (CS). As shown in
the drawing, an instruction stream 33 containing a sequence of scalar and
compounded scalar instructions has control bits or tags (T) associated with
each compound instruction. Thus, a first scalar instruction 34 could be
executed sin~ly by functional unit A in cycle 1; a triplet compound instruction
36 identified by tag T3 could have its 3 compounded scalar instructions
executed in parallel by functional units A, C, and D in cycle 2; another
compound instruction 38 identified by tag T2 could have its pair of
compounded scalar instructions executed In parallel by functional units A
and B in cycle 3; a second scalar instructlon 40 could be executed slngly by
functional unit C in cycle ~i; a large group compound instruction 42 could
have its 4 compounded scalar instructions executed in parallel by functional

' ~Is-sn-n4~) 1() 2 0 5 3 9 41
units A-D in cycle 5; and a third scalar instruction 44 could be executed
singly by functional unit A in cycle 6.
One example of a c~mputer architecture which can be adapted for
handling compound instructions is an IBM System/370 instruction level
architecture in which multiple scalar instructions can be issued for execution
in each machine cycle, in accordance with FIGURE 2. In this context, a
machine cycle refers to a single pipeline stage required to execute a scalar
instruction.
Generally, it is useful to provide for compounding at a time prior to
instruction issue 50 that the process carl be done once for an instruction or
instructions that may be executed many times. It has been proposed to
locate the instruction compounding functions in the real memory of a
computer system in order to implement compounding in hardware, affer
compile time, yet prior to instruction issue. Such compounding is considered
to be a preferred alternative to other alternatives described herein and
referred to as "in-memory compounding".
Generally, in-memory compounding is illustrated in FIGUF<E 3. In FIGURE
3, a hierarchical memory organization includes an l/O adaptor 40 which
interfaces with auxiliary storage devices and with a computer real memory.
The real memory of the organization includes a medium speed, relatively
high capacity main memory 46 and a high-speed, relatively low capacity
instruction cache 48. (The main memory and cache collectively are referred
to herein as "real memory", "real storage", or, simply "memory".) A stream
of instructions is brought in from auxiliary storage devices by way of the l/O
adaptor 40, and stored In biocks called "pages" in the main memory 46. Sets
of conti~uous instructions called "lines" are moved from the main memory 46
to the instruction cache 48 where they are available for hlgh-speed relerence
for processing by the instruction fetch and Issue unit 50. Instructlons which
are fetched from the cache are issued, decoded at 52, and passed to the
functional units 56, 58, ..., 60 for execution.

~ ()~()4() Il~ 23~3941
During execution, when`reference is made to an instruction which is in the
program, the instruction's address is provided to a cache management unit
62 which uses the address to fetch one or more instructions, including the
addressed instruction, from the instruction cache 48 into the queue in the unit
50. If the addressed instruction is in the cache, a cache "hit" occurs.
Otherwise, a cache "miss' occurs. A cache miss will cause the cache
management unit 62 to sen~i the line address of the requested instruction to a
group of storage management functions 64. These functions can include, for
example, real storage management functions which use the line address
provided by the cache management unit 62 to determine whether the page
containing the addressed line is in the main memory 46. If the page is in
main memory, the real storage management will use the line address to
transfer a line containing the missing instruction from the main memory 46 to
the instruction cache 48. If the line containing the requested instruction is
not in the main memory, the operating system will activate another storage
management function, providing it with the identification of the page
containing the needed line. Such a storage management function will send to
the 1/0 adaptor 40 an address identlfylng the page contalning the line. The
1/0 adaptor 40 will bring the page from auxiliary storage and provide it to the
main memory 46. To make room for the fetched page, the storage
management function selects a page in the main memory 46 to be replaced
by the fetched page. In ~CISM architecture, it is contemplated that the
replaced page is returned to auxiliary storage through 1he 1/0 adaptor
without compounding tag information. In this manner, those instructions
most likely to be immediately required during execution of an instruction
sequence are adjacent to the functional units in the instruction cache 48. The
hierarchical memory organlzation provides the capabillty for fast retrleval of
Instructions that are require!d but not in the cache.
In the context of SCISM architecture, in-memory instruction compounding
can be provided by an instruction compounding unit 70 which is located
functionally between the 1/0 adaptor 40 and the main memory 46 so that

~Ns-90-()4() - 12 2 0 ~ 3 9 4 1
compoundin~ of the scalar instruction stream can take place at the input to,
or in, the maln memory 46~ In this location, instructlons can be compounded
during an ongoing page fetch.
Alternatively, the instruction compounding unit can occupy the position 72
between the main memory 46 and the instruction cache 48 and compound
instructions are formed line-by-line as they are fetched into the instruction
cache 48, as may be considered a preferred embodiment.
Example of Compounding
The particular technique for compounding is a matter of design choice.
However, for purposes of illustration, one technique for creating compound
instructions formed from adjacent scalar instructions can be stated. By way
of example, instructions may occupy 6 bytes (3 half words), 4 bytes (2 half
words), or 2 bytes (1 half word) of text. For this exampie, the rule for
compounding a set of instructions which includes variable instruction lengths
provides that all instructions which are 2 bytes or 4 bytes long are
compoundable with each other. That Is, a 2 byte instruction is capable of
parallel execution in this particular example with another 2 byte or another 4
byte instruction and a 4 byte instruction is capable of parallel execution with
another 2 byte or another 4 byte instruction. The rule further provides that
all instructions which are 6 bytes long are not compoundable. Thus, a 6 byte
instruction is only capable of execution singly by itself. Of course,
compounding is not limited to these exemplary rules, but can embrace a
plurality of rules which define the criteria for parallel execution of existing
instructions in a specific configuration for a given computer architecture.
The instruction set used for this example is taken from the Systeml370
archltecture. By examining the OP code for each Instruction, the length of
each Instruction can be de~ermined from an instructlon length code (ILC) in
the op code. The Instruction's type Is further defined in other op code bits.
Once the type and length ot the instruction is determined, a compounding tag

1 ~9-9()-()4() 1~l 2 ~ 5 3 9 4 1
containing tag bits is then generated for that specific instruction to denote
whether it is to be comp~unded with one or more other instructions for
par~llel execution, or to be ~xecuted singly by Itself.
In the tag format of th5s example twhich is not limiting), if 2 adjacent
instructions can be compounded, the tag bits, which are generated in
memory, are "1" for the first compounded instruction and "zero" for the
second compounded instruction. However, if the first and second
instructions cannot be compounded, in this first tag format the tag bit for the
first instruction is "zero" and the second and third instructions are then
considered for compounding. Once an instruction byte stream has been
processed in accordance with the chosen compounding technique and the
compounding bits encoded for various scalar instructions, more optimum
results for achieving parallal execution may be obtained by using a bigger
window for looking at larger groups of instructions and then picking the best
combination of N instructions for compounding.
Compounding Programs
However, taking the above by way of example of the problem solved here,
it may be considered that we have generally a need to provide the system
and process illustrated generally by FIGURE 4-A. Existing programs written
in existing high level languages, as described with reference 10 FIGURE 5, or
existing assembly languagé programs to be processed, as described with
reference to FIGURE 6, need to be processed, and we have provided a
system which has the capability to identify sequences of instructions which
can be executed as a single compound instruction in a computer deslgned to
execute compound instructlons.
Turning now to FIGUR~ 4-A, it will be seen that there is illustrated
pictorially the sequence ir~ which a program is provided as an input to a
compounding facility that produces a compound Instruction program based
on a set of rules which reflect both the system and hardware archltecture.

I-Ng-90-040 14- 20~3941
These rules are hereafter referred to as compounding rules. The programproduceci by the compou`~$~ing facility can then be executed directly by a
compound instruction execution engine generally illustrated by FIGURE 7.
However, FIGURE 5 shows a typical path taken by a program from higher
level source code to actual execution, and may also be considered as one of
the possible organizations suggested by FIGURE 4-A. An alternative related
to assembly level programs is discussed with respect to FIGURE 6.
As will have been appreciated, referring to FIGURE 5, there are many
possible locations in a computer system where compounding may occur, both
in software and in hardware. Each has unique advantages and
disadvantages. As shown in FIGURE 5, there are various stages that a
program typically takes from source code to actual execution. During the
compilation phase, a sour~e program is translated into machlne code and
stored on a disk 46. During the execution phase the program is read from
the disk 46 and loaded Into a main memory 48 of a partTcular computer
system configuration 50 where the Instructions are executed by appropriate
Instructlon processlng units 52, 54, 56. Compoundlng could take place
anywhere along thls path. In general as the compounding facility is located
closer to an instruction processing unit (IPU), the time constraints become
more strlngent. As the compounding facility is located further from the IPU,
more instructions can be examined In a large sized instruction stream
window to determine the best grouping for compounding for increasing
execution performance.
Compoundillg an Assembly-language Program
The flow diagram of FIGURE 6 shows the generation of a compound
instructlon set program from an assembly language program in accordance
with a set of cust~mized compounding rules 58 which reflect both the system
and hardware architecture. The assembly language program is provlded as
an input to a soHware compounding facility 59 which parses the program In

~N9-90-()40 - IS - 2 3 5 3 9 4 ~
ml-, m2-, ... mn-instruction ~loc~;s (m = 1, 2, ...) and produces the compound
instructlon program. Suc'c`'esslve blocks of Instructlons are analyzed by the
compounding facility 59. Blocks are numbered from 1 to n. Member
instructions of block n are indicated by In, where m ranges from 1 up to the
number of instructions in th~ block. The number of instructions in each block
60, 62, 64 in the byte stream which contains the group of instructions
considered together for compounding depends on the design of the
compounding facility and may vary from block to block.
As shown in FIGURE 6, this particular compounding facility is designed to
consider two-way compounding for "m" number of instructions in each block.
The primary first step is to consider if the first and second instructions
constitute a compoundable pair, and then if the second and third constitute a
compoundable pair, and then if the third and fourth constitute a
compoundable pair, all the way to the end of the block. Once the various
possible compoundable pairs C1-C~ have been identified, the compounding
facility can select the preferred of compounded instructions and use flags or
identifier bits to identify the optimum grouping of compound instructions.
If there is no optimum ~rouping, all of the compoundable adjacent scalar
instructions can be identified so that a branch to a target located amongst
various compoun~ instructions can exploit any of the compounded pair
which are encountered. Where multiple compounding units are available,
multiple successive blocks in the instruction stream could be compounded at
the same time.
Executing a Compounded Program
The flow diagram of FIGURE 7 shows the execution of a compound
instruction set program which has been generated by a hardware
preprocessor 66 or a soffware preprocessor 67. A byte stream having
compound instructions flows into a compound instruction (Cl) cache 68 that
serves as a storage buffer providing fast access to compound Instructlons.

I N9-9()-()4() - I(i - 2 ~ ~ 3 9 41
Cl issue logic 69 fetches cornpound instructions from the Cl Cache and issues
their individual compounded instructions to the appropriate functional units
for parallel execution.
It is to be emphasized that instruction execution units (Cl EU) 71 such as
ALU's in a compnund instruction computer system are capable of executing
either scalar instructions one at a time by themselves or alternatively
compounded scalar instructions in parallel with other compounded scalar
instructions. Also, such parallel execution can be done in different types of
execution units such as ALU's, floating point (FP) units 73, storage
address-generation units (AU) 7~ or in a plurality of the same type of units
(FP1, FP2, etc) in accordance with the computer architecture and the specific
computer system configuration.
Control Information In Compounding
. .
In the first tag format discussed in the aforementioned e)tample,
compounding information is added to the instruction stream as one bit for
every two bytes of text (instructions and data~.
In general, a tag containing controi information can be added to each
instruction in the comi~ounded byte stream -- that is, to each
non-compounded scalar instruction as well as to each compounded scalar
instruction included in a pair, triplet, or larger compounded group. As used
herein, identifier bits refers to that part of the tag used specifically to identify
and differentiate those compounded scalar instructions forming a
compounded group from the r~malning non-compounded scalar Instructions
whlch remain in the compound instruction prograrn and that when fetched
are executed slngly.
Where more than two sc:alar instructions can be grouped together to form
a compound instruction, additional identifier bits may be advantageous. The
minimum number of identifier bits needed to indicate the specific number of

20~39~1
l~Ns-9o-n4n - 17-
scalar instructions actually compounded is the logarithm to the base 2
(rounded up to the nearest whole number) of the maximum number of scalar
instructions that can be grouped to form a compound Instruction. For
example, if the maximum is two, then one Identifier bit is needed for each
compound instruction. If the maximum is three or four, then two identifier
bits are needed for each compound instruction. If thz maximum is five, six,
seven or eight, then three identifier bits are needed for each compound
instruction.
A second tag format, implementing this encoding scheme, is shown below
in Table 1:

~: N9-~()-040 1~ - 2 3 ~ 3 9 ~ 1
`~;
Table 1
identifier Total #
Bits Encoded meaning Compounded
00 This instruction is not none
compounded with its following
instruction
01 This instruction is two
compounded with its one
following instruction
This instruction i5 three
compounded with its two
following instructions
11 This Instruction Is four
compounded with its three
following instructions
Assuming instruction ali~nment is such that each halfword of text needs a
tas, it can be appreciated that the IPU ignores all but the tag for the first
instruction when an instruction stream is fetched for execution. In other
words, a half word of fetched text is examined to determine if it begins a
compound instruction by checking its identifier bits. In accordance with
Table 1, if it is not the beginning of a compound Instructlon, its Identifler blts
are zero. If the half word is the beginnln~ of a compound instruction
containin~ two scalar instructions, the identifier bits are "1" for the first
instruction and "0" for the second instruction. If the half word is the
beginning of a compound instruction containing three scalar instructions, the
Identifier bits are "2" for the first instructlon and "1" for the second Instructlon

1 N9-9()-o~() - 11 2~3941
and O for the third instrul.:tion. In other words, the identifier bits for each
half word identify whether or not this particular half word is the beginning of
a compound instruction while at the same time indicating the number of
instructions which make up the compounded group.
Thi; method of encoding compound instructions assumes that if three
instructions are compounded to form a triplet group, the second and third
instructions are also compounded to form a pair group. In other words, if a
branch to the second instruction in a triplet group occurs, the identifier bit 1for the second instruction hldicates that the second and third instruction will
execute as a compounded pair in parallel, even though the first instruction in
the triplet group was not executed.
It will be apparent to those skilled in the art that the present invention
requlres an instruction stream to be compounded only once for a particular
computer system configuration, and thereafter any fetch of compounded
instructions will also cause a fetch of the identifier bits associated therewith.
This avoids the need for the ineflicient last-minute determination and
selection of certain scalar instructions for parallel execution that repeatedly
occurs every time the same or different instructions are fetched for execution
in the so-called super scalar machlne.
Compounding Without Reference Points Between Instructions
It is straightforward to create compound instructions from an instruction
stream if reference points are known that indicate where instructions begin.
As used herein, a reference point means knowledge of which byte of text,
which may contain instructions and/or data, is the first byte of an Instruction.This knowledge couid be obtained by some marking fleld or other Indicator
whlch provides information about the location of instruction boundarles. In
many computer systems such a reference point is expressly known only by
the compiler at compile time and only by the CPU when instructlons are
fetched. When compounding is done after compile time, a compiler could

n~g-~()-n4n -2n 2~39~1
indicate with ~ags which bytes contain the first byte of an instruction and
which contain data. Thi~` extra information results in a more efficient
compounding facility since exact instruction locations are known. Of course,
the compiler would differentiate between instructions and data in other ways
in order to provide the cornpounding facility with specific information
indicating instruction boundaries.
In a system with all 4-byte instructions aligned on a four byte boundary,
one tag is associated with each four bytes of text. Similarly, if instructions
can be aligned arbitrarily, a tag is needed for every byte of text. However,
certain computer architectures allow instructions to be of variable length, and
may further allow data anct instructions to be intermixed, thus complicating
the compounding process. Of course, at execution time, instruction
boundaries must be known to allow proper execution. But since
compounding is preferably done a sufficient time prior to instruction
execution, a technique is needed to compound instructions without
knowledge of where instructions start and without knowledge of which bytes
are data. This technique needs to be appllcable to all of the accepted types
of architectures, including the RISC (Reduced Instruction Set Computers)
architectures in which instructions are usually a constant length and are not
intermixed with data. In connection with our invention, accomplishment of
these tasks, which are genf!~rally described with reference to FIGURE 4-A, is
achieved with the assistance of a compound instruction execution engine in
an alternative environment.
Organization of Preferred System With Compounding Facility
Generally the preferred operating envlronment may be represented by the
operational environment shown in FIGURE 4-B. While the compounding
facllity may be a software entity, the preferred embodiment of the
compounding facility may ~e implemented by an instruction compounding
unit. Referring to FIGURE 4-B of the drawings, there is shown a
representative embodiment of a portion of a digital computer system or

1 ~19-9()-()4() -21 - 2~3~41
digital data processing sys~em constructed in accordance with the present
invention with a cache m~naç~ement unit 144. This computer system is
capable of processing two or more instructions in parallel. It includes a first
storage mechanism for storing instructions and data to be processed in the
form of a series of base instructions for a scalar machine. This storage
mechanism is identified as higher-level storage 136. This storaye (also main
memory ) is a large capacity lower speed storage mechanism and may be,
for example, a large capacity system storage unit or the lower portion of a
comprehensive hierarchical storage system.
The computer system of FIGURE 4-B also includes an Instruction
compounding facility or mechanism for receiving instructions from the higher
level storage 1.36 and associating with these instructions compound
information in the form of tags which indicate which of these instructions may
be processed in paralle with one another. A suitable instruction
compounding unit is represented by the instruction compounding unit 137.
This instruction compounding unit 137 analyzes the incoming instructions for
determining which ones may be processed in parallel. Furthermore,
instruction compounding unit 137 produces for these analyzed instructions
tag bits which indicate which irlstructions may be processed in parallel with
one another and which o~es may not be processed in parallel with one
another but must be processed singly.
The FIGURE 4-B system further includes a second storage mechanism
coupled to the instruction cnmpounding unit 137 for receiving and storing the
analyzed instructions and their associated tag fields so that these stored
compounded instructions rs7ay be fetched. This second or further storage
mechanism is represented by compound instruction cache 138. The cache
138 is a smaller capacity higher speed storage mechanism of the kind
commonly used for improving the performance rate of a computer system by
reducing the frequency of having to access the lower speed storage
mechanism 136.

I,N9-9()-04() - 22 - 2 0 ~ 3 9 ~1
The FIGURE 4-B system further includes a plurali~y of functional
instruction processing unit~î 139, 140, 141, et cetera. These functional units
139-141 operate in parallel with one another in a concurrent manner and
each, on its own, is capable of processing one or more types of machine-ievel
instructions. Examples of functional units which may be used are: a general
purpose arithmetic and logic unit (ALU), an address generation type ALU, a
data dependency collapsing ALU, a branch instruction processlng unit, a data
shifter unit, a floating point processing unit, and so forth. A given computer
system may include two or more or some of the possible functional units.
For exampie, a given computer system may include two or more general
purpose ALUs. Also, no given computer system need include each and every
one of these different types of functional units. The particular confi~uration
will depend on the nature of the particular computer system being
considered.
The computer system of FIGURE 4-B also includes an instruction fetch
and issue mechanism coupled to compound instruction cache 138 for
supplying adjacent instructions stored therein to different ones of the
functional instruction processing units 139-141 when the instruction tag bits
indicate that they may be processed in parallel. This mechanism also
provides single instructions to individual functional units when their tag bits
indicate parallel execution is not possible and they must be processed singly.
This mechanism is represented by instruction fetch and issue unit 142. Fetch
and issue unit 142 fetches instructions form the cache 138 and examines the
tag bits and instruction operation code (opcode) fields, performing a decode
function, and based upon such examinations and other such pipeline control
signals as may be relevant, sends the instruction under consideration to the
appropriate ones of the functional units 138- 141.
In the context of SCISM architecture, in-cache instruction compounding is
provided by the instructlon compounding unit 137 so that compoundlng of
each cache line can take place at the input to the compound instruction cache
138. Thus, as each cache line Is fetched from the main memory 136 into the

~ N9-9()-041) - 2~ - ~ 0 5 3 ~ ~ ~
cache 138, the line is analyzed for compounding in the unit 137 and passed,
with compounding inforr~ation tag bits, for storage in the compound
instruction cache 138.
Prior to caching, the line is compounded in the instruction compounding
unlt 137 which generates a set of tag bits. These tag bits may be appended
directly to the instructions with which they are associated. Or they may be
provided in parallel with the instructions themselves. In any case, the bits
are provided for storage together with their line of instructions in the cache
138. As needed, the compounded instruction in the cache 138 is fetched
together with its tag bit information by the instruction and issue unit 142.
Instruction fetch and issue unit 142 and compound instruction cache 138 are
designed such that a maximum length compound instruction (as defined by
the system) may be fetched from compound instruction cache 138 and issued
to the functional units 139~ 140, 141, and so on. As instructions are received
by the fetch and issue unit 142, their tag bits are examined to determine by
decoding examination whether they may be processed in parailel and the
opcode fields are examined to determine which of the available functlonal
unlts is most appropriate for their processing. If the tag bits indicate that two
or more of the instructions are suitable for processing in parallel, then they
are sent to the appropriate ones in the functional units in accordance with the
codings of their opcode Fields. Such instructions are then processed
concurrently with one another by their respective functional units.
When an instruction is encountered that is not suitable for parallel
processing, it is sent to the appropriate functional unit as determined by its
opcode and it is thereupon processed alone and singly by itself in the
selected functional unit.
In the Ideal case, where plural instructions are always bein0 processed in
parallel, the instruction execution rate of the computer system would be N
times as great as for the case where instructions are executed one at a time,

20~3941
rl~N9-s()-o4() - 2~-
with N being the numbel of instructions in the groups which are being
processed in parallel.
Further Control Information (Tag) in Compounding
As stated previously, the control field or tag contains information
delimiting compound instruction boundarles, but may contain as much
addltional information as IS deemed eflicacious for a particular
implementation.
For example, in a third tag format, a control field (tag) might be defined
as an 8-bit field,
to t, t2 t3 t4 tS t6 t7
with the bits defined as follows
Bit Function
to If 1, this instruction marks the beginning of a
compound instruction.
t, If 1, ther. execute two compound instructions in
parallel
t2 If 1, then this compound instruction has more than
one execution cycle.
t3 If 1, then suspend pipelining.
f4 If instruction is a branch, then if this bit is 1,
the branch is predicted to be taken.
f5 If 1, then this instructlon has a storage
interlock from a previous compound instructlon.
t6 If 1, then enable dynamic instruction issue.
t7 If 1, then this instruction uses an ALU.

~Ns-sn-040 - 2~ - 2 ~ 5 3 9 ~ ~
Classifyin~ Instructions for Con~pounding
In general, the compounding facility will look for classes of instructions
that may be executed in parallel, and ensure that no interlocks between
members of a compound instruction exist that cannot be handled by the
hardware. When compatible sequences of instructions are found, a
compound instruction is created. For example, the System/370 architecture
might be partitioned into the following classes:
1. RR-format loads, logicals, arithmetics, compares
o LCR -- Load Complement
o LPR -- Load Positive
o LNR -- Load Negative
o LR -- Load Register
o LTR -- Load and Test
o NR -- AND
o OR -- OR
o XR -- Exclusive OR
o AR -- Add
o SR -- Subtract
o ALR -- Add Logical
o SLR -- Subtract Logical
o CLR -- Compare Logical
o CR -- Compare

1 ~9-9~) ~wn 2h - 2 V 5 3 9 4 1
2. RS-format shifts (no storage access)
.;
o SRL -- Shift Right Logicai
o SLL -- Shift Left Logicai
o SRA -- Shift Right Arithmetic
o SLA -- Shift Leff Arithmetic
o SRDL -- Shift Right Logical
o SLDL -- Shift Left Logical
o SRDA -- Shift Right Arithmetic
o SLDA -- Shift Left Arithmetic
3. Branches -- on count and index
o BCT-- Branch on Count (RX-format~
o BCTR -- Branch on Count (RR-format)
o BXH -- Branch on Index High (RS-format)
o BXLE -- Branch on Index Low (RS-format)
4. Branches -- on condition
o BC -- Branch on Condition (RX-format)
o BCR -- Branch on Condition (RR-format)
5. Branches -- and link
o BAL -- Branch and Link (RX-format)
o BALR -- Branch and l ink (RR-format)
o BAS -- Brar,ch and Save (RX-format)
o BASR -- Branch and Save (RR-format)

2~53~41
1 ~9-9()-()4() - 27 -
6. Stores
o STCM -- Store Characters Under Mask
(0-4-b~/te store, RS-format)
o M1VI -- Move Immediate (1 byte, Sl-format)
o ST -- Store (4 bytes)
o STC -- Store Character (1 byte)
o STH -- Store Half (2 bytes)
7. Loads
o LH -- Load Half (2 bytes)
o L -- Load (4 bytes)
8. LA-- Load Address
9. RX-format arithmetics, I,:~gicals, inserts, compares
o A -- Add
o AH -- Add Half
o AL -- Add Logical
o N--AND
o O -- OR
o S -- Subtract
o SH -- Subtract Half
o SL -- Subtract Logical
o X-- Exclusive OR
o iC -- Insert Character
o ICM -- Insert Characters Under Mask
(0- to 4-byte fetch)
o C -- Compare
o CH -- Compare Half
o CL -- Compare Loglcal

I,N9-s()-04() - 2x - 2 ~ 5 3 ~ 41
o CLI -- Compare Logical Ir,lmediate
o CLM -- Compare Logi.^al Character Under Mask
10. TM -- Test Under Mask
The rest of the System/370 instructions are not considered to be
compounded for execution in ~his invention. This does not preclude them
from being compounded on a future compound instruction execution engine.
One of the most common sequences in programs is to execute an
instruction of the TM or RX-forrnat compare class, the result of which Is used
to control the execution of a BRANC~i-on-condition type instruction which
immediately follows. Performance can be improved by executing the
COMPARE and the BRANC:H instructions in parallel, and this is sometimes
done dynamically in high performance instruction processors. Some
difficulty lies in quickly identifying all the various members of the COMPARE
class of instructions and all the members of the BRANCH class of instructions
in a typical architecture during the instruction decoding process. This
difficulty is avoided by the invention, because the analysis of all the members
of the classes are accompl!shed ahead of time and a compound instruction
which is 0uaranteed to work. is c,reated.
Many classes of instructions may be execu1ed in parallel, depending on
how the hardware is designed. In addition to the COMPARE and BRANCH
compound instruction desctibed above, other compound instructions can be
envisioned, such as LOADS and RR-~ormat instructions, BRANCH and LOAD
ADDRESS, etc. A compound instruction may even include multiple
instructions from the same class, for example, RR-format arithmetlc, if the
processor has the necessary execution units.
Optimizing the Compounding of Instructions

2~3941
I N9-90-()4() ~()
In any realizable instruction processor, there will be an upper limit to the
number of instructions that can comprise a compound instruction. This upper
limit, m, must be specified to the compounding facility which is creating the
executable instructions by generating compound instructions, so that it can
generate compound instructions no longer than the maximum capability o
the underlying hardware. Note that m is strictly a consequence of the
hardware Implementation; i-l does not limit the scope of instructions that may
be analyzed fGr compounding in a given code sequence by the software. In
general, the broader the scope of the analysis, the greater the parallelism
achieved will be, as more advantageous compoundings are recognized by
the compounding facility. To illustrate, consider the sequence:
X1 ;any compoundable instruction
X2 ;any co~poundable instruction
LOAD R1,(X) ;load Rl from memory location X
ADD R3,R1 ;R3 = R3 ~ R1
SUB Rl,R2 ;R1 = R1 - R2
COMP R1,R3 ;compare R1 with R3
X3 ;any compoundable instruction
X4 ;any compoundable instruction.
If the hardware-imposed upper limit on compoundlng is m=2, then there
are a number of ways to compound this sequence of instructions depending
on the scope of the compuunding facility. If the scope were equal to four,
then the compounding fac31ity would produce the pairings <- X1 > <X2
LOAD> <ADD SUB> <COMP X3> cx4 ->, relieving completely the
hazards between the LOAD and the ADD and the SUB and the COMP. On
the other hand, a superscalar machine with m=2, which pairs instructions in
its instruction issue logic on strictly a first-in-first-ou1 basis, would produce
the pairlngs <X1 X2~ <LOAD ADD> <SUB COMP> <X3 X4>, incurring
the full penalty of the interlocking instructlons. Unfortunately, the LOAD and
ADD cannot be executed In parallel because ADD requires the result of the
load. Similarly, the SUB and COMP cannot execute In parallel. Thus no
performance improvement is gained.

I Ns-s()-()4() ~(~ 2 ~ ~ 3 9 41
Branching and C~mpound Instructions
The systems describe~ provide a method of generating a compound
instruction program and furthe, suggest a variety of methods for maintaining
this association of compounding information (tags) with instruction text in a
relatlvely permanent manner. But since there can be no guarantee that a
branch into the middle of a compound instruction wlll not occur, what is not
thus far apparent is how this compounding information can be maintained,
and further, and more importantly, how correct program behavior can be
maintained, in the presence of branches to arbitrary locations. This problem
is in fact solved by instruction fetch and issue unit 142 of FIGURE 4-B, as willbe explained. First, assume a fourth definition of the compounding tag, T,
such that T= 1 if the associated instruction either begins a compound
instruction or is an instruction to be executed singly. Further, T= O for
member instructions of a compound instruction other than the first. Then,
assuming that instruction fetch and issue unit 142 can fetch a
maximum-length compouncl instruction (as stated above), when fetching
instructions in response to a taken branch, it always fetches a block of text
equai to the maximum lenyth compound instruction beginning at exactly the
branch target address, and then executes all Instructions fetched with T=O as
a compound instruction, up to the point in the fetched text where an
instruction is encountered having T=1, indicating the beginning of the next
compound instruction. If the T bit of the branch target is 1, then it is the
beginning of a compound instruction and may be executed directly. FIGURE
8 illustrates this situation. The notation used in FIGURE 8 is Im~ where n is
the compound instruction number, and m is the number of the instruction
within compound instruction n. The allowable range for m is from 1 to the
maximum length of a compound instruction, which, in this example, is
assumed to be three. Instruction 2 of compound Instructlon 1 is a branch
Instruction, that, ill this example, will have two posslble target paths, "a" and
"b". The "a" path branches to the middle (12) of compound Instruction j, while
the "b" path branches to the first instruction of compound instruction j. If thebranch were to follow path "a", the hardware fetches the maximum length

2~3941
I N9-90-()40 - ~1 -
compound instruction, that ïs, three instructions, then executes 1~ and /~ as a
comps~und instruction Th~ remainder of the fetch, namely /,, is recogni~ed to
be the beg,nning of either a new compound or single instruction (from its
having T=1) and is held in reserve while the next block of text is is fetched
for subsequent execution. If the branch instruction took the "b" path to the
first instruction of compound instruction j, the hardware would again fetch a
block of text equal in length to the maximum length compound instruction,
yielding, in this case, a complete compound instruction, namely 1~ and /~.
Execution of that compound instruction proceeds directly.
This technique is equally applicable to other tag formats, such as the
second tag format discussed above.
Note in FIGURE 8 that branch instruction B~ is compounded with the
instructions following it, but since the branch is taken, the prefetched text
followlng the branch, whether it is comprised of member units of 1he
compound instruction containing the branch or subsequent single or
compound instructions or in fact data ~in systems that allow data and
instructions to intermix), cannot be allowed to update the machine state if
correct program behavior is to be maintained. Further, fetching the target of
a branch Instruction may result in the fetching of a compound instruction
containing a second branch that too can be followed by text that can not be
allowed to update the machine state should the second branch be taken.
This branch-to-branch sequence may continue ad infinitum. In both cases,
the solution for maintaining correct program behavior is to prevent
instructions that should not have been executed from updating the machine
state. That is, the execution results of prefetched text sequentially following
the taken branch must be nullified. Algorithmically, this may be stated as
follows:
1. Begin the execution of the compound instruction, including any branch
instruction within the cornpound instruction.

2as3s4l
r ~-9()-()4() - .~2 -
2. If the compound instruction does in fact contain a branch instruction, do
not permit any instruct~ons following the branch, be they in the same
compound instruction as the branch, or in subsequent compound or single
instructions, to update the machine state.
3. If the branch is taken, nullify the execution results of any instructions
sequentially following the taken branch. Do not, however, nullify the
execution of the branch target.
While we have described our 7nventions in their preferred embodiment
and alternative embodiments, various modifications may be made, both now
and in the future, as those skilled in the art will appreciate upon the
understanding of our disclosed inventions. Such modifications and future
improvements are to be understood to be intended to be within the scope of
the appended claims which are to be construed to protect the rights of the
inventors who first conceived of these inventions.

Representative Drawing

A single figure which represents the drawing illustrating the invention.

Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee and Payment History should be consulted.

Event History

Description	Date
Inactive: IPC from MCD	2006-03-11
Time Limit for Reversal Expired	1997-10-22
Application Not Reinstated by Deadline	1997-10-22
Deemed Abandoned - Failure to Respond to Maintenance Fee Notice	1996-10-22
Application Published (Open to Public Inspection)	1992-09-30
All Requirements for Examination Determined Compliant	1991-10-22
Request for Examination Requirements Determined Compliant	1991-10-22

Abandonment History

Abandonment Date	Reason	Reinstatement Date
1996-10-22

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
INTERNATIONAL BUSINESS MACHINES CORPORATION

Past Owners on Record
BARTHOLOMEW BLANER
STAMATIS VASSILIADIS
THOMAS L. JEREMIAH

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Abstract	1992-09-29	1	30
Claims	1992-09-29	7	205
Drawings	1992-09-29	9	141
Descriptions	1992-09-29	32	1,083
Representative drawing	1999-07-04	1	18
Fees	1994-05-10	1	46
Fees	1995-05-08	1	48
Fees	1993-04-29	1	30

Language selection

Menus

English Abstract

Event History

Abandonment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 2053941 Summary

English Abstract

Event History

Abandonment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.