Note: Descriptions are shown in the official language in which they were submitted.
CA 02741253 2011-05-27
Title: System and method for development of a system architecture
Field
[11 The described embodiments relate to systems and methods of developing a
system architecture, and in particular, relate to systems and methods of
developing a
system architecture based on a plurality of optimization parameters.
Background
[2a The design and development of systems requires extensive analysis and
assessment of the design space, not only due to the assorted nature of design
parameters, but also due to the diversity in architecture for implementation.
Given
specifications and system requirements, the aim of designers is to reduce a
large and
complex design space into a set of feasible design solutions meeting
performance
objectives and functionality.
[31 For systems based on operational constraints the selection of an optimal
architecture for system design is an important step in the development
process. Design
space architecture can have innumerable design options for selection and
implementation based on the parameters of optimization. Selection of the
optimal
architecture from the design space that satisfies all the performance
parameter
objectives may be useful for the present generation of System-on-chip (SoC)
designs
and Very Large Scale Integration (VLSI) design. As it is possible to implement
different
functions of a system on different hardware components, the architecture
design space
becomes more complex to analyze. In the case of high level synthesis,
performing
design space exploration to choose candidate architecture by concurrently
satisfying
many operating constraints and performance parameters is considered an
important
stage in the whole design flow. Since the design space is huge and complex
there
exists a desire to efficiently explore candidate architectures for the system
design based
on the application to be executed. The method for exploration of candidate
architecture
should not only be less in terms of complexity factor and time but should also
explore
the variant in an efficient way meeting specifications provided. The process
of high-level
-1-
CA 02741253 2011-05-27
synthesis design is very complicated and descriptive and is usually performed
by
system architects- Depending on the application, the process of defining the
problem,
performing design space exploration and the other steps required for its
successful
accomplishment may be very time consuming. Furthermore, recent advancements in
areas of communications and multimedia have led to the growth of a wide array
of
applications requiring huge data processing at minimal power expense. Such
data
hungry applications demand satisfactory performance with power efficient
hardware
solutions. Hardware solutions should satisfy multiple contradictory
performance
parameters such as power consumption and time of execution, for example. Since
the
selection process for the best design architecture is complex, an efficient
approach to
explore the design space for selecting a design option is desirable.
Summary
[4j In a first aspect, some embodiments of the invention provide a method of
developing a system architecture comprising:
defining a plurality of resources constraints mm.xN1e,...... maxNNR, for a
plurality
of kinds of resources R1...... Rn, wherein each resource constraint maxNR, is
a maximum number, 1 ss i s n, of a kind of resources Ri available to construct
the system architecture, wherein n is an integer greater than 1,-
defining a plurality of constraint values comprising a constraint value for
each
optimization parameter of at least three optimization parameters for the
system architecture, wherein the at least three optimization parameters
comprise a final optimization parameter;
defining a design space as a plurality of variants representing different
combinations of a number of each kind of resource R1....... Rn available to
construct the system architecture, wherein each variant is a vector of the
form;
Võ m (Nk,,._..NR.)
-2-
CA 02741253 2011-05-27
wherein NR, Is i:% n represents the number of the kind of resource Ri,
wherein based on the resource constraints, I A NR, s maxNR,;
determining a plurality of satisfying sets of variants by, for each of the
optimization parameters except for the final optimization parameter.
generating a universe of discourse set by sorting the plurality of
variants of the design space:
assigning a membership value to each variant of the universe of
discourse, wherein each membership value is within the interval [0,11,
wherein for each variant of the universe of discourse set, the
membership value assigned to that respective variant indicates a
position of the respective variant in the universe of discourse set;
determining a satisfying set of variants from the universe of discourse
set by determining a border variant of the universe of discourse set,
wherein each variant of the satisfying set substantially satisfies the
constraint value for the optimization parameter, wherein the border
variant is the last variant of the universe of discourse set to satisfy
the constraint value for the optimization parameter such that all
variants to one side of the border variant in the universe of discourse
set satisfy the constraint value for the optimization parameter and all
variants to the other side of the border variant in the universe of
discourse set do not satisfy the constraint value for the optimization
parameter, and wherein the border variant is determined by
performing a fuzzy search of the universe of discourse set using the
corresponding membership values;
-3-
CA 02741253 2011-05-27
determining a set of variants based on an intersection of the plurality of
satisfying sets of variants for the optimization parameters except the final
optimization parameter;
for the final optimization parameter:
generating an ordered list of variants by sorting the set of variants;
selecting a variant from the set of variants based on a position of the
variant in the ordered list of variants; and
developing the system architecture using the selected variant.
[61 In accordance with some embodiments described herein, the developed system
architecture comprises, for each kind of resource Ri in the plurality of kinds
of resources
R1.....Rn, the number Ni of that kind of resource defined by the selected
variant.
[61 In accordance with some embodiments described herein, the membership value
assigned to each variant of the universe of discourse set is determined by a
membership function based on the position of the variant in the universe of
discourse
set, and an order of the first and last element of the universe of discourse
set.
171 in accordance with some embodiments described herein, if the Qptimization
parameter represents a time of execution constraint then the membership
function is of
the form:
x-13
TX 25 or else the membership function is of the form:
x-a
tX
13--a
wherein xis the position of the variant in the universe of discourse set, TX
is the
assigned membership value of the variant that is in the xth position in the
universe of
discourse set, a and 13 are an order of the first and last element of the
universe of
CA 02741253 2011-05-27
discourse set, wherein a is 1 and fi is the total number of the variants in
the universe of
discourse set.
[81 In accordance with some embodiments described herein, performing a fuzzy
search comprises:
calculating an initial membership value, wherein the initial membership value
is a
function of a maximum value of the optimization parameter, a minimum value of
the optimization parameter, and a value for the border variant based on the
constraint value of the respective optimization parameter;
determining a closest variant from the variants of the universe of discourse
set,
wherein the closest variant is the variant having a membership value that is
closest to the initial membership value:
calculating an optimization parameter value for the closest variant;
calculating a new initial membership value;
determining a new closest variant from the variants of the universe of
discourse
set, wherein the closest variant is the variant having a membership value that
is
closest to the calculated new initial membership value;
if the new closest variant has already been checked then determining an
unchecked variant having a membership value that is the next membership value
and set the new closest variant to be the unchecked variant;
calculating an optimization parameter value for the new closest variant;
determining if the border variant is found;
-5-
CA 02741253 2011-05-27
if the border variant is still not found then return back to the step of
calculating a
new initial membership value, else end.
[9] In accordance with some embodiments described herein, the initial
membership
value is calculated based on the following function:
Vsod, -Alin
Max - Min
wherein T. is the initial membership value, V, is the constraint value of the
respective optimization parameter, Min and Max are the minimum and maximum
values
for the respective optimization parameter.
[101 In accordance with some embodiments described herein, if the optimization
parameter value for the closest variant is less than the constraint value of
the respective
optimization parameter then the new initial membership value is calculated
using a
function of the form:
T A. TV MaX - Vv,. 1141
To - T , VGwu~r - VV urcnu
or else the new initial membership value is calculated using a function of the
form:
TMt Tr fin-Vra.wor
T8 --Ty Y9oraer - ~rrulan7
wherein TM is the assigned membership value of the variant that is in the
maximum
position in the universe of discourse set, TM,f, is the assigned membership
value of the
variant that is the minimum position in the universe of discourse set, TT, is
the assigned
membership value of the variant in the sorted universe of discourse. T. is the
calculated
new initial membership value, is the constraint value of the respective
optimization
parameter, V,,aõ,, is the optimization parameter value and Max is the maximum
value for
the respective optimization parameter.
CA 02741253 2011-05-27
[111 In accordance with some embodiments described herein, for each of the
plurality
of optimization parameters:
defining a priority factor function for each kind of resource R1....... Rn,
wherein a priority factor function defines a rate of change of the
optimization
parameter with respect to a change in a number N, of the corresponding
kind of resource Ri, 1 s is n;
calculate a priority factor for each kind of resource R1------- Rn available
to
construct the system architecture, using the corresponding priority factor
function for the respective optimization parameter; and
wherein the universe of discourse set is generated by sorting the variants of
the design space based on a relative magnitude of the calculated priority
factors.
[121 in accordance with some embodiments described herein, the method further
comprises:
determining a priority order by sorting the calculated priority factors based
on a
relative magnitude of the calculated priority factors; and
wherein the universe of discourse set is sorted based on the priority order
for the
respective optimization parameter.
[131 In accordance with some embodiments described herein, the method further
comprises selecting the selected variant using the priority order for the
final optimization
parameter-
[14a In accordance with some embodiments described herein, the design space is
sorted to construct the universe of discourse set using a tree as follows:
CA 02741253 2011-05-27
for each optimization parameter, assign to the kind of resource R1....... Rn
available to construct the system architecture with the highest priority
factor a
level one of the tree with all its variants as branch nodes, wherein level one
refers to a root node of the tree, the kind of resource R1....... Rn available
to
construct the system architecture with the next highest priority factor a
level two
of the tree with all its variants as branch nodes ...... the kind of resource
RI....... Rn available to construct the system architecture with the lowest
priority
factor a level n of the with all the variants as leaf nodes such that the leaf
nodes
are level n+1.
(15] In accordance with some embodiments described herein, the system
architecture
comprises a Register Transfer Level data path circuit.
(161 In accordance with some embodiments described herein, the system
architecture comprises a Register Transfer Level control timing sequence.
[17] In accordance with some embodiments described herein, the Register
Transfer
Level data path circuit is configured to generate output data as a result of
performing a
sequence of operations on data using Register Transfer Level modules, wherein
the
Register Transfer Level modules include the number of each kind of resources
represented by the selected variant.
[18] In accordance with some embodiments described herein, the Register
Transfer
Level modules are selected from the group consisting of registers for storage
of data,
memory modules, latches for sinking of data, multiplexers and demultiplexers.
(19] in accordance with some embodiments described herein, the kinds of
resources
R1,.--..Rn are selected from the group consisting of adders, subtractors,
clock
oscillators, multipliers, divider, comparator, Arithmetic Logic Unit (ALU),
integrator,
summer and other functional modules
[201 In accordance with some embodiments described herein, the optimization
parameters are selected from the group consisting of hardware area, cost, time
of
execution, and power consumption.
[21] In accordance with some embodiments described herein, the Register
Transfer
Level control timing sequence provides a control configuration for a data path
circuit to
CA 02741253 2011-05-27
provide timing and synchronization required by data traversing through the
Register
Transfer Level modules of the data path circuit.
[22] In accordance with some embodiments described herein, the final
optimization
parameter is a hardware area of a total number of all kinds of resources
R1...... Rn, and
wherein, for the hardware area, the priority factor function of each kind of
resource
R1...... Rn is an indicator of a change of area contributed by a change in the
number of
the kind of resource Ri, wherein 1:5 i s n .
[23] In accordance with some embodiments described herein, for the hardware
area,
the priority factor for each Kind of resource R1...... Rn that is not a clock
oscillator is
calculated from NR,, 4NR,, KR, wherein NR,is the number of the kind of
resource Ri, Kx,
is an area occupied by the kind of resource Ri, ANR, = KR, is a change of area
contributed
by the Kind of resource Ri, wherein Ri is a member of the kinds of resources
R1------- Rn,
and, for the hardware area, the priority factor function of resource Ri that
is a clock
oscillator is calculated from AA(R,,,4) NR(,,,, R,,u , wherein R,,, is a clock
oscillator used to
construct the system architecture, dA(R,.,.)is a change of area occupied by
clock
oscillators, NR,,k is a number of clock oscillators.
[24] In accordance with some embodiments described herein, for the hardware
area,
the priority factor for each kind of resource R1...... Rn that is not a clock
oscillator is of
the form:
PF(Ri) s ANR~- ,KR,
NR,
and wherein, for the hardware area, the priority factor function of resource
Ri that
is a clock oscillator is of the form:
4A(R, )
NRr,
[25] In accordance with some embodiments described herein, the plurality of
optimization parameters comprise a time of execution of a total number of all
kinds
resources R1,._....Rn, and wherein, for the time of execution, the priority
factor function
for each kind of resource R1,...Rn is a function of the rate of change of a
cycle time
CA 02741253 2011-05-27
with a change in the number NK, of the kind of resources Ri at a maximum clock
period,
wherein I :z i s n and Ri is a member of the kinds of resources R1,.....Rn.
[26] In accordance with some embodiments described herein, the priority factor
function for the time of execution of the resources R1....... Rn that is not a
clock
oscillator is calculated by NR; Tx. T1,`4"' wherein NR, is the number of the
kind of
resource Ri, TR, a number of clock cycles required by the kind of resource Ri
to finish
each operation, Tp is the time period of the clock, T'"", is the maximum clock
period;
and, for the time of execution, the priority factor function of resource Ri
that is a clock
oscillator is calculated by k4,, N,4 TR,, R,,,,, NRC,k , where Re,, is a clock
oscillator used to
provide necessary clock frequency to the system, NR, is the number of the kind
of
resource Ri, NR,,, is the number of clock oscillators, TR, a number of clock
cycles
required by the kind of resource Ri to finish each operation.
L271 In accordance with some embodiments described herein, the priority factor
function for the time of execution of the resources Ri,......Rn that is not a
clock
oscillator is of the form:
PF(R,) _ R, J (Tam)
and wherein, for the time of execution, the priority factor function of
resource Ri
that is a clock oscillator is of the form:
PF(R ) = NRL O T'KL t NR2 * TR3.....+NR4 = Tsn (ATP)
cK
[28] In accordance with some embodiments described herein, the plurality of
optimization parameters comprise a time of execution of a total number of all
kinds
resources R1....... Rn, and wherein, for the time of execution, the priority
factor function
for each kind of resource RI.......Rn is a function of a difference between
the time of
execution when resource Ri, wherein I is of the interval [1,n], is at its
minimum value
when all other resources are at their maximum value and the time of execution
when
resource Ri is at its maximum value when all other resources are at their
minimum
values.
-10
CA 02741253 2011-05-27
[29] In accordance with some embodiments described herein, the priority factor
function for the time of execution of the resources R1....... Rn that is not a
clock
oscillator is calculated by NR, TR"H" Tk - wherein NR, is the number of the
Kind of
resource Ri, T,u' and T'n'^are the maximum and minimum value of the execution
time
when resource Rn is maximum and minimum, respectively, at maximum clock
frequency all other resources being maximum, wherein, for the time of
execution, the
priority factor function of resource Ri that is a clock oscillator is
calculated by NR n, TRak
,,',n where Nn,,4 is the number of clock oscillators TX j and ? are maximum
and
minimum values of execution time when the clock period is maximum and minimum
respectively, and all available resources have a maximum value.
1301 In accordance with some embodiments described herein, the priority factor
function for the time of execution of the resources R1------- Rn that is not a
clock
oscillator is of the form:
T ,, _TA4fl
PF(Rn) = 8a_ R.
JA'Xn
and wherein, for the time of execution, the priority factor function of
resource Ri
that is a clock oscillator is of the form:
T,u.-TRu
PF(Rclk)
[31] In accordance with some embodiments described herein, the plurality of
optimization parameters comprise a power consumption of the resources R1 ------
Rn.
and wherein, for the power consumption, the priority factor function for each
kind of
resource R1...... Rn is a function of a change in power consumption per unit
area due to
deviation of clock frequency from maximum to minimum and a change in the
number
NR, of the kind of resource Ri at maximum clock frequency, wherein I s i s n,
and Ri is a
member of the kinds of resources R1------- Rn.
-11-
CA 02741253 2011-05-27
1321 In accordance with some embodiments described herein, the priority factor
function for the power consumption of the resources R1....... Rn that is not a
clock
oscillator is calculated by NR,, KR,,, ANR;, (pj`" ", pf wherein N. is the
number of
resource RI, KRõ is an area occupied by resource Ri, ANA, = KRrs is a change
of area
contributed by resource Ri, p, is power consumed per area unit resource at a
particular
frequency of operation, (pj"a" is power consumed per area unit resource at a
maximum
clock frequency; and for the power consumption, the priority factor function
of resource
Ri that is a clock oscillator is calculated by NR,, Tx, R} I NR,,,,, p,,
where RR, is a clock
oscillator used to provide necessary clock frequency to the system, NR, is the
number of
the kind of resource Ri, TRn a number of clock cycles required by resource Ri
to finish
each operation, p,, is power consumed per area unit of resource at a
particular
frequency of operation-
[331 In accordance with some embodiments described herein, the priority factor
function for the power consumption of the resources R1....... Rn that is not a
clock
oscillator is of the form-
PF(Ra) = R` R, (P').
NR,
and wherein, for the power consumption, the priority factor function of
resource
Ri that is a clock oscillator is of the form:
NMI * T,,tNR2'TR2----- 1-NJR,'TR,(Ap)
NRC,k
[341 in accordance with some embodiments described herein, the plurality of
optimization parameters comprise a total cost of the total number of all kinds
resources
R1 ...... Rn, and wherein, for the total cost, the priority factor function
for each kind of
resource R1....... Rn is an indicator of change in total cost of the total
number of all kinds
resources R1....... Rn with respect to a change in the number of the kind of
resource RI
and the cost per unit resource, wherein I i s n.
1351 In accordance with some embodiments described herein, for the cost, the
priority
factor function for each kind of the resources R1,., ...Rn that is not a clock
oscillator is
calculated by Nk,, KR,, ANR,, CR;, wherein NR, is the number of the kind of
resource Ri,
-12-
CA 02741253 2011-05-27
KR,is an area occupied by the kind of resource Ri, AN,, = KR; is a change of
area
contributed by the kind of resource RI, Cis the cost per area unit of the kind
of
resource Ri; and wherein, for the cost, the priority factor function of
resource Ri that is a
clock oscillator is calculated by RC,k, NRCrk, 44(R41k), CR,.,,,, wherein Rau
is a clock
oscillator used to provide necessary clock frequency to the system, AA(R,,k)is
a change
of area occupied by clock oscillators, N,,,i is a total number of clock
oscillators available
to construct the system architecture, Cam,,, is the cost per area unit of
clock oscillators.
[36] In accordance with some embodiments described herein, for the cost, the
priority
factor function for each kind of the resources RI....... Rn that is not a
clock oscillator is
of the form:
PF (R) - Y (;' KR. ' CR,
NR.
and wherein, for the cost, the priority factor function of resource Ri that is
a clock
oscillator is of the form:
'
PF(rtk) = A(Rrk) "krlL
NRcu
[37] In accordance with some embodiments described herein, the method further
comprises, for each optimization parameter, determining the satisfying set of
variants
from the universe of discourse set using the border variant-
[38] In accordance with some embodiments described herein, the method further
comprises, for each of the plurality of optimization parameters, determining
whether the
constraint value for the optimization parameter is valid by:
determining a minimum value for the optimization parameter;
determining a maximum value for the optimization parameter;
determining whether the constraint value is greater than or equal to the
minimum value for the optimization parameter and whether the constraint
value is less than or equal to the maximum value for the optimization
parameter;
if the constraint value is greater than or equal to the minimum value for the
optimization parameter and the constraint value is less than equal than or
-13-
CA 02741253 2011-05-27
equal to the maximum value for the optimization parameter, determining
that the constraint value is valid; and
otherwise, determining that the constraint value is invalid and prompting
for correction.
[39] In accordance with some embodiments described herein, the method further
comprises determining whether the set of variants is valid by determining
whether the
set of variants is null; and upon determining that the set of variants is not
valid, relaxing
the constraint values for each optimization parameter by predetermined
percentage.
[40] In accordance with some embodiments described herein, the method further
comprises representing the combination of the number of resources R1........
Rn of the
selected variant in a temporal and a spatial domain using a sequencing and
binding
graph and a plurality of registers.
[41] In accordance with some embodiments described herein, the method further
comprises determining a multiplexing scheme for the resources R1..... Rn of
the
selected variant, with inputs, outputs, operations, interconnections and time
steps.
[42] in accordance with some embodiments described herein, the method further
comprises producing a Register Transfer Level data path circuit using the
multiplexing
scheme-
[43] In accordance with some embodiments described herein, the method further
comprises producing an integrated circuit using the system architecture.
[441 In accordance with some embodiments described herein, the set of variants
based on the intersection of the satisfying sets of variants is a pareto set
of variants.
[451 In another aspect, embodiments described herein provide a non-transitory
computer-readable storage medium comprising instructions for execution on a
computing device, wherein the instructions, when executed, perform acts of a
method of
developing a system architecture, wherein the method comprises:
defining a plurality of resources constraints max NR,,... __ maxNRõ , for a
plurality
of kinds of resources RI......Rn, wherein each resource constraint maxNR, is
a maximum number, I s i:% ri, of a kind of resources Ri available to construct
the system architecture, wherein n is an integer greater than 1;
-14-
CA 02741253 2011-05-27
defining a plurality of constraint values comprising a constraint value for
each
optimization parameter of at least three optimization parameters for the
system architecture, wherein the at least three optimization parameters
comprise a final optimization parameter;
defining a design space as a plurality of variants representing different
combinations of a number of each kind of resource R1....... Rn available to
construct the system architecture, wherein each variant is a vector of the
form:
V, ~NKl, ...NRn~
wherein NR, t s is n represents the number of the kind of resource Ri,
wherein based on the resource constraints, 1 z NR, s max NR,;
determining a plurality of satisfying sets of variants by, for each of the
optimization parameters except for the final optimization parameter:
generating a universe of discourse set by sorting the plurality of
variants of the design space;
assigning a membership value to each variant of the universe of
discourse, wherein each membership value is within the interval f0,11,
wherein for each variant of the universe of discourse set, the
membership value assigned to that respective variant indicates a
position of the respective variant in the universe of discourse set;
determining a satisfying set of variants from the universe of discourse
set by determining a border variant of the universe of discourse set,
wherein each variant of the satisfying set substantially satisfies the
constraint value for the optimization parameter, wherein the border
variant is the last variant of the universe of discourse set to satisfy
-15,-
CA 02741253 2011-05-27
the constraint value for the optimization parameter such that all
variants to one side of the border variant in the universe of discourse
set satisfy the constraint value for the optimization parameter and all
variants to the other side of the border variant in the universe of
discourse set do not satisfy the constraint value for the optimization
parameter, and wherein the border variant is determined by
performing a fuzzy search of the universe of discourse set using the
corresponding membership values;
determining a set of variants based on an intersection of the plurality of
satisfying sets of variants for the optimization parameters except the final
optimization parameter;
for the final optimization parameter:
generating an ordered list of variants by sorting the set of variants;
selecting a variant from the set of variants based on a position of the
variant in the ordered list of variants; and
developing the system architecture using the selected variant.
1461 In a further aspect, embodiments described herein provide a system of
developing a system architecture comprising:
a resource constraint module defining a plurality of resources constraints
maxNR,...... maxNR,,, for a plurality of kinds of resources R1...... Rn,
wherein
each resource constraint maxNR, is a maximum number, Is i s n, of a kind of
resources Ri available to construct the system architecture, wherein n is an
integer greater than 1;
_15.-
CA 02741253 2011-05-27
an optimization parameter constraint module for defining a plurality of
constraint values comprising a constraint value for each optimization
parameter of at least three optimization parameters for the system
architecture, wherein the at least three optimization parameters comprise a
final optimization parameter;
a design space module for defining a design space as a plurality of variants
representing different combinations of a number of each kind of resource
R1....... Rn available to construct the system architecture, wherein each
variant is a vector of the form:
Võ - (NR3......
wherein Ns, 1 s i $ n represents the number of the kind of resource Ri,
wherein based on the resource constraints, Is NR, s maxN.;;
a satisfying set module for determining a plurality of satisfying sets of
variants
by, for each of the optimization parameters except for the final optimization
parameter:
generating a universe of discourse set by sorting the plurality of
variants of the design space:
assigning a membership value to each variant of the universe of
discourse, wherein each membership value is within the interval [0,11,
wherein for each variant of the universe of discourse set, the
membership value assigned to that respective variant indicates a
position of the respective variant in the universe of discourse set;
determining a satisfying set of variants from the universe of discourse
set by determining a border variant of the universe of discourse set,
wherein each variant of the satisfying set substantially satisfies the
constraint value for the optimization parameter, wherein the border
-17-
CA 02741253 2011-05-27
variant is the last variant of the universe of discourse set to satisfy
the constraint value for the optimization parameter such that all
variants to one side of the border variant in the universe of discourse
set satisfy the constraint value for the optimization parameter and all
variants to the other side of the border variant in the universe of
discourse set do not satisfy the constraint value for the optimization
parameter, and wherein the satisfying set module interacts with a
fuzzy search module for determining the border variant by performing
a fuzzy search of the universe of discourse set using the
corresponding membership values;
an intersection module for determining a set of variants based on an
intersection of the plurality of satisfying sets of variants for the
optimization
parameters except the final optimization parameter;
a selection module for selecting a variant for use in developing the system
architecture by, for the final optimization parameter:
generating an ordered list of variants py sorting the set of variants;
selecting a variant from the set of variants based on the ordered list
of variants; and
a system architecture module for developing the system architecture using
the selected variant.
(471 In another aspect, embodiments described herein provide a method of
determining a variant representing a combination of a number of each Kind of
resource
R1...... Rn available for constructing a system architecture comprising:
_18-
CA 02741253 2011-05-27
defining a plurality of resources constraints maxNJe,,..... maxN,,, for a
plurality
of kinds of resources R1...... Rn, wherein each resource constraint maxNR, is
a maximum number, I :s i s n, of a kind of resources Ri available to construct
the system architecture, wherein n is an integer greater than 1;
defining a plurality of constraint values comprising a constraint value for
each
optimization parameter of at least three optimization parameters for the
system architecture, wherein the at least three optimization parameters
comprise a final optimization parameter;
defining a design space as a plurality of variants representing different
combinations of a number of each kind of resource R1....... Rn available to
construct the system architecture, wherein each variant is a vector of the
form:
V,, ={NRI,.....NRR)
wherein Na, t s i s n represents the number of the kind of resource Ri,
wherein based on the resource constraints, Is NR, s maxN,t,;
determining a plurality of satisfying sets of variants by, for each of the
optimization parameters except for the final optimization parameter:
generating a universe of discourse set by sorting the plurality of
variants of the design space;
assigning a membership value to each variant of the universe of
discourse, wherein each membership value is within the interval [0,11,
wherein for each variant of the universe of discourse set, the
membership value assigned to that respective variant indicates a
position of the respective variant in the universe of discourse set;
-19-
CA 02741253 2011-05-27
determining a satisfying set of variants from the universe of discourse
set by determining a border variant of the universe of discourse set,
wherein each variant of the satisfying set substantially satisfies the
constraint value for the optimization parameter, wherein the border
variant is the last variant of the universe of discourse set to satisfy
the constraint value for the optimization parameter such that all
variants to one side of the border variant in the universe of discourse
set satisfy the constraint value for the optimization parameter and all
variants to the other side of the border variant in the universe of
discourse set do not satisfy the constraint value for the optimization
parameter, and wherein the border variant is determined by
performing a fuzzy search of the universe of discourse set using the
corresponding membership values;
determining a set of variants based on an intersection of the plurality of
satisfying sets of variants for the optimization parameters except the final
optimization parameter,
for the final optimization parameter:
generating an ordered list of variants by sorting the set of variants;
and
selecting a variant from the set of variants based on a position of the
variant in the ordered list of variants.
[48] In a further aspect, embodiments described herein provide a method of
developing a system architecture comprising:
defining a plurality of resources constraints maxN1ej...... maxN,t,, for a
plurality
of kinds of resources R1.. _ _.. Rn, wherein each resource constraint maxNR,
is
-20-
CA 02741253 2011-05-27
a maximum number, 1:5 i:5 n, of a kind of resources Ri available to construct
the system architecture, wherein n is an integer greater than 1;
defining a plurality of constraint values comprising a constraint value for
each
optimization parameter of at least three optimization parameters for the
system architecture, wherein the at least three optimization parameters
comprise a final optimization parameter;
defining a design space as a plurality of variants representing different
combinations of a number of each kind of resource R7....... Rn available to
construct the system architecture, wherein each variant is a vector of the
form:
V, = (NR,,-...NR,)
wherein Na, I s i s n represents the number of the kind of resource Ri,
wherein based on the resource constraints, i s NR, s maxN.,;
determining a plurality of satisfying sets of variants by, for each of the
optimization parameters except for the final optimization parameter:
generating a universe of discourse set by sorting the plurality of
variants of the design space;
determining a satisfying set of variants from the universe of discourse
set by determining a border variant of the universe of discourse set.
wherein each variant of the satisfying set substantially satisfies the
constraint value for the optimization parameter, wherein the border
variant is the last variant of the universe of discourse set to satisfy
the constraint value for the optimization parameter such that all
variants to one side of the border variant in the universe of discourse
set satisfy the constraint value for the optimization parameter and all
variants to the other side of the border variant in the universe of
-21-
CA 02741253 2011-05-27
discourse set do not satisfy the constraint value for the optimization
parameter;
determining a set of variants based on an intersection of the plurality of
satisfying sets of variants for the optimization parameters except the final
optimization parameter;
for the final optimization parameter:
generating an ordered list of variants by sorting the set of variants;
selecting a variant from the set of variants based on a position of the
variant in the ordered list of variants; and
developing the system architecture using the selected variant-
wherein the design space is sorted to construct the universe of discourse set
using a tree having n+1 levels of nodes as follows:
for each optimization parameter, assigning to each Kind of resource R1.......
Rn
available to construct the system architecture a level 1 to n of the tree
based on
the relative magnitude of its priority factor, wherein level 1 is a root node
in the
tree;
a level one of the tree with all its variants as branches, wherein level one
refers
to a root node of the tree, the Kind of resource R1....... Rn available to
construct
the system architecture with the next highest priority factor a level two of
nodes
of the tree with all its variants as branches,.--- the kind of resource
R1,...... Rn
available to construct the system architecture with the lowest priority factor
a
level n of nodes of the tree with all the variants as branches; and
-22-
CA 02741253 2011-05-27
generating a tree structure by representing the Kind of resource Ri assigned
level
i I:% is. n as one or more nodes in the tree based on the number of branches
from the one or more nodes of the level i-1 except for level one which is a
root
node, wherein each node in level i has a number of branches maxNR, and
representing all variants of the design space as leaf nodes at a level n+1 of
the
tree, each leaf node connected by a branch to a node of the level n of the
tree,
wherein each variant corresponds to a path from the root node at level one of
the
tree to the leaf node representing that variant at level n+1 of the tree.
149] In another aspect, embodiments described herein provide a method of
developing a system architecture comprising:
defining a plurality of resources constraints maxNR....... max NRR , for a
plurality
of Kinds of resources R1...... Rn, wherein each resource constraint maxNR, is
a maximum number, I s i s n, of a kind of resources Ri available to construct
the system architecture, wherein n is an integer greater than 1;
defining a plurality of constraint values comprising a constraint value for
each
optimization parameter of at least three optimization parameters for the
system architecture, wherein the at least three optimization parameters
comprise a final optimization parameter;
defining a design space as a plurality of variants representing different
combinations of a number of each kind of resource R1....... Rn available to
construct the system architecture, wherein each variant is a vector of the
form:
V, . (NR,,....NR1}
wherein NR, l s i s n represents the number of the kind of resource Ri,
wherein based on the resource constraints, I :s NR, s maxNF,;-
determining a plurality of satisfying sets of variants by, for each of the
optimization parameters
-23-
CA 02741253 2011-05-27
generating a universe of discourse set by sorting the plurality of
variants of the design space;
assigning a membership value to each variant of the universe of
discourse, wherein each membership value is within the interval [0,1],
wherein for each variant of the universe of discourse set, the
membership value assigned to that respective variant indicates a
position of the respective variant in the universe of discourse set;
determining a satisfying set of variants from the universe of discourse
set by determining a border variant of the universe of discourse set,
wherein each variant of the satisfying set substantially satisfies the
constraint value for the optimization parameter, wherein the border
variant is the last variant of the universe of discourse set to satisfy
the constraint value for the optimization parameter such that all
variants to one side of the border variant in the universe of discourse
set satisfy the constraint value for the optimization parameter and all
variants to the other side of the border variant in the universe of
discourse set do not satisfy the constraint value for the optimization
parameter, and wherein the border variant is determined by
performing a fuzzy search of the universe of discourse set using the
corresponding membership values;
determining a set of variants based on an intersection of the plurality of
satisfying sets of variants for the optimization parameters;
generating an ordered list of variants by sorting the set of variants;
-24-
CA 02741253 2011-05-27
selecting a variant from the set of variants based on a position of the
variant
in the ordered list of variants; and
developing the system architecture using the selected variant.
Brief Description of the Drawings
[501 For a better understanding of embodiments of the systems and methods
described herein, and to show more clearly how they may be carried into
effect,
reference will be made, by way of example, to the accompanying drawings in
which:
[51] FIG. I illustrates a block diagram of a system for developing a system
architecture in accordance with embodiments described herein;
152] FIG. 2 illustrates a flow chart of a method of determining a variant
representing a
combination of a number of each kind of resource available to construct a
system
architecture in accordance with embodiments described herein;
[531 FIG. 3 illustrates a flow chart of a method of developing a system
architecture in
accordance with embodiments described herein,
[541 FIG. 4 illustrates a flow chart of a method for determining a satisfying
set of
variants for an optimization parameter in accordance with embodiments
described
herein;
[551 FIG. 5 illustrates a flow chart of a method of conducting a fuzzy search
of the
universe of discourse set in accordance with embodiments described herein;
1561 FIG. 6 illustrates a graphical representation of a fuzzy search of the
universe of
discourse set based on assigned membership values in accordance with
embodiments
described herein
[571 FIG. 7 illustrates a flow chart of a method of generating a universe of
discourse
set in accordance with embodiments described herein;
[681 FIG. 8 illustrates a flow chart of a method of sorting the design space
based on
priority factors in accordance with embodiments described herein;
1591 FIG. 9 illustrates a flow chart of another method of sorting the design
space
based on priority factors in accordance with embodiments described herein.-
-25-
CA 02741253 2011-05-27
[60] FIG. 10 illustrates a flow chart of a method of generating a universe of
discourse
set in accordance with embodiments described herein;
[61] FIG. 11 illustrates an example hierarchical tree structure in accordance
with
embodiments described herein;
[62] FIG. 12 illustrates a flow chart of a method of generating a universe of
discourse
set in accordance with an example embodiment;
[63] FIG. 13 illustrates a representation of a universe of discourse set in
accordance
with an example embodiment;
[64] FIG. 14 illustrates a representation of a universe of discourse set in
accordance
with an example embodiment:
[65] FIG. 15 illustrates a flow chart of a method for developing a system
architecture
using the selected variant in accordance with an example embodiment;
[66] FIG. 16 illustrates a sequencing and binding graph in accordance with an
example embodiment;
[67] FIG. 17 illustrates a sequencing and binding graph with data registers in
accordance with an example embodiment;
[66] FIG. 10 illustrates the cycle time calculation for the combination of
resources
specified by the selected variant in accordance with an example embodiment;
[69] FIG. 19 illustrates a block diagram of the data path circuit for the
resources
specified in the selected vector in accordance with an example embodiment; and
[70] FIG. 20 illustrates a schematic structure of a device designed in
accordance with
an example embodiment.
[71] The drawings, described below, are provided for purposes of illustration,
and not
of limitation, of the aspects and features of various examples of embodiments
of the
invention described herein. The drawings are not intended to limit the scope
of the
applicants' teachings in any way- For simplicity and clarity of illustration,
elements
shown in the figures have not necessarily been drawn to scale. The dimensions
of some
of the elements may be exaggerated relative to other elements for clarity.
Further,
where considered appropriate, reference numerals may be repeated among the
figures
to indicate corresponding or analogous elements.
-26-
CA 02741253 2011-05-27
Description of Exemplary Embodiments
(72] It will be appreciated that numerous specific details are set forth in
order to
provide a thorough understanding of the exemplary embodiments described
herein.
However, it will be understood by those of ordinary skill in the art that the
embodiments
described herein may be practiced without these specific details. In other
instances,
well-known methods, procedures and components have not been described in
detail so
as not to obscure the embodiments described herein. Furthermore, this
description is
not to be considered as limiting the scope of the embodiments described herein
in any
way, but rather as merely describing implementation of the various embodiments
described herein.
173] The embodiments of the systems and methods described herein may be
implemented in hardware or software, or a combination of both. However, these
embodiments may be implemented in computer programs executing on programmable
computers, each computer including at least one processor, a data storage
system
(including volatile and non-volatile memory and/or storage elements), and at
least one
communication interface. For example, the programmable computers may be a
server,
network appliance, set-top box, embedded device, computer expansion module,
personal computer, laptop, personal data assistant, or mobile device. Program
code is
applied to input data to perform the functions described herein and to
generate output
information. The output information is applied to one or more output devices,
in known
fashion. In some embodiments, the communication interface may be a network
communication interface. In some embodiments, the communication interface may
be a
software communication interface, such as those for inter-process
communication. in
still other embodiments, there may be a combination of communication
interfaces.
(74] Each program may be implemented in a high level procedural or object
oriented
programming or scripting language, or both, to communicate with a computer
system.
However, alternatively the programs may be implemented in assembly or machine
language, if desired. In any case, the language may be a compiled or
interpreted
language. Each such computer program may be stored on a storage media or a
device
(e.g. ROM or magnetic diskette), readable by a general or special purpose
programmable computer, for configuring and operating the computer when the
storage
-27-
CA 02741253 2011-05-27
media or device is read by the computer to perform the procedures described
herein.
Embodiments of the system may also be considered to be implemented as a non-
transitory computer-readable storage medium, configured with a computer
program,
where the storage medium so configured causes a computer to operate in a
specific
and predefined manner to perform the functions described herein.
1751 Furthermore, the system, processes and methods of the described
embodiments
are capable of being distributed in a computer program product including a
physical
non-transitory computer readable medium that bears computer usable
instructions for
one or more processors. The medium may be provided in various forms, including
one
or more diskettes, compact disks, tapes, chips, magnetic and electronic
storage media,
and the like. The computer useable instructions may also be in various forms,
including
compiled and non-compiled code.
[76] Embodiments described herein may provide Design Space Exploration (DSE)
with multi parametric objective in High Level Synthesis (HLS) which involves
assessing
variants of an architecture design space to find an optimum (or near optimum)
solution
or variant for the architecture design according to the system requirements
specified.
Due to the time to market pressure, the cost of solving the problem of
architecture
variant selection by exhaustive analysis may be forbidden. The tradeoffs
linked to the
selection of the appropriate variant during architecture evaluation may
require careful
assessment for efficient design space exploration. Further DSE requires
satisfying
multiple (and in some cases conflicting) objective conditions and constraints
such as
increase in accuracy of evaluation during DSE with simultaneous speedup in the
exploration process. Embodiments described herein may provide DSE based on a
fuzzy
search technique for architecture design space evaluation and variant
selection. Other
embodiments described herein may provide a hybrid design space exploration
based on
a combination of a fuzzy search technique and priority factors for
architecture evaluation
and selection. Further embodiments described herein may provide a hybrid
design
space exploration based on a combination of a fuzzy search technique and a
hierarchical tree structure.
[77] Complex Digital Signal Processing (DSP) Very Large Scale integration
(VLSI)
designs may be possible due to DSE techniques. These techniques may provide
the
-28-
CA 02741253 2011-05-27
platform for superior designing by applying the most suitable architecture
variant
according to the user specifications and constraints. Each step in the design
process
may require efficient usage of the existing conditions and resources.
Optimization may
also be required and may only be achieved by setting system unconstrained
parameters
in such a way so as to maximize performance while satisfying multiple design
specifications including multiple optimization parameter constraints and
resource
constraints- The HLS design process used for the development of the multi-
objective
VLSI designs and the complex system-on-chip designs may be characterized by
combined use of heterogeneous techniques, methodologies and significant
decision
making strategies with which an architectural model is gradually carved out
step by step
on the basis of the user specifications and system requirements. Furthermore,
an ever-
wide array of embedded Application Specific Integrated Circuits (ASICs) have
been
designed and deployed to satisfy the explosive growth in increase of demand of
electronic devices All the application domains ranging from highly efficient
but less
flexible ASICs to the highly complex System-on-Chip (SoC) designs require
proficient
multi objective optimized design methodology where the cost of analyzing the
architectural variants for selection meets the design objectives specified.
Proliferation of
the mentioned VLSI circuits in today's modern portable and other high end
electronic
devices may be possible due to efficient IOSE methodologies. Embodiments
described
herein may provide a new framework for DSE with a fuzzy search approach to
explore
the architecture design space with reduced analysis time for the evaluation
and
selection of the architecture with multiple optimization parameter
requirements or
constraints such as hardware area, time of execution and power consumption.
[78] The high level synthesis methodology may contain a sequence of tasks to
convert the abstract behavioral description of the algorithm into its
respective structural
block at register transfer (RT) level. The design at the RT level may comprise
functional
units such as ALU's, storage elements, registers, busses and interconnections-
HLS
may offer advantages such as productivity gains and efficient IDSE. Performing
OSE at
a higher level of abstraction may pay more dividend than at lower levels of
abstraction
i.e. transistor level or logic level. Traditional high level synthesis design
methodology
may be much simpler than modern design techniques. In general the initial step
of
-29-
CA 02741253 2011-05-27
synthesis is to compile the behavioral specification into an internal
representation. The
next step is to apply high level transformation techniques with'the aim to
optimize the
behavior as per the desired performance. In order to realize the structure, a
final step is
to perform scheduling to determine the time at which each operation is
executed and
the allocation, which is synthesizing the necessary hardware to perform the
operations.
Scheduling can be of two different classes: time constrained scheduling and
resource
constrained scheduling. Time constrained scheduling refers to finding the
minimum cost
schedule that satisfies the given set of constraints with the given maximum
number of
control steps. Resource constraint scheduling on the other hand refers to
finding the
fastest possible schedule that satisfies the given set of constraints with the
given
maximum number of resources. Resource constraints may be generally specified
by the
area occupied by the functional units such as adders/subtractors, multipliers,
dividers
and AWUs. Although the data path of the system consists of registers and
interconnections they are not considered to be included as resource
constrained
because they are difficult to specify. High level synthesis can be broadly
divided into the
following steps: input description, internal representation, design space
exploration,
allocation, scheduling and binding. Therefore the final structure at the RT
level may
include the data path and the control path. The new generation of system
designs may
require multi parametric optimization strategies in HLS while simultaneously
utilizing
rapid and efficient DSE approaches for finding the best suitable architecture.
[791 Embodiments described herein may avoid constructing hierarchical
structures for
architecture evaluation and thereby may minimize time overhead. Further
embodiments
described herein may avoid evolutionary algorithms that may be slow in finding
the
global optimum solution and do not always guarantee the selection of global
optimum
and might eventually end up in finding the local minima. Embodiments described
herein
consider multi objective problems (such as area, delay and power consumption,
for
example)- Embodiments described herein may avoid using a genetic algorithm,
which
may be inherently slow in nature and does not always guarantee reaching the
global
optimum solution. The chances of yielding the local minima always exist. Due
to current
time to market pressure, the objective specification has become equally
important as
intended functionality of the system. So exploration approaches should not
just produce
-30-
CA 02741253 2011-05-27
the correct optimal architecture but should also be able to find the optimal
solution with
increased acceleration to satisfy the time to market pressure conditions.
Embodiments
described herein may be capable of accurately and rapidly evaluating the
design space
for finding the optimal design solution and can thereby assist the designers
in finding
the best architecture for the design with increased acceleration-
[80] Embodiments described herein use a fuzzy search technique to search the
universe of discourse set (the sorted design space) to identify variants that
satisfy
constraint values for optimization parameters and resource constraints.
Embodiments
described herein may be directed to a design flow starting with the real
specification and
formulation to receive the constraint values as input, and eventually
obtaining the
register transfer level structure performing DSE. As an illustrative example,
three
optimization parameters may be optimized during the following demonstration of
design
flow for high level synthesis; however, more than three optimization
parameters may be
optimized and different combinations of parameters may be optimized. This
illustrative
example will be based on the following optimization parameters: power
consumption,
time of execution and hardware area of the resources, but another example
includes
cost of resources.
[81] Reference is first made to FIG. 1. which illustrates a block diagram of a
system
10 for developing a system architecture in accordance with embodiments
described
herein. System may include a resource constraint module 12, an optimization
parameter
constraint module 14, a design space module 16, a priority factor module 18, a
satisfying set module 20, an intersection module 22, a selection module 24, a
system
architecture module 26, and a fuzzy search module 28
[82] System 10 may be implemented using a server which includes a memory
store,
such as database(s) or file system(s), or using multiple servers or groups of
servers
distributed over a wide geographic area and connected via a network. System 10
has a
network interface for connecting to network in order to communicate with other
components, to serve web pages, and perform other computing applications.
System 10
may reside on any networked computing device including a processor and memory,
such as an electronic reading device, a personal computer, workstation,
server, portable
computer, mobile device, personal digital assistant, laptop, smart phone, WAP
phone,
31-
CA 02741253 2011-05-27
an interactive television. video display terminals, gaming consoles, and
portable
electronic devices or a combination of these. System 10 may include a
microprocessor
that may be any type of processor, such as, for example, any type of general-
purpose
microprocessor or microcontroller, a digital signal processing (DSP)
processor, an
integrated circuit, a programmable read-only memory (PROM), or any combination
thereof. System 10 may include any type of computer memory that is located
either
internally or externally such as, for example, random-access memory (RAM),
read-only
memory (ROM), compact disc read-only memory (CDROM), electro-optical memory,
magneto-optical memory, erasable programmable read-only memory (EPROM), and
electrically-erasable programmable read-only memory (EEPROM), or the like.
System
10 may include one or more input devices, such as a keyboard, mouse, camera,
touch
screen and a microphone, and may also includes one or more output devices such
as a
display screen and a speaker. System 10 has a network interface in order to
communicate with other components by connecting to any network(s) capable of
carrying data including the Internet, Ethernet, plain old telephone service
(POTS) line,
public switch telephone network (PSTN), integrated services digital network
(ISDN),
digital subscriber line (DSL), coaxial cable, fiber optics, satellite, mobile,
wireless (e.g.
WW-Fi, WiMAX), S87 signaling network, fixed line, local area network, wide
area
network, and others, including any combination of these.
183] Resource constraint module 12 is operable to define a plurality of
resources
constraints maxNR, ...... maxNRõ , for a plurality of kinds of resources
R1.... ...... Rn. Each
resource constraint maxNR, is a maximum number, I s i s n, of a kind of
resources Ri
available to construct the system architecture, where n is an integer greater
than 1.
Examples of resources include adders, subtractors, clock oscillators,
multipliers,
dividers, comparators, Arithmetic Logic Units (ALU), integrators, summers and
other
functional modules. An example of a resources constraint is a maximum amount
of
each type of resource available to construct the system architecture.
[84] Optimization parameter constraint module 14 is operable to define
constraint
values comprising a constraint value for each optimization parameter of at
least three
optimization parameters for the system architecture- The at least three
optimization
parameters comprise a final optimization parameter. Examples of optimization
- 32 --
CA 02741253 2011-05-27
parameters include hardware area, cost of resources, time of execution, and
power
consumption.
[85] Design space module 16 is operable to define a design space as a
plurality of
variants representing different combinations of a number of each kind of
resource
R1....... Rn available to construct the system architecture. Each variant is a
vector of the
form:
Võ Wx...... Nj)
wherein NR, 1:9 i:5 a represents the number of the Kind of resource Ri, where
based on
the resource constraints, I :s Nz :5 max N.,,;
[86] Satisfying set module 20 is operable to generate a universe of discourse
set by
sorting the plurality of variants of the design space- Satisfying set module
20 is operable
to determine a satisfying, or near satisfying, set of variants from the
universe of
discourse set by determining a border variant of the universe of discourse
set. Each
variant of the satisfying set satisfies (or nearly satisfies) the constraint
value for the
optimization parameter- If a variant obeys the constraints or exceeds the
constraint
value by an acceptable amount, such as 5-10% for example, or some other
configured
amount, then the variant may satisfy or nearly satisfy the constrain value
respectively.
The border variant is the last variant of the universe of discourse set to
satisfy (or nearly
satisfy) the constraint value for the optimization parameter such that all
variants to one
side of the border variant in the universe of discourse set satisfy (or nearly
satisfy) the
constraint value for the optimization parameter and all variants to the other
side of the
border variant in the universe of discourse set do not satisfy the constraint
value for the
optimization parameter. Satisfying set module 20 is operable to generate a
universe of
discourse set by sorting the variants of the design space using priority
factors, an
arrangement criterion function, a hierarchical tree structure, a priority
order, and other
sorting methodologies. Satisfying set module 20 is operable to determine a
satisfying
set of variants for each of the optimization parameters except for the final
optimization
parameter. Satisfying set module 20 is operable to interact with fuzzy search
module 28
to determine the satisfying set of variants by searching the universe of
discourse set.
[87] Fuzzy search module 28 is operable to conduct a fuzzy search of a
universe of
discourse set to identify the border variant. Fuzzy search module 28 is
operable to
-33-
CA 02741253 2011-05-27
assign a membership value to each variant of the universe of discourse, where
each
membership value is from the interval [0,1]_ A membership value assigned to a
variant
of the universe of discourse set indicates the position of the respective
variant in the
universe of discourse set. Fuzzy search module 28 is operable to search the
variants of
the design space based on the assigned membership values. In general,
searching the
design space to identify a set of variants that represents a combination of
resources to
construct a system architecture in view of constraint values for optimization
parameters
and resource constraints can be a tedious and time consuming task. The search
may
demand great accuracy and elaborate analysis of the variants in the design
space.
System 10 is operable to implement a fuzzy search of the universe of discourse
set (the
sorted design space) to identify variants in a relatively short period of
time, as compared
to an exhaustive search for example.
[88] Satisfying set module 20 is operable to determine a satisfying set of
vectors for
each of the optimization parameters (except the final optimization parameter)
by
interacting with fuzzy search module 28 to determine the border variant from
the
universe of discourse set Fuzzy search module 28 is operable to determine the
border
variant by conducting a fuzzy search of the universe of discourse set using
the
corresponding membership values.
[89] Intersection module 22 is operable to determine a set of variants based
on an
intersection of the satisfying sets of variants for the optimization
parameters.
[90] Selection module 24 is operable to select a variant for use in developing
the
system architecture by, for the final optimization parameter, generating an
ordered list of
variants by sorting the set of variants, and selecting a variant from the set
of variants
based on the ordered list of variants.
[91] System architecture module 26 is operable to develop the system
architecture
using the selected variant. The system architecture may comprise a Register
Transfer
Level data path circuit and a Register Transfer Level control timing sequence.
The
Register Transfer Level data path circuit is configured to generate output
data as a
result of performing a sequence of operations on data using Register Transfer
Level
modules. The Register Transfer Level modules include the number of each Kind
of
resources represented by the selected variant. Examples of Register Transfer
Level
34
CA 02741253 2011-05-27
modules include registers for storage of data, memory modules, latches for
sinking of
data, multiplexers and demultiplexers_ The Register Transfer Level control
timing
sequence provides a control configuration for a data path circuit to provide
timing and
synchronization required by data traversing through the Register Transfer
Level
modules of the data path circuit.
(921 In accordance with some embodiments, satisfying set module 20 is operable
to
interact with priority factor module 18 to generate the universe of discourse
set. System
may include a priority factor module 18 that is operable to define, for each
optimization parameters, a priority factor function for each kind of resource
R1....... Rn.
10 A priority factor function defines a rate of change of the optimization
parameter with
respect to a change in a number NR, of the corresponding kind of resource Ri,
Is is n.
Examples of priority factor functions are illustrated herein in relation to
hardware area,
execution time, cost, and power consumption- Other priority factor functions
may also
be used by system 10 for these optimization parameters, and for other
optimization
parameters.
[931 Reference is first made to FIG. 2 which illustrates a flow chart of a
method 100a
of determining a variant representing a combination of a number of each kind
of
resource available to construct a system architecture in accordance with
embodiments
described herein, and FIG. 3 which illustrates a flow chart of a method 100b
of
developing a system architecture in accordance with embodiments described
herein.
1941 At step 102, system 10 defines resources constraints maxNR,......
maKNie,,, for a
plurality of kinds of resources R 1.... - Rn, wherein each resource constraint
maxNR4 is a
maximum number, Is i s n, of a kind of resources Ri available to construct the
system
architecture, n being an integer greater than 1. Examples of kinds of
resources include
adders, subtractors, clock oscillators, multipliers, divider, comparator, ALU,
integrator,
summer and other functional modules. This step may be part of the problem
description
and technical specifications definition stage, which provides input data for
the high level
synthesis tools. Resource constraints may also include specifications and data
regarding the kind resources such as number of clock cycles required for each
operation, area occupied by each unit of each kind of resource, cost of each
unit of
-35-
CA 02741253 2011-05-27
each Kind of resource, power consumed by each unit of each kind of resource,
and so
on_
[95] At step 104, system 10 defines a constraint value for each of at least
three
optimization parameters for the system architecture, wherein the at least
three
optimization parameters comprise a final optimization parameter. Examples of
optimization parameters include hardware area, cost, time of execution, and
power
consumption. A constraint value may be a maximum or minimum value for a
respective
optimization parameter. For example, if an optimization parameter is power
consumption then a constraint value may be a maximum power consumption for the
system architecture. The optimization parameters include a final optimization
parameter, which provides a frame of reference to evaluate the set of variants
to select
a variant for developing the system arcnitecture. The developed system
architecture
comprises, for each Kind of resource Ri in the plurality of kinds of resources
R1._._.Rn,
the number Ni of that kind of resource defined by the selected variant.
[961 In accordance with embodiments described herein, system 10 is operable to
validate the constraint values for the optimization parameters.
[971 System 10 performs this validation step as a first screening level of
check by
performing a Minimum-Maximum evaluation for the constraint to verify whether
the
constraints specified are valid and feasible.
[901 System 10 is operable to perform this validation using the following
example
inputs: Module Library, Data Flow Graph (or Mathematical function) of the
application
and constraints values. System 10 is operable to produce the following output:
the
decision whether the design process continues or terminates (i.e. constraints
are valid
or invalid). System 10 is operable to perform the validation according to the
following
algorithm.
Repeat for all the constraints values specified
f
1. Calculate the minimum value of the optimization parameter under
consideration. For
the parameter discussed in the supporting document, calculate the minimum
value of
the hardware area (power consumption)/execution time based on the minimum
-36-
CA 02741253 2011-05-27
resource/maximum resource (considering that whichever parameter among hardware
area, power consumption or execution time is the first user constraint) using
any one of
the functions described below based on the user requirement:
In case of hardware Area-
= (NRI ' K.. + Nxz - K'n ... t N,,. xa,) t A(ka )
where, NR, represents the number of resource Ri and is equal to 1 for all
cases.
Therefore for calculating the minimum area, NR, = NR2 = NR3 = ... = NRn= 1.
Also 'KR;
represents the area occupied per unit resource RI which is obtained from the
user as
input. A(Rclk) refers to the area of clock oscillator used as a resource
providing the
necessary clock frequency to the system. 'KR,' represents the area occupied
per unit
resource 'Ri' (1 <=i<=n)-
In case of power consumption:
Pmjn =(NRI =Kg, +NR2 - K,_ +..t1VR,. -KKn)=p--
Therefore for calculating the minimum area, NR, = NR2 = NR3 = -.- = NRn= 1 _
Moreover, `p-
c' is the slowest clock frequency available in the module library which
consumes the
least power per unit area.
In case of execution time:
7 =[L+(N-1)-T j
'L' and 'TC' should be calculated based on minimum resources considering NR, =
NR2 =
NR3 = .. = NR.= 1- '1..' represents latency of execution, 'Tc' represents the
cycle time of
execution during data pipelining. Also, 'N' is the number of sets of data to
be pipelined
obtained from library (users input).
2. Calculate the maximum value of the optimization parameter under
consideration.
Calculate the maximum value of the hardware area based on the minimum resource
(considering that hardware area is the first user constraint) using the
function described
below-
3a
-37-
CA 02741253 2011-05-27
In case of hardware Area:
A -(Nk;=KR,tNR,=Kkzt...tNm,Kw,)t4(RCA)
Where, NR, represents the number of resource R;. Therefore for calculating the
maximum area, NR1 = NR2 = NR3 = ... = NRn = Maximum resource of certain
functional unit
specified by user in the library. Also 'KRi represents the area occupied per
unit resource
'RP which is obtained from the user as input.
In case of power consumption:
Pm., = (Nx, . K,RI + Nut ' KR2 t .. t Nita . Kw, )=p,.
Therefore for calculating the minimum area, NR, = Nn;-- NR3 = ..- = NRõ =
Maximum
resource of certain functional unit specified by user in the library.
Moreover, 'p,' is the
fastest clock frequency available in the module library which consumes the
maximum
power per unit area
in case of execution time:
Tm~0 =[Lt(N-1) z l
'L' and 'Tc' should be calculated based on maximum resources considering NRi =
NR2 =
NR3 = ... = NRn= Maximum resource of certain functional unit specified by user
in the
library.
Check if Constraint specified satisfies the upper threshold (maximum value)
and
lower threshold (minimum value) of the parameter calculated above in steps 1
and 2.
In other words, let the constraint for hardware area is'Aconsz', constraint
for power
consumption is 'Pconst' and constraint for execution time is 'Twnst'= Then,
the following
conditions are checked:
Airun Acons,<= A,uxx (For Hardware area)
Tm,n `=TLonsj c-- Tmaõ (For Execution Time)
Pmm <=pcons-<= P," ," (For Power consumption)
If the above conditions satisfy then, the design process continues
-38-
CA 02741253 2011-05-27
Elseif the above conditions fail then the design process stops and prompt for
correction
of constraint values.
END
[99] At step 106, system 10 defines a design space as a plurality of variants
representing different combinations of a number of each kind of resource
R1,...... Rn
available to construct the system architecture. Each variant is a vector of
the form:
Võ = (NR,,....NRf )
where NR, 14 I s n represents the number of the kind of resource Ri, wherein
based on
the resource constraints, 1 -s Ng, ;s max NR,.
[100] This initial arrangement of variants of the design space can be made in
any
order. The design space is used by system 10 to visualize the total
architectural variants
available to construct the system architecture. The design space can change
based on
the resources available to construct the system architecture, as defined by
the resource
constraints. The design space is created according to the resource constraints
for total
available resources available to construct the system architecture and can
represent all
different combinations of each kind of available resource, or a subset
thereof.
[101] At step 108, system 10 determines a plurality of satisfying sets of
variants-
Specifically, system 10 is operable to determine at least one satisfying set
for each
optimization parameter. In some examples, system 10 is operable to determine
at least
one satisfying set for each optimization parameter except for the final
optimization
parameter. Each variant in the satisfying set of variants for a given
optimization
parameter satisfies the constraint value(s) for the respective optimization
parameter.
[102] Referring now to FIG. 4, there is shown a flowchart of a method of
determining a
satisfying set of variants for an optimization parameter. In accordance with
some
embodiments described herein, system 10 is operable to implement the method
134 for
each optimization parameter, except the final optimization parameter, in order
to
determine a satisfying set of variants for each optimization parameter-
[103] At step 130, system 10 generates a universe of discourse set by sorting
the
variants of the design space. System 10 is operable to sort the variants of
the design
space using various sorting methodologies. For example, system 10 is operable
to sort
-39-
CA 02741253 2011-05-27
the variants of the design space using priority factors, a hierarchical tree
structure, a
priority order, and an arrangement criterion function. Further details in
relation to sorting
the variants to generate the universe of discourse set will be described in
relation to
FIGS. 7, 8 9, and 10.
11041 At step 132, system 10 assigns a membership value to each variant of the
universe of discourse, where each membership value is from the interval 10,11.
The
resulting set of assigned membership values may be referred to herein as a
fuzzy set.
The fuzzy set may be the same size as the universe of discourse set, such that
there is
a 1-1 mapping between an assigned membership value and a variant. The
membership
value assigned to a variant of the universe of discourse set indicates the
position of the
respective variant in the universe of discourse set. For example, if a variant
is an
extreme position in the universe of discourse set (such as the first element
or last
element, for example) then the membership value assigned to that variant will
indicate
that the variant is in an extreme position. Such a membership value may also
be in an
extreme position in the fuzzy set. As another example, in accordance with some
embodiments, a variant that is the xth position in the universe of discourse
set may be
assigned a membership value that is in the xth position in the fuzzy set.
[1051 In accordance with embodiments described herein, system 10 is operable
to
determine a satisfying set of architectural variants for each optimization
parameter by
performing a fuzzy search of the universe of discourse set. A variant is a
vector that
represents a combination of a number of each kind of resource available to
construct
the system architecture. A fuzzy search is a search based on fuzzy logic
decision
making. Fuzzy logic decision making based searching during OSE in high level
synthesis may reduce the number of architectural variants to be analyzed for
selection
of an optimal system architecture, as compared to an exhaustive search for
example.
[1Q91 Fuzzy set theory involves manipulation of the fuzzy linguistic
variables. The basic
difference between the classical set theory and the fuzzy set theory is the
assignment of
every element x, a value from the interval [0, 1] instead of the two element
set {Q, 1).
where x E U, U being the set of elements. In fuzzy set theory, the
characteristic function
is generalized to a membership function that assigns a membership value to
every
-40-
CA 02741253 2011-05-27
element Y in the set of elements U. The membership function NF of a fuzzy set
F is a
function:
U-~[Q,f]
[1071 The set of sorted variants is referred to herein as the universe of
discourse set.
The architectural variants of the universe of discourse set are assigned
membership
values from the interval [0,11 such that the variants are represented in the
form of a
fuzzy set of membership values between 0 and 1. Each variant corresponds to an
assigned membership value in the fuzzy set based on the characterized
membership
function. The membership value will be assigned to each variant in such a way
so as to
reflect the way that the variants of the universe of discourse set are sorted-
That is,
each assigned membership value indicates the position of the corresponding
variant in
the universe of discourse set. For example, the assigned membership values are
either
increasing or decreasing from the left to the right of the extreme of the
fuzzy set to
reflect the increasing or decreasing order of the corresponding variants in
the universe
of the discourse set. In this theory, only the variants in the extreme
positions in the
universe of discourse set optimization parameter values (which are the minimum
and
the maximum values or maximum and minimum optimization parameter values) are
calculated at the beginning. The membership value of the variants between the
two
variants in the extreme positions will be considered to be directly
proportional to the
position of the variants in the sorted arrangement. The membership value of a
variant
can be calculated by the equations (1) or (2) depending on the kind of
optimization
parameter being considered-
x-a
T = (1)
¾-a
x-/3
Or. T = (2)
a-~B
[108J If the optimization parameter is a time of execution then system 10 is
operable to
calculate the membership values assigned to the variants using equation (2) is
used.
otherwise system 10 is operable to calculate the membership values assigned to
the
variants using equation (1)
-41
CA 02741253 2011-05-27
11091 The optimization parameter value of the variant is assumed to be
proportional to
the position of the variant in the universe of discourse set (i_e. the sorted
design space).
In equation (1) and (2), 'x' is the position of the variant; 'T' represents
the assigned
membership value of the variant which is the xv' position in the universe of
discourse
set; 'a' and '3' are the order of the first element and Me last element in the
universe of
discourse set, thus 'a' is equal to 1 and '[i' is equal to the total number of
variants in the
universe of discourse set.
[1101 A graphical representation of the above function represents a straight
line which
will aid in finding the border variant. The border variant is the last variant
of the universe
of discourse set to satisfy the constraint value for the optimization
parameter such that
all variants to one side of the border variant in the universe of discourse
set satisfy the
constraint value for the optimization parameter and all variants to the other
side of the
border variant in the universe of discourse set do not satisfy the constraint
value for the
optimization parameter. For example, the border variant will be the first
variant that
satisfies the constraint value for the optimization parameter execution time
constraint
and the border variant is the last variant that satisfies the constraint value
for the
optimization parameter area or power.
[111] Referring now to FIG. f, there is shown graphical representations 20Qa,
200b,
200c, 200d of a fuzzy search algorithm used to search for the border variant
in
accordance with embodiments described herein. In particular, there is shown
graphical
representations 200a, 200c of the fuzzy search algorithm for searching a
greater and
lesser optimization parameter value, respectively, such as hardware area or
power
consumption, and graphical representations 200b, 200d of the fuzzy search
algorithm
for searching a greater and lesser optimization parameter value, respectively,
such as
execution time. The x-axes 202a, 202b, 202c, 202d of the graphical
representations
200a, 200b, 200c, 200d refer to the architectural variants of the universe of
discourse
set. The y-axes right side 206a, 206b, 206c, 206d refer to the optimization
parameter
values of the variants and the y-axes left side 204a, 204b, 204c, 204d refer
to the
membership values assigned to the variants The variable 'Tg' refers to the
membership
value for the border variant for the optimization parameter. Similarly, 'Tv'
is the
membership value for the variant under test and Vvar,ant is its respective
value. Similarly,
-42-
CA 02741253 2011-05-27
TMin' and 'TMax' are the membership values assigned to the minimum and maximum
variants in the universe of discourse set, while 'Max' and 'Min' are its
respective
optimization parameter values.
1112] The increasing trend line from left to right of the graphical
representations 200a,
200c for area and power consumption optimization parameters and the decreasing
in
trend line from left to right of the graphical representations 200b, 200d for
execution
time optimization parameter are represented by membership values assigned to
each
variant in the universe of discourse set. The optimization parameter value of
each
variant may be directly proportional to its assigned membership value.
1113] The trend line shown in the graphical representations 200a, 200c for
area and
power consumption optimization parameters represents the increase in
membership
values assigned to each variant in universe of discourse set for area and
power
consumption optimization parameters. The trend line shown in the graphical
representations 200b, 200d for execution time represents the decrease in
membership
values assigned to each variant in the universe of discourse set for the
execution time
optimization parameter. The assigned membership values may not be calculated
using
separate functions for each variant but instead are calculated by applying
equations (1)
and (2) to all variants in the universe of discourse set. System 10 is
operable to sort the
variants of the design space for each optimization parameter to generate a
universe of
discourse set for each optimization parameter and also operable to assign
membership
values to each variant in a way that preserves the order of the variants. For
example,
the variant in the xth position of the universe of discourse set may be
assigned a
membership value that has the xth position with respect to the other assigned
membership values. This is because actual optimization parameter values of the
variants in the design space may be proportional (e.g. directly proportional)
to the
membership values assigned to those variants
1114] A graphical representation 200a illustrates the increase in membership
value for
the area (or power consumption) optimization parameter to illustrate that the
actual area
optimization parameter value for the variants increases in the universe of
discourse set
for area. The optimization parameter values for the variants are approximated
by the
straight line from points 0 208a to R 210a drawn from origin to the maximum.
The point
43
CA 02741253 2011-05-27
M 212a refers to the point in the line corresponding to the constraint value
for the
respective optimization parameter (Vsoraer) system 10. The point 'V,' 214a
indicates the
initial variant closest to the calculated initial membership value (Tin,). The
point P 216a is
a point in the straight line corresponding to the assigned membership value
(h) and the
optimization parameter value (Vvarjani) of variant 'V,'. Now if for example,
the
optimization parameter value (Vvarranr) calculated is less than the constraint
value
(Vso,r), then the search should be performed between points P 216a and point R
210a.
A second straight line from points P 216a and R 210a is approximated for the
increase
in membership values assigned to variants for the area/power optimization
parameter.
In this straight line point N 218a corresponds to the constraint value for the
optimization
parameter- Using the similarity between the triangles 4 PNQ (created by points
P 216a,
N 218a, and Q 220a)ana A PRS (created by points P 216a, R 210a, and S 222a)
the
following function (3) can be attained:
--cY Max- Vv~ (3)
xa -T , VI-I, - VYarwau
[115] System 10 is operable to conduct a similar analysis for graphical
representation
220b with a decreasing trend line for the time of execution optimization
parameter. The
trend line shows the decrease in magnitude of membership value based on the
decrease in actual execution time optimization parameter values of the
variants in the
universe of discourse set for the time of execution optimization parameter.
(1161 A graphical representation 200c illustrates the increase in trend line
for area
optimization parameter (or power consumption optimization parameter). The
point 'M'
212c refers to the point on the trend line corresponding to the constraint
value for the
optimization parameter (V rier). The point 'V,' 214c indicates the initial
variant closest
to the calculated initial membership value (r,,). The point 'P' 216c is a
point in the trend
line corresponding to the actual membership value (rõ) and the actual
optimization
parameter value of the variant (Vvarianm) of variant 'V,'. For example, if the
calculated
variant optimization parameter value is more than the constraint value for the
optimization parameter (Vsoraer) then the system 10 should perform the search
between
points 'P' 216c and point 'Q' 208c. System 10 is operable to approximate a
second
straight tine for the increase in membership values for area optimization
parameter (or
-44-
CA 02741253 2011-05-27
power consumption optimization parameter), and the point 'N' 218c is a point
corresponding to the constraint value for the optimization parameter (Va er).
Now using
the similarity between the triangles A MPN (created by points M 212c, P 216c,
and N
218c) and 4 RPO (created by points R 210c, P 216c, and 0 208c) the following
function
(4) can be derived:
TMm_Tr min -Vvwknr (4)
T-0 - Tp Vam. - Yy1WW
[117] System 10 is operable to conduct a similar analysis for graphical
representation
200d with a decreasing trend line for the time of execution optimization
parameter to
derive the function (4) above. The trend line shows the decrease in value of
membership value based in the decrease in actual execution time optimization
parameter values for the variants.
(118] System 10 is operable to assign membership values to each variant of the
universe of discourse set for each optimization parameter based on the trend
line
(increasing or decreasing) for the optimization parameter, by using equations
(1) or (2)
for example. System 10 is operable to calculate the membership value assigned
to a
given variant as a function of the position of the variant in the universe of
discourse set,
and the order of the first and last elements of the universe of discourse set,
namely 1
and the total number of variants in the universe of discourse set.
[119] Referring back to FIG. 4, at step 134, system 10 determines a satisfying
set of
variants from the universe of discourse set by determining a border variant.
Each vector
of the satisfying set satisfies (or nearly satisfies) the constraint value for
the optimization
parameter. The border variant is the last variant of the universe of discourse
set to
satisfy the constraint value for the optimization parameter such that all
variants to one
side of the border variant in the universe of discourse set satisfy the
constraint value for
the optimization parameter and all variants to the other side of the border
variant in the
universe of discourse set do not satisfy the constraint value for the
optimization
parameter. System 10 determines the border variant by performing a fuzzy
search of
the universe of discourse set using the corresponding membership values
assigned to
the variants of the universe of discourse set.
45-
CA 02741253 2011-05-27
[120] Referring now to FIG. 5 there is shown a flowchart of a method 134 of
determining the border variant by conducting a fuzzy search of the universe of
discourse set using the assigned membership values. The fuzzy search algorithm
may
be used as a means for searching the design space to identify the border
variant after it
has been sorted to generate the universe of discourse set. in accordance with
some
embodiments described herein, system 10 is operable to implement the method
134 for
each optimization parameter, except the final optimization parameter, in order
to identify
the border variant in the universe of discourse set for each optimization
parameter.
11211 At step 140, system 10 calculates an initial membership value. System 10
is
operable to calculate the initial membership value as a function of a maximum
value of
the optimization parameter, a minimum value of the optimization parameter, and
a value
for the border variant based on the constraint value of the respective
optimization
parameter.
[122] In accordance with some embodiments, system 10 calculates the initial
membership value based on the following function (5):
V' Min
T `"' - Max - Min (5)
wherein z,,, is the initial membership value, V,,,,,,,, is the constraint
value for the
respective optimization parameter value, Min and Max are the minimum and
maximum
values for the respective optimization parameter.
[123] At step 142, system 10 determines the variant (from the variants of the
universe
of discourse set) that is assigned a membership value that is closest to the
initial
membership value, which is referred to herein as the closest variant.
[1241 At step 144, system 10 determines whether it has already considered or
checked
the closest variant while implementing the method 134.
[1251 If the system 10 has not already checked that closest variant, then at
step 146,
system 10 calculates an optimization parameter value for the respective
closest variant.
Each optimization parameter may be expressed as a function and that respective
function may be used to determine the optimization parameter value for a given
variant.
[1261 At step 148, system 10 determines whether the border variant is found.
System
10 is operable to determine whether a border variant is found based on a
termination
-46-
CA 02741253 2011-05-27
condition. For example, if the universe of discourse is sorted in increasing
order, such
as for power consumption and hardware area, then system 10 is operable to
implement
the following termination condition: continue fuzzy search until a last
variant at position i
which satisfies Voorder is found where the variant at position i+1 (if
position i is not the
first position) does not satisfy (or nearly satisfy) Vporde. That is, the
border variant has
not been found until system 10 determines the last variant in position i for
which Pi<=
Vbaraer and is the variant with the optimization parameter value that is most
closest to
Vboraer. As another example, if the universe of discourse is sorted in
decreasing order,
such as for time of execution, then system 10 is operable to implement the
following
termination condition: continue fuzzy search until a first variant at position
i which
satisfies Vboraer is found where the variant at position i-1 (if position I is
not the first
position) does not satisfy (or nearly satisfy) Vnoror. That is, the border
variant has not
been found until system 10 determines the first variant in position i for
which Pi<= Vbprder
and is the variant with the optimization parameter value that is most closest
to Vbora$r=
[1271 If the border variant has not been found, at step 150, system 10
calculates a new
initial membership value and returns to step 144 to determine the closest
variant for the
new initial membership value. That is, system 10 returns to step 144 to
determine the
variant (from the variants of the universe of discourse set) that is assigned
a
membership value that is closest to the new initial membership value.
11281 In accordance with some embodiments, system 10 is operable to calculate
the
new initial membership value based on whether the optimization parameter value
of the
variant is less than or greater than a value for the border variant which is
based on the
constraint value of the optimization parameter, which may be referred to as
VBoraer=
[1291 In accordance with some embodiments, system 10 is operable to calculate
the
new initial membership value as a function of the assigned membership value of
the
variant that is in the maximum position in the universe of discourse set, the
assigned
membership value of the variant that is the minimum position in the universe
of
discourse set, the constraint value of the optimization parameter, the
assigned
membership values for the variants in the universe of discourse set, the
optimization
parameter value of the closest variant, the minimum optimization parameter
value of the
-47-
CA 02741253 2011-05-27
variants for the respective optimization parameter and the maximum
optimization
parameter value of the variants for the respective optimization parameter.
[1301 For example, in accordance with some embodiments, if the optimization
parameter value for the closest variant is less than the constraint value of
the
optimization parameter then system 10 may calculate the new initial membership
value
using a function (3) of the form:
t yt. -,r,, Max - VVurwn:
T$ -T' V"d, - VVarw,u (3)
[1311 Or else, system 10 May calculate the new initial membership value using
a
function (4) of the form:
r,00. -TV Min- VYrnw,
T B -T V _ ~~waer - rya. u (4)
wherein is the assigned membership value of the variant that is in the maximum
position in the universe of discourse set, tM,,is the assigned membership
value of the
variant that is the minimum position in the universe of discourse set, Tv is
the assigned
membership value of the variant in the universe of discourse set, z,, is the
calculated
new initial membership value, V,,,,,Qp, is the constraint value of the
optimization
parameter, V,,is the optimization parameter value of the closest variant, and
Max is
the maximum optimization parameter value of the variants for the respective
optimization parameter.
[1321 If at step 144, system 10 determines that the closest variant has
already been
checked, then at step 152, system 10 determines an unchecked variant having a
membership value that is the next membership value and set the new closest
variant to
be the unchecked variant.
[1331 For example, in accordance with some embodiments described herein,
system
10 may first compare the optimization parameter value of the closest variant
to the
constraint value for the optimization parameter. If system 10 determines that
the
optimization parameter value of the closest variant is less than the
constraint value for
the optimization parameter. then system 10 is operable to determine whether
the variant
-48-
CA 02741253 2011-05-27
assigned a membership value that is the next higher membership value has been
checked, and if not then system 10 sets that unchecked variant as the closest
variant
and proceeds to step 146. If system 10 determines that the optimization
parameter
value of the closest variant is more than the constraint value for the
optimization
parameter, then system 10 is operable to determine whether the variant
assigned a
membership value that is the next lower membership value has been checked, and
if
not then system 10 sets that unchecked variant as the closest variant and
proceeds to
step 146.
11341 If at step 148, system 10 determines that the border variant has been
found then
the method 134 ends at step 154. System 10 is operable to determine whether a
border
variant is found based on the termination condition described above. In
addition, if at
step 152, the system 10 determines that the optimization parameter value of
the closest
variant is equal to the constraint value for the optimization parameter, the
system 10 is
operable to determine that the border variant has been found at step 148 and
the
method 134 ends at step 154.
[135) The border variant indicates the last variant of the universe of
discourse to satisfy
(or nearly satisfy) the constraint value of the respective optimization
parameter. As an
example, if the optimization parameter is hardware area or power consumption
then the
universe of discourse set may sort the variants in increasing order of
magnitude and the
border variant may indicate the last variant to satisfy (or nearly satisfy)
the constraint
value. If the optimization parameter is execution time then the universe of
discourse set
may sort the variants in decreasing order of magnitude and the border variant
is the first
variant to satisfy (or nearly satisfy) the constraint value.
[1361 Referring back to FIGS. 2 and 3, after system 10 determines the
satisfying sets
of variants, at step 110, system 10 determines a set of variants based on an
intersection
of the satisfying sets of vectors for the optimization parameters. That is,
the set of
variants includes variants that are in each of the satisfying sets of
variants. In
accordance with some embodiments, each variant in the set of variants
satisfies (or
nearly satisfies) the constraint values for each optimization parameter. In
some
embodiments, the final optimization parameter may have a constraint value that
is
defined in such a way that a satisfying set of variants cannot be determined
-49-
CA 02741253 2011-05-27
independently of the satisfying sets of variants for the other optimization
parameters.
For example, the constraint value for the final optimization parameter may be
defined as
the minimum value for the respective final optimization parameter provided
that the
other constraint values have been met. For such an example, each variant in
the set of
variants satisfies or nearly satisfies) the constraint values for each
optimization
parameter except the final optimization parameter, based on how the constraint
value
for the final optimization parameter is defined. In accordance with some
embodiments,
the set of variants may be referred to as the pareto-optimal set.
[1371 In accordance with some embodiments, system 10 is operable determine
whether the constraint values are valid using the set of variants. System 10
performs
this constraints validation check by determining if the set of variants is
absolutely
vacant. A vacant set of variants signifies that the constraint values provided
are too
tightfstrict. If so, the strict constraint values of the given optimization
parameter need to
be relaxed to a certain extent. The algorithm for used by system 10 to detect
the
problem and resolve is described below:
1 _ Let the variant vectors obtained in the set of vectors (P) after applying
the proposed
design space exploration approach be P = {Va, VD, Vc..... V}, where Va, Vb,
V"...., Vn are
vectors of the design space that are elements of the set of vectors.
2. If the set of vectors, P = 4) (Null), then there exists no variants in the
set P. This
indicates that the constraint values are too tight and it needs to be relaxed.
This is
because there exist no variant vector from the design space that
simultaneously obeyed
the constraint values. Proceed to step #4-
3. Elseif P # co (not null), then there exists variants in the Pareto set, P.
Continue the
design process and stop the validation-
4. Relax the constraint values by a predetermined percentage, such as 5 % for
example, to set new constraint values for the optimization parameter. Using
this
illustrative example, the constraint for hardware area is 'Acou1', constraint
for power
- 50 -
CA 02741253 2011-05-27
consumption is 'Pconst and constraint for execution time is 'Tcor,s1', then
depending on the
user specified constraints, the new constraint values after applying the
relaxation phase
is as follows:
a) Acoaa (new) = Ate,,,, (Original) + 5% of A~,ri, (Origiinal)
b) Tc.. (new) = T... (original) -s- 5% of T,o., (original)
r) P.n%r (new) = Pro.,g (original) t 5% of Pcoon (original)
[1381 At step 112, system 10 selects a variant from the set of variant. System
10 is
operable to select a vector by sorting the variants in the set of variants,
ranking the
vectors or otherwise evaluating the variants from the set of variants in order
to generate
an ordered list of variants- System 10 is operable to select a variant based
on a position
of the variant in the ordered list of variants. For example, the constraint
value for the
final optimization parameter may be a minimum value provided the constraint
values for
all other optimization parameters are satisfied. In such a case, system 10 is
operable to
evaluate each vector of the set of vectors in view of the final optimization
parameter
function, priority function for final optimization parameter, or other
function that may be
used to evaluate the variants in view of the final optimization parameter.
System 10 is
operable to sort the set of variants using the various sorting methodologies
described
herein in relation to FIG. 7, 8, 9, and 10 After the variants in the set of
variants are
sorted system 10 is operable to select the first or last variant to satisfy
(or nearly or
substantially satisfy) the constraint value for the final optimization
parameter depending
on whether the variants are sorted in decreasing or increasing order,
respectively. In
some cases, the selected variant will be in the first or last position in the
sorted set of
variants depending on whether the variants are sorted in decreasing or
increasing
order. The selected variant defines a combination of a number of each kind of
resource
available to construct the system architecture, which satisfies or nearly
satisfies the
constraint values for the optimization parameters and the resource
constraints. If the
vectors of the satisfying set obey the constraints or exceed the value by an
acceptable
amount, such as 5-10% for example, or some other configured acceptable amount,
then
the vectors are said to satisfy or nearly satisfy (or substantially satisfy)
the constrain
respectively. The selected variant should concurrently optimize the
optimization
parameters while meeting the specifications and constraints provided.
- 51 -
CA 02741253 2011-05-27
11391 In accordance with some embodiments, system 10 is operable to determine
a
satisfying set of variants for each optimization parameter including the final
optimization
parameter. Then system 10 determines the set of variants based on the
intersection of
the satisfying sets of variants for all optimization parameters. each variant
in the set of
variants should satisfy (or nearly satisfy) all constraint values. System 10
selects a
variant from the set of variants by sorting or ranking the variants to
generate an ordered
list of variants. System 10 selects a variant from the set of variants based
on a position
of the variant in the ordered list of variants.
[140] At step 114, system 10 provides the selected variant for use in
developing the
system architecture. System 10 is operable to provide the selected variant by
transmitting data represented the selected variant, displaying a
representation of the
selected variant, storing data representing the selected variant on shared
memory for
access by another system or component, print a representation of the selected
vector or
otherwise output the selected variant.
[1411 Alternatively or additionally, at step 116, system 10 develops the
system
architecture using the selected variant. The developed system architecture may
comprise, for each kind of resource Ri in the plurality of Kinds of resources
R1.....Rn,
the number Ni of that kind of resource defined by the selected variant.
11421 In accordance with some embodiments, the system architecture may include
a
Register Transfer Level data path circuit and a Register Transfer Level
control timing
sequence. The Register Transfer Level data path circuit may be configured to
generate
output data as a result of performing a sequence of operations on data using
Register
Transfer Level modules, where the Register Transfer Level modules include the
number
of each kind of resources represented by the selected variant. Examples of
Register
Transfer Level modules include registers for storage of data, memory modules,
latches
for sinking of data, multiplexers and demultiplexers. The Register Transfer
Level control
timing sequence provides a control configuration for a data path circuit to
provide timing
and synchronization required by data traversing through the Register Transfer
Level
modules of the data path circuit.
11431 System 10 is operable to represent the combination of the number of
resources
R 1,.. .. Rn of the selected variant in a temporal and a spatial domain using
a
-52-
CA 02741253 2011-05-27
sequencing and binding graph and a plurality of registers- System 10 is
operable to
determine a multiplexing scheme for the resources R1....... Rn of the selected
variant,
with inputs, outputs, operations, interconnections and time steps. System 10
is operable
to produce the Register Transfer Level data path circuit using the
multiplexing scheme
[144] System 10 is operable to produce an integrated circuit, such as a FPGA
or ASIC
for example, using the system architecture.
[145] Referring now to FIGS. 7. 8, 9 and 10 there is shown flow chart diagrams
of
methods 130a, 154a,154b,130b of sorting the variants of the design space to
generate
a universe of discourse set in accordance with embodiments described herein.
As
explained herein these methods 130a, 154a, 154b, 130b may also be used to
generate
an ordered list of variants by sorting the set of variants determined based on
the
intersection of the satisfying sets, with respect to a particular optimization
parameter,
such as the final optimization parameter or other optimization parameter.
[145] Referring to FIG, 7, at step 150, for each of the plurality of
optimization
parameters, system 10 is operable to define a priority factor function for
each Kind of
resource R1, ...... Rn. A priority factor function defines a rate of change of
the
optimization parameter with respect to a change in a number N,Q, of the
corresponding
Kind of resource Ri, 1 s i n. Fxamples of priority factor functions, including
an analysis
as to how the priority factor functions are derived, are described in detail
in the
Applicants' U.S. Patent Application No. 12/974,925 filed December 21, 2010,
the entire
contents of which is hereby incorporated by reference.
[147] The term Priority Factor (PF) will be used herein as an example approach
to
sorting the variants of the design space. The PF may be used as a determining
factor
which helps judge the influence of a particular resource an the variation of
the
optimization parameters such as area, time of execution, power consumption,
and so
on. System 10 is operable to use the PF will be used later to organize the
architecture
design space consisting of variants in increasing or decreasing order of
magnitude.
[145] The optimization parameters may include hardware area, cost, time of
execution,
and power consumption, each of which may be defined by a optimization
parameter
function. As explained in Applicants' U.S. Patent Application No. 12/974,925
filed
December 21, 2010, system 10 is operable to generate the priority factor
functions by
-53-
CA 02741253 2011-05-27
applying partial derivatives to the optimization parameter function and by
using the
theory of approximation by differentials.
[149] An optimization parameter may be hardware area of a total number of all
kinds of
resources RI ......Rn. As an example, the total hardware area may be
represented by
the following function (6):
A A(R) (5)
where A represents the hardware area of the total number of all kinds of
resources
Ri ------ Rn, A(Ri) is the hardware area of the total number of resource Ri,
where Ri
denotes a resource available to construct the system architecture, 1 s i:5 n .
[150] Area can be expressed as the sum of the resources i.e. adderlsubtractor,
multiplier, divider etc and also the clock frequency oscillator. Therefore for
a system with
'n' functional resources equation (6) can also be represented as shown in
equation (7):
4T(Nx, KR,*' -Kx,*...*N,~,'i, )+A(1.) (7)
Where NR, represents the number of resource Rand 'KR,' represents the area
occupied
per unit resource 'RI (1<=i<=n);
[151] For the optimization parameter hardware area, the system 10 is operable
to
define for hardware area a priority factor function of each kind of resource
R1...... Rn
that is an indicator of a change of area contributed by a change in the number
of the
kind of resource Ri, wherein t s a s n. System 10 uses the optimization
parameter
hardware area to determine how a variation in area is affected by the change
of a
number of a certain resource so the priority factor for hardware area may be
the rate of
change of area with respect to the change in a number of resources.
[152] For example, for the hardware area, system 10 is operable to define the
priority
factor for each kind of resource R1...... Rn that is not a clock oscillator as
a function of
NR,, ANt,, KR, wherein IV,, is the number of the kind of resource Ri, K1e, is
an area
occupied per unit resource Ri, AN,, is the difference in number of a maximum
and
minimum resource Ri, ANx, -,K,, is a change of area contributed by the kind of
resource
Ri, wherein Ri is a member of the kinds of resources R1,...- .Rn. For each
resource Ri
that is a clock oscillator system 10 is operable to define the priority factor
as a function
of AA(R,.,,) Nxc,E, R~.,k , wherein R,,k is a clock oscillator used to
construct the system
- 54
CA 02741253 2011-05-27
architecture, AR(R,,k)is a change of area occupied by clock oscillators,
NRC,,k is a number
of clock oscillators.
[153] As a specific example, for the hardware area, system 10 is operable to
use a
priority factor (8) for each kind of resource R1...... Rn that is not a clock
oscillator of the
form:
Ka,
PF(Ri) _ AN' . R.
NR, (8)
[1541 For the hardware area, system 10 is operable to use a priority factor
function (9)
of resource Ri that is a clock oscillator of the form:
AA(Rr,k )
N8,1k (9)
11551 Another optimization parameter may be a time of execution of a total
number of
all kinds resources R1, ..... Rn. The time of execution can be represented by
the
following optimization parameter function (10)-
Ttar =[L+(D-1)' ).r,,, (10)
where L represents latency of execution, T,: represents the cycle time, Q
represents the
number of data elements to be processed, and Tp represents the time period of
the
clock.
[156] For the time of execution, system 10 is operable to use a priority
factor function
for each kind of resource R1....... Rn that is a function of the rate of
change of a cycle
time with a change in the number NR, of the kind of resources Ri at a maximum
clock
period, wherein 1 a i s n and Ri is a member of the kinds of resources
R1...... Rn.
[157] For example, system 10 is operable to use a priority factor function for
the time of
execution of the resources Ri.......Rn that is not a clock oscillator that is
a function of
NR, TR, T li37 wherein NR, is the number of the kind of resource Ri, T,, a
number of
clock cycles required by the kind of resource Ri to finish each operation, TP
is the time
period of the clock, Tp"'" is the maximum clock period. For resource Ri that
is a clock
oscillator, system 10 is operable to use a priority factor function that is a
function of R,,k
NR, TR, R,,k NR,,,,. where R ,k is a clock oscillator used to provide
necessary clock
-55-
CA 02741253 2011-05-27
frequency to the system, N., is the number of the kind of resource Ri, NK,p is
the
number of clock oscillators, TR, a number of clock cycles required by the kind
of
resource Ri to finish each operation.
11581 As a specific example, system 10 is operable to use a priority factor
function (11)
for the time of execution of the resources R1....... Rn that is not a clock
oscillator of the
form.
PF(R,) = ~R(b'r`a''
(11)
[1591 System 10 is operable to use a priority factor function (12) of resource
Ri that is a
clock oscillator of the form:
FF R . NRI 11 TR, -r N.== TR2 .....+ Ntõ = TR. (AT,)
NR,.,, (12)
[160] As another example, for the time of execution, system 10 is operable to
use a
priority factor function for each kind of resource R1....... Rn that is a
function of a
difference between the time of execution when resource Ri, wherein i is of the
interval
[1,n], is at its minimum value when all other resources are at their maximum
value and
the time of execution when resource Ri is at its maximum value when all other
resources are at their minimum values.
11611 System 10 is operable to use a priority factor function for the time of
execution of
the resources R1....... Rn that is not a clock oscillator that is a function
NR, Ty j- T4fA
wherein NR, is the number of the kind of resource Ri, T, ' and T, rare the
maximum
and minimum value of the execution time when resource Rn is maximum aria
minimum,
respectively, at maximum clock frequency all other resources being maximum.
11621 System 10 is operable to use a priority factor function of resource Ri
that is a
clock oscillator that is a function of N,,,,,,, T'" ' T*,k where NRclc is the
number of clock
oscillators T~rx` and T' , are maximum and minimum values of execution time
when the
clock period is maximum and minimum respectively, and all available resources
have a
maximum value.
-56-
CA 02741253 2011-05-27
[163] As a specific example of such a priority factor function for the time of
execution of
the resources R1....... Rn that is not a clock oscillator, system 10 is
operable to use a
priority factor function (13) of the form:
TM4 --T"r"'
PF(Rn)
Nei (13)
[164] System 10 is operable to use a priority factor function (14) for
resource Ri that is
a clock oscillator of the form.
T ck - TRc,k
PF(Relk)
NW IL (14)
[165] System 10 is operable to determine the TR `" in the above equations (13)
(14)
using the following algorithm.
[166] System 10 is operable to analyze the architecture configuration needed
to
calculate the TR `A`. By definition 'TRflM ' is the maximum execution time at
maximum
clock frequency when resource 'Rn' is minimum and all other resources being
maximum. Therefore the architecture needed to calculate TR"xis Rn = 1 and rest
of the
resources Ri = NMax, where i = any real integer number (any other resource
type) except
n (i i6 n), NMax= Maximum number of resource Ri specified by the user in the
specification and Rclk = Max.
[167] System 10 is operable to use the architecture configuration analyzed
above to
determine the latency (14 from the scheduling graph and Cycle time from the
pipelined
timing diagram for 'D' sets of pipelined data based on the user specified user
module
library specifications.
[166] System 10 is operable to determine the Texe using the function (10):
T. =[L +(D-1)=T,J'TE,
where `1_' represents latency of execution, ' Tc'represents the cycle time,
'O' denotes the
number of data elements to be processed and 'Tp' is the time period of the
clock.
-57-
CA 02741253 2011-05-27
[169] System 10 is operable to determine the T,`" in the above equation by
implementing the following algorithm:
[1701 System 10 is operable to analyze the architecture configuration needed
to
calculate Tj'n . Based on the definition, the architecture needed to calculate
Tj ~` is
Rn = maximum, rest of the resources RI = NM and Rclk = Max, where i = any real
integer number (any other resource type) except n (i 0 n) and NMax= Maximum
number
of resource RI specified by the user in the specification.
[1711 System 10 is operable to use the architecture configuration analyzed to
determine the Latency L from the scheduling graph and cycle time from the
pipelined
timing diagram for the application for 'D' sets of pipelined data based on the
user
specified user module library specifications.
[172] System 10 is operable to determine the TeAe using the function (10);
T..,, =[L+(D-1)=T FTp
[1731 Where 'L' represents latency of execution, ' Tc' represents the cycle
time, 'D'
denotes the number of data elements to be processed and ' Tp is the time
period of the
clock.
[1741 System 10 is operable to determine the tk in the above equations (13)
(14)
by analyzing the architecture configuration needed to calculate the TR dk . By
definition
TR M4` ' is the maximum execution time, when resource 'Rclk' is minimum clock
frequency and all other resources are maximum. Therefore the architecture
required to
calculate TR' k is Rclk = Minimum clocK frequency (Maximum clock period), rest
of the
resources Ri = NMa, where i signifies any resource type except 'clk' (i # clK)
and Nea=
Maximum number of resource Ri specified by the user in the specification
[175] System 10 is operable to determine the Latency and Cycle time for the
application for 'D' sets of pipelined data based on the user specified user
module library
specifications using the architecture configuration analyzed.
(176] System 10 is operable to determine the Tee using the function (10),
T'a=1Lt(D-D-T'1-T
-58-
CA 02741253 2011-05-27
where 'L' represents latency of execution, 'T,'represents the cycle time, 'D'
denotes the
number of data elements to be processed and 'TT' is the time period of the
clock.
[1771 System 10 is operable to determine the Turk in the above equation by
analyzing the architecture configuration needed to calculate the TR k . By
definition:
'TRcfxM' ' is the minimum execution time, when resource 'Rclk' is maximum
clock
frequency and all other resources being maximum. Therefore the architecture
needed to
calculate 'i,rk is Rclk = Maximum clock frequency (Minimum clock period), rest
of the
resources Ri = NMax; where i signifies any resource type except 'cik' (i 0
clk) and NMax=
Maximum number of resource Ri specified by the user in the specification.
(1781 System 10 is operable to determine the latency and Cycle time for the
application for 'D' sets of pipelined data based on the user specified user
module library
specifications using the architecture configuration analyzed above.
11791 System 10 is operable to determine the Texe using the function (10):
Ta. = fLi(D-I)=T,1=Tp
where 'L' represents latency of execution, ' Tc' represents the cycle time, 1Y
denotes the
number of data elements to be processed and 'Tp is the time period of the
clock.
[180] The above priority factor functions for time of execution indicate the
average
change in execution time with the change in number of a particular resource
and
change in clock frequencies, respectively. System 10 is operable to use a
priority faction
function to calculate a real number, which may be referred to as a priority
factor, which
represents the extent to which a change in number of a particular resource
contributes
to the change in execution time.
11811 Another optimization parameter is a power consumption of the resources
R 1. . . ... Rn. The total power consumption of the resources R1,. _ .. Rn can
be represented
by the following function (15)_
F= (N,r,-KR,)=p,
(15)
-59-
CA 02741253 2011-05-27
[1821 where N,,, is the number of resource Ri, KR, is an area occupied per
unit of
resource RI, and p, is power consumed per area unit resource at a particular
frequency
of operation.
[1831 For power consumption, system 10 is operable to use a priority factor
function for
each kind of resource R1...... Rn that is a function of a change in power
consumption
per unit area due to deviation of clock frequency from maximum to minimum and
a
change in the number NR, of the kind of resource Ri at maximum clock
frequency,
where Isis n, and Ri is a member of the kinds of resources R1,_._-..Rn.
[1841 For power consumption, the priority factor function may indicate the
rate of
change in the total power consumption with the change in the number of
resources at
maximum clock frequency, in accordance with embodiments described herein. The
maximum clock frequency may be considered because the total power consumption
is
maximum at this frequency. Hence, the change in the number of a specific
resource at
maximum clock frequency will influence the change in the total power
consumption the
most, compared to the change at other clock frequencies.
[1651 For the power consumption optimization parameter, for the resources
R1....... Rn
that is not a clock oscillator, system 10 is operable to use a priority factor
function that is
a function of N,,, K,,,, AN,,,, (p,)ml", p, where NR, is the number of
resource Ri, Kh is
an area occupied by resource Ri, ANR4 =KR. is a change of area contributed by
resource
Ri, p, is power consumed per area unit resource at a particular frequency of
operation,
(pd"'" is power consumed per area unit resource at a maximum clock frequency.
For
resource Ri that is a clock oscillator, system 10 is operable to use a
priority factor
function that is a function of N, TR, RR,, NR,.,A., p, where icy,,, is a clock
oscillator used to
provide necessary clock frequency to the system. NR, is gfthe number of the
kind of
resource Ri, TRe a number of clock cycles required by resource Ri to finish
each
operation, p, is power consumed per area unit of resource at a particular
frequency of
Operation.
[1861 For power consumption, as a specific example, for resources R1....... Rn
that is
not a clock oscillator, system 10 is operable to use a priority factor
function (16) of the
form,
-60-
CA 02741253 2011-05-27
PF(Ri) = ANN= KR. (P")""& (16)
N,
1187] For resource Ri that is a clock oscillator, system 10 is operable to use
a priority
factor function (17) of the form:
PF(Rca) W NRl'p TRi t NR2 * TR2.." NRn 0 TRn (AP J
NRtrk (17)
[188] Another example optimization parameter is a total cost of the total
number of all
kinds resources R1...... Rn.
[189] As noted herein the total area can be defined by the following functions
(18) (19):
.~ = 2 A(Ai) (18)
R-(Nõ,-AK,+Nx: K, + rtNp KRõ)+A(J( )+NõM'A~, (19)
where 'NRi' represents the number of resource 'Ri', 'KRi' represents the area
occupied
per unit resource 'Ri', 'NRM' represents the number of memory elements present
(such
as registers) and 'KRM' represents the area occupied by each memory element.
Let the
total cost of all resources in the system is 'CR'. Further, cost per area unit
of the
resource (such as adders, multipliers etc) is given as 'CR,', the cost per
area unit of the
clock oscillator is 'CRcrk' and finally the cost per area unit of memory
element is 'CRM'.
Therefore total cost of the resources may be defined by the following function
(20);
CR"(Nkl n.lrtNK,-KR'1t..tIV K,,)CK,tA(RRu)-CC..+N 'K,,,,-C,w, (20)
[1901 For the total cost, the priority factor function for each kind of
resource RI....... Rn
is an indicator of change in total cost of the total number of all kinds
resources
R1....... Rn with respect to a change in the number of the kind of resource Ri
and the
cost per unit resource, where I s i s n.
[191] For the cost, the priority factor function for each kind of the
resources R1,_.....Rn
that is not a clock oscillator may be a function of N, , , KR,, ANR, , C,,,
wherein NR, is the
number of the kind of resource Ri, KN,is an area occupied by the Kind of
resource Ri,
ANk, -KR, is a change of area contributed by the kind of resource Ri. Cis the
cost per
area unit of the Kind of resource Ri. For the cost, the priority factor
function of resource
Ri that is a clock oscillator may be a function of 8r , ]Vkilk , AA (Rrj,
CRrA , wherein R,.,k is
a clock oscillator used to provide necessary clock frequency to the system,
AA(R,,k)is a
-61-
CA 02741253 2011-05-27
change of area occupied by clock oscillators, N,,,, is a total number of clock
oscillators
available to construct the system architecture, C,,, is the cost per area unit
of clock
oscillators.
11921 As a specific example, for the cost, the priority factor function (21)
for each kind
of the resources R1....... Rn that is not a clock oscillator may be of the
form:
P.F(Rt)= ANK,'KR.'CR.
NR, (21)
11931 The priority factor function (22) of resource Fri that is a clock
oscillator may be of
the form:
PF(R~,k)= A(RC,x)'CRcu
NR~" (22)
11941 At step 152, for each of the plurality of optimization parameters,
system 10 is
operable calculate a priority factor for each kind of resource R1....... Rn
available to
construct the system architecture, using the corresponding priority factor
function for the
respective optimization parameter. The priority factors provide a measure of
the change
in an optimization parameter with the change in number of a specific kind of
resource
11951 At step 154, system 10 is operable to sort the variants of the design
space to
generate the universe of discourse set based on a relative magnitude of the
calculated
priority factors.
[1961 As an example of sorting based on the relative magnitude of the
calculated
priority factors, referring to FIG. 8, at step 156, system 10 is operable to
generate a tree
representing the sorted design space based on the priority factors.
1197] System 10 is operable to generate the tree by, for each optimization
parameter,
assigning to the kind of resource Ri with the highest priority factor a level
one of the tree
represented by a root node with maxNR, branches, as each branch represents a
variant
for that kind of resource. Further, system 10 is operable to, for the kind of
resource Rm
with the next highest priority factor, assign a level two of the tree
represented by nodes
at each branch extending from the level 1 above. Each node in level two has
maxNRm
branches, as each branch represents a variant for that kind of resource Rm.
System 10
is operable to continue to assign a tree level to each kind of resource in
this manner
until level n is assigned to a kind of resource R1...... Rn. Finally system 10
represents
-62-
CA 02741253 2011-05-27
all variants as nodes at level n+1 of the tree, with a node at each branch
extending from
the level n above. That is, the number of branches extending from level n
should
correspond to the number of variants in the design space.
[1981 As another example of sorting based on the relative magnitude of the
calculated
priority factors, referring to FIG. 9, at step 166, system 10 is operable to
determine a
priority order for the respective optimization parameter by sorting the
calculated priority
factors based on a relative magnitude of the calculated priority factors. For
example,
according to the priority factors calculated, the priority order may be
arranged so that
the resource with the lowest priority factor is assigned the highest priority
order while
the resource with the highest priority factor is assigned the lowest priority
order. The
priority order of the resources increases with the decrease in priority factor
of the
resources.
[199] At step 160, system 10 is operable to sort the variants of the design
space to
generate the universe of discourse set based on the priority order for the
respective
optimization parameter. Further, system 10 is operable to select the selected
variant
using the priority order for the final optimization parameter.
(200] Referring now to FIG. 12, which illustrates flow chart of a method 300
for
generating an ordered list of vectors by arranging the random design space in
an
organized increasing order for the power consumption parameter in accordance
with an
example embodiment. The arrangement is based on priority order (PO) sequencing
from step 158 (FIG. 9). In this example, the variants will be placed in such a
way that
the variant first in the set will have the least/greatest value for the
respective
optimization parameter and the variant last in the set will have the greatest/
least value
for the respective optimization parameter.
[201] Let NR, represent the number of a particular resource Ri, and at step
302, the
initial number of all Kinds of resources is set to one, NR,,...... NRn=1-
[2021 Let position 'p' represent the position where a particular variant
vector is located
within the arranged design space, and at step 304, the position 'p' is set to
one (p = 1)
and the variant vector (NR, , ...... NRn) is assigned to position 'p'.
12031 Let 'i' be an index, and at step 306, 'i represents the resource whose
PO is
maximum.
-63-
CA 02741253 2011-05-27
[204] Let NR,m,.. (also referred to herein as maxNR,) represent the maximum
number of
a kind of resource Ri, at step 308, it is determined whether NR,= NRõW.
[205] If Nk,does not equal NR,a,n,, then at step 310, NR, is increased by one.
[206] At step 312, variant vector (NR. ....... NRn) is assigned to position
'p+1'.
(207] At step 314, position 'p' is increased by one, p=pt1.
[208) Let p(final) be the final position according to the maximum number of
design
options available, which is the number of vectors of the design space, and at
step 316, it
is determined whether p=p(f nal).
[209] If p does not equal p(final) then method 300 returns to step 306.
[210] If p does equal p(final) then method 300 proceeds to step 318 and ends.
[211] If N,,, equals N~ õwõ . then at step 320, NR, is reset to one.
[212] At step 322, i represents the next resource with the next higher PO, and
the
method returns to step 308.
[213] Referring now to FIG. 10, there is shown another example sorting method
130b
for generating the universe of discourse set. This method sorts the variants
in order,
increasing or in some cases decreasing order depending on the optimization
parameter,
using a hierarchical tree structure.
[214] At step 170, system 10 is operable to locally arrange the variants of
each kind of
resource Ri in order (decreasing or in some cases increasing depending on the
optimization parameter) with respect to optimization parameter. For each kind
of
resource Ri, system 10 is operable to arrange the variants in increasing order
by
number of that kind of resource from 1 to MaxNR,.
[215] At step 172, system 10 is operable to define an arrangement criterion
function for
arranging each kind of resource. In accordance with some embodiments, system
10 is
operable to define one arrangement criterion function for use in relation to
all types of
optimization parameters- The arrangement criterion function may be based on a
value
of the optimization parameter with the maximum number of resources, a value of
the
optimization parameter with the minimum number of resources, and the number of
variants of the respective kind of resource.
[216] For example, for all types of optimization parameters and for each kind
of
resource, system 10 is operable to use the following arrangement criterion
function (23):
-64-
CA 02741253 2011-05-27
K(Rs) = Psm.,a (Rd) ... P m t(RI)
MI-1 (23)
where Psmax represents the value of the optimization parameter with the
maximum
number of resources, Psm,n represents the value of the optimization parameter
with a
critical variant configuration, and mi is the number of variants of resource
Ri. The critical
variant configuration may be a function of the minimum number of resource Ri
with the
maximum number of all others kinds of resources.
[217] At step 174, system 10 is operable to calculate an arrangement criterion
value for
each kind of resource using the arrangement criterion function.
12181 At step 176, system 10 is operable to assign a level of a hierarchical
tree
structure to each kind of resource based on the relative magnitude of the
arrangement
criterion values. For example, system 10 is operable to assign the Kind of
resource with
the highest value to level 1, the kind of resource with the second highest
value to level
2, and so on.
[2191 At step 178, system 10 is operable to generate a hierarchical tree
structure
based on the relative magnitude of the calculated arrangement criterion
values. System
10 is operable to generate the tree structure as described at step 166 of FIG.
8 except
system 10 uses the relative magnitude of the calculated arrangement criterion
values
instead of the relative magnitude of the priority factor functions. Each leaf
node (node
with no children branches) at level nt1 represents a variant of the design
space (partial
or full) that is sorted in increasingidecreasing order of magnitude with
respect to the
optimization parameter from left to right. The branches from each level of
nodes
represent the variants of the kind of resource corresponding to that level in
order as per
the local arrangement determined at step 170, namely 1 to maxNR,. Each variant
is
determined based on the path from the root node at level 1 to the leaf node
that
corresponds to the variant, where each branch represents the number of a kind
of
resource that corresponds to the level of node the branch stems from in the
hierarchical
tree structure.
[2201 An illustrative example will be explained with reference to FIGS. 2 to
9.
12211 Referring to FIG. 2, at step 102, system 10 defines the following
resource
constraints based on specifications received as input data, where each
resource
-65-
CA 02741253 2011-05-27
constraint corresponds to a maximum number N1e,, 1!5; 1 s n. of each kind of
resource
R1 ....... Rn available to construct the system architecture:
3 adder/subtractor units.
4 multiplier units
2 clock frequency oscillators: 50 MHz and 200 MHz
[2221 The following example specifications are also received by system 10 for
each
kind of resource R1....... Rn:
No. of clock cycles needed for multiplier to finish each operation: 4 clock
cycles (cc)
No. of clock cycles needed by the adder/subtractor: 2 cc
Area occupied by each adder/subtractor: 20 area units (a.u.) on the chip.
(e.g. 20 CLB on an FPGA)
Area occupied by each multiplier: 100 a.u. on the chip. (e.g. 100 CLB on
an FPGA)
Area occupied by the 50MHz clock oscillator: 4a. u.
Area Occupied by the 200 MHz clock oscillator: 1 Oa.u.
Power consumed at 50 MHz. 10 milli-watt (mW)/a.u.
Power consumed at 200 MHz: 40mW/a.u.
[223] At step 104, system 10 defines the following constraint values for each
of at least
three optimization parameters based on received input data:
Maximum power consumption: 8 W
Maximum time of execution: 140 ps (For 1000 sets of data)
Hardware area of resources: Minimum while satisfying the above
constraints-
[2241 In this illustrative example, power consumption, time of execution and
hardware
area of the resources are three optimization parameters with constraint values
to be
satisfied. In this example, hardware area is the final optimization parameter
as its
constraint value is defined as a minimum value while satisfying the other
constraint
values, so a satisfying set of variants cannot be determined for this
optimization
parameter independently of the other optimization parameters. Hardware area
typically
is a major design objective, but for the current generation of multi objective
VLSI
-66-
CA 02741253 2011-05-27
computing systems such as the portable devices (e.g. mp3 players, PDA's,
mobile etc),
power consumption may also act a major design constraint, Hence power
consumption
has been used as an example for an optimization parameter with a specified
constrain;
value. This is an arbitrary example only and execution time, power or cost may
be used
as final optimization parameters.
12251 During the problem formulation stage for high level synthesis the
mathematical
model of the application may be used to define the behavior of the algorithm-
The model
suggests the input/output relation of the system architecture and the data
dependency
present in the function. For this illustrative example a digital OR
Butterworth filter may be
used as a benchmark to demonstrate the design space exploration approach in
accordance with embodiments described herein. The choice of the OOR
Butterworth filter
is arbitrary and other filters may be used. The transfer function of a second
order digital
IIR Butterworth filter can be given as:
H(z) - Y(z) z' rt 3z2 t 3z t 1
X(T) 6z' t2z (24)
y(n) - 0.167x(n) t O.Sx(i7 -1) t O.Sx(ri - 2) t 0.167x(n - 3)
- 0.33y(n -- 2) (25)
where H (z) denotes the transfer function of the filter in the frequency
domain and x(n),
x(n-1), x(n-2), x(n-3) represent the input variables for the filter
respectively in time
domain, y(n) and y(n-2) represent the present output of the filter and the
previous output
of the filter in the time domain.
(226] At step 106, system 10 defines the design space for this illustrative
example as
follows.
V1N1,1,1) V7=(1,3,2) V13=(2,1,2) V19=(3,3,1)
V2=(1,2,1) V8=(1,4,2) V14=(2,2,2) V20=(3,4,1)
V3=(1, 3,1) V9=(2,1,1) V15=(2,3,2) V21=(3,1,2)
V4=(1,4,1) V1O=(2,2,1) V16=(2,4,2) V22=(3,2,2)
V5=(1,1,2) V11=(2,3,1) V17=(3,1,1) V23=(3,3.2)
V6=(1,2,2) V12=(2,4,1) V18=(3,2,1) V24=(3,4,2)
-67-
CA 02741253 2011-05-27
[227] At step 108, system 10 determines a satisfying set of vectors for the
optimization
parameters that are not the final optimization parameter, namely, power
consumption
and time of execution. As noted herein, in accordance with some embodiments,
system
is operable to determine satisfying set of vectors for all the optimization
parameters
5 and at step 110 determine a set of vectors based on the intersection of the
satisfying set
of vectors for all the optimization parameters
[228] System 10 is operable to determine a satisfying set of variants as
described in
FIG. 4 for each optimization parameter, except the final optimization
parameter. At step
130, system is operable to sort the design space to generate a universe of
discourse
10 set. As described herein system 10 is operable to sort the design space
using a variety
of methodologies.
[229] As an illustrative example, system 10 is operable to sort the design
space as
described in FIGS. 7 to 9.
[230] System 10 implements steps 150 and 152 for each optimization parameter
except for the final optimization parameter, which for this example are power
consumption and time of execution.
[2311 At step 150, system 10 defines the priority factor function for the
optimization
parameters power consumption as per equations (16) (17) described herein and
reproduced below for ease of reference:
PF(Rn } ¾ A'VRR ' Kft = ~ c /
NR" {i6)
PF(Rcdk) NR1 'TRI +NR7 =TRZ +..+Nw, -Tx,, (aPj
'NR; represents the number of resource of the kind of resource R1,'KR,'
represents the
area occupied per unit resource Ri and 'pc' denotes the power consumed per
area unit
resource at a particular frequency of operation. Where 'TR, represents the
number of
clock cycles needed by resource 'Ri' (1 c i<=n) to finish each operation.. In
the above
equations, the maximum clock frequency was considered because the total power
consumption is maximum (i.e. (pt) max) at this frequency. Hence, the change in
the
number of a specific resource at maximum clock frequency will influence the
change in
the total power consumption (P) the most, compared to the change at other
clock
-68-
CA 02741253 2011-05-27
frequencies. ,O,pc is the difference in power consumption at maximum and
minimum
frequency.
[2321 At step 152, for each kind of resource available to construct the system
architecture, system 10 calculates a priority factor using the priority factor
function for
the power consumption optimization parameter. For power consumption
optimization
parameter, system 10 calculates the following priority factors for each kind
of resource:
For resource adder/subtractor (R1): PF(R1 533.33
For resource multiplier (R2): PF(R2) - 3000
For resource clock oscillator (Rak): PF(Rcjk) 6900
[2331 At step 154, system 10 sorts the variants of the design space based on
the
calculated priority factors for each kind of resource in order to generate the
universe of
discourse set.
[2341 Referring now to FIG. 8, there is shown an example method 15ta of
sorting the
variants of the design space based on the priority factors. At step 156,
system 10
generates a hierarchical tree structure 250 (FIG. 11) representing the
universe of
discourse set based on the priority factors by assigning a level of the tree
to each
resource based on the relative magnitude of the corresponding calculated
priority factor.
[2351 In order to generate the hierarchical tree structure, system 10 is
operable to
assign the resource type with highest priority factor at Level t.1 represented
as a root
node with branches corresponding to each of its variants. The resource type
with next
highest priority factor at L2 with all its variants as branches, continuing in
this manner
until system 10 reaches the resource type with lowest priority factor and
assigns it last
level of the tree with all its variants as branches of the tree. The
hierarchical tree
structure represents the sorted variants as leaf nodes from the branches of
the last
assigned level.
[236j Since the priority factor for resource Rclk is the highest it is
assigned level 1.1 of
the hierarchical tree structure. Since the priority factor for resource R2
multiplier is the
next highest hence it is assigned level 2 of the hierarchical tree structure.
Finally, the
priority factor for resource (R1) adder/subtractor is the lowest so it is
assigned the last
level of the hierarchical tree structure.
-69-
CA 02741253 2011-05-27
[237] After assigning levels to each Kind of resource, system 10 is operable
to
construct a representation of the hierarchical tree structure based on the
assigned
levels.
[238] Referring now to FIG. 11, there is shown a representation of the
hierarchical tree
structure 250 for the power consumption optimization parameter for this
example in
accordance with embodiments described herein.
[239] In this example, system 10 constructs a hierarchical tree structure 250
with three
levels L1 252, L2 254,1-3 256. The resource Rclk is assigned 1-1 252 and is
represented
as a root node 258 with two branches 260, 262 since it has two variants. The
resource
R2 multiplier is assigned L2 254 and is represented as two nodes 264, 266
(corresponding to the two branches 260, 262). Each of the two nodes 264, 266
has four
branches 268, 270, 272, 274, 276, 278, 280, 282 since the resource R2
multiplier has
four variants. The resource R1 adder/subtractor is assigned L3 256 and is
represented
as eight nodes 284, 286, 288, 290, 292, 294, 296, 298 (corresponding to the
eight
branches 268, 270, 272, 274, 276, 278, 280, 202). Each of the nodes 284, 286,
288,
290, 292, 294, 296, 298 has three branches since the resource R1
adder/subtractor has
three variants.
[240] System 10 uses the hierarchical tree structure 260 to sort the variants
of the
design space to generate the universe of discourse set. The variants are
indicated as
the final leaves in the tree as they correspond to the specific combination of
resources
represented by the path in the tree from root to leaf. That is, each path from
Li to L3
represents a combination of resources, which in turn corresponds to a variant-
For
example, the path from the resource RcIK at the root node 252 to branch 260
represents
the subset of variants with one of resource Rclk and the path from the
resource Rclk at
the root node 252 to branch 262 represents the subset of variants with two of
resource
RclK. The path from resource R2 multiplier node 264 to branch 268 represents
that
subset of variants with one of resource R2 multiplier, the path from resource
R2
multiplier node 264 to branch 270 represents that subset of variants with two
of
resource R2 multiplier, the path from resource R2 multiplier node 264 to
branch 272
represents that subset of variants with three of resource R2 multiplier, and
so on. This
also applies to the resource R1 adder/substractor nodes 284, 286, 288, 290,
292, 294,
-70-
CA 02741253 2011-05-27
296, 298 and branches that extend to the leaves that correspond to the
variants.
Accordingly, the hierarchical tree structure 250 sorts the variants of the
design space to
generate the following universe of discourse set = (Vi, V9, V17, V2, Via, V18,
V3, V11,
V19, V4, V12, V20, V5, V13, V21, V6, V14, V22, V7, V15, V23, V8, V16, V24).
[241] Referring now to FIG. 9, there is shown another example method 154b of
sorting
the variants of the design space based on the priority factors.
[242] At step 155, system 10 determines a priority order based on the relative
magnitude of the priority factors. For example, system 10 is operable to
determine the
following priority order for sorting the design variants in increasing order:
PQ (R1) > PC (R2) ' Po (RcIK)
[243] System 10 sorts the design space based on the priority order in
increasing order
from top to bottom for power consumption parameter, as described in FIG. 12.
For this
example variant n is represented as Vn = (NR1, NR2, NR3). The variables NR1,
NR2
and NR3 indicate the number of adder/subtractors, multipliers and clocK
frequencies,
respectively. Therefore according to resource constraints specified 1 <=NR1
<=3,
1 <=NR2<=4, and 1<=NR3<=2.
[244] Referring now to FIG. 10, there is shown another example sorting method
130b
for generating the universe of discourse set. This method sorts the variants
in order
(increasing or decreasing order depending on the optimization parameter) using
a
hierarchical tree structure.
[245] At step 170, system 10 is operable to locally arrange the variants of
each kind of
resource Ri in decreasing order with respect to optimization parameter. For
each kind of
resource Ri, system 10 is operable to arrange the variants in increasing order
by
number of that kind of resource from 1 to MaxNR,.
[246] At step 172, system 10 is operable to define an arrangement criterion
function for
arranging each Kind of resource. In accordance with some embodiments, system
10 is
operable to define one arrangement criterion function for use in relation to
all types of
optimization parameters. The arrangement criterion function may be based on a
value
of the optimization parameter with the maximum number of resources, a value of
the
optimization parameter with the minimum number of resources, and the number of
variants of the respective kind of resource.
-- 71
CA 02741253 2011-05-27
[247] For example, for all types of optimization parameters and for each kind
of
resource, system 10 is operable to use the following arrangement criterion
function (23):
K(Ri) = Prma.(Ri) - P"", (Ri)
rai -1 (23)
[248] Where Psmax represents the value of the optimization parameter with the
maximum number of resources, Psm,n represents the value of the optimization
parameter
with a critical variant configuration, and mi is the number of variants of
resource Ri. The
critical variant configuration may be the minimum number of resource Ri with
all the
maximum number of the all others kinds of resources-
[249] At step 174, system 10 is operable to calculate an arrangement criterion
value for
each kind of resource using the arrangement criterion function-
[250] For the power consumption parameter, the calculated arrangement
criterion are
as follows:
For the Kind of resource R1 adder/subtractor:
118.4-16. -0.80
K (R1: adder) = 3-1
where the critical variant configuration is 1 AdderlSubtractor, 4 Multipliers
and 200 MHz
clock frequency. The value Psm,n is obtained based on (1.20 t 4*100)w40 =
16,800mW
or 16.8W. The specifications of each resource may be based on table I
described herein
and the functions defining the optimization parameters.
For the kind of resource R2 multiplier-
118.4-6 . --4
K (R2: multiplier) 4-1
Where the critical variant configuration is 3 Adder/Subtractor, 1 Multipliers
and 200 MHz
clock frequency. The value P,,õn is obtained based on (31120 +
1*100)"40=6,400mW or
6.4W. The specifications of each resource may be based on table I described
herein.
For the kind of resource Rclk clock:
K (Rclk: clock frequency oscillator) = 19-4 - 4. _ 13.8
2-1
-72-
CA 02741253 2011-05-27
Where the critical variant configuration is 3 Adder/Subtractor, 4 Multipliers
and 50 MHz
clock frequency. The value Psm,n is obtained based on (3*20 +
4*100)"10=4,60OmW or
4.6W. The specifications of each resource may be based on table I described
herein.
1251] At step 176, system 10 is operable to assign a level of a hierarchical
tree
structure to each kind of resource based on the relative magnitude of the
arrangement
criterion values. For example, system 10 is operable to assign the kind of
resource with
the highest value to level 1, the kind of resource with the second highest
value to level
2, and so on. For this example, system 10 assigns resource Rclk clock to level
1,
resource R2 multiplier to level 2, and resource R1 adder/substractor to level
3.
1252] At step 178, system 10 is operable to generate a hierarchical tree
structure
based on the relative magnitude of the calculated arrangement criterion
values. FIG. 11
illustrates the resultant tree structure 250 for the power consumption
parameter. Each
leaf node (node with no children branches) represents a variant of the design
space
(partial or full) that is sorted in increasing order of magnitude with respect
to the power
consumption optimization parameter from left to right. The branches from each
level of
nodes represent the variants of the kind of resource corresponding to that
level in order
as per the local arrangement determined at step 170, namely 1 to maxNR, Each
variant
is determined based on the path from the root node at level 1 to the leaf node
that
corresponds to the variant, where each branch represents the number of a kind
of
resource that corresponds to the level of node the branch stems from in the
hierarchical
tree structure.
[253] Referring now to FIG. 13, there is shown a universe of discourse set 400
generated by system 10 by sorting variants of the design space as described
herein-
12541 Referring back to FIG. 4, after system 10 generates a universe of
discourse set,
at step 132, system 10 is operable to assign a membership value to each
variant in the
universe of discourse set.
[255] For this example power consumption optimization parameter, system 10 is
operable to calculate a membership value (r) for each variant of the universe
of
discourse set created based on the function (1):
IC _ x-a
fl -a (1)
-73-
CA 02741253 2011-05-27
[2561 Where `x' is the position of the variant in the universe of discourse
set; 'T'
represents the assigned membership value of the variant which is the x4'
position in the
universe of discourse set; 'a' and 'R' are the order of the first element and
the last
element in the universe of discourse set, where 'a' is equal to 1 and 'P' is
equal to the
total number of variants in the universe of discourse set.
[2571 System 10 assign the calculated membership to the variants in the
universe of
discourse set, where one membership value is assigned to each variant.
[258] Referring back to the example for the power consumption optimization
parameter, the following is an example universe of discourse set:
{ Vl' Y9 V17 V2 V10' Y18' V3' Vll
V19'V4'V12'V20'V5'V13'V21'V6'
V14'V22'Y7'V15'Y23'Y8'V16'724
(259] For each variant in the universe of discourse set, system 10 calculates
a
membership value (T) using the function described above. For example,
membership
value (T) for:
(Position 1) variant 1: r = x - a = 11 =0
.6-a 24-1
0.043
(Position 2) Variant 9= 't x a 1--l = = -a 24-1
(Position 3) Variant 17: x = , a = 2411, 0.067
(Position 4) Variant 2: r = x - a = 411 = 0.130
/3--a 24-1
(Position 5) Variant 10: z - fix---a = it = 0.174
--a T4 -
74--
CA 02741253 2011-05-27
(Position 23) Variant 16: = x-a = 23-1 =
-a 24-1
0.957
(Position 24) Variant 24: T = = 1
-a 24-1
[2601 After calculating a membership value for each variant in the universe of
discourse set, system 10 assigns a calculated membership value to each
corresponding
variant, and for this example will obtain the following-
0 0.043 0.087 0.130 0.174 0.217 0.261 0.304
W V9'V17' V2'V10'V18' V3'V11
0.348 0.391 0.435 0.478. 0.522 0.565 0.609 0.652
V19 V4 ' V12 ' V20 V5 ' V13 V21 ' V6
0.696 0.739 0.783 0.826 0.870 0.913 0,957 1
V14 ' V22 ' Y7 ' V15 ' Y23 ' V8 ' V16 ' V24
[2611 For this example, the variants of the design space shown are sorted in
increasing
order of magnitude from the north extreme to the south extreme. This sorted
arrangement helps system 10 to search the universe of discourse set to
identify the
border variant for the power consumption parameter. The assigned membership
values
indicate the position of each variant in the universe of discourse set. For
example, the
first variant (position=1) in the sorted design space is the first element of
the fuzzy set
(the set of calculated membership values assigned to the variants of the
universe of
discourse set), the second variant (position-2) in the sorted design space is
the second
element of the fuzzy set and so on, until last variant of the sorted design
space is the
last element of the fuzzy set. The universe of discourse with the assigned
membership
values can be represented by the set shown above-
[2621 At step 134, system 10 is operable to determine a satisfying set of
variants (or
vectors) for the power consumption optimization parameter by determining a
border
variant. System 10 is operable to identify the border variant by conducting a
fuzzy
search of the universe of discourse set using the assigned membership values.
-75-
CA 02741253 2011-05-27
(263] Referring to FIG. 5, there is shown a method 134 for conducting a fuzzy
search
of the universe of discourse set using the assigned membership values.
[264] At step 140, system 10 calculates the initial membership value (T n.)
based on the
minimum and the maximum value of power consumption optimization parameter.
(265] In accordance with some embodiments, system 10 calculates the initial
membership value based on equation (5) described above, and reproduced below:
VBoraer - Min
Max - Min (5)
(266] According to the specification provided for power consumption, Veorder =
8 W (the
constraint value for the optimization parameter), the minimum value is 1.2 W
and the
maximum value is 18.4 W are the calculated (according to equation 15) minimum
and
maximum values of the variants with minimum and maximum resources
respectively.
1267] The results of steps 140 to 154 of FIG. 4 are shown in Table I below.
Table I The v4nallt3 obtained for power consumption after apply me Fuzzy
search method on the arranged design space
Calculated Var.ar~ts
Equations for obtaining the were iponding in the Power Consumption value
Decision based on the
4alculated membership values membership set according to the or the variant
Vaw,~,
valuc~(r) calculated 't'
2-1.2 ((I.20)t(4*100))'I0mW Vs õk
18.4-1.2 T,, = 0.395 0.387N4 4.2 W search down to the
= de
1-0.387 18.4-4.2 ((2'20M1.100))*40mW P ` V5QZaa,
T ,g T 8 - 4.2 =a = 0.531 0.565~V 13 = 3.6 W search down in the
a - dese a Spam
1-0. 65 _ }8.4 5.6 120)+(24 = VB4
({! 100}} 40mW
U 646 U 642N0 search up in the.
TB-0.565- 8-56 to= =SttW
desi n aye
0. 21
0 - 0.642 1.2-S-8
r 0.642 8-8.8 rp = 0.574 {Since V t3 has beau ((3'20)t'( *65))*4 rnW = yto
e - cl.eelc4) 6AW p
[268] System 10 implements the fuzzy search to find the border variant in four
comparisons based on the calculation of power consumption using equation (15).
The
border variant obtained is variant 21. This value indicates the last variant
in the universe
of discourse set that satisfies the constraint value for the power consumption
optimization parameter (Vsoraer)=
[266] Referring back to FIG. 3, at step 108, system 14 now determines a
satisfying set
of variants for the other optimization parameter, time of execution.
-76-
CA 02741253 2011-05-27
12101 Referring bacK to FIG. 4, alt step 130, system 10 generates a universe
of
discourse set for the time of execution optimization parameter. System 10 is
operable to
generate the universe of discourse set by sorting the variants of the design
space using
various sorting methodologies.
[271] For example, FIG. 7 illustrates one example sorting method 130a for
generating
the universe of discourse set.
[2721 At step 150, system 10 is operable to define, for the time of execution
optimization parameter, a priority factor function for each kind of resource.
[273] As described above and reproduced below for ease of reference, as a
specific
example of such a priority factor function for the time of execution of the
resources
R1,.. _ _ - -Rn that is not a clock oscillator, system 10 is operable to use a
priority factor
function of the form:
PF(Rn)
Nlf,
(2741 System 10 is operable to use a priority factor function for resource Ri
that is a
clock oscillator of the form:
PF(Rclk) Rc'Ik flrt[
J Nk,u
[2751 At step 152, system 10 calculates a priority factor for each resource
using the
priority factor functions. Specifically, system 10 calculates the following
priority factors:
For resource addertsubtractor (Ri):
T'- " Ta P' - (40.03 - 40.03)
PF(RO = ---~- =0
NR4 3
For resource multiplier (R2):
PF(R2) _ TN' ' - TR '' = (100.01- 40.03) a 14.995
NRZ 4
For resource clock oscillator (RG,k):
T R - = (160.09 - 40.03)
PF{Rclk) ~ N z = 60.025
Ra
-77-
CA 02741253 2011-05-27
(282] Referring now to FIG. 9, there is shown another method 164 of sorting
the
variants of the design space based on the priority factors.
[2831 At step 158, according to the above calculated priority factors, system
10
determines the following priority order by arranging the variants in
increasing order-
6 PO (R1) PO (R2) > PO (RGIK)
12841 At step 160, system 10 using the priority order to sort the design space
in
decreasing order to generate the universe of discourse set 402 shown in FIG.
14.
12851 Referring now to FIG. 10, there is shown another example sorting method
130b
for generating the universe of discourse set.
1286] System 10 is operable to implement steps 170 and 172 as described above
in
relation to the power consumption optimization parameter- That is, system 10
is
operable to use the same arrangement criterion function for both optimization
parameters (power consumption and time of execution).
[2871 At step 174, system 10 is operable to calculate the following
arrangement
criterion values for each Kind of resource,
For kind of resource R1 adder/subtractor_
40-02 - 40-02 .0
K (R1) = 3-1
Where the critical variant configuration is 1 Adder/Subtractor, 4 Multipliers
and 200 MHz
clock frequency. The value Psmin is obtained based on (12i-(1000-
1)'8)1/200=40.02us
(from (ki (N-1)*Tc)Tp)- The specifications of each resource may be based on
table I
described herein.
For Kind of resource R2 multiplier.-
140.02-100,011 .19.99
K(R2)= 4-1
Where the critical variant configuration is 3 Adder/Subtractor, 1 Multipliers
and 200 MHz
clock frequency. The value Psmin -is obtained based on (22+(1000-
1)*20)`1/200=100.01us (from (1-+(N-1) Tc)*Tp). The specifications of each
resource
may be based on table I described herein
-79--
CA 02741253 2011-05-27
For kind of resource Rcik clock:
40.02 -160. -12002
K (Rclk) = 2-1
Where the critical variant configuration is 3Adder/Subtractor, 4Multipliers
and 50 MHz
clock frequency. The value Psm,n is obtained based on (10t(1000-
7)`8)"7/50=160,04us
(from (L-*(N-1)"Tc)'Tp). The specifications of each resource may be based on
table I
described herein.
[288] System 10 is operable to implement steps 176 and 178 as above for power
consumption optimization parameter in order to generate the hierarchical tree
structure
250 shown in FIG. 11. As noted above, this is the same tree structure 260 used
for the
power consumption parameter as well (and the other sorting method based on
priority
factors) because the King of resource clock is also assigned level 1,
multiplier is
assigned level 2, and adder/substractor is assigned level 3 based on the
arrangement
criterion values.
[289] Referring back to FIG. 4, after system 10 generates a universe of
discourse set
for the time of execution optimization parameter, at step 132, system 10 is
operable to
assign a membership value to each variant in the universe of discourse set.
[290] System 10 is operable to calculate a membership value (T) for each
variant of the
universe of discourse based on the function: x - x
a-fl
[291] Where 'x' is the position of the variant in the universe of discourse
set; 'T'
represents the approximated membership value of the variant which is the x'"
position in
the universe of discourse set, 'a' and '[3' are the order of the first element
and the last
element in the universe of discourse set, thus 'a' is equal to 1 and 'P' is
equal to the total
number of variants in the universe of discourse set.
[292] System 10 then is operable to assign the calculated membership values to
the
corresponding variants in the universe of discourse set for execution time.
[293] The following is an example representation of the universe of discourse
set;
-60--
CA 02741253 2011-05-27
t V1'Y9'Y17'Y2'V10'Y18'Y3'Y11'
V19'V4'V12'V20' V5'V21' V6' tt
V14'V22'V7'V15'V23'78'V16'724
(2941 System 10 is operable to calculate the following membership values (T)
to be
assigned to each variant of the universe of discourse set:
(Position 1) V I. T = L-,# 1-24 1
a-fl 1-24
(Position 2) V9: T = a = T = 1- 24
24 0.957
-24
24 = 0.913
(Position 3) V 17. T m X T = 1-24
a-P x-~
(Position 4) V2; T 4 - 24 = = T = = 0.870
1-24
(Position 5) V10: T= x~f =T= 5224=0.826
a-~ 1--24
......................................................
x-~ _
(Position 23) V16: T 23T24
= a _ = 1 _ 24 = 0.043
x - = 2`1;24= 0
(Position 24) V24. 'C = cx + ~ 1-24
12951 System 10 then is operable to assign the calculated membership values to
each
corresponding variant of execution time optimization parameter, to obtain the
following
representation of the assigned membership values and the universe of discourse
set,
-81
CA 02741253 2011-05-27
1 0.957 0.913 0.870 0.826 0.783 0.739 0.696
Vl' V9 V17' V2 ' VIO' V18' V3 ' VII
0.652 0.609 0.565 0.522 0,478 0.435 0.391 0.348
V19' 74 ' V12' V20' V5 ' V13'V21' V6
0.304 0.261 0.217 0.174 0.130 0.087 0,043 0
V14 ' Y22 ' V7 ' V15 ' V23 ' V8 ' 1 7 2 4
[296] At step 134, system 10 is operable to determine a satisfying set of
variants by
determining a border variant. System 10 is operable to determine the border
variant by
'rabic 11. The variants obta(ncd for %mccunan umc after applylug Fuzzy sc :h
method on the arTangcd clcsign space
C41cW arc Varwwls cbrh,4wa tog in
F bgac 1 o
M Inc upWuupg t(i~ c41cu141ca [kcrsron
"lwur rrumbetslap the Sc; accor4mg to Itlc E wcuuon rime
menrbcr inp values rir(uaF(T) ew,:,44ed 'T'
140-40.02 (16 +(1000-1)*12)
T,,,, 0.277 0 261/V22 O 005 = iexrCh up in the
400.04 -40.02 0 40.02 _ > dcsr n V CC
1-0.261 400-04-60-02 r - U 433 0.43 HIV 13 r0=or3 (22 x(1000-1)'20) S ~r,:h up
aw,
, III The
T.0-0.261 140-60-02 40.005 =100.01 s do-10 !Ma.
1-0-435 400-04-100-01 20=(j2-(i000-j)-$) T ' a*,aa ,
T 0.510 0 522/V20 T= q starch down in
re -0.435 _ 140-100.01 n 60.02 =160.08s
the tltaigu Spur
1-0.522 _ 40-02-160-08 0.47 IV5
T 0.522 140-160.08 xa=0.435 (Since V13 has lien 1 , =(22+(1000.1)+20) Sto
a - checked so according to *0.005 =100 Ul p
the algorithm check V5)
conducting a fuzzy search of the universe of discourse set based on the
assigned
membership values.
[297) Referring now to FIG. 5, there is shown a method 134 of determining a
satisfying
set of variants by conducting a fuzzy search of the universe of discourse set
based on
the assigned membership values.
[2991 At step 140, system 10 calculates the initial membership value (T,,,,)
based on the
minimum value and the maximum value of time of execution optimization
parameter
using the above described equations (5) (10) and the following values: VBorder
(the
constraint value for the optimization parameter) = 140 .is, minimum value =
40.02iis,
maximum value = 400.041 ,s (calculated using equation 10). In the universe of
discourse
set 402 shown in FIG. 14, variant 'V1' and variant 'V24' correspond to the
minimum and
maximum variants respectively.
[2991 The results of steps 140 to 154 of FIG. 4 are shown in Table 11 below.
-82-
CA 02741253 2011-05-27
[300] The proposed fuzzy search approach determines the border variant in four
comparisons by analyzing the execution time according to equation (10). The
border
variant obtained is variant 5. This value indicates the first variant in the
design space
which satisfies the constraint value for execution time optimization parameter
(Vsoraer).
(3011 Referring back to FIG. 2. after determining a satisfying set of variants
for each
optimization except the final optimization parameter, at step 110, system 10
is operable
to determine a set of variants based on the intersection of the satisfying
sets of variants.
For this example, each variant of the set of variants should simultaneously
satisfy the
constraint values for both power consumption and execution time. For this
example, the
set of variants includes three variants V5, V13 and V21 that are common to
each
satisfying set of variants for the optimization parameters power consumption
and time of
execution. Hence these three variants (V5, V13 and V21) simultaneously satisfy
both
power consumed and execution time constraint values.
[3021 At step 112, system 10 is operable to select a variant from the set of
variants
(V5, V13, V21)_ System 10 is operable to select a vector by sorting the
variants in the
set of variants, ranking the vectors or otherwise evaluating the variants from
the set of
variants in order to generate an ordered list of variants. System 10 is
operable to select
a variant based on a position of the variant in the ordered list of variants.
(3Q31 In accordance with some embodiments, system 10 is operable to select the
vector by defining a priority factor function for the final optimization
parameter, which for
this example is the hardware area of the total number of resources. System 10
is
operable to use the priority factor for area defined by equations (8) (9)
above- System
10 is operable to compute a priority factor for each kind of resource, which
is R1
adder/subtractor, R2 multiplier, RclK clock for this example. After computing
the priority
factors for each resource, system 10 is operable to determine a priority order
by sorting
the calculated priority factors in increasing order. Specifically, system 10
determines the
following priority order: PO (Rclk) > PO (R9) > P(? (R2). System 10 is
operable to use
the priority order to sort the variants V5, V13, V21 of the set of variants in
increasing
order of magnitude.
-33--
CA 02741253 2011-05-27
13041 System 10 is operable to select variant number 'V5' of the set of
variants based
on the priority order- Variant 5 is the variant that concurrently satisfies
the multiple
constraint values for the optimization parameters hardware area, power
consumption
and time of execution.
[305] At step 114, system 10 is operable to provide the selected variant 5.
Alternatively
or additionally, at step 116, system 10 is operable to develop a system
architecture
based on the selected variant 5.
13081 System 10 is operable to determine and demonstrate the final variant
(which
satisfies all the three optimization parameters constraints values) from the
intersection
set of vectors. In accordance with some embodiments, system 10 is operable to
determine the selected variant according to the following summary algorithm-
1 _ Determine the Border variant of power consumption and form the satisfying
set of
power consumption. In the example, based on the user constraint of BWatts, the
border
variant obtained was V21.
The satisfying set (A) was = {V1, V9, V17, V2, V10, V18, V3, V11, V19, V4,
V12, V20,
V5, V13, V21)
2. Determine the Border variant for execution time and form the satisfying set
for
execution time. In the example demonstrated in the paper, based on user
constraints of
140us, the border variant obtained was V5.
The satisfying set (13) was = {V5, V13, V21, V6, V14, V22, V7, V15, V23, V7,
V15, V23,
V8, V16, V24}
3. Find the Intersection Pareto set which is the intersection of set A and B.
The Pareto
set (P) was T {V5, V13, V21}
4. Calculate the Priority factor of different resources available for hardware
area
(equations 9-12) which in this case is the third optimization parameter.
PF(Rl)=A--'.KRI _ (3-1),20 ¾13,33
1Vn, 3
PF(R2)= n2 K1 =(4-1)-100 _75
4
84 -
CA 02741253 2011-05-27
PF(Rclk) = M(Rclk) 1 4 3
NR.4r
5- Obtain the priority order (PO) for the above resources. Since the resource
which has
the highest
PF is given the lowest priority hence, PO of R2 is least while PO of RclK is
highest.
Hence, the following PO sequence is obtained:
PO (Rclk) > PO (RI) > PO (R2)
6. Construct the architecture vector design space for hardware area using the
PO.
7. Assign a pointer 'p' which indicates the position of the three variants
(V5, V13, V21)
obtained above.
8. Find the pointer which points to the least position p and yield the variant
corresponding to that position as the final variant for the exploration
process. The
pointer pointing to the least position indicates the variant with the minimum
hardware
area (among the three variants), since the design space for hardware area is
arranged
in increasing order from top element to bottom element. Therefore, V5' in
position p, 2
is the final best variant of the exploration process.
[3071 Referring now to FIG. 15, which shows a flow chart of a method 500 for
developing a system architecture using the selected vector.
[3081 At step 502, system 10 generates a sequencing graph and a binding graph
based on the selected vector. System 10 uses the aid of a sequencing graph and
a
binding graph to represent the combination of a number of each Kind of
resource
specified by the selected vector in the temporal and spatial domain. The flow
of data
elements through different operators in the data path can be visualized with
the help of
sequencing graphs. This graphical representation of the application can
distinctly
underline the operations in discrete time steps while maintaining the
precedence
constraints specified. Referring now to FIG. 16, which shows a sequencing and
binding
graph 60 for the adderlsubtractor (R1) 76 and the multiplier (R2) 78 specified
by the
85,
CA 02741253 2011-05-27
selected variant V5 for time slots TO 62, Ti 64, T2 66, T3 68, T4 70, T5 72,
T6 74, in the
context of the example problem.
1309] Scheduling is a process that states the time slot for every operation
while fixing
the timing length (latency) in such a manner so that a synthesized hardware
structure
meets the timing restriction specified. A classical example of time constraint
scheduling
where the scheduler must achieve the goal with a minimum number of functional
units
possible to realize the behavior is shown. The scheduling of operations may be
performed based on the As Soon As Possible (ASAP) algorithm. Though many
algorithms may be used for scheduling operations such as the As Late as
Possible
(ALAP), List scheduling, Force Directed scheduling, ASAP, and so on. ASAP was
selected because the operations should be done as soon as the resources R1 76
and
R2 78 become free. As the processed data is ready the prepared data from the
previous
stage is used for the next operation. The binding graph will be used in
further design
stages to realize the function used as a benchmark application for
demonstration of the
optimized high level synthesis design flow.
1310] Referring back to FIG. 15, at step 504, system 10 adds data registers to
the
sequencing and binding graph 60. In terms of architectural synthesis and
optimization a
circuit is generally specified by the following three dimensions. First a
sequencing
graph, second a set of functional resources described in the form of area and
latency
and, finally the operating constraints. The function of registers is to
perform data storage
and the wires interconnect the different discrete components. Referring now to
FIG. 17,
which illustrates the sequencing graph 60 with data registers 80. In the
sequencing
graph 60 of this example, Register P 80 has been added in time slot T2 66
because the
results of the multiplier (R2) 78 at time slot Ti 64 are not used until time
slot T3 68. The
latency for the function is calculated as 22 clock cycles. Referring now to
FIG. 18, which
illustrates a graph a1 for the cycle time calculation for the combination of
resources
specified by the selected vector, and in particular a multiplier 53 and an
adder/subtractor 56.
(311] Referring back to FIG. 15, at step 506, system 10 determines a
multiplexing
scheme for the resources specified in the selected vector The binding of the
resources
as shown in FIGS. 19 and 20 enables a methodology to be formalized that
incorporates
-86-
CA 02741253 2011-05-27
the multiplexers and demultiplexers into the data path circuit of the system
architecture.
The multiplexing scheme is an important stage in high level synthesis design
flow
Multiplexing scheme is a procedure for representing each system resource with
respective inputs, outputs, operations, time steps and the necessary
interconnections.
Multiplexing scheme highlights the actual usage of resources by the operands
at
different times while strictly adhering to the data dependency present. The
illustration
provides a guide for developing system block diagrams before developing the
control
unit structure for the data path- This scheme prevents any errors in the final
hardware
structure that could result in catastrophic consequences. The control unit is
responsible
for the coordination of the data path of the system. Multiplexers and
demultiplexers can
be constructed and assigned to their respective inputs and Outputs based on
the
multiplexing scheme, keeping in mind the dependency of the data. In this
example, two
functional resources one adder/subtractor R1 and one multiplier R2 perform
different
functions for the circuit. A multiplexing scheme table for each of the above
mentioned
resources is shown in Tables 3 and 4 respectively.
Table 3 Mulrip1exin83eheMe for Adder-:wburactorresoueCe(RI)
Time O uafOf Input l 14 uc 2 Ou ut
0 ------- ....... .......
l -- ------- -------
--- R2Out Rc8P
S Rate Riout Mat
R2 Rlaai Rtin
- 112 out Riout Riiu
Ct - ------ ,.. Regy
7 ---=-- == ..
I,4-fulriptexurg scheme for Multiplier resource (R2)
II~r C1 xanou IO ur 1 z 2 4u t
0 - R.__ (n) R (n-1) Re- O Reg?
2 R6 (A-2) RcgE Rliu
3 Arg,~, $g(n-3) Rlin
4 R~$C R (s .2) Rlw
=----- ------ Rliu
6 ------
-87-
CA 02741253 2011-05-27
f3121 System 10 is operable to develop the multiplexing scheme table (MST)
from the
scheduling step (Sequencing Graph with data registers) by implementing the
following
algorithm.
S Create a table with 5 columns and n rows (where n = number of time steps in
the
sequencing graph)-
The 15` column = the time step, 2'0 column = operation, 3`a column = Input 1,
4`4 column
Input 2 and 5m column = Output.
Repeat for i=0toN
f
For time step i (O-,=i<=N, where N is the total number of time steps in the
sequencing
graph), check if an operation exists. There could be the following three
conditions:
if an (operation exists in the current time step and there also exists an
operation in the
next time step) then assign the respective operation for the current time step
i in the
column specifying the operation of the MST and assign input 1 and input 2 of
next
operation which would be used in the next time step (i*1) in the columns
specifying the
inputs of the MST. Finally, assign the output of the current operation in the
column
specifying the output of the MST
Elseif, (there exists no operation in the current time step and there exists
no operation in
the next time step), then assign "-for the operation in the column specifying
the
operation of the MST, assign "-" for the input 1, assign "" for the input 2 in
the columns
specifying the inputs of the MST and assign "-" for the output of the current
operation in
the column specifying the output of the MST.
Elseif, (there exists no operation in the current time step but there exists
an operation
for the next time step), then assign "-"for the operation in the column
specifying the
-88--
CA 02741253 2011-05-27
operation of the MST, assign respective inputs for trie input 1 and input 2
which would
be used in the next time step (i*1), in the columns specifying the inputs of
the MST.
Also, assign "-A for the output of the current operation in the column
specifying the
output of the MST.
Elseif, (there exists an operation in the current time step but there exists
no operation
for the next time step), then assign the respective operation for the current
time step i in
the column specifying the operation of the MST, assign '-"for the input 1,
assign =" for
the input 2 in the columns specifying the inputs of the MST (since there is no
operation
in time step i+1, hence no inputs are prepared). Finally, assign the
respective output of
the current operation in the column specifying the output of the MST.
}
STOP
(3131 System 10 is operable to implement the algorithm mentioned above to
create
table 3 (multiplexing table for adder/subtractory
Create a table with 5 columns and n rows (where n = number of time steps in
the
sequencing graph).
The 15' column = the time step, 2" column = operation, 3r" column = Input 1,
4'1' column
= Input 2 and 54' column = Output.
At time step 0 of the sequencing graph, since there is no operation, hence,
assign "-" for
the operation, "-" for the input 1, -" for the input 2 and 02 for the output
in the
multiplexing scheme table for 151 row.
At time step 1 of the sequencing graph, since there is no addition operation
as well,
hence, assign "= for the operations, "-" for the input 1, "-"for the input 2
and '-"for the
output in the multiplexing scheme table for 2"' row.
-89-
CA 02741253 2011-05-27
At time step 2 of the sequencing graph, since there is no addition operation
as well,
hence, assign %' for the operations. But at this time step, the data for the
addition
operation in the next time step (time step 3) is getting ready hence assign
"R2out" for
the input 1, "RegP" for the input 2 and for the output (as there is no
operation
performed in this step), in the multiplexing scheme table for 31 row.
At time step 3 of the sequencing graph, since there is an addition operation,
hence,
assign "t" for the operations. Also, at this time step, the data for the next
addition
operation for the next time step (time step 4) is getting ready hence assign
"R2outA for
the input 1, "R1out" for the input 2. Now the result of current addition at
time step 3 is
fed to next adder (Minded as resource R1). Therefore assign "R1 in" for the
output, in the
multiplexing scheme table for 41" row.
At time step 4 of the sequencing graph , since there is an addition operation,
hence,
assign "t" for the operations. Also, at this time step. the data for the next
addition
operation for the next time step (time step 5) is getting ready hence assign
"R2out" for
the input 1, "R1 out' for the input 2. Now the result of current addition at
time step 4 is
fed to next adder (binded as resource R1). Therefore assign "R1 in" for the
output, in the
multiplexing scheme table for 5`" row.
At time step 5 of the sequencing graph, since there is an addition operation,
hence,
assign "+" for the operations. Also, at this time step, the data for the
subtraction
operation for the next time step (time step 6) is getting ready hence assign
"112out for
the input 1, "R1out" for the input 2. Now the result of current addition at
time step 5 is
fed to subtractor (binded as resource R1). Therefore assign "R1 in" for the
output, in the
multiplexing scheme table for 51" row.
At time step 6 of the sequencing graph, since there is a subtraction
operation, hence,
assign "-" for the operations. Now the result of current subtraction at time
step 6 is fed
to output register Y. Therefore assign "Reg Y" for the output, in the
multiplexing scheme
table for 6'" row.
_90-
CA 02741253 2011-05-27
At time step 7 of the sequencing graph, since there is no operation, hence,
assign "-" for
the operation, "-" for the input 1, "-" for the input 2 and "-" for the output
in the
multiplexing scheme table for 7" row.
[3141 Similarly, Table 4, the multiplexing scheme table for multiplier can be
obtained-
[3151 At step 508, system 10 generates a system block diagram. After the
multiplexing
scheme has been successfully performed, the next phase of the design flow is
the
development of the system block diagram. The system block diagram comprises
two
divisions, data path circuit and the control unit. The data path is
responsible for the flow
of data through the buses and wires after the operations have been performed
by the
components present in the data path circuit. Thus, the data path provides the
sequence
of operations to be performed on the arriving data based on the intended
functionality.
As an example, the data path can comprise registers for storage of data,
memory
elements such as latches for sinking of data in the next stage, as well as
multiplexers
and demultiplexers for preparation of data at run time by change of
configuration. The
data path circuit also consists of functional resources which are accountable
for
performing the operations on the incoming data. The block diagram for the
benchmark
application consists of two resources (an adder/subtractor and a multiplier)
for executing
their respective assigned operations. Another component of the system block
diagram
is the control unit or the controller. A centralized control unit controls the
entire data path
circuit and provides the necessary timing and synchronization required by data
traversing through the data path circuit. The control unit acts as a finite
state machine
that changes its state according to the requirement of activating and
deactivating the
various elements of the data path at different instances of time- Based on the
multiplexing scheme the block diagram of the data path circuit was constructed
to
demonstrate design flow for the benchmark application.
[316] Referring now to FIG. 19, which shows a block diagram of the data path
circuit
86 for the resources specified in the selected vector, which for this
illustrative example,
is a adder/substractor 82 and a multiplier 84.
-91-
CA 02741253 2011-05-27
13171 At step 510, system 10 generates a RTL level representation of the
system
architecture. System 10 is operable to create the RTIR data path circuit
diagram from
Multiplexing Scheme as follows.
[318] The Block diagram of the RTL data path circuit in FIG. 19 is obtained
from the
information extracted by the multiplexing scheme in table 3 and table 4 (In
the tables,
R1 = ad(ler/subtractor resource and R2 = multiplier resource) The procedure
implemented by system 10 for constructing the data path circuit of the RTL is
discussed
below;
[319] Let the number of variables available for INPUT 1 of the multiplexing
scheme
table for resource R1 be denoted as V,. Therefore, from the multiplexing
scheme table 3
for adder/subtractor. V. = 4, since there are 4 possible input variables
(R2out, R2out,
R2out and R2out) for INPUT lof adder/ subtractor.
[320] Let the number of possible variables available for INPUT 2 of the
multiplexing
scheme table for resource R1 be denoted as Vs,- Therefore, from the
multiplexing
scheme table 3 for adder/subtractor, V, = 4, since there are 4 possible input
variables
(RegP, Riout, R'lout and Riout) for INPUT 2 of adder/subtractor.
[321] Rased on the value of V,, = 4, a 4-bit multiplexer (MUX 1) component is
adopted
from the module library- The inputs to the 4-bit MUX would be the 4 possible
variables
acting as inputs for INPUT 1 which are in this case R2out, R2out, R2out and
R2out as
mentioned in step 1
[322] Similarly, based on the value of Vy = 4, a second 4-bit multiplexer (MUX
2)
component is again adopted from the module library. The inputs to this second
4-bit
MUX would be the 4 possible variables acting as inputs for INPUT 2 which are
in this
case RegP, R1 out, Riout and R1 out as mentioned in step 1. Selector signals
are
assigned to each multiplexer which selects different inputs based on the
information of
the select lines.
[323] If the multiplexers obtained in step 2 are an N bit multiplexer then
input storage
elements are needed for each case to store the data from different inputs at
different
time instances.
[324] Since for the applications discussed as example, the multiplexers (MUX 1
and
MUX 2) obtained in step 2 are 4 bit Multiplexers hence there will be a sharing
of the
-92-
CA 02741253 2011-05-27
same mux unit at different time instances. This mandates the incorporation of
storage
latches to temporarily hold the data for various inputs until needed by the
next
component. Therefore, for each multiplexer formed in step 2, a corresponding
latch is
added. Strobe signals are assigned to each input latch which latches the data
when
needed. Thus this strobe maintains the synchronization process.
[3251 Elseif the multiplexers obtained in step 2 is a 1 bit multiplexer then
no input
storage element is needed in the design.
[3261 Followed by the latch component, the main functional unit has to be
added from
the library which actually processes the data based on the inputs received at
different
time instances by the two multiplexers through the corresponding storage
latches. in
this case, the outputs of two latches act as inputs of the adderlsubtractor
resource.
Hence, the same adder/subtractor resource performs the same functional
operation but
on different inputs at different time instances (as received from the
latches). Enable
signal is assigned to functional unit (resource) which activates the resource
when both
the inputs are ready. Thus enable also maintains the synchronization.
[3271 Now since the same adderlsubtractor resource performs the same
functional
operation but on different inputs at different time instances, hence an output
storage
latch needs to be incorporated in the data path unit of the RTL circuit. This
output
storage latch holds different data from the functional resource based on the
different
outputs processed by the functional unit. Output Strobe signals is assigned to
the output
latch which latches the data when needed. Thus this strobe is also responsible
for the
synchronization process.
[328] If an output storage latch is present in the data path unit, then a
demultiplexer
has to be added in the data path unit. This is because, based on the data
stored by the
output latch due to different output processing from functional unit, a
structure is needed
that can produce the output of all the data from the latch through parallel
wires. Hence
an N-bit demultiplexer is needed- In the case of example discussed so far, the
value of
the N bit width of the demultiplexer = value of N bit width of any input
multiplexer. De-
selector signal is assigned to the demultiplexer which outputs the different
results of the
latch through different wires.
-93-
CA 02741253 2011-05-27
Table 5.
Adder (R 1) MulupUer CR2} Strobes
(3291 Flseif an output storage latch is not
n s o a a w. ; K present in the data path unit then no
g - R
0 t1 0 0 04 0 0 0 o naw 5011 0 0 0 0 demultiplexer needs to be added in the
data path
000 1 ' unit.
[3301 This process is repeated for all the
1 001 000 3
0 4 multiplexing tables developed in the design
s process. In this case, the steps from 1-6 was
6 repeated for multiplexing table for multiplier
7 resource (R2).
0 , 3
(331) Once all the connections mentioned in step
1 4
00 l a 010 U01 10 1-7 for the discrete components for a specific
0 ' l resource is complete, then the outputs of each
t2 resource stage are connected to the inputs of the
93 other resource stage based on the information
0 , 14
1s present in multiplexing scheme tables. For
1 0 lei 00 1 0 011 u10 16 example, in this case, input 'R 1 out' of resource
17 R1 results from the output 'R1 in' of the same
0 0 is
114 resource. Also, output 'R1 in' of the second
0 , u 1 20 resource R2 acts as the input of the first resource
1 21 R1 in the form of 'R2out' through the second
1 0 0 10 01 1 0 1 011 22 MUX. Similarly, using the information from the
u 0 23
44 multiplexing tables, the interconnected
25 components of each resource is connected
0 1 0 1 26 among each other to obtain a circuit in FIG. 13.
'-' Referring pack to FIG. 15, at step 512, system 10
1 0 0 1 10 1 - 100 26 generates a control timing sequence, also
0 0 ,q
30 referred to herein as a control unit. The next
31 design stage is the development of the control
0 1 0 ' 32 unit structure, which is accountable for any mis-
33 coordination in timing among the various
1 1 U 11 34
1 1 35 elements of the data path. The function of the
U , 36
0 1 3'7 _
CA 02741253 2011-05-27
controller is to activate and deactivate the different elements of the data
path based on
the timing specification determined for the objective function. The control
unit prepares
the data path units for the incoming data by changing the configuration to
perform the
next assigned function. For synchronous functioning of all data elements in
the system
the controller must respond to the requirement at exactly the right moment.
Failure to
activate or deactivate any functional block in the data path will result in
fatal
consequences in the system output. The determination of the timing
specification from
the control unit helps to create an error free structure of the controller. As
an example,
VHDL was used here as the hardware description language for designing the
control
unit, but other hardware description languages may also be used. The timing
specification data shown in Table 5 is developed with clock cycles placed in
the Y-axis
and the control signals placed in the X-axis. At every count the transition of
the different
control signals can be clearly observed. This facilitates in the description
of the control
structure in a hardware description language.
[3321 System 10 is operable to implement the following procedure for
determination of
the controller table.
[3331 The procedure for determination of controller table is to identify the
various
control signals that control the different components of the data paths. With
the
developed block diagram in the previous step (see the algorithm for
development of
block diagram of the data path from multiplexing scheme table), there can be
'm' # of
control signals depending on the complexity of the data path unit
architecture. For the
example used, after development of the block diagram of the data path unit for
the
example as shown in FIG. 19, the control signals for the different components
can be
extracted. Examples of control signals are:
Latch strobe - Control signal of input latch for adder/subtractor (R1)
enable add_sub - Control signal of resource for adder/subtractor (R1)
Output strobe- Control signal of output latch for adder/subtractor (R1)
Selector - Control signal of multiplexer for adder/subtractor (R1)
Deselector- Control signal of de-multiplexer for adderlsubtractor (R1)
Latch strobe - Control signal of input latch for multiplier (R2)
-95
CA 02741253 2011-05-27
enable multiplier - Contra{ signal of resource for multiplier (R2)
Output strobe- Control signal of ot.ttput la-t4-_n far multiplier (R2)
Selector - Cantrol signal of multiplexer for multiplier (F 2)
pBSetactor- Control signal of c7e-multiplexer far multiplier (P.2)
Stobes~register - Control signal of Reg x(n-1). Rag x(n-2). RBgA, Rexg(-. Reg
x(n),
Re1gI3. Reg x(n-3). Rea y(n-2)_
lastrobe_ragP - Control signal of RegP.
Stroba_regY- Control signal of output register Y.
1 When count0, iP (clod: event and clock== 1 =) then
if (reset-'O=) then
if co04at=0 they
1.1 strobes' O'; --initialize strobe for the registers
1-2 Iatch_strobs_R2 0;---initiali~ the latch strobe
1 .3 larch-Qtr-beR l = O, -initialize the latch strobe
1.4 output,-strobe. R2 --0; ---
1.5 output_strobc_RI
1 .6 strebe_regY -0; ---initialize
1 .7 I>strobc.regP -0; ---inItieIizc
1.8 Increment count;
2. When count=-1 then
strobes-9" 1
2-1
2.2 selector .2~"'000'=;
2 3 Increment count;
3. When cotanw-2 then
3-1 busy- ! =:
3.2 Ie;ch_suobe_R2 ' 1';
3.3 Increment count;
4. W hen count-3 then
4.1 eneble_1 2n-'1';
4.2 selector R2o ==OO I ";
4.3 Dcsclector_R2c===000'=
4.4 increment count;
5. When count=4 then
5.1 laten_strobe_R2 U=---resenting
5.2 Increment count;
6. When count-5 then
6.1 increment count;
_ gg-
7- When count-6 the-
7-1 Increment co.xnt:
8_ When count=7 then
8-1 OurpuT_strobe_1Z2-t== 1';
8.2 Increment count;
When cosant~8 then
9-1 enable R2-n 0';---resetting the value
9.3 l~strnha_regPn 1';
9_3 lncr'emarit count;
l_ Wnen cottrO 9 then
10.1 latch sTrebe_R2K 'I';
10.2 Increment count;
I 1 When counnl0 then
ti .1 enable_ t2<~'l';
1 1 2 snlactor_R2c===ol0";---- preparing for the next input
1 1.3 selector R1 - =='OU='; --preparing for the next input
1 1.4 l7eseIecter_9.2-e="001 =';
11.5 outpu!-strone_R2<==O';----resetting
11-6 Increment count:
12 When count=II the-
12-1 laic n=-stro ---- reserving
12-:2 Increment count;
13. When count-12 then
13.1 increment count;
14- When count=13 than
14.1 Increment count;
15_ When cotult=l4 than
15.1 outpur_strobc_R2t='1';
15.3 enable R2-n 0'; -----resetting
4 Inc remeni count;
16. When count=l5 then
16.1 latch_strobc,R2--' l';
16.2 latch strobo_R1 en 1';
16.3 Increment count;
_97
CA 02741253 2011-05-27
17. When count-- 16 then
17.1 enable R.2< 1';
17.2 add sub<="0';-------------- for the add operation
17.3 enable_Rl <=T;
17.4 selector R2<="011 ";
17.5 selector R1<=' 01";
17.6 Deselector_R2<="010";
17.7 Deselector R1<="00";
17.8 output-strob, ,-R2<='0'; ---resetting
17.9 Increment count;
18. When count- 17 then
18.1 Increment count;
19. When count=18 then
19.1 latch strobe_R2<='0;----resetting
19.2 latch-strobe-R1<='O';----resetting
19.3 Increment count;
20. When count=19 then
20.1 Increment count;
21. When count=20 then
21.1 output_strobe_R1 <_' 1;
21.2 ouiput_strobe-R2<='1 ;
21.2 enable R2<='0; -----resetting
21.3 enable-R1<;--'O'; -----resetting
21.4 Increment count;
22. When count=21 then
22.1 latch_strobe_R2 {=' 1';
22.2 laich_strobe_R1 ='1
22.3 Increment count;
23. When count=22 then
23.1 enable - R2<7--'I',
23.2 add_sub<='0';
23.3 enable-R1<--'I';
23.4 selector R2<=" 100";
23.5 selector RI <_" 10";
23.6 Deselector_R2<="O 1 l ";
23.7 Deselector_R1 <="OI ";
23.8 output_sirobe K2='0'; ---resetting
23.9 output strobe_Rl<='[1';---resetting
23.10- Increment count;
-98-
CA 02741253 2011-05-27
24. When count=23 then
24.1 latch strobc_R2<='0';---- resetting
24.21atch_strobe_R1<='O'---- resetting
24.3Increment count;
25. When count=24 then
25.1 Increment count;
26. When count=25 then
26.1 Increment count;
27. When count=26 then
27.1 output_strobe_Rl <=' 1';
27.2 otttput_strobe R2<=' 1;
27.3 enable R2<='0'; ----resetting
27.4 enable__R1<='0'; -----resetting
27.5 increment count;
28. When count-27 then
28.1 latch ,-strobe_R2<=' 1';
28.2 latch -strobe _R I <_' I';
28.3 Increment count;
29. When count--28 then
29.1 enable-R2<='1' 1';
29.2 add sub<='0';
29.3 enable RI <=T;
29.4 selector^R 1 <_" 11 ";
29.5 Drselecior R2<=" 100";
29.6 Deselector R1<."10";
29.7 output-strobe-R2 <=V ----resetting
29.8 output strobe_R1-='0',---resetting
29.9 Increment count;
30. When countw29 then
30.1 latch _strobe_R2<='0'---- resetting
30.2 latch ^strobe _RI '0'; ---resetting
30.3 Increment count;
31 When cuunt=30 then
31.1 Increment count;
32. When coun131 then
32.1 Increment count;
-99-
CA 02741253 2011-05-27
33. When count-32 then
33.1 output_strobe_R1<='l';
33 2 autput_sirobe_R2-c 1';
33.3 enable-R2<='u'; -----resetting
33.4 enable-----resetting
33.5 Increment Count,
34 When count-33 then
34 1 latch _Sirobe_R 1 < t
34.2 lncrement count;
35. When count=34 then 10
35.1 add sub<-' 1 ; -----For subtraction
35.2 enable^R1=='1 ;
35 3 Deselector_R 1 < ' 11
35.4 output_strobe~RI<='O';---resetting
35.5 IncremcnT count;
36. When count-35 then
36.1 Increment count; 15 At step 514, system 10
simulates the schematic
37_ When count structure for testing and
then
37.1 latch strobe*R <.='O',---resetting verification and then
37 2 output_srrobe_RI < 1'; implements the schematic
37.3 Increment count;
20 structure of the device may be
38. When count=37 then developed in any of the
38.1 strobe_regY<=' I';
38.2 enable-RI <='U'; ---resetting the value synthesis tools available
38.3 Increment count; Examples include Synopsys,
Xilinx integrated Software,
25 Environment (ISE) and Altera Quartus II. For this example, components in
the data path
may be described and implemented in VHE L before verification. Then, as an
example,
the schematic structure of the whole device may be designed and implemented in
Xilinx
integrated Software Environment (ISE) version 9.2i_ Referring now to FIG. 20,
there is
shown a schematic structure 93 of the whole device in accordance with an
example
30 embodiment, which is designed in Xilinx ISE 9.21 as an illustrative
example.
[3341 The described fuzzy search framework for design space exploration is
efficient in
architecture evaluation and exploration time. Example sizes of the design
space
consisting of variants for the benchmarks are indicated in Table III. For
example, the
-100-
CA 02741253 2011-05-27
total number of variants in the design space for Discrete Wavelet
Transformation
(DWT)is 288, while on the other hand, the total number of variants of the
design space
for EWF and FIR are 450 and 1200 respectively. The results of the comparison
of the
proposed design space exploration process with exhaustive analysis are shown
in
Table III. Results indicate that the proposed approach may be capable of
achieving
speedup compared to the exhaustive search. Speed up of up to 92.70 % is
achieved for
the DWT high level synthesis benchmark. Moreover speedup of 94.22 % and 97.75
%
for EWF and FIR benchmarks are obtained respectively when compared to
exhaustive
search as shown in Table Ill.
[335] The described approach using a fuzzy search technique offers fast
exploration of
the architecture design space according to the system specifications provided.
The
method is based on the sorting of the architecture design space, such as for
example in
an increasing or decreasing order based on the calculation of the PF or by
using a tree
structure or another sorting method. It then uses the fuzzy search mechanism
to find an
optimal variant for the design- The speed up of the proposed fuzzy search
method
indicates improvement (or at least an alternative) in the speedup for finding
the optimal
variant of the design space. Moreover since the time to market pressure has
led to
growing attention for automation of OSE approaches, hence automations of the
DSE
methodology for multi objective designs has become significant. This fuzzy
search
method allows automation of the DSE methodology which is exceedingly important
for
the current generation of the VLSI designs which has multi objective
requirement.
[335] The present invention has been described here by way of example only.
Various
modification and variations may be made to these exemplary embodiments without
departing from the spirit and scope of the invention, which is limited only by
the
appended claims.
-101-