Patent 2086174 Summary

(12) Patent:	(11) CA 2086174
(54) English Title:	COMPUTATIONAL STRUCTURES FOR THE FREQUENCY-DOMAIN ANALYSIS OF SIGNALS AND SYSTEMS
(54) French Title:	STRUCTURES DE CALCUL POUR L'ANALYSE FREQUENTIELLE DE SIGNAUX ET DE SYSTEME
Status:	Deemed expired

(51) International Patent Classification (IPC):	G06F 17/10 (2006.01) G06F 17/14 (2006.01)
(72) Inventors :	SUNDARARAJAN, DURAISAMY (Canada) AHMAD, M. OMAIR (Canada) SWAMY, M. N. SRIKANTA (Canada)
(73) Owners :	SUNDARARAJAN, DURAISAMY (Canada) AHMAD, M. OMAIR (Canada) SWAMY, M. N. SRIKANTA (Canada)
(71) Applicants :
(74) Agent:
(74) Associate agent:
(45) Issued:	1998-08-25
(22) Filed Date:	1992-12-23
(41) Open to Public Inspection:	1994-06-24
Examination requested:	1992-12-23
Availability of licence:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:	None

Note: Descriptions are shown in the official language in which they were submitted.

SPECIFICATION 2 0 8 6 1 7 4

The present invention is related to improved fast-Fourier-transform (FFT) analyzers. It
is directed to both the butterfly circuits and the interconnection of the butterfly circuits to
form FFT analyzers.
The use of the discrete Fourier transform (DFT) for the frequency-domain analysis and
design of signals and systems came into widespread use after the publication of the paper
"An Algorithm for the Machine Calculation of Complex Fourier Series, in "Math. Comp~-
tation., vol. 19, Apr. 1965, pp. 297-301 by Cooley, J. W. and Tukey, J. W. describing, in
general, the decomposition of an N-point DFT into a number of DFTs of a smaller size and,
in particular, the radix-2 decimation-in-time (DIT) algorithm. The radix-2 decimation-in-
frequency (DIF) algorithm was reported later by Gentleman, W. M. and Sande, G. in "Fast
Fourier Transforms--for Fun and Profit,"in Proc. AFIPS, Joint Comp~ter Conf., vol. 29,
1966, pp. 563-578. However, both the DIT and DIF algorithms, in general, are referred
to as Cooley-Tukey algorithms. A detailed description and the history of the development
of these algorithms can be found in Special Issue on FFT and Applications, IEEE Trans.
Alldio Electroaco~st., vol. AU-15, June 1967. Despite the subsequent development of other
algorithms, it is the Cooley-Tukey radix-2 algorithm that has been most widely used as
presented originally without any significant changes, due to the simplicity, regularity, and
efficiency of the resulting computational structure. A drawback of the computational struc-
ture of the Cooley-Tukey radix-2 algorithm is that it requires more complex multiplications
compared with that of the higher-radix algorithms. In addition, this structure has the over-
heads of bit-reversal and data-swapping operations. This disclosure reports the invention of
a large set of computational structures, designated as plus-minus (PM) structures, derived
from a new family of radix-2 algorithms, designated as the plus-minus (PM) algorithms.

,!' ~ .

2086~ 74
The Cooley-Tukey computational structure based on the radix-2 algorithm uses a divide-
and-conquer strategy to decompose the problem of computing an N-point DFT into a num-
ber of sub-problems of computing 2-point DFTs each characterized by a basic computational
process commonly known as the butterfly operation. Each of the input and output nodes
of a butterfly has only one data value. Therefore, a single butterfly computes only one 2-
point transform. However, it is possible to design a large number of radix-2 computational
structures by making the input data to each node of a basic computational unit a vector
rather than a scalar. The computational structures of the present invention are developed
from this observation.
Since a 2-point transform is computed with one data value from each of the two input
nodes of the butterfly, the existence of more than one data value (i. e., the elments of
a vector) at each node implies the computation of two or more 2-point transforms by
each butterfly. The ~-element vector formation from the raw given data is carried out by
computing ~b-point DFTs in a preprocessing structure (the first part of the PM structure).
The computational structures derived from the PM family of radix-2 algorithms handle the
data values as vectors with 2 or more elements and compute more than one 2-point DFTs
in each butterfly of the second part of the structure.
In drawings which illustrate embodiments of the invention,
~igure 1 is the butterfly of the 2 x 1 PM DIT DFT structure. Two of the twiddle factors
assume a value of 1, whereas the other two differ by a factor of WN,
~igure 2 is the flowchart of the 2 x 1 PM DIT DFT structure for complex-valued input
data with N = 32,
Figure 3 is the butterfly of the 2 x 1 PM DIF DFT structure. Two of the twiddle factors
assume a value of 1, whereas the other two differ by a factor of W~
Figure 4 is the flowchart of the 2 x 1 PM DIF DFT structure for complex-valued input
data with N = 32,
~igure 5 is the hardware realization of the butterfly of the 2 x 1 PM DIT DFT structure,

20861 7~
Figure 6 is the hardware realization of the 2 x 1 PM DIT DFT structure for complex-
valued input data with N = 8,

Figure 7 is the flowchart of the 2 x 1 PM DIT DFT structure for real-valued input data
with N = 32,

Figure 8 is the flowchart of the 2 x 1 PM DIF IDFT structure for the transform of
real-valued data with N = 32,
Figure 9 is the butterfly of the 2 x 2 PM DIT DFT structure with four twiddle factors,
Figure 10 is the flowchart of the 2 x 2 PM DIT DFT structure for complex-valued input
data with N = 32,

Figure 11 is the flowchart of the 2 x 2 PM DIT DFT structure for complex-valued input
data with N = 16,

Figure 12 is the butterfly of the 2 x 2 PM DIF DFT structure with four twiddle factors,
Figure 13 is the flowchart of the 2 x 2 PM DIF DFT structure for complex-valued input
data with N = 32,

Figure 14 is the flowchart of the 2 x 2 PM DIF DFT structure for complex-valued input
data with N = 16,

Figure 15 is the hardware realization of the basic computing unit of the 2 x 2 PM DIT
DFT structure,
Figure 16 is the flowchart of the 2 x 2 PM DIT DFT structure for real-valued input data
with N = 32,

Figure 17 is the flowchart of the 2 x 2 PM DIF IDFT structure for the transform of
real-valued data with N = 32,
Figure 18 is the flowchart of the 2 x 1 PM DIT DFT structure for 8 x 8 complex-valued
2-D input data, a twiddle factor W8 is shown only by its exponent s,

~f

- - 2086 1 7~
Figure 19 is the flowchart of the 2 x 1 PM DIF DFT structure for 8 x 8 complex-valued
2-D input data, a twiddle factor W8 is shown only by its exponent s,
Figure 20 is the flowchart of the 2 x 2 PM DIT DFT structure for 8 x 8 complex-valued
2-D input data, a twiddle factor W8 is shown only by its exponent s, and
Figure 21 is the flowchart of the 2 x 2 PM DIF DFT structure for 8 x 8 complex-valued
2-D input data, a twiddle factor W8 is shown only by its exponent s.

Each member of the set of computational structures of the present invention is designated
hereafter as u x v PM structure and the algorithm through which it is derived is designated
as u x v PM algorithm. The letter u indicates the length of the input vectors to a butterfly.
It is a power of 2 with a positive integer exponent in the range 1 to M--1 (In this disclosure
it is assumed that N, the number of samples in the input data set, is equal to an integral
power of 2 i.e., M = log2 N is a positive integer.). It is possible to combine fully or partially
the multiplication operations of two or more consecutive stages in the PM structures in
order to reduce the total number of multiplication operations. The letter v, in the range 1
to (M--log2 u), specifies the number of stages whose mult;plications are combined.
The decomposition process of the computation of an N-point DFT into 2-point DFTsrequires the incorporation of the vectorization process in the definition of the DFT. Starting
from the definition of the DFT, we present an equivalent vectorized format for any value of
u.
The DFT of a data sequence {x(n)~ n = 0,1, . . ., N--1} is defined as
N--1
X(k) = ~, 2(n)WNk, k = 0,1, . . ., N--1, (1)
n=O
where WN = exp(--j2N). For a positive integer u as defined earlier, (1) can be rewritten as

X(k) = {x(0) + x(1--)WN u + ~ ~ ~ + X((fl--1)--)WN ) u }WNk

N)WlkN + ~ ~ + 2(1 + (U--1)--) WN } N

'll ( U 1 ) WN + ~ ~ ~ + X (U----1 ) W( --l )k N } (----1 )k (2)

Since WN = WU1, (2) can be expressed as 2 0 8 6 1 7 4

X(k) = {x(O) + x(l--)Wuk + ~ ~ ~ + x((u--1)--)WUu l)k}WNk

+{x(l) + x(l + 1--)Wulk + ~ ~ ~ + x(l + (u-- 1)--)WU )k}WN +

U ) + (2 U 1)WU + ~ ~ + X(U----1)W(U--l)k}W(N-l)k
Note that the expressions inside each pair of braces is a u-point DFT and when evaluated
will give rise to u distinct values for k = O, 1, . . ., ?b--1. Let a(n)(n = O, 1, . . ., N _ 1) denote
vectors, each consisting of u-elements (Vectors will be denoted by boldfaced letters). The
vector a(n) consists of the nth u-point transform values aq (n) (q = O, 1, . . ., u- 1) as defined
below.
a(n) = {aO(n),al(n),...,au-l(n)} =
{valuesof theu--pointDFTof {x(n+O--),x(n+1--),...,x(n+(u--1)--)}},

n=0,1,.. ,----1. (4)
u
Therefore, (3) can be rewritten as

X(k) = aq(O)WN + aq(1)WN + ~ ~ ~ + aq(----1)W( u

= ~, aq(n)WNk7 k = O, l, . . ., N--1- (5)
n=O
Since the values of the u-point transform aq(n) are periodic with a period of u, for values
of k equal to or greater than u in (5), the values aq(n) repeat. Therefore,
q = k mod u. (6)
Replacing the argument k by k + PN in (5) and (6) yields

N) (O)WO(k+PU)+aq(1)WN( U + +aq(U 1)WN
k = 0,1, . . .,----1, p = 0,1, . . ., u--1 (7)

q = (Ir + p--) mod u. (8)

- 2086~ 74
Let us vectorize the output transform values X (k) as

A(k) = {Ao(k) , A1 (k) , . . ., Au-l (k)} = { X (k + p - ),p = O,l,...,u - 1},
k = 0,1,.. .,----1. (9)

Therefore, (7) can be expressed as

Ap(k) = aq(O)WNk(WNU )~P + aq(l)WNk(WNU )lp + ~ - + aq( - - 1)WNU ) (WNU )( U -1)p

Ok( l)op + (l)Wlk(Wl)lP + ~ ~ ~ + aq(----1)WN ( U)
With the vectorization of the input quantities x(n) as carried out in (4) and the output
quantities X (k) as carried out in (9) and from (10), the DFT as defined by (1) can be
equivalently written as
N - l
Ap(k) = ~ (WUI)Pnaq(n)WNk, k = 0,1,..., - - 1 (11)

where q = (k + PNU ) mod u and p = O, 1, . . ., u--1. Thus, the PM structures must have a
part in which u-element vectors are formed by implementing u-point transforms.
The inverse DFT (IDFT) is given by
N--1
x(n) = N ~, X(k)WNnk, n = 0,1, . ..,N--1. (12)

A set of expressions similar to those given by (4), (9), and (11) can be obtained for the case
of IDFT. The input quantities X (k) are vectorized as
B(k) = {Bo(k)~ Bl(k),..., Bu-l(k)} =
{values of the u - point IDFT of { X (k + O - ), X (k + 1 - ),..., X (k + (u - 1) - )}},

k = 0,1,. .. ,----1. (13)
The output data values x(n) are vectorized as

b(n)={bo(n),bl(n), -,bu-l(n)} = {x(n + p - ),p = O,1,~..,u - 1},
n=O,l,..... ,----1. (14)

A~

-- 20861 7~
An expression similar to (11) for computing the IDFT can be obtained from (12) using (13)
and (14) as

( ) u ~ ( W -l)PkB (k) W Nnk~ n = 0,1,. ~ u 1 (15)
where q = (n+ pUN) mod u and p= 0,1,...,u--1.
In the subset of the computational structures with u = 2, data samples are organized as
2-element vectors. For the specific value of u = 2, (4), (9), and (11) become, respectively,
as

a(n) = {aO(n),al(n)} = {x(n) +x(n+ 2 )~ x(n)--x(n+ 2 )}~
n = 0, 1, . . ~ 2 -1 (16)

A(k) = {Ao(k)7Al(k)} = {X (k)~ X (k + 2 )}'
k = 0,1,.. ., 2 --1 (17)
N - l
Ap(k) = ~ (_l)P aq(n)wNk7 k = 0~ 2 - 1 (18)
where p = 0 or 1 represents the first or the second element, respectively, of the output
vector A(k), and
q = k mod 2, N > 2
Similarly, in the case of IDFT, for u = 2, (13), (14), and (15) become, respectively, as

B(k) = {Bo(k)7Bl(k)} = {X (k) + X (k + 2 )~ X (k) - X (k + 2 )}'
k = 0,1,..., 2 --1. (19)

Note that the process of forming the vectors B (k) differs from the one using (12), in that
the values are not divided by 2, the value of the divisor N outside the summation in (12).
This is done to avoid the division by a constant more than once.

b(n) = {bO(n),bl(n)} = {x(n),x(n+ 2 )}'
n=O,l,..... , 2 -1 (20)

~- 20861 7~

bp(n) = N ~ )PkBq(k)WNnk~ n = 0,1, . ..,----1 (21)
where p = O or 1 represents the first or the second element, respectively, of the output
vector b(n), and
q = n mod 2, N > 2
Equations (18) and (21) can have direct hardware or software implementation to compute
the DFT and IDFT, respectively. The decomposition of these equations can be expected
to yield faster and more efficient computational structures. Therefore, the procedure for
the decomposition of these equations and the derived structures will be presented in detail
in the following sections. The procedure of obtaining the computational structures for
the parameter values of ~ ~ 2 follow along the same lines. It should be noted that the
number of array-index updatings in the PM structures will be proportional to Nu due to the
vectorization of the data samples.
The 2 x 1 PM DIT computational structure to compute the DFT for complex-valued in-
put data is derived by decomposing (18) corresponding to a number of computational stages
where the multiplication operations of consecutive stages are not combined. Decomposition
of (18) corresponding to the even-indexed and the odd-indexed input samples aq(n) yields
N_1 N_1
Ap(k) = ~ )P(2n)aq(2n)WNnk + ~ (_ I )P(2n+l)a (2n + l)W(2n+l)k (22)
n=O n=O
Since (--1)P(2n) = 1 and WNnk = W_k~ (22) can be rewritten as
4 --1 4N_1
Ap(k) = ~, aq(2n)WN + ( - 1)PWN ~ aq(2n + 1)WN . (23)
n=O 2 n=~ 2
The output transform Ap(k+ N) which is N samples ahead of the kth sample Ap(k), can be
obtained by replacing k by k + N in (23) . The resulting equation, by noting that WN =--i
N

and WN =--1, can be written as
N - 1 N - 1
Ap(k + 4 ) = ~ (--1)naq(2n)WNk + (--1)P(--j)Wk ~ (--1)naq(2n + 1)WNk,
n=O 2 n=O 2
k= 0,1,.. , 4 --1. (24)

- 20861 74
Using (18), the transform of the N even-indexed input samples aq(n), represented by Ao(k)
and A1(k), respectively, is given by
4N - 1

Ao(k) = ~ aq(2n)W2N
n=O
and 4N_1
Ae (k) = ~ (--1 ) na (2n) wnk
n=O 2
and the transform of the N odd-indexed input samples aq(n), represented by Ao(k) and
Al~(k), respectively, is given by
N - 1

Ao(k) = ~ aq(2n + 1)WN
n=O 2
and N_1

Al(k) = ~ (_l)naq(2n+ l)WNk

n=O
where k = 0,1,,, , N4 _ 1. Now (23) and (24), using the smaller size transforms given above,
can be written as
Ap (k) = Ao (k) + (--l ) pwNAo (k) (25)
and
Ap(k + 4 ) = Al (k) + (--l)P(--j)WNAl~(k) (26)
where k = O, 1, , N _ 1. From (25) and (26), we see that in order to compute the transform
values Ap(k) and Ap(k+ N), the precomputation of Ao(k) and A1(k), and Ao(k) and Al~(k)
is required. Thus, the problem of computing an N--point DFT has been decomposed into a
problem of computing two N -point DFTs. This process of decomposition can be continued.
In general, the relations governing the basic computation at the rth stage can be deduced
from (25) and (26) as
Aor+l) (h) = Aor) (h) + A(r) (I ) wS
Al ) (h) = Aor) (h)--Aor) (I ) wS
Aor+l)(l) = A(r)(h) + A(r)(l)WN+ 4
A(r+l)(l) = A(r)(h)--A(r)(l)WN 4

.

- 20861 74
where s is an integer whose value depends on the stage of computation r and the index h.
These equations characterize the input-output relations of a butterfly, the basic computing
unit, shown in Fig. 1. These equations represent two 2-point DFTs after the input values
indexed with I are multiplied with appropriate twiddle factors. In the general case, each
butterfly of the PM structures compute ~b 2-point transforms. In Fig. 1, two input ports of
the butterfly are marked 100 and 101. Two output ports of the butterfly are marked 102
and 103.
Repeated use of (25) and (26) yields the flowchart of the computational structure for
a specific value of N. The set of butterflies for the last stage is found by allowing k to
vary from 0 to N _ 1 in (25) and (26). The output vectors of the preceding stage are split
into two sets of N-vector DFTs. The set of butterflies for each of the two independent sets
of this stage is found by allowing k to vary from 0 to N _ 1 in equations obtained from
(25) and (26) by replacing N by N2 As seen from (25) and (26) ( which correspond to the
last stage), h and I of the general butterfly differ by a number that ranges from 1 in the
first stage to 2N in the last stage. The process of decomposition stops when the size of the
transforms reduces to two vectors. The flowchart of the 2 x 1 PM DIT DFT computational
structure for N = 32 is shown in Fig. 2. One of the butterflies that makes this structure
is shown in dashed box 106. Reference numeral 104 indicates an input vector with index
zero and reference numeral 105 indicates an output vector with index zero.
The 2 x 1 PM DIT structure for computing IDFT can be obtained by decomposing (21)
in a similar way. The process will lead to relations similar to (25) and (26) with the sign
of the exponents of WN negated and--j replaced by j. Finally, each output data vector
component bq(n) is divided by N.
The 2 x 1 PM DIF computational structure to compute the DFT for complex-valued in-
put data is derived by decomposing (18) corresponding to a number of computational stages
where the multiplication operations of consecutive stages are not combined. Decomposition
of (18) corresponding to the first-half and the second-half of the input samples aq(n) yields
4 --I I~T_1
A (k) = ~ (--l)Pnaq(n)wNk + ~ (--l)Pnaq(n)wN (27)
~=0 n= 4

~- - 2086 1 74

Replacing n by n + N in the second summation of the right side of (27) gives

4N - 1 4N - 1 ( 2 8 )
n=O n=O
The pair of summations in (28) can be combined into a single one, giving
4 --1
Ap(k) = ~ (--l)Pn{aq(n) + (--l)P4 (--j)kaq(n + 4 )}WN (29)
n=O
The even-indexed and odd-indexed transform values Ap(k) are readily obtained from (29)
by replacing k by 2k and k by 2k + 1, respectively, as
N - 1

Ap(2k) = ~, (--l)P {aO(n) + (--l)P 4 (--1) aO(n+ 4 )}WN (30)
n=O
and
Ap(2k + 1) = ~, (--1) {al (n) + (--1)P 4 (--j)(--l)kal (n + 4 )}WNWNk (31)
n=O
where k = 0~ N _ 1. From (30) and (31)~ we see that the input values aq(n) and
nq(n+ N4 ) are combined to reduce the problem size. The problem of computing an N--point
DFT, therefore, has been decom~iosed into a problem of computing two N2--point DFTs.
This process of decomposition can be continued. In general, the relations governing the
basic computation at the rth stage can be deduced from (30) and (31) as

aOr+l)(h) = aOr)(h) + aO )(I)

a(r+l ) (h) = aOr) (h)--aOr) (I )

aO (I) = al (h)WN + al (I)WN
(r+l)(l) = a(r)(h)WN--a( )(I)WN 4

where s is an integer whose value depends on the stage of computation r and the index h.
These equations characterize the input-output relations of a butterfly, the basic computing
unit, shown in Fig. 3. In Fig. 3, two input ports of the butterfly are marked 200 and 201.
Two output ports of the butterfly are marked 202 and 203.
Repeated use of (30) and (31) yields the flowchart of the computational structure for a
specific value of N. The set of butterflies for the first stage is found by allowing k to vary
12

. .

20861 74
from 0 to N _ 1 in (30) and (31). The input vectors of the succeeding stage are split into
two sets of N-vector DFTs. The set of butterflies for each of the two independent sets for
this stage is found by allowing ~ to vary from 0 to N _ 1 in equations obtained from (30)
and (31) by replacing N by N. The process of decomposition stops when the size of the
transforms reduces to two vectors. The flowchart of the 2 x 1 PM DIF DFT computational
structure for N = 32 is shown in Fig. 4. One of the butterflies that makes this structure is
shown in dashed box 206.
The 2 x 1 PM DIF structure for computing IDFT can be obtained by decomposing (21)
in a similar way. The process will lead to relations similar to (30) and (31) with the sign
of the exponents of WN negated and--j replaced by j. Finally, each output data vector
component bq(n) is divided by N.
Fig. 5 shows a possible hardware realization of the butterfly of the 2 x 1 PM DIT
DFT structure. Several of these units can be used, and in the ultimate realization all the
stages can be realized with complete hardware without any software at all. For the sal~e
of discussion, assume that only one butterfly is available in a computing system as part of
an arithmetic unit. The butterfly unit could be made for different precisions and types of
arithmetics. The butterfly essentially consists of four basic arithmetic units: multipliers,
adders, subtractors, and plus-minus units. Reference numerals 110, 111, 112, 113, 114,
115, 116, and 117 indicate real multipliers. Reference numerals 121 and 122 indicate
adders. Reference numerals 120 and 123 denote subtractors. Dashed boxes 140 and 141
point to complex multipliers. Reference numerals 130, 131, 132, and 133 indicate plus-
minus units. The + and - symbols on a subtractor indicates the minuend and subtrahend,
respectively. A plus-minus unit has a + input and a +/- input and two outputs. The +
output is simply the addition of the two inputs, whereas the - output is the difference of
+ and +/- inputs. As the sum and the difference produced by the plus-minus units are
stored in consecutive locations in the memory, both the results need only one destination
address and can be efficiently moved to the memory as a long word. Each butterfly has
two twiddle factors WN and WN+ 9 . Assuming that WN = C + jd, WN+ 4 = d--jc. Note
that the inputs to the butterfly are decomposed into their real and imaginary parts before
they are applied to multiplier or plus-minus units. Similarly, the outputs from the butterfly

2086 1 74
are also in decomposed form. Specifically, A(r) = AR(r) + jAl~(r)~ (i = 0,1). The input
ports of the butterfly for the real and imaginary parts of the first complex input signal are
marked 100rO and 100iO, respectivly. For the second input signal, the ports are maked
100rl and 100il. For the third input signal, the ports are maked 101rO and 101iO. For
the fourth input signal, the ports are maked 101rl and 101il. The output ports of the
butterfly for the real and imaginary parts of the first complex output signal are marked
102rO and 102iO, respectivly. For the second output signal, the ports are maked 102rl
and 102il. For the third output signal, the ports are maked 103rO and 103iO. For the
fourth output signal, the ports are maked 103rl and 103il. The multipliers, the adders
and subtractors, and the plus-minus units of a butterfly can be used as a three-stage pipeline
architecture. For each cycle, one set of four complex-valued outputs are produced from four
complex-valued input signals. Each stage of the pipeline is busy with the next set of data.
The operation of the pipeline needs buffer units between the stages and control circuits that
are not shown in Fig. 5.
Fig. 6 shows the hardware realization of the 2 x 1 PM DIT DFT structure for complex-
valued input data with N = 8. The first part of structure marked 150 implements the
2-point transforms to form the vectors and the swapping of the data. The real part of the
complex input signal indexed zero is marked 151. The real part of the complex output
signal indexed zero is marked 159. The second part of the structure consists of two stages
of butterflies. The first stage has no multiplier units as the twiddle factors for this stage are
all either 1 or--j. Dashed boxes 152 and 153 are the two first stage butterflies. Dashed
boxes 156 and 158 constitute the first butterfly of the second stage. Dashed boxes 154,
155 and 157 constitute the second butterfly of the second stage. The hardware realization
for the 2 x 1 PM DIF structures will contain same number of multipliers, adders, etc., but
the interconnections will be different.
The transform of a set of real-valued input data is Hermitian-symmetric, i.e., the real
part of the transform is even symmetric and the imaginary part is odd symmetric. For an
even N, the transform values X(O) and X(N) are real-valued and distinct and

X(k) = X*(N--k), 1 < k < 2 --1.

14

-- 2086 1 74
Therefore, for a specific value of N, only the first-half of the butterflies of each group of the
computational structure corresponding to complex-valued data are computed during the
computation of any stage. Due to this reason, the memory requirement is reduced by one
half and the computation requirement is slightly less than one half of that of the structures
for complex-valued input data. Therefore, the computational structure is similar to but
only one half in size corresponding to complex-valued input data.
In the subset of the computational structures with ?1 = 2, data samples are organized as
2-element vectors. The storage of the vectors differs from that of the complex-valued data.
For the specific value of ~ = 2, vectors defined in (16) and (17) are modified, respectively,
as
a'(n) = {a(n), a(n + 4 )} =

{{x(n)+x(n+ 2),x(n)--x(n+ 2)},{x(n+ 4)+x(n+ 4 ),x(n+ 4)--x(n+ 4 )}},
N

n = 0,1,.. , 4 --1 (32)

A (O) = {A(O), Ao( 4 )} = {{X(O), X( 2 )}~ X( 4 )},
A'(k) = {A(k)} = {X(k),X(k+ 2 )}'
N

k = 1, 2, . . ., 4 --1. (33)

Fig. 7 shows the flowchart of the 2 x 1 PM DIT DFT computational structure for real-valued
input data with N = 32. Reference numeral 160 indicates an input vector indexed zero.
Reference numeral 170 indicates an output vector indexed zero. Although the structure
has the same number of stages as its complex-valued counterpart given in Fig. 2, there
are some differences that must be noted. The number of butterflies in each stage is one
half. In the first stage, 4-point transforms are computed by merging two 2-point transforms.
As a vector contains two 2-point transforms, the four point transform is computed using
the elements of a single vector. This operation in Fig. 7 is marked as f indicating the
computation of a 4-point transform in the flowchart. A butterfly marked f is shown in
dashed box 161. In the implementation, however, the vector formation, its scrambling, and
the computation of the 4-point transform are all carried out at the same time. Since the

-

- 20861 74
first output vector has three transform values, the first butterflies in each group marked g is
to be interpreted differently. A butterfly marked g is shown in dashed box 162. While the
second vector is computed in a regular manner, the computation of the first vector consists
of computing the first transform value and two others that are separated from it by one-
quarter and one-half of the samples in the group. The other butterflies are similar to the
structures of the complex-valued input data. Butterfly with ports marked 163, 164, 165
and 166 is one such butterfly. As only the first half of each group of butterflies in each stage
corresponding to the structure for complex-valued data are computed, negation, swapping
and conjugation operations must be carried out in order to get the first half of the output
transform vectors. The conjugation operations are shown in Fig. 7 by the symbol * and one
of them is marked 168. These operations are shown in Fig. 7 as post-extended operations
of each stage, but in the implementation they are merged with the main stage and they
do not pose any additional computational complexity. Butterfly with ports marked 163,
164,167 and 169 constitute a butterfly for real data. The hardware implementation of the
PM structures for real-valued input data for computing DFT is similar to PM DIT DFT
structures for complex-valued input data with the differences as explained above.
While the redundancies due to real-valued data can be exploited in the development of
either the DIT or the DIF structures, it is easier to implement the DFT computation using
the DIT structure and implement the TDFT computation using the DIF structure.
In the case of IDFT, for the specific value of ~ = 2, the vector definitions (19) and (20)
are modified, respectively, as
N

B'(O) = {B(O), B( 4 )} =

{{X(0) + X( 2 )~ X(0)--X( 2 )}~ {X( 4 ) + X*( 4 ), X( 4 )--X*( 4 )}},
B'(k) = {B(k)} = {X(k) + X(k + 2 )~ (X(k)--X(k + 2 ))}'
N

k = 1,2,. .. , 4 -1 (34)

b~(n) = {b(n),b(n+ 4 )} = {{x(n), 2(n+ 2 )}~ {~(n+ 4 ), ~(n+ 4 )}},
n=0,1,.. , 4 -1. (35)

16

- 20861 74
Fig. 8 shows the 2 x 1 PM DIF IDFT computational structure for the transform of real-
valued data with N = 32. Reference numeral 260 indicates an input vector indexed zero.
Reference numeral 270 indicates an output vector indexed zero. In computing the IDFT,
the input is always the first half of the vectors but the computation requires first quarter
and third quarter of the input vectors. Therefore, similar to the forward transform, the
conversion of the second quarter data into that of the third quarter data requires the op-
erations of negation, data swapping, and conjugation. In the IDFT these operations are
carried out before the butterfly operations of each stage, i.e., each stage is pre-extended. The
conjugation and data swapping operations are shown in the pre-extended stages but they
are combined with the butterfly operations without posing any addit;onal computational
complexity. Butterfly with ports marked 263,264,265 and 266 is similar to a complex
butterfly. Butterfly with ports marked 263,264,267 and 269 is a butterfly for the trans-
form of real data. Since the first vector has three real values and one imaginary value that
are independent, the first butterfly marked g of each group must be interpreted differently.
While the second vector is computed in the same manner as for the complex-valued data,
the computation of the first vector consists of computing the first inverse transform value
and three others that are separated from it by one-quarter, one-half, and three-quarter of
the samples of each group. A butterfly marked g is shown in dashed box 262. In the last
stage, two 2-point IDFTs are computed in a butterfly as usual. However, all the four input
and output values are stored in a single vector and this operation is marked as f in Fig.
8. A butterfly marked f is shown in dashed box 261. This operation of this stage can
be combined with the operations of scrambling and dividing the transforrn values by N.
The hardware implementation of the PM structures for real-valued input data for comput-
ing IDFT is similar to PM DIF IDFT structures for complex-valued input data with the
differences as explained above. -
In the 2 x 2 PM structures the data values are 2-element vectors and the multiplication
operations of each pair of adjacent stages are combined. To derive the 2 x 2 PM DIT
structure for computing the DFT, consider the input-output relations (25) and (26) for the
last stage of the 2 x 1 PM DIT structure, as given by

Ap(k) = Ao(k) + ( - l)P(Wk Ao(k)) (36)

- 20861 74
and
Ap(k + 4 ) = Ale(k) + (--l)P(--j)(WkAl~(k)) (37)
where k = 0,1, . ., N _1. The output transform values Ap(k+N) and Ap(k+3N) which are
N8 samples ahead of those given by (36) and (37), respectively, can be obtained by replacing
k by k + N in these equations and the resulting equations are given by

Ap(k + 8 ) = Ao(k + 8 ) + (--1)P(WN+ 8 Ao(k + 8 )) (38)
and
Ap(k + 8 ) = Ael(k+ 8 ) + (--l)P(--j)(WN+8 Al~(k + 8 )) (39)
where k = 0,1, . . ., N8 - 1. The variables Aq(i) and Aq~(i) (q = 0,1; i = k, k + N8 ) on the
right side of (36) through (39) represent the output values of the previous stage. These
values, in turn, can be obtained recursively as

Ae(k) = Aoe(k) + (--l)PWNkAo~(k), (40)

Ap(k+ 8 ) = Alee(k) + (--l)P(--j)wNkAeo(k) (41)
A~ (k) = Aoe (k) + (--l )PWNkAo~ (k), (42)
and
Ap(k + 8 ) = Al~e(k) + (--l)P(--j)wNkAoo(k)
where k = 0,1, . . ., N _ 1. Since the terms Aq~(k) and Aq(k + N) Of (36) through (39) have
twiddle factors associated with them, (42) and (43) can be modified accordingly as

WNAp(k) = WkAoe(k3 + (--l)PW3kA~~(k)
and
WN 8 Ap(k + 8 ) = WN 8 Al~e(k) + (--1)P(--j)WN 8 Al (k) (45)
where k = 0,1, . .., N8 - 1. From (44) and (45), we see that the butterfly of the 2 x 2 PM
structure has up to four non-trivial twiddle factors. The form of these twiddle factors and
their interrelationships, in fact, characterizes this structure. The input-output relations of

2086 1 74
a butterfly of the 2x 2 PM DIT structure with four twiddle factors shown in Fig. 9 can be
deduced from (44) and (45) as

A(r+l ) (h) = A(r) (h) WN + A( ) (I ) WN

A(r+l ) (h) = A(r) (h) WN--A( ) (I ) WN
A(r+l) (l) = A(r) (h) WN+ 8 + A(r) (l ) WN 8
A(r+l) (I) = A(r) (h)WN+ 8 _ A(r) (I)WN 8
where s is an integer whose value depends on the stage of computation r and on the index
h.
Fig. 10 shows the flowchart of the 2 x 2 PM DIT structure for N = 32. There are
four stages of computation after the formation of the input vectors. The multiplication
operations of the first and the second stages and those of the third and the fourth stages
are separately combined. A 2x2 PM DIT butterfly is shown in dashed box 175. Butterflies
marked 173 and 174 require no nontrivial multiplications. Butterfly marked 171 requires,
in general, two nontrivial multiplications. Butterfly marked 172 requires, in general, four
nontrivial multiplications. If M is even, there is an odd number of stages after the formation
of the input vectors. In this case, for the purpose of merging of multiplication operations,
stages are paired starting from the last stage. The processing of the left out first stage is
done in the same way as in the 2 x 1 PM DIT structure. The reason of not pairing this
stage with the second stage is that it has only trivial multiplications. As an example, the
flowchart for the 2 x 2 PM DIT structure for N = 16, (i.e., M = 4) is shown in Fig. 11.
The 2 x 2 PM DIT structure, as implemented by (36) through (39), (40), (41), (44), and
(45), saves 25% of complex multiplications when compared with the implementation of the
2 x 1 PM DIT structure. The multiplications in two adjacent stages in the 2 x 2 PM DIT
structure have been combined such that the second stage of the pair has multiplications
only by -j and the first one has less number of multiplications than those of the two stages
together of the 2 x 1 PM DIT structure. In general, multiplication operation of two or more
consecutive stages of a PM structure can be combined together in order to reduce such
operations.

19

.0-,
~ ~!

-

2086 1 74
We will now derive the 2 x 2 PM DIF structure for computing the DFT. Expanding (30)
such that the summation is only over N terms and noting that WN =--j, yields

8~ ( l)Pn{(ao(n) + (--l)P4 (--1) aO(n+ 4 ))
n=O
(_l)P 8 (--j)k(aO(n + --) + (--1)P 4 (--1) aO(n + 8 ))}WN
k = 0~ 4 --1. (46)
Changing k to 2k in (46) gives

Ap(4k) = ~ (--l)P {(aO(n) + (--l)P 4 aO(n + 4 )) +
n=O
(--l)P8 (--l)k(an(n+ 8 ) + (--l)P4 aO(n+ 8 ))}WN,
k=0,1,.. , 8 -1. (47)
Similarly, replacing k by 2k + 1 in (46) yields

Ap(4k + 2) = ~ )Pn{(aO(n)--(_l)P 4 aO(n + 4 )) +
n=O
(_l)P8 (--j)(--l)k(aO(n+ 8 )--(--l)P4 aO(n+ 8 ))}WN WN,
k = 0, 1, . . . ~ 8 --1. (48)
Expanding (31) such that the summation is only over N8 terms yields

8~ ( l)Pn{(al(n) + ( l)P 4 (--j)( 1) a1( 4
n=O
(--1)P 8 WN (--j)k(al (n +--) + (--l)P 4 (--j)(--l)kal (n + 8 ))}WNWN,
k=0,1,.. , 4 -1. (49)
Changing k to 2k in (49) gives

Ap(4k+1) = ~ )Pn{(al(n)+(-l)P 4 (--j)al(n+ 4))+
n=O
(--l)P8 WN (--1) (al (n+ 8 ) + (--l)P4 (--j)al(n+ 8 ))}WNW~k,
k = 0,1, . . ~ 8 --1. (50)

- A~
.~

Similarly, replacing k by 2k + 1 in (49) yields 2 n 8 6 1 7 4

Ap(4k+3) = ~ )P {(al(n)--(--l)P 4 (--j)al(n+ 4 ))+
n=O
(--1)P8 (--j)WN (--1)k(al(n+ 8 )--(--1)P4 (--j)a1(n+ 8 ))}WN WN ~
N

k=0,1,..., 8 -1. (51)

The input-output relations of a butterfly of the 2 x 2 PM DIF structures with four twiddle
factors shown in Fig. 12, can be deduced from (50) and (51) as

(r+l)(h) = aOr)(h)WN + aO (I)WN

(r+l)(h) = aOr)(h)WN--aO (I)WN

a(r+l) (¦) = a(r) (h)W3s + a(r) (I)W3s+ 8
(r+l)(l) = a(r)(h)WN5--a( )(I)WN

where s is an integer whose value depends on the stage of computation r and on the index
h.
Fig. 13 shows the flowchart of the 2 x 2 PM DIF structure for N = 32. A 2 x 2 PM DIF
butterfly is shown in dashed box 275. Butterflies marked 273 and 274 require no nontrivial
multiplications. Butterfly marked 271 requires, in general, two nontrivial multiplications.
Butterfly marked 272 requires, in general, four nontrivial multiplications. If M is even,
there is an odd number of stages after the formation of the input vectors. In this case, in
contrast to the 2 x 2 PM DIT structure, stages are paired starting from the first stage for the
purpose of merging of multiplication operations. The processing of the left out last stage
is done in a similar way as in the 2 x 1 PM DIF structure. The reason for not pairing the
last stage with the preceding one is that it has only trivial multiplications. As an example,
the flowchart for the 2 x 2 PM DIF structure for N = 16, (i.e., M = 4) iS shown in Fig.
14. The 2 x 2 PM DIF structure, as implemented by (47), (48), (50), and (51), saves 25~o
of complex multiplications when compared with the implementation of the 2 x 1 PM DIF
structure. The multiplications in two adjacent stages in the 2 x 2 PM DIF structure have
been combined such that the first stage of the pair has multiplications only by -j and the

. ;~ ~;
~ .

2086 1 74
second one has less number of multiplications than those of the two stages together of the
2 x 1 PM DIF structure.
Fig. 15 shows a possible hardware realization of the smallest basic computing unit of the
2 x 2 PM DIT DFT structure marked 175 in Fig. 10 that repeats itself. It consists of 2 pairs
of butterflies, with butterflies in each pair coming from two consecutive stages. Butterflies
marked 180, 181, 182, and 183, respectively, are detailed diagrams of butterflis 172, 171,
174, and 173 of Fig. 10, respectively, As seen from Fig. 5, a single butterfly realization has
8 multipliers, 2 adders, 2 subtractors, and 4 plus-minus units. Therefore, the 2 x 2 basic
computing unit realization of Fig. 15 has a savings of 25% each for multipliers, adders,
and subtractors. The basic computing unit produces 8 complex outputs in a cycle when
operated as a pipeline. As the 2 x 2 PM structures have nontrivial multiplications in one
of the two consecutive stages, the pipeline of the basic computing unit has only four stages
compared with the 6 stages of an equivalent implementation for the 2 x 1 PM butterfly.
The hardware realization for the 2 x 2 PM DIF structures will contain same number of
multipliers, adders, etc., but the interconnections will be different.
The 2 x 2 PM DIT DFT structure for real-valued data is similar to the 2 x 1 PM DIT
DFT structure with the advantage of further saving of 25% of complex multiplications by
merging the multiplication operations of consecutive stages. Fig. 16 shows the 2 x 2 PM DIT
DFT structure for real-valued data. A butterfly marked g with ports 190, 191, 192, and
193 is slightly different from that of Fig. 7 due to the merging of multiplication operations.
The 2 x 2 PM DIF IDFT structure for the transform of real-valued data is similar to
the 2 x 1 PM DIF IDFT structure with the advantage of further saving of 25% of complex
multiplications by merging the multiplication operations of consecutive stages. Fig. 17
shows the 2 x 2 PM DIF IDFT structure for the transform of real-valued data. A butterfly
marked g with ports 290, 291, 292, and 293 is slightly different from that of Fig. 8 due
to the merging of multiplication operations.
The DFT and the IDFT of 2-D discrete signals is usually obtained by computing 1-D
transforms of each row of data followed by 1-D transforms of each column of the resulting
data or vice versa. With this approach, direct application of 1-D DFT or IDFT computa-
tional structures yields the required transform. Figs. 18,19, 20, and 21 show, respectively

22
c~

2086 1 74
the flowchart of the 2 x 1 DIT, 2 x 1 DIF, 2 x 2 DIT, and 2 x 2 DIF computational structures
for computing 2-D DFT of 8 x 8 complex-valued data. In these figures, the twiddle factors
are of the form W8 and are represented by their exponents. In Figs. 18, 19, 20, and 21, the
2-D input data is read row by row. The column transform is carried out in the first two
stages. The third stage makes the vectors for the row transform carried out in the last two
stages. In Fig. 18, reference numerals 197 and 198 show, respectively, an input and output
vector with index zero. Reference numeral 199 shows a butterfly for the computation of
vectors for the row transforms. It is possible to do the row transform first and column
transform next by inputting the data column by column. Also it is possible to use DIF
structure for one transform and DIT structure for the next.
In this disclosure, the invention of a large set of computational structures, referred
to as the PM computational structures, along with their hardware implementations for
transformation of real-valued or complex-valued one-dimensional or two-dimensional dis-
crete signals from the time-domain to the frequency-domain and vice versa has been re-
ported. These structures have been derived from the design of a large family of radix-2
decimation-in-time and decimation-in-frequency algorithms, referred to as the PM algo-
rithms, to compute the discrete Fourier transform or the inverse discrete Fourier transform.
A member of the set of PM structures is characterized by two parameters, ~ and v where ?l
(~ = 2r, r = 1, 2, . . ., (log2 N)--1) specifies the size of each data vector applied at the two
input nodes of a butterfly and v represents the number of consecutive stages of the structure
whose multiplication operations are merged partially or fully. The formation of the vectors
of the input data allows the computation of u 2-point DFTs or IDFTs by a single butterfly.
Each computational structure essentially consists of two parts. In the first part, ~-element
vectors are formed from the N samples of the given discrete signal. In the second part, each
butterfly of an interconnected network of butterflies operating on 2 ~-element input vectors
produce 2 ~-element output vectors.
The nature of the problem of computing the discrete Fourier transform is such that a
more efficient solution is achieved by the computational structures with ~ = 2. In this
disclosure, a detailed description of the 2 x 1 and 2 x 2 PM computational structures for
real and complex-valued one and two-dimensional discrete signals have been presented. The

2086 1 7~
PM structures described in this disclosure provide the advantages of less complex multi-
plications, less bit-reversals, less array-index updating, and no independent data swapping
compared with the Cooley-Tukey radix-2 structure.
From the general principles of the present invention and the description of the two
preferred embodiments, the 2 x 1 and 2 x 2 PM structures, presented in this disclosure,
those skilled in the art will readily comprehend the various modifications to which the
present invention is susceptible. Examples of some of the modifications are listed below: (i)
Structures with butterflies that handle vectors of more than one size can be derived. (ii)
Structures with varying vector lengths from stage to stage can be realized. (iii) Structures
for any positive integer value of the data length N and for vector length ~ less than N
can be designed. (iv) Structures for m-dimensional signals (m > 2) can be designed. (v)
Higher-radix structures can be designed. (vi) Structures in which multiplication operations
of more than two consecutive stages merged can be derived. For example, when the mul-
tiplication operations of three consecutive stages are merged in the 2 x 3 PM structure an
additional savings of 83~o in real multiplication operations over that provided by the 2 x 2
PM structures is obtained.

24

Fee Type	Anniversary Year	Due Date	Amount Paid	Paid Date
Application Fee			$0.00	1992-12-23
Maintenance Fee - Application - New Act	2	1994-12-23	$50.00	1994-03-11
Maintenance Fee - Application - New Act	3	1995-12-25	$50.00	1995-12-20
Maintenance Fee - Application - New Act	4	1996-12-23	$50.00	1996-12-20
Maintenance Fee - Application - New Act	5	1997-12-23	$75.00	1997-12-23
Final Fee			$150.00	1998-04-09
Maintenance Fee - Patent - New Act	6	1998-12-23	$275.00	1999-02-05

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Drawings	1995-06-17	21	1,765
Description	1995-06-17	46	3,687
Drawings	1997-07-09	21	690
Cover Page	1995-06-17	1	64
Abstract	1995-06-17	1	70
Claims	1995-06-17	2	139
Abstract	1997-07-09	1	35
Description	1997-07-09	23	900
Claims	1997-07-09	3	109
Cover Page	1998-08-13	2	101
Representative Drawing	1998-08-13	1	20
Fees	1999-02-05	2	116
Correspondence	1998-02-02	1	2
Prosecution-Amendment	1997-12-23	2	65
Prosecution-Amendment	1998-03-27	1	36
Correspondence	1998-04-09	1	44
Fees	1998-02-18	1	44
Fees	1997-12-23	1	49
Prosecution Correspondence	1994-03-11	1	30
Prosecution Correspondence	1994-12-19	76	2,560
Prosecution Correspondence	1995-05-23	1	38
Office Letter	1994-04-29	1	13
Office Letter	1995-05-05	1	29
Office Letter	1994-04-29	1	19
Office Letter	1995-05-05	1	39
PCT Correspondence	1994-03-11	1	30
PCT Correspondence	1995-05-30	1	37
Fees	1996-12-20	1	60
Fees	1995-12-20	1	57
Fees	1994-10-19	1	44
Fees	1994-03-11	1	48

Language selection

Menus

English Abstract

French Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Title	Date
Forecasted Issue Date	1998-08-25
(22) Filed	1992-12-23
Examination Requested	1992-12-23
(41) Open to Public Inspection	1994-06-24
(45) Issued	1998-08-25
Deemed Expired	2000-12-27

Current Owners on Record
SUNDARARAJAN, DURAISAMY
AHMAD, M. OMAIR
SWAMY, M. N. SRIKANTA