Language selection

Search

Patent 2310186 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 2310186
(54) English Title: METHOD AND SYSTEM FOR DECODING
(54) French Title: METHODE ET SYSTEME DE DECODAGE
Status: Deemed Abandoned and Beyond the Period of Reinstatement - Pending Response to Notice of Disregarded Communication
Bibliographic Data
(51) International Patent Classification (IPC):
  • H03M 13/31 (2006.01)
  • H03M 13/29 (2006.01)
  • H04L 1/22 (2006.01)
(72) Inventors :
  • CASTURA, JEFFREY P. (Canada)
  • KSCHISCHANG, FRANK (Canada)
(73) Owners :
  • SOMA NETWORKS, INC.
(71) Applicants :
  • SOMA NETWORKS, INC. (United States of America)
(74) Agent:
(74) Associate agent:
(45) Issued:
(22) Filed Date: 2000-06-02
(41) Open to Public Inspection: 2001-12-02
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data: None

Abstracts

English Abstract


Reduced complexity decoding algorithms for Low Density Parity Check codes are
presented. The performance of these algorithms is optimized using the concept
of density
evolution and they are shown to perform well in practical decoding situations.
The codes
are examined from a performance vs. complexity point of view, and it is shown
that there
is an optimal complexity for practical decoders beyond which performance will
suffer.
The idea of practical decoding is used to develop the sum-transform-sum
algorithm,
which is very well suited for a fixed- point hardware implementation. The
performance of
this decoder approaches that of the sum-product decoder, but is much less
complex.


Claims

Note: Claims are shown in the official language in which they were submitted.


We Claim:
1. A method for decoding comprising the steps of:
i) receiving a set of probability messages in a first format;
ii) updating said set of probability messages based on an operation that
compares said set of probability messages with a first-set of code
constraints;
iii) transforming said set updated at step ii) into a second format;
iv) updating said set transformed at step iii) based on an operation that
compares said set of probability messages with a second-set of code
constraints complementary to said first-set;
v) transforming said set updated at step iv) back into said first format;
vi) repeating steps ii)-v) on said set of probability messages transformed
at v) until a desired level of decoding is reached; and,
vii) outputting said set of probability messages determined at step vi).
2. The method according to claim 1 wherein said first-set is a set of variable
node
descriptors and said second-set is a set of check-node descriptors.
3. The method according to claim 1 wherein said desired level of decoding is
reached when step vi) is performed a predetermined number of times.
4. The method according to claim 1 wherein said desired level of coding is
reached
when step vi) is performed until said set of probability messages matches a
known
type of valid code.

Description

Note: Descriptions are shown in the official language in which they were submitted.


CA 02310186 2000-06-02
Method and System for Decoding
Contents
List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . ix
1 Introduction
1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 1
1.2 An Introduction to LDPC Codes . . . . . . . . . . . . . . . . . . . . . .
. . 3
1.3 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 5
1.4 Organization of this work . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 6
2 LDPC Code Fundamentals
2.1 Construction of LDPC Codes . . . . . . . . . . . . .
. . . . . . . . . . . . . 7
2.2 Properties of Low Density Parity Check Codes . . . . .
. . . . . . . . . . . . 9
2.2.1 Goodness properties . . . . . . . . . . . . . . .
. . . . . . . . . . . . 10
2.2.2 Distance Properties . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 10
2.2.3 Encoding . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 11
2.2.4 Decoding . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 12
2.3 Factor Graphs and Message-Passing Algorithms . . . . .
. . . . . . . . . . . 13
2.4 Density Evolution . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 19
2.4.1 Fixed Points . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 24
3 Optimization of LDPC Decoders 26
3.1 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 26
3.1.1 Optimization Approach . . . . . . . . . . . . . . . . . . . . . . . . .
. 28
3.2 Optimization of the Sum-Product Algorithm . . . . . . . . . . . . . . . .
. . 30
3.3 Optimizing the Min-Sure Algorithm . . . . . . . . . . . . . . . . . . . .
. . . 32
m

CA 02310186 2000-06-02
3.4 The Modified Min-Sum Algorithm . . . . . . . . . . . . . . . . . . . . . .
. . 36
3.4.1 Comments on Good Degree Sequences . . . . . . . . . . . . . . . . . 40
3.5 Finite Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . ~ .
. . . . . 41
3.6 Performance vs. C;omplexity . . . . . . . . . . . . . . . . . . . . . . .
. . . . 44
4 The Sum-Transform-Sum Algorithm 49
4.1 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . .
. _ . . 49
4.1.1 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 52
4.2 Optimization and Performance . . . . . . . . . . . . . . . . . . . . . . .
. . . 53
4.3 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 57
4.4 Implementation Architecture . . . . . . . . . . . . . . . . . . . . . . .
. . . . 58
Conclusions and Future Work 62
5.1 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . . .
. . . . 63
A The Forward-Backward Algorithm 65
v

CA 02310186 2000-06-02
List of Figures
1.1 Factor graph representation of a LDPC code . . . . . . . . . . . . . . . .
. . 4
2.1 Message passing on a factor graph . . . . . . . . . . . . . . . . . . . .
. . . . 16
2.2 Generalized message passing at variable and check nodes . . . . . . . . .
. . 17
2.3 Tree representation of a section of figure 1.1 . . . . . . . . . . . . . .
. . . . 20
2.4 Fixed points in density evolution . . . . . . . . . . . . . . . . . . . .
. . . . 25
3.1Sum-Product algorithm performance . . . . . . . . .
. . . . . . . . . . . . . 31
3.2Sum-Product algorithm performance vs. iterations. . .
. . . . . . . . . . . . 32
3.3Check node output vs. input . . . . . . . . . . . .
. . . . . . . . . . . . . . . 33
3.4Optimized min-Burn performance . . . . . . . . . .
. . . . . . . . . . . . . . 37
3.5Optimized modified min-sum performance . . . . . .
. . . . . . . . . . . . . 38
3.6Min-Sum performance vs. block length . . . . . . .
. . . . . . . . . . . . . . 42
3.7Modified min-sure performance vs. block length . . .
. . . . . . . . . . . . . 42
3.8Performance vs. maximum iterations . . . . . . . .
. . . . . . . . . . . . . . 43
3.9Performance at high SNR, . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 43
3.10Performance vs. Complexity for BER of 10-3 . . . .
. . . . . . . . . . . . . 46
3.11Performance vs. Complexity for BER, of 10-5 . . . .
. . . . . . . . . . . . . 46
3.12Performance vs. Complexity for BER of 10-5, 25 . . .
iterations . . . . . . . 47
3.13Performance vs. Complexity for BER of 10-5, n=100,000. . .
. . . . . . . . 48
3.14Performance vs. Complexity for BER, of 10-5,n=1000. . .
. . . . . . . . . . 48
4.1 Bit allocation for the SXS algorithm . . . . . . . . . . . . . . . . . . .
. . . 52
4.2 SXS performance for regular 3-6 code length 10,000 . . . . . . . . . . . .
. . 54
vi

CA 02310186 2000-06-02
4.3 SXS performance for regular 3-6 code length 1000 . . . . . . . . . . . . .
. . 55
4.4 Comparison of SXS and sum-product performance . . . . . . . . . . . . . .
. 55
4.5 Performance of optimized SXS codes . . . . . . . . . . . . . . . . . . . .
. . 56
4.6 Performance-Complexity comparison of the SXS algorithm . . . . . . . . . .
59
4.7 Block diagram of SXS architecture . . . . . . . . . . . . . . . . . . . .
. . . 60
A.l Trellis diagram of check with sites, ~1, x2 and x3 . . . . . . . . . . . .
. . . . 66
A.2 FBA c~, ~3 computation . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 68
vu

CA 02310186 2000-06-02
List of Tables
2.1 Possible binary message representations for iterative decoding algorithms
. . 18
3.1 Optimized min-sum degree sequences . . . . . . . . . . . . . . . . . . . .
. . 35
3.2 Comparison between sum-product and min-sum algorithm thresholds . . . . 36
3.3 Optimized modified ruin-Burn degree sequences . . . . . . . . . . . . . .
. . . 39
4.1 Threshold performance for SXS codes . . . . . . . . . . . . . . . . . . .
. . . 53
4.2 Complexity of SXS algorithm . . . . . . . . . . . . . . . . . . . . . . .
. . . 58
viii

CA 02310186 2000-06-02
List of Acronyms
APP A Posteriori Probability
AWGN Additive White Gaussian Noise
BCJR Forward-Backward Algorithm (B.C.J.R. are the names of its inventor)
BER Bit Error Rate
BSC Binary Symmetric Channel
CDMA Code Division Multiple Access
FBA Forward-Backward Algorithm
FEC Forward Error Correction
GA Genetic Algorithm
LDPC Low Density Parity Check
LLR Log Likelihood Ratio
LLDP Log Likelihood Difference Pair
MAP Maximum A-Posteriori
SNR Signal to Noise Ratio
SXS Sum-Transform-Sum Algorithm
ix

CA 02310186 2000-06-02
Chapter 1
Introduction
1.1 History
Communication systems of all types have to deal with the presence of noise
over the com-
munications channel, and it is the goal of system designers to mitigate the
effect of noise
in the system as efficiently as possible. To this end, many communications
devices utilize
some sort of forward error correction (FEC) code, which entails the addition
of controlled
redundancy to a transmitted signal in the hope that this structure will allow
errors caused
by the noisy channel to be detected and corrected.
Shannon put bounds on the signal-to-noise ratio (SNR) that is required to
communicate
reliably over an additive white Gaussian noise (AWGN') channel, and for a
certain rate R of
transmission, this Shannon limit is ~22~
Eb 22R __ 1
No > 2R , (l.l)
where R. is defined as the ratio of information bits to transmitted symbols.
Traditionally,
codes such as Hamming codes, BCH codes, and Reed-Solomon codes with algebraic
decod-
ing algorithms, and convolutional codes with trellis-based decoding algorithms
have been
employed to achieve gains in overall system performance. In recent years
however, there
has been a great deal of interest in iterative decoding schemes. Based on the
idea of using
very powerful and complex codes and utilizing sub-optimal decoding methods,
they perform
much better than any other class of code known. In fact, codes based on
iterative decoding
1

CA 02310186 2000-06-02
can perform very well at SNRs that are within a fraction of a decibel of the
Shannon limit.
There are two common types of iteratively decoded codes: low density parity
check
codes and turbo codes. Low density parity check codes (LDPC codes) were first
introduced
by Gallager in his PhD thesis in 1960 ~6~, and the performance of these codes
was better
than most other codes known at the time. Unfortunately; it was too difficult
to implement
the decoding algorithm fox practical applications given the technology of the
day. Indeed,
Gallager was only able to simulate LDPC codes for short block lengths and
relatively high
probability ~f bit error on the IBM7090 computer he had available to hire.
Thus, due to the
relative complexity, and the excitement at the time around convolutional
codes, Gallager's
LDPC codes remained relatively unmentioned for the next thirty years ~28~.
Bcrrou et al. ~2~, astounded the coding community in 1993 with their discovery
of turbo
codes, also known as parallel concatenated convolutional codes. The
performance of these
codes was shown to come to within less than 1dB of the Shannon limit when
using iterative
decoding. This remarkable improvement over the previously best known codes
renewed
interest in the search for other Shannon limit approaching codes and iterative
decoding
schemes.
MacKay was the first to show in 1996 that LDPC codes of long block length are
very good
and that their performance is close to, though not quite as good as, turbo
codes ~14~. Since
then, a great deal of inters>.st has been shown in these codes and a number of
improvements
and advancements have pushed their performance closer to the Shannon limit.
Some of
these advancements include LDPC codes over GF(2"~) (4~, irregular LDPC codes
(12~, and
techniques to analyze the behaviour of different LDPC codes decoding
algorithms (19, 18~.
Recently, results have been presented showing that some constructions of LDPC
codes
actually outperform turbo-codes. For example, Spielman gave experimental
results for a rate
1/2 irregular LDPC code that achieves a bit error probability of 3 x 10-5 for
an Eb/No of 0.7
dB ~26~. R.ichardson and lJrbanke have also reported similar results ~18~. In
fact, they have
reported a construction for highly irregular LDPC codes of extremely long
block length that
achieves an asymptotically good performance to within 0.06dB of the Shannon
limit for the
AWGN channel. This is a remarkable result and an important milestone in the
development
of error control codes.
2

CA 02310186 2000-06-02
1.2 An Introduction to LDPC Codes
LDPC codes, despite the long name, are simply binary linear block codes based
on a sparse
parity check matrix. A code rnay be defined as an LDPC code if the average
weight of rows
and columns in the parity check matrix is small. As linear block codes, they
have well defined
generator and parity check matrices G and H respectively, and like the case
for other binary
linear block codes, the parity check matrix is used to decode a noise-
corrupted codeword.
As Shannon's Channel Coding Theorem shows, good codes can be found by randomly
choosing a codebook, and using a very large block length (22~. These very good
codes can
achieve an arbitrarily low probability of bit error for any rate below
capacity. Unfortunately,
Shannon did not give any constructive and practical methods for encoding and
decoding
these codes, and this has been the problem that coding theorists have since
attempted to
resolve (3~.
Gallager developed LL>PC codes with the benefits of randomness while
maintaining a
practical encoding and decoding scheme. Ilis LDPC codes are created by
randomly creating
a large sparse parity check matrix H with uniform row and column weight, both
of which
are small to keep H sparse. By keeping the matrix sparse, the complexity of
decoding is
kept within reasonable limits.
In the realm of graph theory, this corresponds to a random bipartite graph
that has a
constant small number of edges terminating on both variable nodes and check
nodes. This
constraint of uniform variable and check node degree gives rise to what is
known as regular
LDPC codes, which is the original construction Gallager outlined.
It is a simple matter to convert a parity check matrix to a bipartite graph.
Each row in
H is represented by a parity check node, and each column is represented by a
variable node,
and these two types of nodes arc placed across from each other on the graph.
For every
element in the binary matrix H that is a 1, an edge is drawn connecting the
corresponding
check node (row) and variable node (column). Let dv be the degree of the
variable nodes
(weight of rows in H), d~ the degree of the check nodes (weight of columns in
H), and n be
the length of a codeword (number of columns in H). A regular LDPC code C then
can be
described as an f ra, ov, d,~} LDPC code.
3

CA 02310186 2000-06-02
Figure 1.1: Factor graph representation of a LDPC code
It is also possible to create irregular LDPC codes. These codes differ from
regular codes
by the fact that the weight of the columns and rows in H are not equal for all
columns or
rows. For example, some columns may have weight 3, while others may have
weight 4 or
5. The same holds true for the weights of the rows. Well chosen irregular
codes have been
shown to perform better than their regular counterparts ~14~. The degrees that
arise in the
graph can be described by the degree sequence polynomials p(x) and ~(x)
respectively, where
~(~) - ~ ~7,x.~__i and p(:r) = ~ p~.xz i (1.2)
in which ~i (resp.pi) represents the fraction of edges which are incident on a
variable (resp.
check) node of degree i. It follows that ~(1) = p(1) = 1. These polynomials
sufficiently
describe an erase~rrsble of codes that rnay be generated from a given ~(x) and
p(~); they do
not describe a single construction. Note that throughout the remainder of this
paper, the
above polynomials are used in place of their integer counterparts d." and d,~
when discussing
irregular LDPC codes.
Example: For the graph in Figure 1.1,
~(~) = 3 (2:r + :c2) and p(:c) = 9 (x -I- 6x2 + 2:c3) (1.3)
A lower bound on the rate R of an LDPC code can be determined from the degree
sequence polynomials that describe it. Let a be the number of edges in the
graph. The
number of variable nodes of degree i is ~-. Therefore the total number of
variable nodes n
4

CA 02310186 2000-06-02
is the summation over all degrees and is thus
i
n - ~ e~2 - a ~ a(x)dx. (1.4)
i ,«
Similarly the total number of check nodes is
i
r - ~ ePi - a ~ P(x)d~. (1.5)
i
i
Since the dimension of the code is at least k > n - r with strict; inequality
if some of the r
check nodes are linearly dependent, we find
R - k > 1 - _r - 1 - fo P(~)d~. (1.6)
n n fo ,~(x)dx
1.3 Generalizations
There are many areas of interest to researchers studying LDPC; codes. Since
LDPC codes
are linear block codes, it is possible to define them over a larger field,
GF(q) for q > 2.
LDPC codes defined over larger fields tend to perform better than similar
codes taken over
GF(2). In fact, as q increases, the performance improves. MacKay showed in ~4~
a regular
LDPC code defined over GF(16) with block length on the order of 24k bits that
performs
better by about 0.6dB than a comparable code over GF(2). The same trend is
seen when
dealing with irregular LDPC codes. lVlacKay's results show an improvement of
about 0.5dB
when taking a code over C~F(8) instead of GF(2).
Decoding complexity is an issue with larger fields. The decoding complexity
for the sum-
product algorithm as used by MacKay for his tests scales as q 2, so decoding
over GF(8)
instead of GF(2) results in a complexity increase by a factor of 16. There is
work being
done by MacKay and others in an attempt to find simpler methods to decode over
larger
fields in an optimal or near-optimal manner ~4~. If the complexity can be
reduced, even in
a sub-optimal decoding scheme, the resulting performance improvement due to
the larger
fields may be able to offset the loss in performance in a sub-optimal decoder
for close to the
same amount of computational complexity.
One other area of interest for LDPC codes is their use in high rate or short
block length
applications, and this too has been exa,rnined by MacKay and Davey ~15~. Their
paper shows

CA 02310186 2000-06-02
the performance of a rate 8/9 code of short block length n = -1998 bits to be
better than
either comparable BCH codes or Reed-Solomon codes using hard-decision decoding
by more
than ldB.
Despite their relatively poor performance with short block lengths, LDPC codes
have
been proposed as an FEC scheme for CDMA applications. Work on short block
length codes
by Sorokine (23~ showed a class of irregular LDPC codes of block lengths on
the order of 1000
and less to be particularly effective in fading channels. These codes appear
to be well suited
for carrying voice information on cellular networks, and allow for a greater
user capacity per
cell.
1.4 Organization of this work
The aim of this work is to investigate reduced complexity decoding methods for
LDPC codes.
Chapter 2 introduces the concepts and principles of LDPC codes, and outlines
some of the
algorithms that may be used for decoding. Density evolution (19J, a technique
which allows
a decoding algorithm's performance to be determined numerically and aids in an
overall
understanding of the iter;~tive decoding process, is also described in Chapter
2. Density
evolution is applied in Chapter 3 to optimize reduced-complexity LDPC
decoders. fade-offs
between performance and complexity are examined and presented. In Chapter 4,
an alternate
decoding approach is developed that exploits the duality of the LDPC factor
graph and is
well suited to fixed-point applications. Performance and complexity issues are
treated, and
possible implementation issues are discussed. Finally, Chapter 5 provides some
conclusions
and suggestions for further research.
6

CA 02310186 2000-06-02
Chapter 2
LDPC Code Fundamentals
The defining feature of LDPC codes is their low weight parity check matrices.
It is important
to understand the properties that this gives LDPC codes, and we will examine
some here.
2.1 Construction of LDPC Codes
There are numerous methods for generating parity-check matrices that satisfy
LDPC con-
straints, and a few commonly used techniques will be outlined here. While
either a structured
or a pseudo-random approach may be used for construction, it has been found
that a higher
degree of randomness generally results in codes that perform better than those
that are
highly structured.
Gallager originally proposed a method of partitioning the parity check matrix
H into
blocks and performing random permutation operations. Set the first row of H to
d~ ones
followed by (n - r~~) zeroes ~6~. Create the next (n/d~) - 1 rows (assuming
this is an integer)
by shifting' the previous row by d~ positions. This sub-section of the matrix
is copied and
randomly permuted a number of tunes to obtain the desired dimension for H. An
example
of this construction is given in (2.I). This construction generates regular
codes, and is sure
to have low probability of repeated rows. However, since the sum of the rows
of the top
block and the rows of any other block is zero, this method results in linearly
dependent rows
and as such, the rate of the code is higher than might first be assumed.
7

CA 02310186 2000-06-02
1 1 1 1 0 0 0 0 0 0 0
0
Top
0 0 0 0 1 1 1 1 0 0 0
0
Block
0 0 0 0 0 0 0 0 1 1 1
1
1 0 0 0 1 0 1 0 0 0 0
1
Permutation
H 0 0 1 1 0 0 0 1 0 1 0 (2.1)
= 0
of Top Block
0 1 0 0 0 1 0 0 1 0 1
0
0 1 0 0 1 1 0 0 0 0 1 0
Another Permutation
1 0 1 0 0 0 1 0 0 0 0 1
of Top Block
0 0 0 1 0 0 0 1 1 1 0 0
An alternate method was presented by MacKay that allows a nearly uniform
parity
check matrix to be generated (16~, with no strict guarantee of uniformity. The
construction
operates by assigning a fixed number of ones, j, to each column of the parity
check matrix.
A bit in a particular column is assigned to a row with uniform probability,
with no regard
to whether that particular location is already occupied by a one. It is
possible then to have
a column with lower weight than j, though it is not possible to have a column
with greater
weight. resulting in an irregular code. MacKay has also suggested that any
rows which
share rrmltiple ones in the same location should be avoided. This prevents
short cycles from
appearing in the graph, which has been shown to reduce the performance of LDPC
codes.
A more general method of LDPC parity check matrix construction is simply to
assign
one's randomly with low probability to a matrix of the desired dimensions.
This will almost
surely result in an irregular code, with average row and column weights that
can easily be
fixed. For example, the matrix
8

CA 02310186 2000-06-02
1 0 1 0 1 0 0 0
0 1 0 0 0 1 1 0
0 0 0 1 0 1 0 0
H = (2.2)
1 0 0 1 0 0 0 1
0 1 1 0 1 0 0 0
0 0 1 0 1 0 1 1
was obtained in this way. Care must be taken to ensure that there are no
identical rows,
and that linear dependence among the rows is kept relatively small.
Although the matrix in (2.2) is not likely to be considered sparse, it is a
good demonstra-
tion of the generation of an irregular LDPC; parity check matrix. The
polynomials describing
the distribution of the degree's of the check and variable nodes are given in
(1.3), and the
bipartite graph representing this code is shown in Figure 1.1.
MacKay has also outlined a modification to his construction technique that
allows the
resulting LDPC code to be encoded much more efficiently, based on the work of
Spielrnan ~16,
13, 24J. This will be discussed in detail in Section 2.2.3.
2.2 Properties of Low Density Parity Check Codes
It is useful to first outline some terminology. In defining the ability of a
family of codes
to achieve capacity, at least asymptotically, the terms very good, good and
bad are used by
MacKay to describe this attribute and are adopted here (14J. Very good codes
are those
that can asymptotically achieve an arbitrarily low probability of bit error
for any rate below
capacity using some decoding algorithm (e.g., maximum likelihood decoding).
Good codes
are codes that can achieve this same performance, but only up to some rate
which is bounded
away from the channel capacity. A bad code is any code that has some
negligible probability
of bit error only for a vanishingly small rate. It is also useful to define a
practical code as one
which can be implemented effectively with limited computational complexity,
i.e., encoded
and decoded in a useful amount of time, depending on the application.
9

CA 02310186 2000-06-02
2.2.1 Goodness properties
In Gallager's original paper on the topic, he proves that LDPC codes are good
with a simple
decoder (6~. This is based on a binary symmetric channel (BSC), and uses a bit-
flipping
iterative decoding algorithm he devised which will be discussed in Section
2.2.4. The LDPC
code is limited to achieving arbitrarily low probability of bit error only for
channel errors
occurring with a frequency less than that which corresponds to capacity of the
BSC.
MacKay shows in (14~ that not only are these codes good, but they are very
good when
certain criteria are met using a sub-optimal typical set decoder. First define
a satisfactory
matrix as:
A satisfactory (M, ~, t, E) matrix A for the distribution P(x) is a matrix
having
M rows and L > .~M columns with weight t or less per column, with the
following
property: if x is generated from P(x), the optimal decoder achieves a
probability
of bit error less then E. (14~
MacKay's theorem stages that given some distribution P(x) of the probabilities
of inputs
to the decoder with mean entropy H(X) < 1 bits, and a desired code rate ~ <
1/H(X),
there exists some integer t(H(X), R) > 3 with the following implication: There
exists some
row dimension of H above which there is a satisfactory (M, ~, t, E) matrix A
that can achieve
a block error probability of c > 0. In this case, t represents the column
weight of H, or dv.
Thus as ~Iv is allowed to grow large, capacity may he achieved. Unfortunately,
the complexity
of most LDPC decoding algorithms grows as dv gets large.
2.2.2 Distance Properties
Regular LDPC codes have the strict constraint on their parity check matrix
that each row
have some fixed weight a,nd each column also have some fixed weight. With just
these
constraints, Gallager was able to give bounds on the performance of this
ensemble of random
codes, bound the distance functions, and provide a powerful yet, simple
decoding technique,
which can he generalized to belief' propagation.
The LDPC codes Gallager examined exhibit good distance properties for dv > 2,
and

CA 02310186 2000-06-02
he showed that the minimum distance of these codes grows linearly with block
length n for
most codes in the ensemble. MacKay showed in ~14~ that the Gilbert-Varshamov
minimum
distance bound can be obtained as dv -~ co. For a parity check matrix H with
parameters
dv and d~ as defined above, the Shannon limiting bound on entropy for the
codeword x is
H(x) < n/m where m is the size of the information block. It can then be said
that for
dv > 3 and a fraction d < 0.5, then some H can be found such that the minimum
distance
of the code is at least cSn.
2.2.3 Encoding
One of the obstacles preventing LDPC codes from being widely adopted in
practical appli-
cations is the complexity involved in the encoding process which can be many
times greater
than other codes such as c:onvolutional codes or turbo codes. A great deal of
attention has
been paid to the performance of LDPC codes and efficient optimal decoding
methods, but
the encoding problem has often been overlooked. Using standard matrix
multiplication to
generate codewords c frorrr a generator matrix and message word m, i.e. c =
mG, encoding
complexity increases to the second power of the block length n, where one
would hope that
there would be a linear relationship between encoding complexity with block
length.
Spielman was one of the first to examine how sparse matrix techniques may be
used to
efficiently encode LDPC codes ~24~. His codes, which are a subset of codes
called c~pander
cnde,s, are linear time encodable. These codes however do not perform as well
as other LDPC
codes and are not good codes in the technical sense.
MacKay took the work of Spielman in ~25~ and developed a nearly lower-triangle
reduced
matrix form for H that is almost linear time encodable as mentioned above in
section 2.1 ~16,
15~. The structure of H is lower-triangular except for the last M~ rows, which
allow all but
these last M~ rows to be encoded using sparse matrix techniques, which is a
linear time
effort. It is not possible to express an LDPC code fully in lower-triangular
form as this
would violate the LDPC constraints, but it can be made very close. In fact, if
the total
number of rows in H is M, and if' M< < ~ then MacKay has suggested that the
total
code can be linear time encodable, thougkr he has not expressly proven that
this is possible.
11

CA 02310186 2000-06-02
2.2.4 Decoding
An LDPC decoder employs iterative decoding, and its goal is to compute the a
posteriors
probability (APP) of each of the received symbols, taking the symbol with
maximum proba-
bility as the decision. There are many ways this can be done, one of which is
the svm-product
algorithm, a practical decoding algorithm. This approach will compute the
exact APP for
each symbol in a cycle-free graph, and in the case of a graph with cycles,
will generally
compute close approximations of the APP though this behaviour is not clearly
understood.
We will examine a basic instantiation of the MAP decoder to explore its
properties.
Gallager devised an algorithm that is similar to sum-product, decoding and
developed an
analysis of iterative decoding for the binary symmetric channel (BSC) using
this algorithm.
This attempt at a clear analysis of iterative decoding is the first step
towards a deeper
understanding of this decoding approach. This is the first step towards the
concept of
density evol~atiora, a powerful analytical technique for iterative decoding.
A basic bit-flipping algorithm known as Gallager's Decoding algorithm A ~6J is
described
here, and other algorithms will be examined in Chapter 3. The procedure is a
graph based
iterative decoding method using message passing as described in Section 2.3,
and assumes
a BSC. Bits are read off' the binary channel, and are loaded into the variable
nodes of the
graph. These bits are the messages which are passed along the edges of the
graph and are
used for decoding. Messages are successively passed from variable nodes to
check nodes, and
from check nodes to variables, with new messages being computed at the nodes.
At a check node, parity is computed as follows. Each edge incident on a given
check
node computes and returns along that edge, parity using modulo-2 addition of
all the bits
incident on the check, excluding the edge for which parity is being computed.
These results
arc passed back along the edges to the variable bits. At a variable node, the
value of the bit
sent out along an edge is flipped only if al,l of the other edges incident on
that node return
equal and opposite values from the current; value. So a 1 along an edge will
be flipped to a 0
only if all of the other edges incident on that node return the value 0. The
procedure then
repeats until some termination criterion is satisfied.
Given that the received bits are 1's and 0's, let pi be the probability that a
bit is in error
12

CA 02310186 2000-06-02
after the i'h round of decoding. For some bit in error, a parity check
involved with that bit
will be incorrect only if an even number of other bits in the check is in
error. An error will
be corrected only if all of the checks participating with that bit are not
satisfied, and this
occurs with probability:
~1 + (1 - 2pi-1)d°-ll dv
(2.3)
Similarly, it is possible for a correct bit to be mistakingly flipped to an
incorrect state, and
that probability is:
L1 - (1 2pi-1)d<-l~dv (2.4)
Z
Applying these results, a recursive relation can be found for the next round's
overall proba-
bility of bit error:
L1 + (1 - 2pi-a)'~' 1~ ~" L1 - (1 - 2pi-~)d' ll d" ~
pi = po - po 2 + (1 - po) 2 (2.5)
This result is an important step towards an understanding of iterative
decoding. Gallager
showed that there exists some maximum value po of po for which pi goes to zero
as i gets
large for all po less than po. In other words, if the initial probability of
error created by the
channel is below some threshold, there is a very high probability for the
received word to
be properly decoded. This shows that even for the simple decoding algorithm
that Gallager
used, LDPC codes can achieve an arbitrarily low probability of bit error for
some rate that
is bounded away from capacity.
Clearly, there are more powerful ways to decode LDPC codes. Gallager's
Algorithm A
does not use soft-decisions, and assumes a BSC channel. We shall now explore
message
passing on graphs to see hove more powerful algorithms may be exploited.
2.3 Factor Graphs and Message-Passing Algorithms
Factor graphs are bipartite graphs that describe how a function of many
variables is com-
prised of its constituent factors ~10~. Figure 1.1, showing the random LDPC
code construction
in a graphical form, is an example of a factor graph. In graphical notation,
variables are
shown as circles, and function nodes are squares, sometimes with an operator
in them to
13

CA 02310186 2000-06-02
describe the function that is being performed at that node. For example, the
factor graphs
for LDPC codes often have a '+' sign representing the constraint that
neighbouring variables
have zero sum modulo-2.
We can use this representation to provide a framework for algorithm execution,
and in
this case we are interested in investigating decoding algorithms. To assist in
understanding
message passing, an example for the (7, 4) Hamming code will be investigated.
A parity
check matrix for the Hamming code is given by
0 0 0 1 1 1 1
H = 0 1 1 0 (l 1 1 (2.6)
1 0 1 0 1 0 1
As a linear block code, we are interested in finding the a-posteriori
probabilities (APP) for
each received symbol
f (=Lei ~ Y) - ~ f (~1~ ~2~ ~~3~ =t4~ ~5~ ~6~ -~7 ~ Y) ('Z.7)
x:x~x;
where i is the ith digit in the code and the function ,f (~) represents the
conditional ,joint
probability mass of (xl, ~ ~ ~ , x7) given the received channel output y. The
goal is to calculate
each of these probabilities based on the constraints of the code. In order to
properly see this,
the first digit xl will be examined, noting that the analysis proceeds
similarly for all other
bits.
We know from Bayes' rule that the probability of having sent a codeword x upon
receiving
the word y from the noisy channel is
f (x ~ Y) _ .f (Y f (Y) (x) (2.8)
where ,f (y) can be viewed as a constant since y is known upon reception of
the word, and
p(x) depends on the code used. If we assume independence of the channel
symbols, i.e. a
memoryless channel, and assuming that codewords are selected with uniform
probability,
then we can factor (2.8) into
.f~(x ~ Y) = K(Y)~x E C.] ~ f (yi ~ ~i) (2.9)
i
14

CA 02310186 2000-06-02
where K is a constant factor that depends only on y, and the brackets ~~ ~ ~~
follow Iverson's
conventions. The above probability can be further factored into each of the
constraints of
the code,
f (x I Y) = K yxs E C~ ~ f (yi I T~ ) (2.10)
8
For the example of the Ha,mrning code, this factorization can be expressed as
f (x I Y) = K~x4 ~ x,, ~ ~s ~ x7 = 0~ ~~z ~ x3 ~ ~s ~ x7 = 0~ ~W ~ ~s ~ xs ~
x7 = 0~ ~ f (y~ I xi)
(2.11)
where the three Boolean propositions are derived from the three rows of (2.6)
representing
the constraints of the code. Since we are interested in probabilities for each
symbol, we
compute marginals for xi, which for the case of ~1 is
.f (xi I Y) _ .f (xl I W ) ~ .f (x7 I y7) ~ .f (mss I y~) ~ .f (mss I CJs) ~xi
~ ~s ~ x~ ~ x~ = 0]
.27
f~(~'s I Js) ~ f~(TZ I ?J2)(~rz ~ ra E~ ~s ~ T7 = O~ ~ f (r4 I J4)~r4 ~ rs ~
Ts ~ x; = 0
y6 Z'Z xq
(2.12)
We see then that the problem of computing the marginals has been factored into
many smaller
prOblem6, and thlS is natllrally eXprPSSPd thrOllgh the llse Of faCtOr graphs
and a mPSSage
passing algorithm that passes these rnarginals as messages. Unfortunately this
factor graph
has cycles, so the marginals computed are not exact. This example is used
however, since
virtually all codes have factor graph representations with cycles.
The goal of MAP decoding is to find the maximal a posteriori symbols of a
noise corrupted
codeword. In general, this can Be factored into the problem of finding the
most likely symbol
x~ given that the symbol y~ has been received and that the transmitted word x
is subject to
the constraints of the code. The sum-product algorithm applied to the factor
graph for the
code of interest calculates thc: most likely transmitted symbols in this
manner.
Message passing on a factor graph describes how the calculated probabilities
on the graph
will behave over time given a certain set of initial conditions. There are
numerous algorithms
~Iverson's convention takes a Boolean proposition and returns a'1' if that
proposition is true, otherwise
life
a '0' is returned. For example, given a proposition P, ~P~ _
0 otherwise

CA 02310186 2000-06-02
~t3-,~Qca)
~.~~~fl~
af~
X,
f3 ,f2
f2
~ xfi~ Ja
f3 ~ f3
(~~) l~) l=) ld)
Figure 2.1: (a)Factor graph representation of a (7,4) Hamming Code. (b)
Initial message
passing to a check node. (c) Return message from a check node. (d) Return
message from
a variable node.
that can be used to perform message passing, but in particular the sum-product
algorithm
will be described as it can be used to decode LDPC codes quite powerfully in
many cases.
The BCJR algorithm is actually a specific implementation of the sum-product
algorithm,
and belief propagation and turbo-decoding also turn out to be v<~riations of
the sum-product
algorithm. These types of message passing are iterative decoding algorithms.
The message passing algorithm commences by initializing each of the variable
nodes
with the corresponding received symbol probability. Some message passing
schedule is used
which sequentially passes all the information from each node to each adjacent
node. There
are many schedules which can be used, and it is up to the decoder designer to
determine a
suitable one. For instance, a sequential schedule in which each message is
updated in turn
such that no message is updated twice before any other message is updated may
be used.
This schedule is used in all trials performed in this work. The effect of
varying message
schedule may have some impact on the performance of the decoder, but the
schedule chosen
was found to result in good performance.
Messages are represented by the symbol ~, and represent some measure of the
reliability
of the information being passed. The rule determining the update of the
information at each
16

CA 02310186 2000-06-02
wz-.r(y)
wr-~e(a)
f
wh-~=(~) ~v~r(y)
..+......, y. ...:...... ........ .......
.....,~LO.7(),~~f),....................~ w..
Figure 2.2: Generalized message passing at variable and check nodes
node is:
Sum-Product Update Rule (10~: The message sent from a node v on an edge a
is the product of the local function at v (or the unit funcaion if 'v is a
variable
node) with all messages received at v on edges other than e, summarized for
the
variable associated with e.
The summarization operator refers to the computation of a marginal, meaning
that the value
is all possible configurations for all values excluding the one which is of
interest. Figure 2.2
provides more detail. Within each node, it is assumed that calculations are
performed using
an efficient method, so in all cases the forward-backward algorithm is used
~11~, as described
in Appendix B.
There are many probability metrics that may be used for messages, and a few
are outlined
in Table 2.1. Depending on what what domains are chosen for representing
messages, the
form that the sum-product update rules take may vary drastically. Two
representations of
particular interest are the log likelihood ratio (LhR) and the log likelihood
difference pair
(LLDP). Using log likelihood ratios, we can write the variable node update
rule as
~~i f(x) - ~~n, + ~ ~n-~x(x) (2.13)
lxE~a(x~~ f
where ~c is the variable node of interest, f is the check node whose edge is
attached to x
along which we are sending the output. The value p,yh is the message from the
channel
corresponding to the variable node x being updated, and n(~) is the set of
edges connecting
17

CA 02310186 2000-06-02
Name Representation Domain (E
(p,) IE8)
Basic p~0~, p(1~ (0, 1~
= 1 - p(0~
Likelihood Ratio p~ ~ ~0, oo~
Likelihood Difference (p~0~ - p~l~) (-l,1~
Log-likelihood Ratio(LLR) log ( P~) (-oo, oo~
Log-likelihood Difference (sgn (p~0~ p~l~ ({~1}, (0,
pair(LLDP) - p~l~) , ~ oo~)
log ~p~0~ ~)
-
Table 2.1: Possible binary message representations for iterative decoding
algorithms
to x. The check node update equation using log-likelihood ratios is
(e~y-~r(~) + 1) + ~ (er~y-~s(~) _ 1)
(x) = to yE~(f)\~ yE~(f)\~ 2.14
~f-»: g ~ (e,~~~f(y) + 1) _ ~ (ei.~~-~f(z) _ 1) ( )
yEm(f)\~ yEra(f)\~;
where d~ is the number of edges connected to ,f , the check node being
updated. This rule is
actually quite complicated, and is one of the drawbacks of the sum-product
algorithm with
this representation. It is interesting to note that the update rule for the
check nodes using
the log-likelihood difference pair is very similar to (2.13), and may be
written as
u~-r f(s, r) _ ~ ~h-j~(s, r) (2.15)
hE~(~)\f
where s E W2 - ~1, r E IEg+ and the addition of the signs in {~1{ is real
multiplication.
Later, we will exploit this representation in an attempt to reduce the
decoding complexity.
Thus, returning to the example of the (7, 4) Hamming code, we can follow the
algorithm
through. As mentioned previously, the seven variable nodes are initialized to
be the corre-
sponding probabilities representing the likelihood of the received bits being
either a 0 or 1.
Each of the seven variable nodes sends its likelihood message along each edge
to the corre-
sponding check nodes. The message that is returned to each of the variables is
the result of
the check node function with all message inputs excluding the input message
along the edge
that the output message is passed. Figure 2.3 (c) depicts the return message
to x~ based
on the inputs of xs, x fi, and x7 to the parity check node. Each variable node
is updated
18

CA 02310186 2000-06-02
following the same rule, such that the output along an edge is dependent on
all of the inputs
to the node excluding the one along the output edge.
At this stage, one full iteration through the graph has been completed and all
of the
variable nodes have an updated estimate as to their probability mass function.
This itera-
tive procedure continues until some termination criterion is met. This could
possibly be a
convergence measure on the variable node output messages. a threshold on the
maximum
number of iterations, or a check node 'satisfaction'. Once the algorithm is
complete, a de-
cision is made based on each bit's probability mass whether it is more likely
a 0 or a 1 and
this hard decision is output as the decoded word. It is also possible to
output soft decisions,
meaning that the reliability measure is also output along with the decision.
This approach provides a relatively straightforward method to decode LDPC
codes. One
problem that has not, been mentioned is the case of cycles in graphs, which
are unavoidable
in LDPC codes and most; other codes of finite length. For the cycle free case,
the sum-
product algorithm exactly computes the desired factors, but this is not the
case for graphs
with cycles. Despite this drawback, the sum-product, algorithm still works
extremely well
on graphs with cycles, as the APP estimates are 'good enough'. Many different
simulations
have shown that for codes without small cycles (of length< 2), the sum-product
algorithm
works very well.
2.4 Density Evolution
Since the rediscovery of LDPC codes in the mid 1990's, a great deal of work
has been
done to further develop the performance and understanding of LDPC codes and
iterative
decoding schemes in general. A first extension of Gallager's analysis for
decoding outlined
in Section 2.2.4 was presented by Luby, Spielman, et al., who used the same
bit-flipping
algorithm applied to irregular LDPC codes ~12~.
In late 1998, Richardson and Urbanke applied t;he bit-flipping algorithm to a
number of
channels including the Gaussian channel, and found the maximum allowable
parameters for
these channels, verifying C''=allager's results and presenting new ones (19~.
They were also able
to successfully analyze the capacity of LDPC codes using soft-message-passing
algorithms,
19

CA 02310186 2000-06-02
Depth : l + 1
Depth : l
Figure 2.3: Tree representation of a section of figure 1.1
one of which is the sum-product algorithm described in Section 2.3. Their
approach will be
detailed here, though expanded to include irregular codes.
First it is useful to describe how a factor graph can be unvolved. Message
passing algo-
rithms work on graphs and it is insightful to be able to view how the messages
move through
the graph as a function of time. Time in this case is assumed to be some
discrete counter
that advances at a rate that is a function of the message passing algorithm.
For example
one iteration of a decoding algorithm may be considered to be one time step.
It is possible
to move backward in time from a certain node n in the graph and recursively
view all of the
nodes and edges which have contributed to the message being output from n.
If we draw the node n and all of the connected edges and nodes which
participate with
node n back t time steps, we have an unvolved tree as in Figure 2.3. This is
an unvolved
graph of depth 2. The resulting graph is a tree or is tree-like, if' and only
if all of the nodes
in the tree are distinct. This is not. the case in Figure 2.3 as the variable
x5 is repeated,
but it serves its purpose as an example. A message passing algorithm used on
the graph
sends messages up the tree, and we are able to understand how the distribution
of messages
and each level l, in the tree affects the message output at depth l + 1. Thus
we see that
this represents the time evolution of messages on the graph. We would like to
find out how
the distribution of the messages evolve as they proceed up the tree in order
to understand
how iterative decoding methods work. First, some assumptions need to be made
about the
system.
The all zeros' codeword is to be the transmitted codeword. If' symmetry
conditions

CA 02310186 2000-06-02
of the channel and both variable and check nodes are satisfied, then the
probability of a
decoding error is independent of the transmitted codeword, validating the use
of this word
and is proved in ~19~. Th.c all zeros' codcword also has the advantage of
having a simple
distribution of bits, namely a delta function, and this will assist in the
simplicity of the
analysis.
The symmetry conditions that are needed for the use of the a,ll zeroes word
are symmetry
in the channel, and symmetry at both cheek and variable node update functions.
Symmetry
in the channel implies zero-mean noise: for the AWGN or an equal likelihood of
flipping a bit
on the BSC. Symmetry at the nodes means that given a set of input message
reliabilities,
i.e. message magnitudes, to the function, the output reliability should be
independent of
the signs of the inputs. What these conditions impose is a prevention from the
possibility of
distributions erroneously drifting.
We will assume that the channel is memoryless and all received symbols are i.
i. d.. Each
of these symbols is converted to a probability metric and is used to
initialize the 0th level of
the tree. The distribution of these probabilities c:an be denoted as Po.
Working under the
assumption that the code is tree-like to a depth l, where l, is the decoding
round, we can
develop an exact analysis of the densities at the lth level of the tree.
Let the messages sent from variable nodes to check nodes be represented by
LLR,s, and
let the messages sent from check nodes to variable nodes be represented by
LLDPs. Let P~
be the probability distribution (density) of messages sent from the set of all
variable nodes,
and let R.~ be the density of messages passed from the set of all check nodes,
both at round
l in the decoding process (or depth l on the tree). Also, define
~(.f ) _ ~ ~~f~(' 1) and p(f~) _ ~ Pzf ~~2 11 (2.16)
where f is an input distribution, and the notation f ~'~ represents a repeated
convolution of
the distribution f , :r times. Both ~~ and p2 represent the same degree
sequence statistics as
in (1.2). Using the LLR, representation as messages, we have an expression for
the output of
a variable node given in (2.13). The density of these messages for all
variable nodes is the
convolution of all the input densities participating. This can be written as
Pt = Po ~ ~(R~_1) (2.17)
21

CA 02310186 2000-06-02
where ~ denotes convolution. Similarly, representing the messages being passed
to a check
node as an LLDP, (2.15) can be used to determine the output message. Again,
the density
of these messages is the convolution of all of the participating input
densities, and can be
written
R~-a = P(~'c-1) (2.18)
In order to develop a relationship between the densities from one round of
decoding to
the next, we require a transformation function which can convert densities
from the LLR
domain to the LLDP domain and back, which are derived in 181 and given here.
To change a
probability distribution from either representation to the other one rnay
apply the following
change of variable
y -~ In coth g/2 dg > 0 (2.19)
which, for probability distributions, may be written as the functional
f~.fl (y) - .f (ln coth ?~/2) ~g ~ ~ (2.20)
sinh(y) -
This operator maps distributions over the reals, f (y) to distributions over
the reals:
7 : ~8~ -> I<8'~ (2.21)
Note that
In coth y/2
In Goth 2 = y (2.22)
so it follows that ~~y~~y f ll (;ey) = f (g). Thus we have a method of
converting positive distribu-
tions of probabilities from one domain to the other.
For some density g, let g = y+ + g--, where g+ and g- represent the positive
and negative
portions of the density respectively defined by
9+(~) = 9(~)~ ~ < 0
9+(x) = 9(x)~~ > 0. (2.23)
Also define a reflection in the y-axis operator, R, defined as ~Ry-l (:z) = g-
(-x). Finally
define the change of variable operator that takes as it's input an LLR
distribution over II8
and returns an LLDP distribution over W2 x R+
h : ll8'~ -~ llgu'2 "~+ (2.24)
22

CA 02310186 2000-06-02
where WZ is the set ~~1~. This operator can then be defined as
frf~(5~ r) _ (I, f~f+1(r» + (-I, L7f~f-1~(r)) (2.25)
and it's inverse, which takes an LLDP input distribution and returns the
equivalent LLR
distribution is defined as
~r lf~(~) _ ~'Yf+~(x) + ~~~'Yf ~~(x) (2.26)
Given the previous equations and operators, we can combine them to produce the
heart
of Richardson and Urbanke's work, their theorem of density evolution on
continuous message
passing algorithms. If the distribution of messages from check nodes is
initialized to erasures
Ro = b(0), i.e. p(0~ = p~l~ for all messages, then we can write the theorem of
density evolution
as:
Density Evolution ~18~: Let Po denote the initial message distrib~ction, under
the assumption that the all zeros codeword was transmitted, of an LDPC code
.specified bg BdgP degree seqv,ences a(.x,) aid p(.x,). If R~ i.s th,e
distribution of
messages passed from check nodes to message nodes at round, l, of sum-product
decoding, then
Rt = 1, lp(h(Po ~ ~(R~_i))) (2.27)
This powerful result provides a method to find good LDPC codes that perform
very well
without the need of simulating codes on a trial and error basis. Degree
sequence polyno-
mials p(~) and ~(~) can be found which maximize a particular channel
parameter, such as
the threshold SNR, thus providing near-optimal performance. Since the all
zeros' codeword
is sent, we know that the algorithm has converged to the correct codeword when
the den-
sity approaches a delta function where p~0~ = 1, which is at positive infinity
for the LLR
representation.
It should be noted that we cannot actually converge to a delta function at
infinity, but
it can be approached. Thus, the resulting LDPC codes used with the optimized
degree
sequences may not asymptotically perform as well as expected since the
probability of bit
23

CA 02310186 2000-06-02
error is not zero. It is vanishingly small however, and empirical evidence
supports the validity
of asymptotic performance.
Hence, we can try to find degree sequences using some directed search that
maximizes
Po for which the density evolution converges to the all zero codeword
distribution, and this
maximum is labelled as Po . This threshold is the supremum of all channel
parameters for
which the probability of error approaches zero for some number of iterations
on the graph.
For example, in the AWGN the noise variance ~ can be maximized for which the
BER, of
the decoder is vanishingly small.
One beneficial feature that stems from this form of analysis is that an
additional con-
straint on the number of iterations used to decode can be enforced. Since
level l of the tree
corresponds to the lth decoding iteration, we can attempt to find degree
sequence polynomi-
als for which P~ is maximized within a constrained number of iterations. This
is very useful
when searching for good codes that are also practical.
Using these density evolution methods described, Richardson and Urbanke were
able
to develop what they term as highly irregular LDPC codes that asymptotically
perform to
within 0.06 dB of the Shannon limit ~18~. This is a significant increase over
the previous
best known performing code. In the subsequent chapters, we make use of this
approach to
optimize LDPC codes for reduced complexity decoding algorithms.
2.4.1 Fixed Points
It is useful to examine how the the distribution of messages passed during the
decoding pro-
cess evolves over time. To this end we define a statistic PlE which represents
the proportion
of messages being passed at round L that, if a hard decision was made as to
their sign, are
in error.
Using this statistic, the time evolution of errors through the message passing
algorithm
may be plotted, and two examples are given in Figure 2.4. The curve to the
right represents
how the distribution of messages approaches the correct codeword (P~ = 0) as
the decoding
round gets very large, in this case about 1600. The inner curve gives the same
evolution for
density evolution constrained to maximum decoding iterations of less than 200.
Note that
24

CA 02310186 2000-06-02
0.14
0.12
0.1
0 0.08
0 0.06
s_.
s~
0.04
0.02
1 Y 1 I I 1 1 I \ I
0 200 400 600 800 1000 1200 1400 1600
Iteration
Figure 2.4: Time evolution of hard-decision errors in message-passing. The
outer curve
represents evolution of a code optimized with no iteration constraints, the
inner curve code
was optimized with a constraint to a maximum of 200 iterations.
the curves are monotonically decreasing but their slope fluctuates. The
regions where the
slope approaches 0 are called fired points, and are locally near-stable
regions of the graph. At
these fixed points, the incremental improvement in PE is very small such that
for realizable
codes, i.e., codes of finite length with cycles, it is possible that the
evolution will become
'stuck' at these points.
In fact, this is the most common cause of decoder failure. Recall that density
evolution
allows us to find highly optimized LDPC codes and these codes are very
sensitive to any
deviations in their structure. Codes of shorter block length tend to deviate
further from the
optimal degree sequences due to finite effects. These deviations result in the
possibility of
the slope at a fixed point to actually reach 0 depending on the actual
received probabilities,
such that the code will not improve any further, and a decoding error results.
Techniques
to mitigate this effect are discussed in ~18~.

CA 02310186 2000-06-02
Chapter 3
Optimization of LDPC Decoders
In Chapter 2, we developed the concepts of density evolution, and presented a
relationship
describing the evolution of densities over time though the unvolved tree. We
will expand on
these ideas in this chapter and develop a similar approach to analyzing
decoding algorithms
that have a reduced complexity compared to the sum-product algorithm.
The ability to find the threshold performance for a LDPC code leads to the
problem of
how one can find good degree sequence polynomials that maximize these
thresholds. This
becomes a non-linear optimization problem with a huge parameter space. We can
utilize a
powerful, global optimization technique called ,genetic cclgorithrns to
provide the means to
solve this problem.
3.1 Genetic Algorithms
A genetic algorithm (GA) is a global optimization technique which takes the
concepts of
biological evolution and Darwinian survival and applies them to finding
optimal solutions
to some generally hard problem. The basic operation of a GA is conceptually
simple: One
creates a population of solutions to some problem, selects and combines the
better solutions
in some manner, and uses their recombination to replace poorer solutions. The
combination
of selection pressure and innovation generally leads to improved solutions,
often approaching
the globally optimal solution.
26

CA 02310186 2000-06-02
The problem to be solved must be expressed through artificial chrorr~osomes,
each of
which represents one possible solution to the problem. A chromosome may be a
string of
bits, a list of parameters, or a more complex representation, such as a tree
of procedures,
but it must represent some embodiment of a solution to the problem. For
example, suppose
the problem is to find the point furthest away from the origin that is within
the bounds of
some closed region of a 2-dimensional space. One simple method to encode the
problem is
to use a chromosome than contains two real numbers (~, y) representing a
location on the
2-D surface.
The GA requires a method to evaluate how good or bad a particular solution, or
chromo-
some, is. This is usually some function that takes as its inputs the
chromosome, and outputs
a fitness measure, and the fitness is what t;he GA uses to score the potential
solution. Using
our previous example, the GA's fitness function may take the chromosome, which
is an (.r,, ~)
coordinate, measure its Euclidean distance from the origin, and determine
whether or not is
is within the bound of the closed function in question. If it is, it may
return a fitness equal
to the distance from the origin, otherwise it may return some non-valid
measure, such as -1.
Having encoded the problem onto a chromosome, and having an evaluation
procedure,
the GA will evolve solutions to the problem by creating a population of
chromosomes. A
population is generally randomly created, and for each individual in the
population, its
fitness is determined. The GA will then select those individuals with the
highest fitness, and
perform some genetic operations on them to create a new generation of
individuals for the
population, which replace the individuals in the population with generally the
worst fitness.
The effect, hopefully, is an overall improvement in fitness.
Genetic operators include selection, recombination, and mv,tatiora, as well as
other less
common ones. Selection is the basic idea that a GA will prefer better
solutions to worse solu-
tions. This is Darwin's 'survival of the fittest' in action. There are many
ways selection can
be accomplished, from choosing the individuals with the highest fitness, to
random chance.
Recombination combines various parts of the chromosomes from two of more
parents and
produces a new chromosome child that contains traits from its parents, just as
in biological
systems. Mutation makes some sort; of alteration to a chromosome, affecting
some change
in the trait of that chromosome, such that its fitness may be altered.
27

CA 02310186 2000-06-02
Using these operators, the GA repeatedly replaces members of the population
and over
a number of generations, the average fitness of the population will hopefully
improve, as
worse individuals are replaced. Usually, the population will converge to some
steady-state
distribution, meaning an optimum has been found. It may be a, local optimum,
not global,
and it is often difficult to know which it is. GA practitioners make sure that
the initial
population is large enough and random enough that the entire search space is
adequately
covered, and they control the genetic operators in such a way to keep the rate
of evolution
from converging too quickly on a local optimum before a better optirrrum has
been found.
For the work in this paper, the GA package GALib was used (27~. This source-
code
is a library of C++ classes that allows the user to easily set up chromosomes,
fitness
functions, and define genetic operators. It is available for use from MIT at
the URL:
http : //lancet . mit . edu/ga/ and contains a full description of how the
package works.
3.1.1 Optimization Approach
The potential search space for this problem is enormous, being all possible
degree sequences
defining an LDPC code, and it is useful to constrain it in order to reduce the
overall com-
plexity. To this end, we constrain the rate of the code to be a fixed value,
and for all of
the cases presented, this value was R = '-'-z . We also constrain the maximum
degree that any
check or variable node can take. The goal of the optimization is to maximize
the threshold
performance of an LDPC code through the use of density evolution. As we see in
(18~, the
threshold performance increases as the maximal degrees increase, so we should
be aware of
the impact on threshold performance by limiting the maximal degrees for ~(x)
and p(x).
Let the maximal degree allowed for a(:z) be dv, and for p(a;) be d~. Define
the chromosome
for the genetic algorithm to be a list of real numbers q E (0, 1~ such that
chromosome " " . ~' ° ~ ~ ~ (3.1)
=q2~q3~' '~qdyq2~q3~' '~qd~-1
where qi and qi correspond to the variable and check coefficients of degree i
respectively. In
order for the polynomials ~(:r) and p(x) to meet their coefficient constraint
that ~7, .~i = 1
and ~i p~ = 1, we normalize both sections of the chromosome in the fitness
function. Note
that we do not include an element in the chromosome for the check node of
highest degree.
28

CA 02310186 2000-06-02
This is to allow the rate constraint to be met, as the check node variables
are normalized so
that the desired rate of the resulting code is achieved.
The initial population of size 50 individuals is randomly generated, and the
GA com-
mences evaluation and evolution. The fitness firnction takes as its input the
chromosome,
and converts it into the degree sequence polynomials such that the desired
code rate is cor-
rect. The function performs density evolution to determine the threshold
performance for
the ensemble of codes described by the polynomials, and the threshold is
returned by the
function as the fitness score. The number of iterations allowed can also be
fixed to limit the
search to more practical codes. Trials performed had the maximum number of
iterations
limited to 25, 50, 10(l, and 200.
Those individuals with the highest fitness are chosen for reproduction, and
their offspring
are used to replace the individuals with the worst performance. A genetic-like
crossover is
used for recombination in which two parent chromosomes are are randomly
'spliced' and
recombined to create an offspring chromosome. Over the course of many
evolutionary steps,
the thresholds generally converge on what is likely the maximum threshold
achievable given
the construction of the decoder and constraints of the code. The best single
individual is
selected and its chromosome is used to create the degree sequence polynomials
for the top
performing code ensemble.
The equations used for density evolution analysis contain many convolutions
that need
to be computed, as seen in (2.27). An efficient method for computing
convolutions is to take
Fourier transforms and multiply. This is how all convolutions are calculated
in this work, and
a few notes should be made about the effects of the Fourier Transform. Fourier
transforms
are efficiently computed using lV-point DFTs where N is the level of
quantization in the
probability distribution, and the quantization error that results can affect
the analysis. The
value N must be large enough such that this error is small, and was chosen to
be 2048 for
these trials which was found to be sufficient through multiple attempts with
different values
for N. The C library FFTW was used to perform all Fourier transforms in this
work (5~.
With the density evolution tools in place, and a global search mechanism
available to us,
we are ready to optimize the different decoders, beginning with the sum-
product decoder.
29

CA 02310186 2000-06-02
3.2 Optimization of the Sum-Product Algorithm
Using the GA setup discussed, optimally performing codes can be generated for
the surn-
product algorithm. Density evolution for this algorithm follows the procedure
presented
in (18~ and the reader is guided to this paper for details. The performance of
these codes is
very good, and for larger block lengths approach their thresholds. This is the
only decoding
algorithm examined that consistently improves in performance as the maximal
degree in-
creases, so long as the number of iterations allowed is sufficiently large,
and this behaviour is
predicted by MacKay in (14~. On average, a large number of iterations is
required to decode
these highly optimized codes. The average number of iterations required rises
to as high as
several hundred for low SNR, conditions. Again, the reader is directed to (18~
for performance
evaluation with unlimited iterations.
When a restriction on the maximum number of iterations is enforced, the
performance
of the Burn-product decoder degrades, and Figure 3.1 provides more detail.
These codes
are derived from density evolution optimization with a constraint on the
maximum number
of iterations set to 200 and a block length of 10,000. For codes with
unlimited number of
iterations, performance of the Sum-Product algorithm improves as the maximal
degree is
increased. However, as can be seen from these results, this is not necessarily
the case for
constrained iterations.
There appears to be an error floor introduced by limiting the number of
iterations which
gets more severe as the maximal degree is increased, such that the performance
of lower
degree codes is better than that of larger degree codes for higher SNR,
conditions. For low
SNR conditions, the higher degree codes outperform the inherently weaker lower
degree
codes, though still not nearly as well as comparable codes that are allowed a
very large
number of iterations. As the maximal degree is increased, variables on average
participate in
a greater number of parity check constraints, meaning that the decoding
procedure requires
more iterations for all of the received information to have a significant
effect. As the number
of iterations are allowed to grow for these codes, the error floor disappears
and the standard
'waterfall' curve is seen as in Figure 3.2. The performance of these codes
when allowed
an unlimited number of iterations however, is worse than the codes that were
optimized to

CA 02310186 2000-06-02
0.01
11 ., 09 05
0.001
,,
p 0.0001 ',
le-05
le-06 ~ '
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
SNR (Eb/No)
Figure 3.1: Sum-Product algorithm performance for rate 1/2 optimized codes of
block length
10,000, maximum number of iterations=200. The numbers 05-20 indicate the
maximal
variable degree for each code used.
operate with a very large number of iterations.
The error floor problem becomes worse as the code's block length is shortened.
It may
be useful to couple these codes with a second high-rate code to remove the
error floor while
still reducing the overall number of computations required for successful
decoding due to a
decreased overall number of iterations. Alternatively it would be nice to be
able to identify
those sections of the graph that are still in error and concentrate the
decoding effort on those
sub-graphs only.
When the inherent power of the sum-product is limited by practical
constraints, there is
a significant degradation in the codes' performance, though overall, the
performance is still
excellent compared with traditional codes. We shall see in the next section
that the min-sum
algorithm can also be optimized for practical situations, and the reduced
complexity decoder
can perform nearly as well as the powerful sum-product algorithm.
31

CA 02310186 2000-06-02
0.01
0.001
W
0.0001
1 e-05
le-06' ~ ~ ~ ~. ~ ~ . ~ w ~ ~ i
0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
SNR (Eb/No)
Figure 3.2: Effect of iterations on sum-product algorithm performance for
optimized rate
1/2 codes of block length 10,000 and a maximal degree of 20. Iterations range
from 200 to
1200 in increments of 200.
3.3 Optimizing the Min-Sum Algorithm
Most of the complexity in the sum-product algorithm arises from the need to
calculate com-
plex products for the check node update rule in (2.14). If this equation is
closely examined,
certain simplifications may be made to greatly reduce the complexity. Figure
3.3 (a) shows
the input-output characteristics of the check node equation based on all
inputs being equal,
and (b) shows the situation for one input being much less than the rest. It is
easy to see
that for the case of one input being much less than the rest, output values
are about equal
to the input values, and even for all inputs being equal as in (a), the
difFerence from the
input=output line is not too great, and is bounded.
Using this knowledge, we can reduce computational complexity of the check node
update
rule through the use of an approximating function, and this will be called the
main-sum
algorithm. The check node function for the min-sum algorithm returns a value
computed as
follows: Let the magnitude of an output message be the minimum magnitude of
all inputs
excluding the edge whose output is being calculated. Let the sign of the
output message be
32

CA 02310186 2000-06-02
4 4
~03 ~0 3
2 ~ 2
' 1 ' 1
~'~ ~
................,.:................... ~
................................................
-1 ~ -1
' '
R -2 ~ -2
:
O -3 (a) O _3 fib)
~ ~.
-4 -4
w
-4 -3 -2-1 0 1 3 4 -4 -3 -2 -1 0 2 3
2 1 4
Input message Input message
Figure 3.3: Check node output vs. input curves. (a) Input-Output
characteristics of check
node for equal inputs. (b) Input-Output characteristics of check node for one
message much
less than all others.
the rnodulo-2 addition of the signs of all the inputs excluding the edge whose
output is being
calculated. If the messages are represented by log likelihood ratios given in
Table 2.1, this
rule can be written as
~~f-~~(x) _ ~ Sgn(uy-~f(y)) ~Ems)\~ buy-,f(y)I (3.2)
yEn,(f)\~
where f represents the check node in question, and ~ the variable which
connects to f though
the particular edge we arc examining. The update rule for the variable side
remains the same
as for the Sum-Product algorithm in (2.13).
This simplification is a useful step, but we would also like to determine
optimal threshold
performances for this algorithm using density evolution concepts. In order to
do this, the
distribution of messages must be determined at the output of the check nodes.
We already
know how the density evolves at the variable nodes, and with the described
update rules for
the min-sum algorithm, the output cumulative distribution of messages from
check nodes
can be written
F(y) = 2 [1- FX(I y I ) + FX(- I y I)1N [1 + (1 + 2FX(o))N] (3.3)
where F(y) is the cumulative distribution function of the log-likelihood ratio
probability
messages of all of the edges being input to check nodes. This function enables
us to model
33

CA 02310186 2000-06-02
the evolution of probabilities on the graph over time. We may now proceed in
finding degree
sequence polynomials which have optimized performance thresholds.
Families of optimized codes were generated using a genetic algorithm. Degree
sequences
were limited to maximal degrees of 4, 5, 7, 9, 11, 13, 15, 20, and 25. The
maximum
number of iterations allowed was set to 25, 50, 100, and 200. For each case,
an optimized
degree sequence was returned by the GA. Table 3.1 shows the resulting degree
sequences and
thresholds for the cases in which iterations were limited to 100. Similar
results were found
when iterations were allowed to be greater or fewer, and in general, the
thresholds found grow
larger as the number of iterations are allowed to increase. Note that the
thresholds given in
Table 3.1 grow smaller as the maximal degree increases, and is seen with the
sum-product
algorithm as well.
The min-sum algorithm, as expected, does not perform as well as the sum-
product al-
gorithm. The thresholds found using density evolution optimization are greater
than the
sum-product algorithm, though the trend of decreasing thresholds with
increasing maximal
degree is still seen. Table 3.2 gives a comparison between the thresholds
found for each algo-
rithm when iterations are unlimited and limited to 100 iterations. Clearly,
from a. threshold
perspective, the sum-product algorithm is superior; however it; is useful to
compare their
performance in practical situations.
First, an examination of the performance curves for the Min-Sum algorithm
reveals that
for practical cases, larger maximal degree codes perform morse than smaller
maximal degree
codes for all SNR.s. As an extreme case, one can see in Figure 3.4 that the
performance for
the code with a maximal degree of 25 performs much worse at all SNRs
investigated than
the code of maximal degree of 11, which was found to be the, best performing
code of the
group. Since the min-sum algorithm does not compute exact APP values, errors
which occur
from the approximation are propagated through t;he graph. Check nodes with
high degrees
have a much greater chancre of sending a larger error to many more nodes in
the graph, and
due to this inherent problem, performance of these codes suffer.
There is an error floor that can be seen with these codes and, like decoding
with the sum
product algorithm, is a result of the iteration constraint put on the codes.
Coupled with
the errors introduced by the approximation of the APP values, this results in
an error floor
34

CA 02310186 2000-06-02
dv 4 5 7 9 11 13 15 20 25
~z 0.3699660.3550970.3166010.3344920.3023080.3093760.2974880.2844560.267085
a3 0.4117070.34071.50.4000990.3752630.3213790.3443620.2656820.2914180.312483
~a 0.2183270.1638180.0760030.0712850.0346570.0651130.0897330 0
~s 0 0.1403700 0.0240160.02117!)40.0179040.0897330 0
as 0 0 0 0 0 0 0 0.0512500.051886
a7 0 0 0.2072970 0 0 0.0650600.1106880.046961
aA 0 0 0 0 0 0 0 0.0412010.040485
J~q 0 0 0 0.1949450 0.1464490 0.0512500.023379
ato 0 0 0 0 0.1745840.0516840 0.0355190.026411
att 0 0 0 0 0.1402770 0 0 0
ats U 0 0 0 0 0.0651130 0 0
ata 0 0 0 0 0 0 0.1175400 0
ats 0 0 0 0 0 0 0.0747650 0
0 0 0 0 0 0 0 0.0235300
azo 0 0 0 0 0 0 0 0.1106880
azs 0 0 0 0 0 0 0 0 0.126607
azs 0 0 0 0 0 0 0 0 0.104703
ps 0.6520060.5104730.1086240.3305660.0289710.1578930 0.1821630.050973
ps 0.3479940.3379850.8852160.3461330.0577930.3480330.4150990.2959640.176702
0 0.1515420.0061610.2292960.0837340.380856(1.4414780.0458570.183145
pR 0 0 0 (l.Of140060.8295020.095024(1.1434230.0334830.381413
pa 0 0 0 0 0 0.0181930 0.1557250.203786
-
pto 0 ~ 0 ~ 0 l 0 0 0 0 0.2868080.003980
~
* 1.221 1.213 1.157 1.0891.040 1.016 0.998 0.9670.914
a' 0.86880.86960.87520.88210.88710.88960.89140.89460.9001
I I I
Table 3.1: Good rate 1/2. degree sequences for the min-sum algorithm over the
AWGN,
optimized for 100 iterations. The values Q* and ( N ) * are the threshold
noise power and
SNR for the given degree sequence.

CA 02310186 2000-06-02
IterationsT MaximalSum-Product Min-Sum
Degree SNR' SNR'
100 4 0.926 1.221
100 5 0.916 1.213
100 7 0.865 1.157
100 9 0.812 1.089
100 15 0.746 0.998
00 4 0.782 0.967
00 5 0.723 0.931
00 7 0.502 0.787
00 9 0.409 0.692
00 15 0.335 0.651
Table 3.2: Comparison between sum-product and min-sum algorithm thresholds
that is severe for few iterations. Like the sum-product algorithm, the error
floor becomes
less severe, and disappears as the number of itera ions increases.
The performance of the min-sum codes with respect to their threshold value
also degrades
faster for smaller block length than compared with the sum-product results.
The reason for
the degradation in performance is the effect of finite block lengths. Recall
that threshold
performance is the expected asymptotic performance as the codes length
approaches infinity.
The min-sum product algorithm is a<aually an approximation of the sum-product
algorithm,
and as such, these finite eif'ects are much more pronounced, resulting in a
rapid degradation
in performance as block lengths decrease.
From the above discussion, we see that though the complexity of the min-sum
algorithm
is lower than the sum-product, the performance suffers significantly, and it
is up to the code
designer to decide which may be of more value. It is interesting to ask
whether there is
a better approximation function than (3.2) that is still computationally
efficient. We will
attempt to answer this in the next section.
3.4 The Modified Min-Sum Algorithm
The simplification of the check node update rule for the min-sum algorithm
introduces an
error into the output messages, which certainly impacts the performance of the
decoder.
This message error, 0~,, defined as the difference between the actual output
from a factor
36

CA 02310186 2000-06-02
0.01
0.001
crl 0.0001
le-05
le-OE
Figure 3.4: Optimized min-sum algorithm performance for rate 1/2 codes of
length 10,000
with 200 maximum iterations.
graph update rule, /~a~t~at and the output from the min-sum update rule,
~Glni'n-suylL, is
~71L2'IL-5'7I,'IIL ~fl(:G'll(lL (3.4)
and is bounded by
0 < D~c G log d,~ (3.5)
where d~ is the degree of the particular check node. It is easy to check that
no error will
occur when all message magnitudes but the minimum are infinite, and that the
maximal
error occurs for the case where all input messages are of the same magnitude.
Knowing this bound, we can attE:mpt t;o improve the min-sum decoding
performance by
introducing an additive term to the check node update rule. This approach has
been used
for other codes, and is very similar to the Max-Log-Map algorithm ~17, 1, 20~.
The goal is to
closely approximate the transfer function of the check node as simply as
possible, and one
effective approach is to use a piece-wise linear approximation. A number of
dif~'erent limits
and variations were tried for the approximation, and the best found is given
here. Let ~i.ILZc~~c
be equal to the result in {3.2), now let the resulting message output from a
check node in
37
1.G l.~i 1.4 l.b l.b
SNR (Eb/No)

CA 02310186 2000-06-02
0.01
0.001
~ o.oool
W
09
le-05
07
le-06 ..05~
04
le-07
1.1 1.15 1.2 1.25 1.3 1.35 1.4
SNR (Eb/No)
Figure 3.5: Performance of the modified min-sum decoder for rate 1/2 codes
with block
lengths of 10,000 and iter<~,tions limited to 200 for varying maximal degrees.
the modified decoder be
l-tinitial - K if ~I,initiad ~ K
N'f->y = ~initiac + K if ~tinitial < -K (3.~)
~tinitia~ otherwise
where K is a constant value that is some fraction of log d~. The actual value
of K used
depends on the average d~ for that code, and was found to be effective when
chosen to be
3 log d~. There are many alterations that can be made to this firnction by
altering K or the
limits. One could do a least squared fit to the transfer function, and the
resulting approxi-
mating functions use would then depend on complexity and performance
considerations. For
the remainder of this discussion, only (3.6) will be used.
Once again, the ideas of density evolution may be applied to this modified
algorithm
to determine optimized degree sequences. The addition of the constant K
complicates the
expression for the probability density and numerical methods were used to
calculate the
required densities. It should be noted that the effect of the additional
constant is to 'shift'
those portions of the probability distribution towards the y-axis.
38

CA 02310186 2000-06-02
d" 4 :i 7 9 13 15
~a 0.3699660.3684360.3436570.2998280.2722850.278550
a3 0.4117070.3984930.296163(1.2800350.2808720.268489
a4 0.2183270.1255190.2046970.0721210.0048040.059741
a5 0 0.1075520 0.0679800.0765610.041022
afi o 0 o a o a
a7 0 0 0.1554830 0 0.044403
a8 0 0 0 0 0 0
ay 0 0 0 (1.2800350.0049160
ale 0 0 0 0 0.1165790
al 0 0 0 0 0 0
ais 0 0 0 0 0.2440420
a74 0 0 0 0 0 0.058201
~ls 0 0 0 0 0 0.249594
p5 0.6520060.6134660.1917500.3677160.153(1800
p~ 0,3479940.2964020.7624450.0080950.1994900.319192
p7 0 0.0901310.0458060.0047160.0043290.068649
pn 0 0 0 0.6194730.1934560.612159
pa 0 0 0 0 0.4496460
F6 1.2211.197 1.128 0.793 0.7350.709
NO
~
0.8f>880.87120.87820.91270.91890.9216
Table 3.3: Good degree sequences for the modified min-sum algorithm with rate
1~2 codes
over the AWGN, optimized for 100 iterations. The values Q* and ( N ) * are the
threshold
noise power and SNR for the given degree sequence.
39

CA 02310186 2000-06-02
Performance curves for the modified min-sum algorithm are given in Figure 3.5
and
Table 3.3 shows the optimized degree sequences for the case of iterations
limited to 100. The
curves look similar to the regular min-sum algorithm with improved overall
performance for
lower SNR, scenarios. The selected update equation has the effect of
introducing many very
low-value messages, and these messages are essentially considered erasures by
the decoder.
For larger maximal degrees, these small valued messages will be propagated
through certain
sections of the graph, resulting in some sub-sections of the graph becoming
stuck in a non-
convergent, inconclusive state. This is seen by the very prominent error floor
in these codes.
As with the other algorithms examined, this error floor gets much worse for
larger maximal
degrees, but a greater number of iterations mitigates the effect. Very high
maximal degree
codes were not investigated as it was assumed that they would have poor
performance, and it
appears that this is the case as the maximal degree 9 code is already worse
than the maximal
degree 7 code for all SNR,s investigated.
The modifications to the min-sum algorithm do give the decoder better
performance, but
the error floor becomes more pronounced. We see again that there are trade-
offs involved
with the selection of a decoding algorithm. Depending on the target BER, that
is to be
achieved, the modified min-sum may or may not be a good choice. We will
examine some
other factors affecting iterative decoder performance in the next section.
3.4.1 Comments on Good Degree Sequences
There are similarities between all of the optimized degree sequences, and it
is worthwhile to
examine them. The most striking commonality between all the optimized
sequences, which
is true for all of the algorithms, is the maximization of degree-2 variables.
In all cases, the
proportion of degree-2 variables is very close to the lirrriting value such
that every check
node has two degree-2 variable nodes participating with it. This means that
the number of
degree-2 variables is less than, but close to the number of check nodes. If
the number of
degree-2 variables is greater than the number of checks, it would be
impossible for a code
to be constructed without a, cycle of length 2. It seems then, that the small
degree variable
nodes, counter intuitively are an important factor in powerful codes, as there
are generally

CA 02310186 2000-06-02
a great deal of degree-3 variable nodes in these codes as well.
Besides this concentration, it also appears that good performance is achieved
by hav-
ing a large proportion of edges associated with the highest variable degrees.
Besides the
high number of edges participating with degree 2 and 3 variable nodes, a
majority of the
remaining edges tend to participate with the highest degree variables,
especially in the case
of the sum-product algorithm. These edges, however, seem to be the cause of
the finite
effects that degrade the practical performance of these codes, and there are
generally fewer
edges participating with these high degree nodes when they are optimized with
iteration
constraints.
3.5 Finite EfFects
Fundamentally, there are two factors preventing the codes presented in the
previous sections
from achieving their threshold performances. The first is the finite block
length of the codes,
and the second is the finite number of maximum iterations allowed.
Both the min-sum and modified min-sum algorithms share certain commonalities
which
will be discussed. First, as with other iterative decoding schemes including
turbo-codes,
performance is highly dependent on the block length of the code. Figure 3.6
shows this
relationship for the min-sum algorithm given a maximum of 200 iterations, and
Figure 3.7
gives the same relationship for the modified min-sum algorithm.
In addition to the block length, both the Min-Sum and modified Min-Sum
algorithms
perform much better when given a greater number of iterations to decode.
Figure 3.8 gives
results for the Min-Sum algorithm, and similar results hold for the other
decoding methods
examined.
Finally, it is useful to investigate the nature of the error floor found in
the various
decoders. Figure 3.9 shows the error floor drop off for the three decoding
schemes discussed
at higher SNRs. An examination of Figure 3.9 shows that all three algorithms
operate in
three 'modes'. The first, at lowest SNR is the standard performance seen for
these curves and
performance is limited by the noise conditions. As the SNR increases, these
codes enter an
'iteration limited' region of performance in which performance improvement
with increasing
41

' ' CA 02310186 2000-06-02
0.01
05,1
0.001
0.0001 ;
r '
lc-0~ ~ 09,104
4
~~~ 09,105 ~~~ 05,105 05,10
le-06
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
SNR. (Eb/No)
Figure 3.6: Performance of the min-sum decoding scheme for varying block
lengths. All algorithms
are highly dependent on the block length used. The first number is the maximal
variable degree
and the second is the block length. Iterations are limited to 200, an<l the
rate is 1/2.
0.01
05, 103
0.001
'
'
m
p 0.0001
''~~y ,09,105 09
le-05 '
05, 104
.~ 05, 10
le-06
1 1.2 1.4 1.6 1.8 2
SNR (Eb/No)
Figure 3.7: Performance of the modified min-sum decoder for maximum iterations
of 200 for rate
1~2 codes of varying maximal degrees and block lengths.
42

' ~ CA 02310186 2000-06-02
0.01 05,25
09,25
0.001
..
'''
'. ',
.'
p 0.0001 '
', .'
.~
..
' ' .05,200
.~
~ ~~~ 05,1000
. ,
1 e-05 ' _
' ~~ 09,200
09,1000'', ~~
le-06 ~
1.2 1.3 1.4 1.5 1.6 1.7
SNR (Eb/No)
Figure 3.8: Performance of the min-sum decoder for block length of 10,000, for
varying
maximum number of iterations.
0.01
0.001
0.0001
W
p
le-05
1 e-06
1 e-07
0.9
SNR, (Eb/No)
Figure 3.9: Performance of the three decoding schemes at high SNR, for rate
1/2 codes
showing the drop o$' of the error floor.
43

CA 02310186 2000-06-02
SNR is smaller than expected. Performance is limited by the fixed number of
iterations
allowed. Finally, at high SNR, regions, performance improvement begins to
increase as
expected.
3.6 Performance vs. Complexity
We have discussed in previous sections some of the impacts that practical
considerations,
such as the maximum number of iterations allowed and block length, have on
performance.
It would be useful to quantify these trade-offs in a meaningful way to try to
get a clear
picture of the issues, and we accomplish this by plotting the performance of
these decoding
algorithms against the decoding cost in terms of their computational
complexity.
A codes true complexity is not simple to calculate and is highly dependent on
how it
is implemented. The complexity calculated and used in these trials is based on
the direct
computational effort required only within the decoding algorithm, and this is
the compu-
tations required at each node. All basic computations are considered to be of
equal effort,
so an addition is the same complexity as a compare or a multiplication. Also,
overhead
computational requirements such as loading and unloading of buffers is not
included. The
intention is to provide a rough measure that serves as a relative comparison
rather than an
absolute description.
Based on efficient calculations (i.e., using the FBA), we can derive
expressions for the
complexity of a given decoding procedure. At a node of degree D, the number of
basic
computations required for the FBA is 3(D-2). For the sum-product algorithm, 5
operations
are required per basic FBA operation at a check node, such that the the
complexity per bit
of the algorithm, C, can be written as
CS_P = I (variable operations + check operations) (3.7)
=~I 3- z 5 +5 3- 1 0 - (3.8)
fo ~(~r)d~ fo p(~)d~
which simplifies to
C,s_ p = I 18 - 5 - 30
fo '~(=~)~x .fo p(x)dx (3.9)
44

CA 02310186 2000-06-02
where I is the number of iterations used and was set to be one of '?5, 50, 100
or 200, depending
on how many maximum number of iterations are allowed. The min-sum algorithm
requires
an additional select operation at each check node beyond the basic
computations, so its
complexity per bit can be written
~~M_S-I 6_ 5 - 4
(3.10)
fo ~'(~)«'x fo p(x)dx
and similarly for the modified min-sum which requires an additional compare
and add per
bit at each check node,
4 (3.11)
Cna.m-s = I 8 - _
.~o '~(x)dx
Figures 3.10 and 3.11 give the perforrrrance complexity trade-off for BER
targets of 10-3
and lOw' respectively. The curves represent codes of block lc,ngth 10,000 rate
1/2, and
maximum iterations limited to 2(l0. For all of the cases examined, there is an
optimal
operating point, meaning that for some added complexity in the form of higher
maximal
degrees, decoding performance actually degrades. This is an important result
for practical
situations. What seems to be happening in general is that the highest degree
nodes require
a very large number of iterations to be decoded correctly, and for situations
where there is
a practical limit on these, the code actually performs worse.
For a target BER, of 1.0-3, one c:an see that there is no obvious 'win' in
terms of the
performance complexity trade-ofF between the three decoding schemes, although
it seem one
can achieve higher performance with lower complexity for some eases with the
modified min-
surer over the standard min-surrr algorithm. For a target BER of 10-5, we see
a clear win
with the modified min-sum algorithm over the standard min-sum, and it even
outperforms
the sum-product algorithm in certain instances.
Figure 3.12 shows the performance complexity trade-off for a target BER of 10-
5 with
iterations constrained to 25. The performance is worse than for the case of
maximum itera-
dons set to 200, but the general pattern remains. Note that the lower maximal
degree codes
perform relatively better with fewer iterations. For example, the best
performing code for
the modified min-sum algorithm with 25 iterations has a maximal degree of 5,
whereas it is
9 for 200 iterations.

CA 02310186 2000-06-02
I I
11 is 2°
0.7
09
0.8
05 Sum-Product
1
09
1.1 °' 11
°~Mod. Min-Sum
1.2 °f
09 13
1.3 °' 2° Min-Sum
05
1.4 4 25
5 6 7 8 9 10 11 12 13 14
Complexity (Computations/Bit/Iteration)
Figure 3.10: Performance Complexity trade-off for a target BER of 10-3 for the
sum-product,
min-sum and modified min-sum algorithms.
1.1
07 / 20
°5 Sum-Product
1.2 05
a9
O °a
1.3
W 11
1.4 Mod. Min-Sum
1.5 0~5 12° Min-Sum
1 25
1.~ t i I I I I I I I I I
5 6 7 8 9 10 11 12 13 14
Complexity (Computations/Bit/Iteration)
Figure 3.11: Performance Complexity trade-off for a target BER, of 10-'' for
the sum-product,
min-sum and modified min-sum algorithms.
46

CA 02310186 2000-06-02
1.2
09 11 15
1.3
2p
0 1.4 Sum-Prods
05
05 p7
1.5
p" p9 Mod. Min-Sum
1.6
o~
11
1.7
nnll~a
I / \ 15
1.8 ~5 'zo Min-Sum
f 7 8 9 10 11 12 13 14
Complexity (Computations~Bit/Iteration)
Figure 3.12: Performance Complexity trade-off for a target BER of 10-5 for the
sum-product,
min-sum and modified min-sum algorithms with maximum iter<~tions set to 25.
It is useful to understand the relationship with varying block lengths for
these algorithms,
so two plots are given in Figures 3.13 and 3.14 for comparison, having block
lengths of 100,000
and 1000 respectively. As the block length increases, the codes become more
sensitive to
the maximal degree parameter. The difference between the performance of the
optimal code
versus one with a smaller or larger maximal degree is much greater for block
lengths of
100,000 than 1000, especially for the min-sum and modified min-sum decoders.
We have seen that for practical cases, reduced complexity decoders such as the
min-sum
and modified min-sum provide positive alternatives to the sum-product
algorithm in terms
of decoding power per unit operation. Overall performance does suffer however,
and we
would like to see ultimately a practical, realizable decoder that can perform
as well as the
sum-product without the complexity. The answer, perhaps, may be found by
examining the
actual implementation of the decoder and exploiting the power of the platform.
47

CA 02310186 2000-06-02
0.9
09
0.95 " 15 20
05
1 Sum-Produc
_ 04 13
O 1.05 091 05
1.1
o~
1.15 o Mod. Min-Sum
1.2
11
1.25 25
Min-Sum
1.3
1.35
5 6 7 8 9 10 11 12 13 14
Complexity (Computations/Bit/Iteration)
Figure 3.13: Performance Complexity trade-off for a target BER of 10-5 for the
sum-product,
min-sum and modified min-sum algorithms for block length of 100,000.
1.8 -
1.82 11
09
1.84
zo
0 1.86 Sum-Product
1.88 ~5
.n
1.9
1.92 0~5 0~ 09
1.94 0/ ',1
1113 Mod. Min-Sum
1.96 a, 15
09
1.98 04 Min-Sum
2
5 6 7 8 9 10 11 12 13 14
Complexity (Computations/Bit/Iteration)
Figure 3.14: Performance Complexity trade-off for a target BER, of 10-~ for
the sum-product,
min-sum and modified min-sum algorithms for block length of 1000.
48

CA 02310186 2000-06-02
Chapter 4
The Sum-Transform-Sum Algorithm
Applications that require very powerful FEC schemes can greatly benefit from
LDPC codes,
as the performance is beyond most other coding choices. One difficulty with
the LDPC codes
presented so far is that they are not necessarily well suited for
implementation in devices
such as modems. These codes have been developed with decoding performance in
mind,
with no thought to the application.
A different approach can be used to find a good decoding algorithm. First,
understand
the implementation, and use the features inherent in the implementation to
find powerful
decoding algorithms. Applied to a hardware instantiation, this approach leads
to the sum-
traraaforrrz-surra (SXS) algorithm.
4.1 Algorithm Description
The SXS algorithm is a hardware architecture and algorithm for a soft-input,
soft-output
iterative decoder for LDPC codes, and is similar to the standard sum-product
or ruin-sum
algorithms. The inputs to the algorithm are soft representations of the
probabilities for the
possible transmitted symbols, and for a hardware implementation, it is natural
to repre-
sent these values as fixed-point numbers. The precision with which these
number will be
represented is a parameter of the system and will depend on, among other
things, memory
constraints, and hardware complexity.
49

CA 02310186 2000-06-02
Unlike the other iterative decoding schemes described, the SXS algorithm
utilizes two
separate probability representations for messages and requires a
transformation algorithm
to convert between the two. The two representations for messages used are the
LLR and the
LLDP, both of which have been previously described and defined. Variable nodes
work in the
LLR, domain while check node operate in the LLDP domain. Using these
representations,
both the variable node update equation and check node update equation are
simple to
calculate, being the summations of (2.13) and (2.15) respectively.
It is useful to note that these two representations are c~uals, meaning that
decoding in one
domain is equivalent to decoding the dual code in the other domain. The idea
of using the
dual domain to decode linear codes was proposed originally by Hartmann and
R.udolph ~7~.
Forney has suggested the use of dual codes and a Fourier Transform method to
convert
between them ~9~. The use of the dual code can reduce the complexity involved
in the
decoding procedure without any performance loss, however the conversion
between domains
may require intensive computations.
With two separate probability representations, the SXS algorithm requires a
function
h(x)(s, r) to transform messages between them. Let ~,Tdtio be a message in the
log-likelihood
ratio domain, and ~,d2 f p be a message in the log-likelihood difference pair
domain. The
function r(x)(s, r) takes as its argument a message of the form ~~,.~,t~~ and
returns a message
of the form ~cd2ff. The returned message has the equivalent probability as the
input message,
only expressed in a different manner. Similarly, the inverse transform
function ),-1 (s, r) (x)
takes as its input a message of the form Ecd2 j f and returns the equivalent
probability in a
message of the form uTazio. The function h(x)(s, r) may be written as
ea ex
1,(x) (s, r) Csgn L2 C 1 + e~ ) 1] , log 2 Ce.~ ~+ 1 ~ - 1 ~ (4.1)
where x is a message of the form ~r~tio. Likewise, the inverse transform
function >,-1 (s; r) (x)
may be written as
1 + Ser
T-r (s, r) (~) = log ~ 1 - ser ~ (4.2)
where (s, r) is the sign and magnitude of a message of the form p,~2 f f.
The function in (4.1) and its inverse, (4.2), may be implemented very simply
using a pair
of look-up tables in memory. If, for example, messages are stored as 8-bit (N-
bit in general)

CA 02310186 2000-06-02
values, then two look-up tables each of size 28 = 256 bytes (2N N-bit values
in general)
are required to completely describe the forward and inverse functions. One
need only to
calculate the value of the transform for each possible input once, and store
the results in
memory for future use. Alternatively, the function and its inverse may be used
themselves
to calculate the necessary transforms on an as-needed basis.
It may also be possible to implement the transformation through the use of a
combination
of logic and memory. For example, it may be a simple matter to use some logic
to determine
the most significant bits of the transform output, and determine the least
significant bits
in a reduced look-up table. This flexibility allows the algorithm to be used
in an optimal
manner depending on the platform that is it is implemented on. If processing
speed is a
constraint, it would make sense to take advantage of the look-up table as much
as possible
as opposed to calculating multiple transforms.
One difficulty in using fixed-point precision for this algorithm is the many-
to-one, one-
to-many problem, which occurs as a result of the non-linear nature of the
transformation in
a fixed-point system. Multiple values in one representation may map to the
same value in
the other and vice versa, making the choice of ranges and precision for these
values critical.
For example, the values 7, 8, and 9 may all map to the value 1 in the other
domain, and the
problem is that information is lost in mapping back from 1 to the original
domain.
Another issue with fixed-point irnplernentations is the range of values that
nray be taken
on by the precision used for the system. If too few bits are used to represent
the values,
performance may be very poor. In fact, this is exactly the case. When 6 bits
are used to
represent the data, the decoder works very poorly indeed, but as the precision
is increased
to beyond 12 bits in general, performance is limited by the power of the code
itself, not the
fixed-point precision.
The two message representations, uTaaio and ~cd2 f f, have different domains,
as shown in
Table 2.1, but since a sign bit is naturally represented in the fixed-point
world of hardware,
the domains become the same, and this is finitely limited by the choice of how
decimal bits
and whole bits are allocated within the overall bits available. For effective
performance,
~7~~zi~ messages may need to take on very large values, whereas ~,ri f f
messages may have very
small decimal values. It is advantageous to allocate more whole bits to
~e,.atqo messages, and
51

CA 02310186 2000-06-02
hrE'.Clslon/dTatio AllocationEtdi f f Allocation
6-bit 1-3-2 1-1-4
8-bit 1-3-4 1-1-6
10-bit 1-4-5 1-1-8
12-bit 1-5-6 1-2-9
lf-bit 1-6-9 1-3-12
Figure 4.1: Bit allocation for messages in the SXS algorithm. The first digit
is the number
of sign bits, then the number of whole bits, then the number of decimal bits
more decimal bits to itdi f f messages. Figure 4.1 shows how bits are
allocated for varying
precision for the trials performed with these codes.
4.1.1 Decoding
The decoding algorithm commences by updating the check node output messages.
This is
accomplished using the following' equation:
l ~ratia = r 1 ~ sw~~ f f ~ ~ r~d~r f (4.3)
xEn(i) xEn(i)
where i is the ith edge on the graph and ~,(i) is the set of edges connecting
to the check node
associated with i, excheding i itself. The variables s and r represent the
sign and magnitude
for the pair (s, r) which is how Etdi ff messages are represented. This
equation is used a
number of times based on the structure of the LDPC code to update all of the
elements of
the variable input array. .fn order to increase the efficiency of this
equation, the FBA may
be used ~11~, or an efficient one-pass algorithm may also be used ~8~.
Once completed, the variable node edges are then updated. The rule for
updating these
messages is governed by the equation
N'di f f - r ~init + ~ N'ratio (4.4)
wEn(j)
52

CA 02310186 2000-06-02
Code IterationsThreshold
Regular (3, 6) 00 1.13dB
Optimized Max Degree200 1.04dB
4
Optimized Max Degree50 1.26dB
4
Optimized Max Degree200 0.97dB
Optimized Max Degree50 1.16dB
5
Table 4.1: Threshold performance for SXS codes
where ,j is the .jth edge in the graph and n(,j) is the set of edges
connecting to the variable
node associated with j, er~cludiray j itself. The order in which the variable
and check node
input arrays are updated may vary depending on the actual code used. These
messages
are iteratively updated until some convergence criteria is met, as with the
other algorithms
explored in this paper.
4.2 Optimization and Performance
Density evolution can be used to find optimized codes for this algorithm, and
the equations
involved are actually identical to those used for the sum-product algorithm.
The surn-
product algorithm, however, uses floating-point values, whereas the SXS
algorithm operates
in the fixed point domain. As such, density evolution needs to also operate in
fixed point.
Fixed-point number precision adds another dimension into the search space for
good codes.
Trials were performed at 10-bit and 12-bit precision and all results presented
here are from
the 12-bit case.
Like the cases for the other decoders optimized in this paper, the SXS
algorithm was
optimized for block lengths of 1000, 10000, maximal degrees of 4 and 5, and
iterations of 50
and 200. For a, comparison reference, a regwlar f ra, 3, 6} LDPC code is also
presented. The
threshold for the regular code under Burn-product decoding is 1.13 dB.
Thresholds for the
optimized codes are presented in Table 4.1.
The performance for a regular ~ 10000, 3, 6} code is presented in figure 4.2.
Given that
53

CA 02310186 2000-06-02
0.01
0.001
W
a 0.0001
le-05
08
16 12 10
le-06
1.2 1.3 1.4 1.5 1.6 1.7
SNR (Eb/No)
Figure 4.2: Performance of the SXS algorithm for a regular {10000, 3, 6} code
with maximum
iterations set to 200. The effect of varying precision is also shown.
the theoretical threshold for this regular code is 1.13dB, the performance of
the decoder is
very good, especially considering the limitation on the number of iterations.
Note that the
performance of the decoder degrades rapidly for fixed-point precisions of less
than 10-bits.
The performance for 8-bit precision is poor, and the 6-bit performance would
not even be
visible on this graph. Once a limit is reached, at about 12 bits, there
appears to be little
additional gain by greater precision.
Figure 4.3 shows the performance of this decoder for a regular {1000, 3, 6}
code. The
performance is worse than the longer code presented in figure 4.2, as is
expected. For
comparison with the sum-product algorithm, figure 4.4 shows the performance of
both the
SXS decoder performance and sum-product decoder performance for the same
regular code.
We see from the plot that the sum-product algorithm performs better, though
not by much.
Genetic algorithms were used to find good degree sequences for the case of 200
maximum
iterations, with maximal variable degrees of 4 anal 5, these two being chosen
to keep overall
complexity to a minimum. The performance for these two codes and the optimized
sum-
54

CA 02310186 2000-06-02
0.01
0.001 ~-
0.0001
\~~
le-06 L ' ' ' ~ ~ ' ' r i
1..5 1.8 1.85 1.9 1.95 2 2.05 2.1
SNR. (Fb/No)
Figure 4.3: Performance of the SXS algorithm for a regular {1000, 3, 6} code
versus varying maxi-
mum iterations. The effect of varying precision is also shown. The curves,
from worst to best, are:
10-bit: 50 iterations, 10-bit: 200 iterations, 12-bit: 50 iterations, 12-bit:
200 iterations
0.01
.,
0.001
..
..
0.0001
. SXS
SP ~\ SXS SP~'~.
le-05
le-06
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1
SNR (Eb/No)
Figure 4.4: Comparison between SXS and sum-product performance for regular (3,
6) codes.
The two left curves are block lengths of 10,000, the two right curves are
block lengths of
1000. In both cases the sum-product decoder is more powerful.

CA 02310186 2000-06-02
0.01
SPOPT5,103'
0.001 _ ___ ____ __________
__-SPOPT5,104______ __ _________
0.0001
SXSOPT5,103
le-05 XSOPT5,104
REG,104 REG,103 w~
le-06 ' ' ' ' ' ' ' ' ' ' "
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1
SNR (Eb/No)
Figure 4.5: Performance of optimized SXS codes with maximum iterations set to
200 of lengths
1000 and 10,000. REG refers to the regular (3, 6) code, OPT5 and SXS-OPT5
refer to the optimized
codes of maximum variable 5 for the sum product optimization and SXS
optimization respectively.
product codes of Section 3.2 are seen in Figure 4.5. Surprisingly, the
optimized sum-product
codes perform very poorly when decoded using the SXS algorithm. Even in the
higher SNR
regions, most codewords did not decode correctly. Instead, two or three bits
got 'stuck' in an
incorrect state. The reason for this is the quantization of values in the SXS
algorithm. The
highly optimized sum product codes get stuck at fixed points since the
incremental gains
made in these regions are quantized away. The optimized SXS codes however,
perform well,
and their performance may possibly be made better given a more advanced
transformation
technique.
Solving the transformation mapping problem is not straightforward, and a few
approaches
were attempted besides simply increasing the precision of the fixed-point
values. A 'pes-
simistic', 'realistic', and 'optimistic' mapping scheme were all used. The
pessimistic approach
assumed that in the case of a one-to-many mapping, that the transformed values
magnitude
was the smallest magnitude that would be mapped to that value. Similarly, the
realistic
scheme used an average value, while the optimistic scheme used the largest
value. None
of the attempts tried yielded a significant improvement in performance, and
the realistic
56

CA 02310186 2000-06-02
scheme was selected for the trials.
This mapping problerr~ is similar to the dynamic range problem associated with
linear
amplifiers, and a useful solution in that realm is companding. It may be
possible to further
mitigate the effects of this problem borrowing the companding technique.
Companding is
allocating the bits in a non-linear manner in such a way that small values are
not affected
too much by quantization error while large values do not extend beyond the
range of the
fixed-point number. This is an interesting area for future research for this
algorithm.
The presented figures provide a good indication of the performance of this
decoding algo-
rithm. Given that the SXS algorithm was developed first with the ideas of
implementation
and simplicity, and then looking toward performance, it is a positive sign to
see that the
code performs well in general. We shall now examine the complexity of this
algorithm.
4.3 Complexity
Having been designed with reduced complexity in mind, and having seen that the
perfor-
mance of these codes can be quite good, it is important to understand the
overall complexity
that is involved in the practical implementation of the SXS algorithm. To this
end, we will
calculate the computational power required per bit decoded.
At each computational node, we assume that the FBA is used. It is also assumed
that the
hardware that the algorithm is implemented on is able to handle the bit-depths
required with
one operation, e.g., 12-bit memory look-ups require one operations, as do 10-
bit additions.
Taking these assumptions into account, the number of computations required per
bit is
~'sxs=I 8- 5 - 6
fo ~(x~~x fo p(x~d~ (4.5)
It is difficult to do a straight comparison with the decoders discussed in
Chapter 3 due to
the fixed vs. floating point computation differences. The computations
required per bit for
the SXS algorithm are presented in Table 4.2. If we assume a one-to-one
complexity between
fixed and floating point computations, we can compare the optimized SXS
algorithm decoder
with an equal complexity min-sum or modified min sum decoder. Plotting the
performance-
complexity point in Figure 4.6 we see that the overall performance is superior
to the ruin-sure
57

CA 02310186 2000-06-02
Code ComplexitySNR, for 10-3 SNR, for 10-5
BER,~ BER.
Regular (3,6) 5.33 1.27dB 1.41dB
Optimized Max Degree5.06 1.18dB 1.33dB
4
Optimized Max Degree5.55 1.14dB I 1.28dB
Table 4.2: Complexity and target SNR for SXS Algorithm codes of length 10,000.
Complex-
ity is presented as computations per bit per iteration.
decoder with similar complexity, and is not quite as good as the optimal
modified min-sum
decoder, although the decoding complexity is lower. This situation remains
true at lower
target BER as well. Given this pessimistic comparison between fixed-point and
floating-
point complexity, it is encouraging to see that as a worst case, the SXS
algorithm performs
quite well for its complexity. A more realistic comparison will only show an
overall better
outcome in favor of the SXS decoder.
4.4 Implementation Architecture
The SXS algorithm has been designed with the intention of being easily
irnplernented in
a, fixed-point hardware environment, and we have seen that with this goal in
mind good
performance is still possible. The use of a non-optimized, regular code avoids
the problem
of error floors, and their constant degree in both check and variable nodes
will simplify the
structure of the hardware required to implement the algorithm. We now provide
an example
of a hardware implementation.
Let the code be a regular {1000, 3, 6} code, therefore the number of edges on
the graph
is 3000. Figure 4.7 shows a block diagram for a possible implementation, and
is the used for
this example. Assume that there is a method to convert channel information
into fixed-point
probability messages (~cr~,,i~ messages), and store the received word in an
array called Input.
There will be two sets of messages that will be needed, and the memory
requirement for these
are each of size 3000 and are the check and variable message memory blocks.
There is also
the need for two look-up tables to perform the necessary transformations from
one message
58

CA 02310186 2000-06-02
1.05
OPT5 SXS o, 09
1 ~
1
. 11
pPT4 SXS ~
1.15~ 05
o ~ Mod. Min-Sum
1.204
\ 13
11 15
W 1.2509
'--'
20
1.3o~
Min-Sum
1.35 ~5
1.4
04
1.45
4.5 5 5.5 6 6.5 7
Complexity (Computations/Bit/Iteration)
Figure 4.6: Performance-Complexity comparison of the SXS algorithm with
floating-point
algorithms. Target BER performance is 10-3.
domain to the other, called the check and variable node transform blocks. The
memory
requirements for these arrays is entirely dependent on the fixed-point
precision used, and if
we assume 8-bit precision, we require a memory of size 2 x 2~ --- 512 bytes.
The decoder must know which edges participate at each of the nodes, and this
is achieved
through the use of the check and variable node descriptors, both of length
3000. Each of
these arrays contains a list of how each edge is connected to its neighbour,
and provides the
information required to perform the update rules. Note that there is only a
need for one of
these arrays, as the information from one can be determined from the other,
but the format
in which the information is stored in both allows for a simple approach to
updating both
check and variable nodes.
Once the input array is initialized, the variable message memory block is set
to be era-
sures, and the decoding process may commence. The variable node descriptor is
stepped
through and in groups of three, the FBA is used to produce three output
messages which
are transformed and written to the proper locations in the check message
memory block. A
similar process is used to then write new variable messages. The process
repeats until some
59

CA 02310186 2000-06-02
Check Node
Descriptor
Check Node ~ Check Node
Update Transform
Function Block Block
Check Msg. ~ J ~ ~ Variable Msg
Memo Memo Decoded Word
Block ry ~ ~~ ~~ Block ~
Variable Node Variable Node
Transform ~ Update
Block Function Block
Variable Node ~ ~ Input ~ From Channel
Descriptor
Figure 4.7: Block diagram of SXS algorithm architecture. Square blocks
represent memory
locations, rounded boxes are computation nodes.

CA 02310186 2000-06-02
termination criterion is satisfied, which is not shown in Figure 4.7. The
criterion may simply
be some counter tracking the iterations, or may be a test to determine if a
valid codeword
has been reached. After the decoder has terminated, the decoded word is
output. This may
be a soft decision or a hard decision, but some decision is made and is
output. The word
may be used from that point in future processing.
This relatively straightforward implementation requires an overall memory of
size 13, 512,
or about 13k. In general, the amount of memory M needed for this
implementation for a
code of length n and a bit precision of b is
i
M = 4n ~ ~(r)dr, + 26+i (4.6)
.o
It is also useful to look at the throughput capabilities for this
architecture. The number
of computations per bit per iterations is given in (4.5). Given a rate R = 2
code, and limiting
the number of iterations to 25, the number of computations required per
information bit is
Cl = 25 x 5.33 : 2 = 266.5 (4.7)
In a completely serial implementation such as the one presented, if the
processor was capable
of 40MIPS, then the throughput would be approximately 150kbps. As a very rough
compari-
son, convolutional codes can achieve about the same throughput on a similar
system, though
with fewer memory requirements ~21~. However, these codes do not necessarily
perform as
well.
One of the benefits of LDPC codes is their inherent parallel structure. This
structure can
be taken advantage of in a hardware implementation to greatly enhance the
throughput of the
system. It would not be difficult to introduce parallel update function blocks
to speed up the
decoder, and although doubling the number of function blocks wouldn't
necessarily double
the throughput, it would increase it dramatically as most of the computations
required take
place in these blocks.
61

CA 02310186 2000-06-02
Chapter 5
Conclusions and Future Work
Low Density Parity check codes were shown to be the most powerful codes found
to date,
approaching to within a fracaion of a decibel of the Shannon limit. We
investigated the
properties of LDPC codes and examined the notion of density evolution. This
powerful
analytical technique allows the performance of LDPC codes to be predicted and
optimized
under certain assumptions, and empirical evidence has supported these claims.
Reduced complexity LDPC decoders were presented. Three decoding algorithms,
the
sum-product, min-sum, and modified min-sum algorithms are described, and a
technique for
optimizing their performance was developed. Density evolution was presented as
a tool to
maximize the performance of a given decoding scheme and was applied
successfully to the
three algorithms discussed. This approach may be used for any iterative
decoding scheme
that may be represented on a graph. We explored some of the issues affecting
the performance
of these codes including maximal degree, number of maximum iterations allowed,
and block
length of the codes. The sum-product algorithm was found to be the most
powerful decoder,
but also the most computationally intensive.
Performance-complexity issues were examined, and the trade-offs clearly
presented. For
all the decoders, there was a limit, at least in practical decoding scenarios,
where increasing
the computational effort no longer increased the performance of the code. This
important
result illustrates that an optimum operating point can be found for these
decoders. While
the sum-product algorithm provided the best performance, the modified min-sum
algorithm
62

CA 02310186 2000-06-02
outperforms it in certain situations, and with m<~ny fewer computations
required per bit.
The min-sum algorithm, the least complex of the decoders presented performed
nearly as
well as the modified min-sum algorithm and may be an advantageous scheme for
applications
requiring low complexity.
The sum-transform-sum algorithm was described as an attempt to build a
powerful yet
simple LDPC decoder by exploiting the efficiencies found in the implementation
platform,
in this case, hardware. The fixed-point representations, and access to memory
provided the
means for the success of this approach. The performance of the SXS algorithm
was good,
but very dependent on the fixed-point precision and transformation look-up
tables. This
algorithm is a very viable solution for the implementation of an LDPC decoder
in hardware,
and a description of an actual implementation was detailed.
5.1 Suggestions for Future Research
LDPC research is at an exciting and vibrant stage; there are a great number of
advancements
being made, and thoughts arc' turning more towards the application of these
powerful codes.
The use of sparse matrix techniques has improved the encoding problem for
these codes, but
it would be nice to find a truly linear time encoding procedure. This problem
may be solved
by exploiting irregular codes and their tendency to have many low weight
variables. This
would allow the parity check matrix to be constructed in a manner to be
encoded in linear
time.
We have seen the effective use of practical decoding systems and have
presented an al-
gorithm capable of being easily implementable in hardware. Many applications
could take
advantage of this system, including the use of LDPC codes for wireless data
transmission,
satellite communications and storage media such as disk-drives. Third
generation (3G)
cellular systems have discovered the power of iterative decoding schemes and
are writing
turbo-codes into their specifications. LDPC codes could also be used with the
same perfor-
mance, and have the added advantage of being patent free, thus no royalties
are required for
their use.
As an extension to the work presented in this paper, it would be useful to
examine a
63

CA 02310186 2000-06-02
few areas in more detail. There may be greater gains in performance made by
tweaking the
modified min-sum algorithm. If a simple yet accurate representation can be
found for the
check node update rule, the algorithm may be made more powerful. Perhaps it is
possible
approach the problem from a stochastic viewpoint, and find a probabilistic
approximation
to the transfer function. It is also useful to investigate the error floor
that results from
optimizing the codes and finding a balance between performance and the
severity of the
error floor.
The effect of cycles in LDPC codes and efficient LDPC code construction
techniques are
also a useful area for further work. There is still no analysis available to
determine the effect
that cycles have on codes, though we know that it is detrimental to
performance. There are
likely certain construction techniques that may be used to mitigate the
existence of cycles
without affecting the probabilistic balance required for optimal decoding
power.
Finally, further investigation into some of the negative aspects of the SXS
algorithm
would be instructive. The performance of these codes could possibly be much
better if there
were a practical solution to the mapping problem caused by the domain
transformation that
the algorithm requires. The use of cornpanding to extend the dynamic range of
the fixed
point numbers may provide answers. It may also be possible to adjust the
computations at
each of the nodes in such a way that their output is already in the
transformed state, and
this non-trivial advancement could further simplify the decoder.
Ultimately, the next major step with LDPC codes will be their adoption and use
in
widespread applications. The power of these codes, and the advances in
decoding and
encoding performance have put. them into the 'practical and useful' realm. The
potential for
exploiting the parallel structure of these codes should provide a throughput
boost that will
also enable their use for high bit rate applications in the near future.
64

CA 02310186 2000-06-02
Appendix A
The Forurard-BackvVard Algorithm
The Forward-Backward algorithm (FBA, also known as BCJR, after its authors),
was de-
veloped by Bahl, Cocke, Jelineck and Raviv in 1974, and represented a major
step forward
in computational efficiency for decoding on trellises. The algorithm is used
throughout the
course of this work and will be presented here.
Consider a parity check equation fl (.rl, .r,2, x3), a function of three input
variables. Fig-
ure A.1 shows a trellis diagram associated with ,fl(). Having a degree of
three, we want to
determine the three messages that are to output along each edge, based on the
three input
messages. A simple, but computationally intensive method is to calculate each
output one
by one. This requires a number of repetitive calculations, and the FBA removes
the need to
repeat these, resulting in a more efficient procedure.
Each stage in this trellis represents one of the three variables that
participate with the
particular parity check, and each node represents points where the parity
function is sat-
isfied. Edges represent the possible values that each variable an take. Each
edge also has
a corresponding input message, a f~~i (z), represented by P~~ (z) in the
figure. Any possi-
ble path between the nodes represents a possible combination of (xl, ~2, xs)
which satisfies
parity.
The FBA is aptly narne<l, as it first computes a set of forward values, known
as the cx
values; and then a set of backward values, known as the /3 values. There are
two forward
values, ao~n~ and al~n~ for each stage of the trellis where n is stage in the
trellis. Moving

CA 02310186 2000-06-02
P~2(0)
' 0 '
Pa,~~) i ' Px3~~)
i i
i i
0 ~ 1 P~z(1)~ 0
1
P~~~1) ~ ~ P:~3~1)
0
P~Z(0)
Figure A.l: Trellis diagram of check with sites, xl, .r,2 and .r,3
forward through the trellis we calculate the values of a, and ao(1~ and al(1~
are initialized to
1 and 0 respectively. Figure A.2 shows how a is computed at each stage, and
are computed
using the equations
ao(n~ = c~o(n - l~u~"-,-~yl(0) + al (n - 1~~~~-,--,f, ~l)~
and
al(n~ = ao(n - l~u~~-,-~f, ~1) + al (n - l~u~,~-,--~f, ~0)~
Once all of the ~ values are determined, there is a backward step, which sweep
through
the trellis in the opposite direction. Two values, ~3o and ,Ql are computed
for each stage of
the trellis in the backward step. The first values, ,Qo (3~ and X31 (3~ in our
example, are set to
l and 0 respectively. The remaining values are computed using the equations
,~o(n~ _ ~3o(7z + l~u~,~+,-->f, ~0) + ,~i(n + l~u~~+,->f, ~1)~
and
,~i(n~ _ ,~o(n + 1~~~,~+,->ful) + ~yn + l~u~~+,->fn0)~
66

CA 02310186 2000-06-02
After all of the a and ~3 values have been determined, the output messages
from each of
the check node edges may easily be determined. This is calculated using the
equations
u,i~-»~(~~ = ao~n~~o~n~ -~ ai~n~~yn~~
and
'uf~-»,a ~ 1 ~ = ao (n~,~i (n~ --I- al ~n~,~o ~n~ .
At this point, all of the output messages from the node have been determined.
This
process is repeated at all nodes to update all of the messages in one side of
the graph, and
this algorithm can be applied for both the check and variable nodes. The
algorithm works
well regardless of the actual computation required to determine an a or f~,
whether it is as
simple as an addition or a complex product, the overall computational power
needed can be
greatly reduced, especially a,s the trellis grows in length.
67

CA 02310186 2000-06-02
ao [n]
forward
a 1 ~ry
Qo[~.] pxa(0)pas(0)+ ~ P 0 ~ 1
pxa (1)I'~3 (1)
ward
,
,
/~i[n] P~a(0)p~3(1)+
pxa(1)pTa(0) p~ 7 0
,
,
,
,
1 ~ 2 ; 3
Figure A.2: FBA a, ~3 computation
68

CA 02310186 2000-06-02
Bibliography
(1~ S. Crozier A. Hunt and D. Falconer. Hyper-codes: High-performance low-
complexity
error-correcting codes. In Proceedings of the 19th Biennial Symposium on
Commrcnica-
dons, C~~ieen',s Urziversit;y, Kingston,, Ontario, pages 263-270, 1998.
(2~ C. Berrou, A. Glavieux, and P. Thitirnajshima. Near Shannon limit error-
correcting cod-
ing and decoding: Turbo-codes. In Proceedings of 199 IEEE International
Conference
ore Communications, pages 1064-1070, Geneva, Switzerland, 1993.
(3~ T. Cover and J. Thomas. Elements of Information Theory. John Wiley & Sons,
NY,
NY, 1991.
(4~ M.C. Davey and D. MacKay. Low density parity check codes over GF(q). IEEE
Com-
mv,niratinr~,.s Letters, Jun 1998.
(5~ M. Frigo and S. Johnson. FFTW: Fastest fourier transform in the west
v.2.1.3.
http:~~www.fftw.org.
(6~ R..G. Gallager. Low Density Parity Check Codes. Number 21 in research
monograph
series. NIIT Press, Cambridge, Mass., 1963.
(7~ C. Hartmann and L. R.udolph. An optimum symbol-by-symbol decoding rule for
linear
codes. IEEE Trarasactiaras o~rc Ireforrraatio~rc Theory, IT-22(5), Sept. 1976.
(8~ R,. Johannesson and K. Zigangirov. F~cnda,mertta,ls of Convoh~tional
Coding. IEEE Press,
New York, New York, 1999.
69

CA 02310186 2000-06-02
(9~ G. David Forney Jr. Codes on graphs: Generalized state realizations.
Submitted to
IEEE Transactions on Information Theory, 1999.
(10~ F. Kschischang, B. Fret', and H. Loeliger. Factor graphs and the sum-
product algorithm.
Submitted to IEEE Transactions on Information Theory, Jul 1998.
(11~ F. Jelinelc L. Bahl, J. Cocke and J. Raviv. Optimal decoding of linear
codes for mini-
mizing symbol error rate. IEEE Transactions on Information Theory, Mar 1974.
(12~ M. Luby, M. Mitzenrnacher, M. Shokrollahi, and D. Spielrnan. Analysis of
low density
codes and improved designs using irregular graphs. In Proceedings of the 30th
Annual
ACM Symposium on the Theory of Computing, pages 249-258, 1998.
(13~ D. MacKay. Decoding times of irregular Gallager codes. Irz Progress, Oct
1998.
(14~ D. MacKay. Good error-correcting codes based on very sparse matricies.
IEEE Trans-
actions ova Ir~forrr~ation Theory, 45(2, Mar 1999.
(15~ D. MacKay and Nl. Davey. Evahration of Gallager codes for short block
length and high
rate applications. In Progress, Jan 1999.
(16~ D. MacKay, S. Wilson, and M. Davey. Comparison of constructions of
irregular Gallager
codes. In 1998 Allerton Conference on Communication, Control, and Computing,
1998.
(17~ P. Hoeher P. Robertson and E. Villebrun. Optimal and sub-optimal maximum
a poste-
riori algorithms suitable for turbo decoding. IEEE Communications Theory,
8(2~:119-
120, March 1997.
(18~ T. Richardson, A. Shokrollahi, and R. Urbanke. Design of provably good
low-density
parity check codes. Sub7nitted to IEEE Transactions on Information Theory, Mar
1999.
(19~ T. Richardson and R. Urbanke. The capacity of low-density parity check
codes under
message-passing decoding. Submitted to IEEE Transactions on Information
Theory,
Nov 1998.

CA 02310186 2000-06-02
(20~ K. Gracie S. Crozier, A. Hunt and J. Lodge. Performance and complexity
comparison of
block-like turbo codes, hyper-codes, and tail biting convolutional codes. In
Proceedings
of the 19th Biennial Symposium on Communications, Queen's University,
Kingston,
Ontario, pages 84-88, 1998.
(21~ P. Sauve S. Crozier, J. Lodge and A. Hunt. SHAR,C DSP in flushed and tail-
biting
Viterbi decoders. The DSP Applications Newsletter, 41, September 1999.
(22~ C.E. Shannon. A Mathematical Theory of Communications. Bell Systems
Technical
Journal, 1948.
(23~ Vladislav Sorokine. Gallager Codes ,for CDMA Applications. PhD thesis,
University of
Toronto, Canada, 1998.
(24~ D. Spielman. Linear-time encodable and decodable error-correcting codes.
IEEE Trans-
actions on Information Theory, 42, Nov 1996.
(25~ D. Spielman. The complexity of error correcting codes. Lecture Notes in
Compv,ter
Science, 1279:67-84, 1998.
(26~ D. Spielman. Finding good LDPC codes. 36th Annual Allertora Conference on
Com-
munication, Control, and Computing, 1998.
(27~ Michael Wall. GAlib: A C++ library of genetic algorithm components.
http:~~lancet.mit.edu~ga~.
(28~ S. Wicker. Error Control Systems for Digital Corremrcnicatiora and
Storage. Prentice
Hall, Upper Saddle River, New Jersey, 1995.
71

CA 02310186 2000-06-02
We present the Sum-X-Sum algorithm, a representation and method for
computation of error
control decoding metrics, and provide an example for a hardware implementable
low density parity
check decoder which performs well with low complexity. Performance and
complexity are compared
with the sum-product algorithm and it is found that while the Sum-X-Sum
algorithm performs
nearly as well, it is much less co7nputationally complex.
1 Introduction
One of the major drawbacks in the implementation of iterative decoding schemes
has been the relative
complexity in the decoding procedure. Foe example, low density parity check
(LDPC) codes that perform
well generally require too many iterations or require block lengths that are
too long for practical applications.
Conversely, those codes that may be practical don't stand up to comparisons
with other decoding methods
such as Turbo-codes ~1~.
It has been a challenge to find a method to reduce the complexity of the
iterative decoding process
without significantly reducing the performance ~2~. We make use of the duality
found in code graphs to
reduce the complexity without an effect on code performance. The use of a dual
code for the purpose of
decoding is not new (3, 4~, but the efficient transformation technique from
one state to another has not been
previously used.
Iterative decoding operation on graphs generally has two steps; updating the
messages sent to checks
nodes, and updating messages sent to variable nodes. With the Sum-Product
algorithm, a complicated yet
powerful decoding technique, it is simple to compute the 'Sum' messages, but
quite complicated to compute
the 'Product' messages ~5~.
We solve this difficulty by transforming the messages needed for the 'Product'
computation and trans-
forming them into their dual representation so that the 'Product' computation
actually becomes a 'Sum'
computation. The added complexity of the transformation is generally much less
than the savings from the
dual message calculation. The algorithm may be implemented in conventional
manners, both in hardware
and software.
2 Algorithm Description
Iterative decoding schemes such as the sum-product algorithm are generally
described in a graphical form.
Figure 1 gives an example of a graphical description of a parity check code.
The graph is bipartite, with
variable nodes being represented by the circles at the top, check nodes
represented by squares at the bottom,
and edges showing the interconnection between the two. Iterative decoders
operate by passing messages
back and forth along the edges, computing new messages at both the variable
and check nodes.
1

CA 02310186 2000-06-02
Figure 1: Example of a graphical representation of an LDPC code. The circles
represent variables, and the
squares are parity check nodes
Figure 2: Message passing and updating at variable and check nodes showing
which message representations
are used for each update equation.
The Sum-Transform-Surn Algorithm (Sum-X-Sum Algorithrrr), is au instantiation
of a soft-input, soft-
output iterative decoding algorithm, and is similar to the standard sum-
product or min-sum algorithms. The
inputs to the algorithm are soft representations of the probabilities for the
possible transmitted symbols.
For example, over the binary input, channel, these possible states are 0 and
1. The probabilities may be
represented by a floating point value, a fixed point, value or some other
proportional means, depending on
the actual implementation.
Unlike other iterative decoding schemes, the Sum-X-Sum algorithm utilizes two
separate probability
representations for messages and requires a transformation algorithm to
convert between the two. The
method with which the probabilities arc transformed may be implemented using
logic, functions, memory,
or any combination of these.
The two representations for messages used are the log-ratio and the log-
magnitude-difference, both of
which have been used in other decoding algorithms (2~ (6~. The log-ratio
representation can be written,
ratio = log P~i~ (1)
The log-magnitude-difference representation is a pair of values (s, r), the
first defined over the set {-1, 1}
representing the sign bit, and the second over the set of positive real
numbers such that
~.~~fr --- (Sgn (Pfol - P(11) , Ilog IP(o~ - p(1111) (2)
With two separate probability representations, the Sum-X-Sum algorithm
requires a function r(x) (s, r)
to transform messages between them. Let ~e,.atio be a message in the log-
likelihood ratio domain, and ~.difp
be a message in the log-likelihood difference pair domain. The function T(~)
(s, r) takes as its argument

CA 02310186 2000-06-02
a message of the form pratzo and returns a message of the form pd2p f. The
returned message has the
equivalent probability as the input, message, only expressed in a different
manner. Similarly, the inverse
transform function h-t ( s, r) (~) takes as its input. a message of the form
~,di p f and returns the equivalent
probability in a message of the form ~,,.o~.to. The function r(x)(s,r) may be
written as
r(:r)(s,r) _ Csgrr I2 ~ PT ~ - 1J , log 2 ~ Pa ~ - I ~ (3)
L 1+e~' eT+1
where x is a message of the form ~.,.~t~o. Likewise, the inverse transform
function r-t (s, r)(~) may be written
as
),-r (s, r) (x) = log ~ 1 + SeT ~ (4)
where (s, r) is the sign and magnitude of a message of the form ~,~.~ f f.
The function in (3) and its inverse, (4), may be implemented very simply using
a pair of look-up tables
in memory. If, for example, messages are stored as 8-bit (N-bit in general)
values, then two look-up tables
each of size 2s = 256 bytes (2'~' N-bit values in general) are required to
completely describe the forward and
inverse functions. One need only to calculate the value of the transform for
each possible input once, and
store the results in memory for future use. Alternatively, the function and
its inverse may be used themselves
to calculate the necessary transforms on an as-needed basis.
It may also be possible to implement the transformation through the use of a
combination of logic and
memory. For example, it rnay be a simple matter to use Borne logic to
determine the most significant bits
of the transform output, and determine the least significant, bits in a
reduced look-up table. This flexibility
allows the algorithm to be used in an optimal manner depending on the platform
that is it is implemented
on. If processing speed is a constraint, it would make sense to take advantage
of the look-up table as much
as possible as opposed to calculating multiple transforms.
The algorithm passes and updates two sets of messages back and forth
iteratively, one set such that the
parity constraints are met using ~rdi f f representation as inputs, and the
other such that certain variable
constraints are met using the ~er~t2o representation. An examination of Figure
2 shows how messages are
input into a node and an output is determined. For each message input into a
node, a corresponding message
which has been transformed is used to update that message going in the other
direction. This will be used
as an input for the other side of the graph. For example, the output of a
check node will be transformed
from a ~C~iff to a /~rotio message.
Two arrays may be used to store the set of messages being passed in both
directions on the graph, one
for check node input (and hence variable node output), and the other for
variable node input. The message
updating rule for a variable will take as its inputs a number of elements from
the variable input array, and
output in a transformed representation, the same number' of elements to
possibly different elements of the
check node input array. This is done for all variable nodes in the graph so
that all elements in the variable
input array are used possible a multiple of times.
Likewise, the message updating rule for check nodes takes as inputs a number
of elements from the check
input array, and outputs in the transformed representation the same number of
elements to possibly different
elements of the variable input array.
2.1 Algorithm Initialization
Information is sampled from the <;hannel and converted into probability
messages of the form ~rotio and are
stored in an array as the messages y,i~~t. This set of probabilities are used
to determine whether a valid
codeword has been received. If a valid codeword is detected, no decoding needs
to be done, and the codeword
is output from the algorithm. If a valid codeword is not detected, the
received messages are transformed
into ~,d2 p p messages and are used to seed the check node input array.

CA 02310186 2000-06-02
The algorithm requires a useful description of the code construction, which
rnay be a table listing the
connectivity between the check nodes and variable nodes. The actual table may
be randomly determined
using permutations or some other technique, or the code may be constructed
manually. This table is used
to determine which messages participate in a given check node or variable
update equation, which are the
sets f (i) and f (j) in Figure 2.
2.2 Decoding Procedure
Once information has been sampled off the channel and a valid codeword has not
been detected, the decoding
algorithm commences by updating the variable input, array messages. This is
accomplished using the following
equation:
l~rnti.a - r ~ ~ ''~'w;~; f~' ~ rW;~=!I (5)
~Ef(2) .xEf(~)
where i is the ith element, of the variable input array and f (i) is the set
of messages connecting to the node
associated with i, excluding i itself. The variables s and r represent the
sign and magnitude for the pair
(s, r) w hick is how ~,~i f f messages are represented. This equation is used
a number of times based on the
structure of the LDPC code to update all of the elements of the variable input
array. In order to increase the
efficiency of this equation, the fnrxun,rrl-6n,ck;wn,rd algorithm may be used
~7~, a one-pass algorithm may also
be used (8~, or some other efficient, technique may be utilized. These
techniques also apply to equation 6
when updating the check input, array.
Once completed, the check node input message array is then updated. The rule
for updating these
messages is governed by the equation
~~tlif f - r ~~'inil + ~ ~lratio (6)
aEf(J)
where j is the jth element, of the check node input array and f ( j ) is the
set of messages connecting to the
node associated with j, excluding j itself. The order in which the variable
and check node input arrays are
updated may vary depending on the actual code used. The updating may be
performed in a serial manner or
parallelism may be exploited to increase the overall throughput of the
decoder. The actual update schedule
may vary, it is importarxt only that all messages are updated in a timely
manner.
After all of the messages have been updated once, one iteration of the
decoding algorithm has been
completed. The decoding algorithm may check to determine if the current
probabilities in both the variable
input array and check node input, array correspond to a valid codeword. If a
valid codeword is detected,
the algorithm stops and the current probabilities are output. The algorithnx
may also continue the decoding
procedure, by repeating each of the two update steps until some criteria is
reached such as a certain fixed
number of iterations, or until a valid codeword is detected.
If the decoding procedure fails to produce a valid codeword, a number of
actions may be taken. The
errors may simply be ignored and the current estimate of the codeword be
output. A second outer code may
be used to attempt to correctly fix the remaining errors. Some type of
retransmission request may be made,
or the output may be flagged as to be in error.
3 Performance and Complexity
Figure 3 shows the bit-error performance of the Sum-X-Sum and sum-product
algorithms for a regular (3,6)
LDPC code. The Sum-X-Sum algorithm performance approaches that of the sum-
product algorithm, and

CA 02310186 2000-06-02
0.01
,.
'
'
0.001 '
'
'
''
v
'
W ''
Pa 0.0001
'' , SXS
SP '' SXS SP,~~'~.
1 e-05
v
le-06
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1
SNR, (Eb/No)
Figure 3: Comparison between Sum-X-Sum and sum-product performance for regular
(3, 6) codes. The two
left curves are block lengths of 10,000, the two right curves are block
lengths of 1000. In both cases the
sum-product decoder is more powerful.
for large enough fixed-point precision, the performance actually equals that
of the sum-product algorithm.
The performance of both these algorithms may be increased significantly by
using optimized irregular LDPC
codes, but. for comparative purposes we shall focus on regular LDPC codes.
Having been designed with reduced complexity in mind, and having seen that the
performance of these
codes can be quite good, it is important to understand the overall complexity
that is involved in the practical
implementation of the Sum-X-Surn algorithm. To this en<I, we calculate the
computational power required
per bit decoded.
At each computational node, we assume that the Forward-Backward algorithm ~7~
is used. It is also
assumed that the hardware that the algorithm is implemented on is able t,o
handle the bit-depths required
with one operation, e.g., 12-bit memory look-ups require one operations, as do
10-bit additions. Taking these
assumptions into account, the number of computations required per bit is
CSum-X-Sum = I $ r 5 - r 6 (7)
fo ~(~)d~ fu P(~)rdx
where I is the number of iterations. It is difficult to do a straight
complexity comparison with the sum-
product algorithm due to the fixed vs. floating point computation differences.
The computations required
per bit for the Sum-X-Sum and sum-product algorithms are presented in Figure
4, assuming a one-to-one
complexity between fixed and floating point computations. The Sum-X-Sum
algorithm requires a significantly
lower number of computations than the sum-product, so clearly there is an
advantage in its use. Given the
pessimistic comparison between fixed-point and floating-point complexity, it
is encouraging to see that as a
worst case, the Sunr-X-Sum algorithm performs quite well for its complexity. A
more realistic comparison
will only show a.n overall better outcome in favor of the Sum-X-Sum decoder.

CA 02310186 2000-06-02
Figure 4: Complexity and target SNR for Sum-X-Sum Algorithm codes of length
10,000 vs. similar codes
using the sum-product algorithm. Complexity is presented as computations per
bit per iteration.
4 Implementation Architecture
The Sum-X-Sum algorithm has been designed with the intention of being easily
implemented in a fixed-point
hardware environment, and we have seen that with this goal in mind good
performance is still possible. The
use of a non-optimized, regular code simplifies the structure of the hardware
required to implement the
algorithm and will be used to describe the hardware. In general however, any
iterative code may use this
algorithm
Let the code be a regular {1000, 3,6} code, therefore the number of edges on
the graph is 3,000. Figure 5
shows a block diagram for a possible implementation, and is the used for this
example. Assume that there
is a method to convert channel samples into fixed-point probability messages
(~cratio messages), and store
the received word in an array called Input. There will be two sets of messages
that will be needed, and the
memory requirement for these are each of size 3,000 and are the check and
variable message memory blocks.
There is also the need for two look-up tables to perform the necessary
transformations from one message
domain to the other, called the check and variable node transform blocks. The
memory requirements for
these arrays is entirely dependent on the fixed-point precision used, and if
we assume 8-bit precision, we
require a memory of size 2 x 28 = 512 bytes.
The decoder must know which edges participate at each of the nodes, and this
is achieved through the
use of the check and variable node descriptors, both of length 3,000. Each of
these arrays contains a list of
how each edge is connected to its neighbour, and provides the information
required to perform the update
rules. Note that there is only a need for one of these arrays, as the
information from one can be determined
from the other, but the format in which the information is stored in both
allows for a simple approach to
updating both check and variable nodes.
Once the input array is initialized, the variable message memory block is set
to be erasures, and the
decoding process may commence. The variable node descriptor is stepped through
and in groups of three,
the FBA is used to produce three output messages which are transformed and
written to the proper locations
in the check message memory block. A similar process is used to then write new
variable messages. The
process repeats until Borne termination criterion is satisfied, which is not
shown in Figure 5. The criterion
may simply be some counter tracking the iterations, or may be a test to
determine if a valid codeword has
been reached. After the decoder has terminated, the decoded word is output.
This may be a soft decision or
a hard decision, but some decision is made and is output. The word may be used
from that point in future
processing.
This relatively straightforward implementation requires an overall memory of
size 13, 512, or about 13k.
In general, the amount of memory 111 needed for this implementation for a code
of length n and a bit precision
of b is
M = 4n ~ .~(~)dx + 2b+r (8)
0
It is also useful to look at the throughput capabilities for this
architecture. The number of computations
per bit per iterations is given in equation 7. Given a rate R = 2 code, and
limiting the number of iterations
to 25, the number of computations required per information bit is
CI = 25 x 5.33 = 2 = 266.5 (9)

CA 02310186 2000-06-02
Check Node
Descriptor
Check Node Check Node
Update Transform
Function Block Block
Check Msg. Variable Msg.
Memory Memory Decoded Word
Block ~ ~ 1 Block
Variable Node Variable Node
Transform Update
Block ~ Function Block
Variable Node Input ~ From Channel
Descriptor
1' I~llre J: BIOCIC dlr'l~f8IT1 of Sllnl-X-Sllm al~Orltrlnl a,rC~llteCtllre.
S(~llare blOfICS repreSPIlt memory IOCat10I1S~
rounded boxes are computation nodes.

CA 02310186 2000-06-02
In a completely serial implementation such as the one presented, if the
processor was capable of 40MIPS,
then the throughput would be approximately 150kbps. As a very rough
comparison, convolut,ional codes
can achieve about the same throughput on a similar system, though with fewer
memory requirements (9J.
However, these codes do not necessarily perform as well.
One of the benefits of LDPC codes is their inherent parallel structure. This
structure can be taken
advantage of in a hardware implementation to greatly enhance the throughput of
the system. It would not
be difficult to introduce parallel update function blocks to speed up the
decoder, and although doubling the
number of function blocks wouldn't necessarily double the throughput, it would
increase it dramatically as
most of the computations required take place in these blocks.
Conclusions
We presented a novel iterative decoding representation and method which is
well suited to practical imple-
mentations. The use of the dual domain can greatly simplify the decoding
procedure. It is a viable solution
for a wide variety of codes, and can perform asymptotically as well as the sum-
product algorithm. The
overall complexity can be made very low through the use of look-up tables or
other simple transformation
techniques to perform some of the required calculations.
References
(1J C. Berrou, A. Glavieux, and P. Thitima,jshima, "Near shannon limit error-
correcting coding and decoding:
'Ilrrbo-codes," in Proceedin,,gs of 199, IEEE In,terreational Conference on
Comrnv,~aication,s, (Geneva,
Switzerland), pp. 1064-1070, 1993.
(2J D. MacKay, "Good error-correcting codes based on very sparse matricies,"
IEEE Transactio~as on Infor-
~nation Theory, vol. 45, pp. 399-431, Mar 1999.
(3~ C. Hartmann and L. Rudolph, "An optirnurn symbol-by-symbol decoding rule
for linear codes," IEEE
Tra~,sactioras o~c hnformatiorc Tlaeory, vol. IT-22, pp. 514 517, Sept. 1976.
(4J G. D. Forney, "Codes on graphs: Generalized state realizations,"
Sr~b~mitted to IEEE Tran.sactiorcs on
Information, Theory, 1999.
(5J F. Kschischang, B. Frey, and H. Loeliger, "Factor graphs and the sum-
product algorithm," Submitted to
IEEE Transactions on Information Theory, Jul 1998.
(6J T. Richardson, A. Shokrollahi, and R. Urbanke, "Design of provably good
low-density parity check
codes," Srcbmitted to IEEE Trarc.sactiores on Inforrrtatio~ Theory, Mar 1999.
(7~ L. Bahl, J. Cocks, F. Jelinek, and J. Raviv, "Optimal decoding of linear
codes for minimizing symbol
error rate," IEEE Transactions on Information Theory, pp. 284-287, Mar 1974.
(8~ R. Johannesson and K. Zigangirov, Fundarn,eutads of Convolv,tional Coding.
New York, New York: IEEE
Press, 1999.
~9~ S. Crozier, J. Lodge, P. Sauve, and A. Hunt, "SHARC DSP in flushed and
tail-biting Viterbi decoders,"
The DSP Applications Newsletter, vol. 41, September 1999.

Representative Drawing

Sorry, the representative drawing for patent document number 2310186 was not found.

Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Revocation of Agent Requirements Determined Compliant 2009-12-15
Inactive: Office letter 2009-12-15
Inactive: Office letter 2009-12-01
Revocation of Agent Request 2009-11-02
Inactive: IPC from MCD 2006-03-12
Revocation of Agent Request 2004-06-18
Application Not Reinstated by Deadline 2004-06-02
Time Limit for Reversal Expired 2004-06-02
Revocation of Agent Requirements Determined Compliant 2004-03-23
Inactive: Office letter 2004-03-23
Inactive: Office letter 2004-03-19
Revocation of Agent Request 2004-02-17
Inactive: Office letter 2003-07-10
Letter Sent 2003-07-10
Deemed Abandoned - Failure to Respond to Maintenance Fee Notice 2003-06-02
Inactive: Office letter 2003-03-26
Letter Sent 2003-03-13
Inactive: Single transfer 2003-01-31
Inactive: Cover page published 2001-12-02
Application Published (Open to Public Inspection) 2001-12-02
Inactive: Adhoc Request Documented 2001-11-20
Inactive: Office letter 2001-11-16
Revocation of Agent Requirements Determined Compliant 2001-11-16
Inactive: Office letter 2001-11-16
Revocation of Agent Requirements Determined Compliant 2001-11-14
Inactive: Office letter 2001-11-14
Inactive: Office letter 2001-11-14
Revocation of Agent Request 2001-11-08
Revocation of Agent Request 2001-11-02
Revocation of Agent Request 2001-10-26
Inactive: Office letter 2001-10-23
Inactive: Adhoc Request Documented 2001-10-18
Inactive: Delete abandonment 2001-10-18
Inactive: Status info is complete as of Log entry date 2001-10-15
Inactive: Abandoned - No reply to Office letter 2001-09-05
Inactive: Correspondence - Formalities 2001-06-15
Revocation of Agent Request 2001-04-03
Inactive: First IPC assigned 2000-08-17
Inactive: IPC assigned 2000-08-17
Inactive: Courtesy letter - Evidence 2000-07-18
Inactive: Filing certificate - No RFE (English) 2000-07-12
Filing Requirements Determined Compliant 2000-07-12
Application Received - Regular National 2000-07-11

Abandonment History

Abandonment Date Reason Reinstatement Date
2003-06-02

Maintenance Fee

The last payment was received on 2002-05-31

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Fee History

Fee Type Anniversary Year Due Date Paid Date
Application fee - small 2000-06-02
MF (application, 2nd anniv.) - small 02 2002-06-03 2002-05-31
Registration of a document 2003-01-31
Registration of a document 2003-02-11
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
SOMA NETWORKS, INC.
Past Owners on Record
FRANK KSCHISCHANG
JEFFREY P. CASTURA
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Description 2000-06-02 85 3,545
Abstract 2000-06-02 1 19
Claims 2000-06-02 1 31
Cover Page 2001-11-23 1 29
Filing Certificate (English) 2000-07-12 1 164
Request for evidence or missing transfer 2001-06-05 1 108
Reminder of maintenance fee due 2002-02-05 1 111
Courtesy - Certificate of registration (related document(s)) 2003-03-13 1 130
Courtesy - Abandonment Letter (Maintenance Fee) 2003-06-30 1 174
Correspondence 2000-07-12 1 15
Correspondence 2001-04-03 1 23
Correspondence 2001-06-15 3 102
Correspondence 2001-10-19 1 14
Correspondence 2001-10-26 4 130
Correspondence 2001-11-02 3 80
Correspondence 2001-11-14 1 13
Correspondence 2001-11-14 1 16
Correspondence 2001-11-08 2 52
Correspondence 2003-03-26 1 11
Correspondence 2003-07-10 1 9
Correspondence 2004-02-17 6 173
Correspondence 2004-03-19 1 13
Correspondence 2004-03-23 1 19
Correspondence 2004-06-18 4 119
Correspondence 2009-11-02 4 404
Correspondence 2009-12-01 1 13
Correspondence 2009-12-15 1 20
Correspondence 2009-12-15 4 675