Design and Implementation of Interconnect Efficient Low Density Parity Check Error Correcting Codes

A Research Proposal Submitted for SABIC/Fast Track Research Grants

By

Aiman H. El-Maleh¹ and Mohammed Adnan Landolsi²
E-Mail: {aimane, andalusi}@kfupm.edu.sa

¹Department of Computer Engineering
King Fahd University of Petroleum & Minerals
Dhahran 31261, Saudi Arabia

²Department of Electrical Engineering
King Fahd University of Petroleum & Minerals
Dhahran 31261, Saudi Arabia
ABSTRACT ............................................................................................................................................3

1. INTRODUCTION ..........................................................................................................................3

2. OVERVIEW OF LDPC CODES ..................................................................................................4

3. ITERATIVE DECODING OF LDPC CODES ............................................................................7

4. LDPC CODE DESIGN: A HARDWARE-ORIENTED PERSPECTIVE .....................................11

5. PROPOSED METHODOLOGY ................................................................................................14

6. PROJECT OBJECTIVES ...........................................................................................................15

7. PROJECT SCHEDULE ...............................................................................................................17

8. UTILITY VALUE & DELIVERABLES .....................................................................................18

9. MONITORING AND EVALUATION PLAN .............................................................................18

10. DETAILED BUDGET .............................................................................................................19

11. REFERENCES .........................................................................................................................23
Abstract
Forward Error Correcting (FEC) codes are an essential component of modern state-of-the-art digital communication and storage systems. The two leading families of FEC codes are widely considered to be Turbo codes and Low Density Parity Check codes (LDPCs), as they demonstrate performance very close to the information-theoretic bounds predicted by Shannon theory. Recently, the trend in error correcting code implementation is focused on using custom chips for codecs. This is motivated by the need to maximize performance and reduce power dissipation by the codecs. Turbo codes are hard to parallelize, and hence, benefit less from hardware implementation. LDPC codes, however, are parallelizable, and current advances in VLSI technology enables the implementation of their codecs over fairly large code word sizes. Most of LDPC code designs are based on random code generation which results in long interconnect wires. Long interconnect wires increase the load capacitance resulting in slower designs and higher power dissipation. In addition, random code designs result in interconnect routing congestion which reduces the chip area utilization. In this work, we will investigate the design of LDPC codes that are interconnect efficient resulting in high speed designs with low power consumption and maximum chip area utilization due to lower routing congestion.

1. Introduction
Forward Error Correcting (FEC) codes are an essential component of modern state-of-the-art digital communication and storage systems. Indeed, in many of the recently developed standards, FEC codes play a crucial role for improving the error performance capability of digital transmission over noisy interference-impaired communication channels. For example, Turbo Codes [1] are adopted for 3rd generation mobile cellular systems (CDMA2000/UMTS) and broadband wireless fixed access systems (IEEE 802.16), while Reed-Solomon [1] codes are used in digital versatile storage disks (DVDs). Also, Low Density Parity Check (LDPC) codes [2] have been selected for the next-generation digital satellite broadcasting standard (DVB-S2), ultra high-speed Local Area Networks (10Gbps Ethernet LANs) and are being considered for systems as well.
The two leading families of FEC codes are widely considered to be Turbo codes [1] and Low Density Parity Check codes (LDPCs) [2], as they demonstrate performance very close to the information-theoretic bounds predicted by Shannon theory, while at the same time having the distinct advantage of low-complexity, near-optimal iterative decoding.

Recently, the trend in error correcting code implementation is focused on using custom chips for codecs. This is motivated by the need to maximize performance and reduce power dissipation by the codecs. This is particularly true for coding schemes which can use the parallelism offered by hardware: by using more devices, frequency can be reduced, and voltage scaled, thereby achieving the same throughput for less power [3]. Turbo codes are hard to parallelize [4], and hence, benefit less from hardware implementation. LDPC codes, however, are parallelizable, and current advances in VLSI technology enables the implementation of their codecs over fairly large code word sizes.

Most of LDPC code designs are based on random code generation which results in long interconnect wires. Long interconnect wires increase the load capacitance resulting in slower designs and higher power dissipation. In addition, random code design result in interconnect routing congestion which reduces the chip area utilization. In [5], based on a parallel implementation of LDPC codes, it is reported that routing congestion has resulted in 50% chip area utilization and that most of the power dissipation is due to the switching activity of the long interconnect wires.

In this work, we are interested in designing LDPC codes that are interconnect efficient resulting in high speed designs with low power consumption and maximum chip area utilization due to lower routing congestion.

2. Overview of LDPC Codes

Low Density Parity Check (LDPC) codes are being re-discovered and undergoing a lot of active research after their original introduction by Gallager [2] some forty years
ago. A tutorial review of the work of Gallager that highlights the powerful, capacity-achieving performance of LDPC codes can be found in [6]. Recent advances in LDPC code design have also demonstrated special classes of codes that are shown to outperform their main competitors, namely the powerful state-of-the-art Turbo Codes, for the same block size and code rate [9]. LDPC codes that essentially achieve theoretical capacity limits (with very long block lengths) have also been constructed [12]. Moreover, LDPC codes enjoy the added advantage of a highly parallelizable decoding structure, which makes them well suited for efficient hardware implementation. It should be pointed out, however, that these outstanding performance results are only achievable when the code block length becomes very large (an observation well in line with information theory projections), but nonetheless, a major advantage of LDPC codes is the existence of near-optimal low-complexity decoding algorithms based on iterative probabilistic message passing (or belief propagation), as will be described in the following sections.

In general, LDPC codes can be considered as a class of linear block codes, with their main characteristic being the use of a very sparse, random-like parity-check matrix (having very small column and row weights compared to the code block length). An LDPC code can also be represented by a bi-partite factor graph, which contains two types of nodes: the code bit nodes and the parity check nodes, interconnected by edges whenever a given check bit appears in the parity check equation of the corresponding information bit. An example of a rate 1/2 LDPC code with block size 10 is given by the following 5x10 parity check matrix $H$ (although this is not really a sparse matrix, for this simple illustrative example), and the corresponding code graph is illustrated in the figure below.

$$H = \begin{bmatrix}
1 & 1 & 1 & 1 & 0 & 1 & 1 & 0 & 0 & 0 \\
0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 0 \\
0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 1 & 1 \\
1 & 0 & 1 & 0 & 1 & 0 & 0 & 1 & 1 & 1 \\
1 & 1 & 0 & 0 & 1 & 0 & 1 & 0 & 1 & 1 \\
\end{bmatrix}$$

This particular example corresponds to a class of LDPC codes known as regular LDPC codes, for which all nodes of a given type (whether bit nodes or check nodes)
have the same degree which corresponds to the number of graph edges connected to that node. The example illustrated above is classified as a (3,6) regular LDPC code, since every bit node has degree 3, and every check node has degree 6. Recently, more powerful LDPC code constructions known as \textit{irregular} LDPC codes [8, 9] have been introduced and shown to achieve better performance than regular codes. These irregular LDPC codes do not impose the constraint of having the same degree of connectivity for every bit and check node, and they have been shown to achieve near-capacity performance (also out-performing turbo codes), as discussed in [9].

\begin{figure}[h]
\centering
\includegraphics[width=0.8\textwidth]{ldpc_code_graph_example.png}
\caption{LDPC Code Graph Example.}
\end{figure}

An important feature that impacts LDPC code performance is the presence of so-called loops (or closed paths) on the code's graph. Loops (and particularly the shortest ones, of length 4) hinder the performance of the iterative decoding algorithm (to be discussed next) and therefore result in poor error performance. An example of a 4-loop from the code's matrix given above is shown in Figure 1.
Another feature of LDPC codes is the need for the encoding operation to transform $H$ into systematic form to obtain the generator matrix $G$. The technique of Gaussian elimination is typically used for this operation. However, this operation usually destroys the sparse nature of the original $H$ matrix, and therefore yields costly storage and computing requirements for the LDPC encoder. In fact, one of the aspects of LDPC codes that is perceived to adversely affect their excellent performance is the “complex encoding” process, in contrast with the straightforward encoding of convolutional and turbo codes for example. However, recent contributions [11, 13] have shown that LDPC codes are also amenable to simple encoding structures without sacrificing decoder error performance.

These aspects, in combination with the highly efficient low-complexity sum-product message passing decoding algorithm (to be discussed next), give LDPC codes a distinct advantage over turbo codes, their main competitors, and this is one of the main motivations for the proposed research work.

3. Iterative Decoding of LDPC Codes

Similar to turbo codes, LDPC decoding is also developed within the framework of iterative decoding principles, with the main feature of exchanging soft information between successive iterations. An efficient iterative message-passing algorithm known as the belief propagation algorithm [7], also known as the sum-product algorithm, is typically used for the decoding of LDPC codes, and is shown to be closely approximate optimal maximum likelihood decoding [2, 6].

In the following, we give a brief description of this algorithm, mainly using the notation and terminology of [6]. We also note that there is a well-known, computationally efficient log-domain implementation of the sum-product LDPC decoding algorithm which we don't discuss here for the sake of brevity (see, for example,[14]).

In the subsequent description, the elements of a codeword $c$ are referred to as bits, and the rows of the parity matrix $H$ are designated as checks. The set of bits $l$ that participate in check $m$ is denoted by $L(m) = \{ l : H_{ml} = 1 \}$. Similarly, the set of checks
in which bit \( l \) participates is denoted by \( M(l) = \{ m : H_{ml} = 1 \} \). Also, a set \( L(m) \) in which bit \( l \) is excluded is denoted by \( L(m) \setminus l \), and likewise for \( M(l) \setminus m \) which refers to \( M(l) \) not including check \( m \). The decoding algorithm has two alternating parts in which quantities \( q_{ml} \) and \( r_{ml} \) associated with each nonzero element of \( H \) are iteratively updated. The quantity \( q_{ml}^i \) can be thought of as the probability information that the bit node \( l \) sends to the check node \( m \) indicating \( \text{Prob}(x_l = i) \) with \( i=0,1 \). On the other hand, the quantity \( r_{ml}^i \) is meant to be the probability information that the \( m \)th check node gathers about the \( l \)th bit of \( x \) being \( x_l = i \) when the other bits have probabilities given by \( \{q_{ml'}^i : l' \in L(m) \setminus l \} \). The belief propagation algorithm is shown to converge to the actual posterior probabilities of the received bits if the bipartite graph associated with the underlying parity matrix \( H \) has no cycles \([6, 7]\). In the following, we briefly summarize the algorithm steps.

1. **Initialization:** The variables \( q_{ml}^0 \) and \( q_{ml}^1 \) are initialized to the probabilities \( p_l^0 \) and \( p_l^1 \) that code bit \( x_l \) is zero or one, respectively. For memoryless symmetric channels model with binary input bits \( x_l \) and continuous channel output \( y_i \), these probabilities are given by \( p_l^i = P(y_i | x_l = i) \), where \( i = 0,1 \). For every position \( (m,l) \) in the matrix \( H \) where the entry is non-zero (i.e., \( H_{ml} = 1 \)), we have the following initialization:

\[
q_{ml}^i = p_l^i.
\]

2. **Horizontal Step (Checks to Bits):** Each check node \( m \) gathers all the incoming information \( q_{ml}^i \), and updates the belief on the \( l \)th bit based on all other bits that participate in check equation \( m \).

\[
r_{ml}^i = \sum_{x_l \in \tilde{L}(m) \setminus l} P(\{x_r\} | x_l = i, H) \times \prod_{l' \in \tilde{L}(m) \setminus l} q_{ml'}^{x_{l'}}
\]

for \( i=0,1 \). Notice that the exclusion of the term \( q_{ml} \) in the computation of \( r_{ml} \) is necessary in order to only include “extrinsic” information about the \( l \)th bit from the \( m \)
parity check equation. In the above expression, the function $P(\{x_r\} \mid x_i = i, H)$ is an indicator function given by:

$$P(\{x_r\} \mid x_i = i, H) = \begin{cases} 1, & \text{if } \sum \oplus \{x_r, i\} = 0 \\ 0, & \text{otherwise} \end{cases}$$

where $\sum \oplus$ represents a modulo-2 summation.

3. **Vertical Step (Bits to Checks):** The computed values of $q_{ml}^i$ are used to update the information (or belief) that each bit node $l$ propagates back to check node $m$. In doing so, the information coming from check node $m$ itself should not be included. We thus have

$$q_{ml}^i = \alpha_{ml} p_i^l \prod_{m' \in M(l) \setminus m} r_{ml}^i$$

for $i=0,1$. The term $\alpha_{ml}$ is used as a normalization factor to ensure $q_{ml}^0 + q_{ml}^1 = 1$.

Each node bit $l$ collects probability information from the check nodes that connect to it, and updates its a posteriori probabilities (APP) through the following expressions:

$$q_i^l = \alpha_i p_i^l \prod_{m' \in M(l)} r_{ml}^i$$

for $i=0,1$. The normalization factor $\alpha_i$ is chosen such that $q_i^0 + q_i^1 = 1$. Notice that two components contribute to the APP terms: the intrinsic information from $p_i$, and an extrinsic part through the $r_{ml}$ terms.

After each vertical-horizontal iteration, a hard decision is made on each bit APP $q_i$, resulting in a tentative decoded vector $\hat{x}$. This is then used to decide whether to stop the decoding algorithm as outlined next.

4. **Stopping Criterion:** The decoded vector $\hat{x}$ is used to check whether it satisfies $H\hat{x} = 0$, in which case it is declared as a valid codeword. If not, iterative
decoding continues from Step 2 above until it is eventually successful, or a preset number of maximum iterations is reached.

It should also be noted that any failed decoding is automatically detected. That is, the belief propagation decoding of LDPC codes gives error detection for free, unlike turbo or convolutional codes which require a separate additional mechanism for error detection. An illustration of the iterative LDPC belief propagation decoding algorithm is shown in Figure 2 below.

It is also possible to implement this algorithm in the log-domain, thus resulting in further computational simplifications (because all multiplicative operations are transformed into additions, which are simpler to implement in hardware). In addition, the structure of the decoding algorithm is highly amenable to efficient hardware parallelization (which would greatly benefit system throughput), and this will be further explored in the proposed work.

Figure 2 Illustration of LDPC Decoding by Message Passing.
4. LDPC Code Design: a Hardware-Oriented Perspective

There are several approaches that have recently been proposed to design good LDPC codes. Basically, the classical design approaches can broadly be classified as: 1) deterministic constructions based on finite geometries and graph-theoretic concepts to obtain good distance and cycle properties for the codes [23,24], and 2) pseudo-random constructions for girth conditioning aided by computer search/simulations to obtain parity check matrices giving good error performance [21,22]. Other researchers have also looked at the LDPC encoder design aspects, since the encoder for purely random parity check matrices can be complex compared to competing FEC schemes, and efficient solutions were proposed in [13,20]. All these preceding approaches address the problem of code design from the point of view of error performance optimization. However, another equally important aspect (that didn’t receive as much attention in the literature) is related to the hardware implementation issues (particularly in VLSI), and it is this aspect of code design that is the guiding motivation of this proposed research.

Some of the recent contributions in this regard are found in [15, 16] where LDPC codes are designed based on special families of graphs (e.g., Ramanujam graphs) which impose constraints on the code parameters. In [17], simulated annealing is used to generate LDPC graphs satisfying certain conditions. In [18], the bit-filling method is proposed to generate LDPC graphs given constraints on the degree of the nodes and the girth (length of the smallest cycle in a bipartite graph). In [19], an algorithm is proposed to generate LDPC codes with a bound on the ‘height’ of the VLSI layout of the decoder. It is shown that this approach limits the routing congestion with little impact on error correcting capability of the code.

Practical implementations of error correction algorithms are evaluated by cost (silicon area), power, throughput, latency, flexibility and scalability. Some LDPC research that promise very good error performance, very close to Shannon limit, resulted in complex code structures that are not practical for implementation.

In implementing a LDPC decoder, we have to consider two main requirements: message computation and exchange of message between nodes. The later should be
considered when constructing a LDPC code. The message computation is usually expressed in log-probability domain since it replaced multipliers with adders. Although working in the log-probability domain simplifies the evaluation of products, it requires lookup tables to evaluate the functions.

A major requirement that should be fulfilled by a LDPC decoder is to provide an interconnect network to facilitate message passing between check and bit nodes. Each edge in a Tanner Graph defines a path of a message passing between two nodes. Direct wiring of the interconnect network may lead to congestion due to disorganized nature of the LDPC graph. This problem can be partially solved by using memory, or in other words by making the structure of the decoder serial instead of being fully parallel.

Two main classes of decoder architectures exist: parallel and serial. Parallel decoders directly map the nodes of the graph onto processing elements, and the edges of the graph onto an interconnect network. This approach will fully utilize the inherent parallelism of the decoding algorithm and it will be throughput-efficient but with the expense of requiring larger area. On the other hand, serial architectures distribute the arithmetic requirements sequentially among small number of processing elements. This approach requires fewer processing elements but it will also need additional memory elements.

In [5], a 1024-bit, rate=1/2 LDPC code fully parallel decoder with the maximum symbol throughput of 1 Gbit/s has been implemented using ASIC technology. The ASIC LDPC decoder [5] with only 1K-bit code length consumed 1.7M gates with a high routing overhead resulting in 50% chip utilization due to the randomness of the LDPC code interconnections.

A high-performance parallel implementations of an encoder and decoder for the Parallely Concatenated Parity Check (PCPC) class of LDPC codes [32] was presented in [33]. The designs are based on a code rate of 8/9, a block size of 576 bits and a (6, 2) finite-precision implementation. The Bit Error Rate (BER) performance has been shown to be very close to that for a hypothetical floating-point implementation of the same architecture.

In [25], a joint code and decoder design methodology was proposed for (3, k)-regular LDPC code and partly parallel decoder design to achieve appropriate trade-offs
between hardware complexity and decoding throughput. Based on applying the joint code and decoder design methodology [25], a high-speed (3, \(k\))-regular LDPC code partly parallel decoder architecture was developed in [26], based on which a 9216-bit, rate\(=\frac{12}{16}\) (3, 6)-regular LDPC code decoder was implemented on Xilinx FPGA device. With a maximum of 18 iterations for each code block decoding, this partly parallel decoder supported a maximum symbol throughput of 54 Mbps and achieved BER \(10^{-6}\) at 2dB over additive white Gaussian noise (AWGN) channel.

In [31], a methodology for the implementation of an LDPC encoder/decoder in a commercial FPGA and a Genetic algorithm that is used to optimize the implementation were presented. The proposed solutions optimized the routing delays in the solutions generated by the Xilinx’s tools.

A massively scalable architecture that reduces the decoding latency of an LDPC decoder was proposed in [27]. The architecture uses hardware scaling and memory partitioning to achieve a throughput of 100 Gbps. A pseudo-random construction of the parity check matrix that is specifically designed to facilitate parallelization of the decoder is used.

Different architectures for low-density parity-check (LDPC) decoders are discussed in [28], with methods to reduce their complexity. Serial implementations similar to traditional microprocessor datapaths are compared against implementations with multiple processing elements that exploit the inherent parallelism in the decoding algorithm. Several classes of LDPC codes, such as those based on irregular random graphs and geometric properties of finite fields are evaluated in terms of their suitability for VLSI implementation and performance as measured by bit-error rate.

In [29], the finite precision effects on the decoding performance of LDPC codes were analyzed and several quantization schemes for the received data and extrinsic information for log-BP based LDPC decoder were investigated. It is reported that 4 bits and 6 bits are adequate for representing the received data and extrinsic information, respectively and offer a good tradeoff between performance and hardware complexity.

Fixed point DSP implementation of LDPC decoder was reported in [30]. A simplified decoding algorithm was used and a stopping criteria for the iterative decoder was implemented to reduce the average number of required iterations. This led to an
implementation with increased throughput compared to other implementations of LDPC codes or Turbo codes. The implemented decoder was able to decode at 5.4Mbps on a Texas Instruments TMS320C64xx DSP running at 600MHz.

5. Proposed Methodology

In this work, we investigate the design of LDPC codes that result in interconnect efficient decoder implementations. We assume a fully parallel implementation of the LDPC decoder. Our objective is to design LDPC codes with a constraint on the interconnect wire length. This will result in constraining the wire capacitance which has a direct impact on the maximum signal delay in the design and the power dissipation. Furthermore, this will lower the interconnect routing congestion and hence reduces the chip area and maximizes chip utilization.

One approach that will be investigated in this direction is to assume that the bit nodes and check nodes are laid out in a two-dimensional structure as shown in Figure 3. Then, with a given requirement on the degree of bit nodes, connections between each bit node and check nodes will be constrained within a given window of a specified number of rows and columns. This guarantees that the wire length interconnecting a bit node with check nodes is bounded by a maximum length. For example, as shown in Figure 3 for a ½ rate 32-bit code, bit nodes 19 and 20 with the given constraint window of two rows and three columns can only connect to check nodes 5, 6, 7, 9, 10, and 11 bounded by the specified window. In addition, the code will be designed taking into account the girth of the graph i.e., it will be ensured that the design code has no cycles of a predetermined length e.g. 4. Connections between bit nodes and check nodes will be done randomly as long as they do not violate the specified constraints.

Various LDPC code designs will be investigated based on this approach by varying the constraint window size, the girth of the graph, and the bit and check nodes degrees. The design that results in the best error performance will be chosen.
In order to validate the hardware advantage by the window constrained LDPC code design, a general VHDL model for fully parallel implementation of any LDPC code based on the H matrix will be developed. This will allow the use of synthesis tools like Synopsys and Xilinx Synthesis tools to synthesize the designed LDPC code and obtain estimates on area, delay and power dissipation.

Another direction in this work is to investigate the design of scalable LDPC code that has the advantage of being automatically generated for any code size and guarantees a maximum wire length regardless of the code size. The idea is based on designing a basic cell that can be replicated with well defined inter-cell connections that maintain certain properties that are known to correlate with error performance of the code e.g. girth. An example of such a cell is shown in Figure 4, which has 16 bit nodes and 8 check nodes with a girth of 8. The advantage of this approach is that once the code is designed and optimized for performance, it is expected that good properties of the code will be maintained for any code size required. Thus, the design will be scalable as it will maintain the speed irrespective of the code length since the maximum wire length will have the same constraint. This is in contrast to random-based LDPC codes which are expected to have lower performance and higher power dissipation with the increase in the code size due to the expected increase in interconnect wire length.

### 6. Project Objectives

The principle objectives of this project are as follows:

1. Design of interconnect-efficient Low Density Parity Check codes with competitive performance compared to other random-based LDPC codes.
2. Investigate different approaches for LDPC code design taking into consideration performance and implementation complexity (at short to moderate block lengths).
3. Prototype the proposed LDPC code design and verify the efficiency of their implementations using synthesis tools and Field Programmable Gate Arrays (FPGAs).
Figure 3 Two-dimensional layout of bit nodes and check nodes for 1/2 rate 32-bit LDPC code.

Figure 4 An example 16-bit LDPC code cell.
7. **Project Schedule**

The time span of this project is eighteen (18) months during which eight tasks will be carried out. Table 1 shows a schedule of these tasks indicating the time requirement of each.

<table>
<thead>
<tr>
<th>TASK</th>
<th>MONTH</th>
</tr>
</thead>
<tbody>
<tr>
<td>1. Literature Survey</td>
<td>1-2</td>
</tr>
<tr>
<td>2. Investigate different approaches for LDPC code design taking into consideration performance and implementation complexity (at short to moderate block lengths).</td>
<td>1-2</td>
</tr>
<tr>
<td>3. Development of random-based LDPC code design tool with girth constraints.</td>
<td>1-2</td>
</tr>
<tr>
<td>4. Design of interconnect efficient Low Density Parity Check codes with competitive performance to other random-based LDPC codes.</td>
<td>1-2</td>
</tr>
<tr>
<td>5. Comparison of Error performance analysis of proposed designs using simulations.</td>
<td>1-2</td>
</tr>
<tr>
<td>6. Prototype the proposed LDPC code design and verify the efficiency of their implementations using synthesis tools and FPGAs.</td>
<td>1-2</td>
</tr>
<tr>
<td>7. Progress reports.</td>
<td>1-2</td>
</tr>
<tr>
<td>8. Preparing final report and publications.</td>
<td>1-2</td>
</tr>
</tbody>
</table>
8. Utility Value & Deliverables

The utility value of this project is many-fold, and includes the following aspects:

1. The results of this research can be used by industry as the designed LDPC codes can offer higher speed, less area and low power than existing designs.

2. The results of this work can be used by other investigators in academia and industry to enhance existing methods.

3. The project provides an opportunity for training graduate students on conducting scientific research.

The deliverables of this project can be summarized as follows:

1. A tool for interconnect efficient LDPC code design that generates the LDPC code based on a given input parameters like code size and node degree and maximum interconnect length.

2. A model in high-level description language (VHDL) than can be used for synthesis and implementation of the LDPC code decoder.

9. Monitoring and Evaluation Plan

The investigators will be involved in all the stages of the project and will work for the whole duration of the project. A graduate student will also assist during the entire duration of the project. The investigators will hold meetings as often as necessary (minimum once a week) to coordinate their work and to make necessary decisions. Periodic progress reports will be submitted every 6 months summarizing the status and accomplishments of the project and a final report documenting the project achievement will be submitted at the end of the project.
10. Detailed Budget

1. Investigators

Both the principle investigator and co-investigator will work for 18 months during the regular semesters for the entire duration of the project. They will receive payments as per the university regulations. The graduate student(s) will assist in the implementation aspects of the research. His total compensation will be (10,800/- per Graduate Student/Research Assistant).

Dr. Aiman El-Maleh (PI)  SR 1,200 X 18 = SR 21,600
Dr. Mohammed Adnan Landolsi (CO-I)  SR 1,000 X 18 = SR 18,000
Graduate Student  SR 600 X 18 = SR 10,800

Subtotal  SR 50,400

2. Equipment, Materials and Charges

Facilities available at KFUPM will be used at no charge to the project. Additional equipment required for the project includes a high quality scanner with an estimated amount of SR 1500/-. Stationary and miscellaneous expenses for consumables such as floppies, CDs, paper, printer toner, etc., will amount to SR 2,000/-, purchase of literature and books, etc., will require SR 2,500/-. A secretary will work for the entire duration of the project. SR 2,000/- for payments to secretary will be required.

Individual items of the budget are summarized below:

Man Power  SR 50,400/-
Scanner  SR 1,500/-
Books, other literature  SR 2,500/-
Stationary and Miscellaneous  SR 2,000/-
<table>
<thead>
<tr>
<th>Item</th>
<th>Cost</th>
</tr>
</thead>
<tbody>
<tr>
<td>Secretary</td>
<td>SR 2,000/-</td>
</tr>
<tr>
<td>Conference Attendance (1 Trip)</td>
<td>SR 10,000/-</td>
</tr>
</tbody>
</table>

The total cost of the project is estimated to be SR 68,400/-

SR 68,400/- = US $ 18,240/-
Project Researchers’ Qualifications:

Dr. Aiman El-Maleh holds a B.Sc. in Computer Engineering, with first honors, from King Fahd University of Petroleum & Minerals in 1989, a M.A.SC. in Electrical Engineering from University of Victoria, Canada, in 1991, and a Ph.D in Electrical Engineering, with dean’s honor list, from McGill University, Canada, in 1995.

Dr. El-Maleh is an Assistant Professor in the Computer Engineering Department at King Fahd University of Petroleum & Minerals since September 1998. He was a member of scientific staff with Mentor Graphics Corp., a leader in design automation, from 1995-1998.

Dr. El-Maleh’s research interests are in the areas of synthesis, testing, and verification of digital systems. In addition, Dr. El-Maleh has research interests in VLSI design, design automation, and computer arithmetic.

Dr. El-Maleh is the winner of the best paper award for the most outstanding contribution in the field of test for 1995 at the European Design & Test Conference. His paper presented at the 1995 Design Automation Conference was also nominated for best paper award. He holds one US patent.

Dr. El-Maleh was a member of the program committee of the Design Automation and Test in Europe Conference (DATE’98).

Dr. Mohammed Adnan Landolsi received his BS from the National Engineering Institute of Tunis in 1988, and the M.S and Ph.D in Electrical Engineering from the University of Michigan in 1991 and 1996, respectively.

He joined KFUPM EE Dept as Assistant Professor in September 2001, where he has been involved in teaching undergraduate and graduate courses in the communications filed. His research interests are in the areas of digital communications and coding, and wireless networking, where he has supervised a number of MS theses, and is presently involved in a number of research projects related to synchronization in CDMA wireless communications, mobile radiolocation, and rate-compatible LDPC coding for wireless fading channels.
Prior to joining KFUPM, Dr. Landolsi worked in the Telecommunication Research & Development industry (with Nortel Networks in the US and Canada), where he held various positions in the areas of mobile cellular networks, broadband satellite communications, and fixed wireless access systems.

Dr Landolsi has authored and co-authored 4 U.S patents, and over 20 technical publications in international journals and conference proceedings.
References


Dr. El-Maleh’s List Publications:


18) Aiman El-Maleh and Raslan Al-Abaji, “Extended Frequency-Directed Run Length Code with Improved Application to System-on-a-chip Test Data


27) Sadiq Sait, Habib Yousef, Aiman El-Maleh, and Mahmud Minhas, “Iterative


and Test of Computers, pp. 32-39, summer 1995. (A special issue on First International Test Synthesis Workshop.)


**Patents:**

**Dr. Landolsi’s Publications:**


**Journal Papers:**

**Patents:**


**Conference Papers:**


