Implementation of a 3GPP LTE Turbo Decoder Accelerator on GPU



TABLE III: Throughput vs W

Max-Iog-MAP throughput/ Full-Iog-MAP throughput (Mbps)

# Iter

W=32

W=64

W=96

W=128

1

49.02/36.59

34.75/23.87

26.32/19.50

17.95/12.19

2

24.14/18.09

17.09/12.72

12.98/9.62

8.82/5.59

3

16.01/12.00

11.34/8.45

8.57/6.39

5.85/3.97

4

11.98/9.01

8.48/6.51

6.41/4.78

4.37/2.97

5

9.57/7.19

6.77/5.2

5.12/3.82

3.49/2.37

6

7.97/5.99 ~

5.64/4.33 ~

4.26/3.18 ~

2.91/1.97

improves the BER performance of the decoder significantly.
Therefore, full-log-MAP is a better choice for this design.

C. Architecture Comparison

Table IV compares our decoder with other programmable turbo
decoders. Our decoder with
W = 64 compares favorably in
terms of throughput and BER performance. We can support both
the full-log-MAP (FLM) algorithm and the simplified max-log-
MAP (MLM) algorithm while most other solutions only support
the sub-optimal max-log-MAP algorithm.

TABLE IV: Our decoder vs other programmable turbo decoders

Work

Architecture

MAP Algorithm

Throughput

Iter.

[19]

Intel Pentium 3

MLM / FLM

366 Kbps/51Kbps

1

[20]

Motorola 56603

MLM

48.6 Kbps

5

[20]

STM VLIW DSP

FLM

200 Kbps

5

[21]

TigerSHARC DSP

MLM

2.399 Mbps

4

[22]

TMS320C6201 DSP

MLm

500 Kbps

4

[5]

32-wide SIMD

MLm

2.08 Mbps

-5

ours

Nvidia C1060

MLM / FLM

6.77 / 5.2Mbps

5

VI. Conclusion

In this paper, we presented a 3GPP LTE compliant turbo
decoder implemented on GPU. We portion the workload across
cores on the GPU by dividing the codeword into many sub-
blocks to be decoded in parallel. Furthermore, all computation
is completely parallel for each sub-block. To reduce the memory
bandwidth needed to keep the cores fed, we prefetch data into
shared memory and keep immediate data in shared memory.
As different sub-block sizes can lead to BER performance
degradation, we presented how both BER performance and
throughput is affected by sub-block size. We show that our
decoder provides faster throughput even though the full-log-
MAP algorithm is used. As the decoder is done in software, we
can easily change the QPP interleaver and trellis structure states
to support other codes. Future work includes other partitioning
and memory strategies to improve throughput of the decoder.
Furthermore, we will implement a completely iterative MIMO
receiver by combining this decoder with a MIMO detector on
the GPU.

References

[1] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon
limit error-correcting coding and decoding: Turbo-Codes,” in
IEEE
International Conference on Communication
, May 1993.

[2] D. Garrett, B. Xu, and C. Nicol, “Energy efficient turbo decod-
ing for 3G mobile,” in
International symposium on Low power
electronics and design.
ACM, 2001, pp. 328-333.

[3] M. Bickerstaff, L. Davis, C. Thomas, D. Garrett, and C. Nicol, “A
24Mb/s radix-4 logMAP turbo decoder for 3GPP-HSDPA mobile
wireless,” in
IEEE Int. Solid-State Circuit Conf. (ISSCC), Feb.
2003.

[4] M. Shin and I. Park, “SIMD processor-based turbo decoder sup-
porting multiple third-generation wireless standards,”
IEEE Trans.
on VLSI
, vol. vol.15, pp. pp.801-810, Jun. 2007.

[5] Y. Lin, S. Mahlke, T. Mudge, C. Chakrabarti, A. Reid, and
K. Flautner, “Design and implementation of turbo decoders for
software defined radio,” in
IEEE Workshop on Signal Processing
Design and Implementation (SIPS)
, Oct. 2006.

[6] Y. Sun, Y. Zhu, M. Goel, and J. R. Cavallaro, “Configurable and
Scalable High Throughput Turbo Decoder Architecture for Mul-
tiple 4G Wireless Standards,” in
IEEE International Conference
on Application-Specific Systems, Architectures and Processors
(ASAP)
, July 2008, pp. 209-214.

[7] P. Salmela, H. Sorokin, and J. Takala, “A Programmable Max-Log-
MAP Turbo Decoder Implementation,”
Hindawi VLSI Design, vol.
vol.2008, pp. pp. 636-640, 2008.

[8] C.-C. Wong, Y.-Y. Lee, and H.-C. Chang, “A 188-size 2.1mm2
reconfigurable turbo decoder chip with parallel architecture for
3GPP LTE system,” in
2009 Symposium on VLSI Circuits, June
2009, pp. 288-289.

[9] G. Falcao, V. Silva, and L. Sousa, “How GPUs Can Outperform
ASICs for Fast LDPC Decoding,” in
ICS ’09: Proceedings of the
23rd International Conference on Supercomputing
, pp. 390-399.

[10] M. Wu, S. Gupta, Y. Sun, and J. R. Cavallaro, “A GPU Imple-
mentation of A Real-Time MIMO Detector,” in
IEEE Workshop
on Signal Processing Systems (SiPS’09)
, Oct. 2009.

[11] M. Wu, Y. Sun, and J. R. Cavallaro, “Reconfigurable Real-time
MIMO Detector on GPU,” in
IEEE 43rd Asilomar Conference on
Signals, Systems and Computers (ASILOMAR’09)
, Nov. 2009.

[12] Xilinx Corporation, 3GPP LTE Turbo Decoder v2.0, 2008.
[Online]. Available:
http://www.xilinx.com/products/ipcenter/DO-
DI-TCCDEC-LTE.htm

[13] K. Amiri, Y. Sun, P. Murphy, C. Hunter, J. R. Cavallaro, and
A. Sabharwal, “Warp, a unified wireless network testbed for edu-
cation and research,” in
MSE ’07: Proceedings of the 2007 IEEE
International Conference on Microelectronic Systems Education
,
June 2007.

[14] NVIDIA Corporation, CUDA Compute Unified Device
Architecture Programming Guide
, 2008. [Online]. Available:
http://www.nvidia.com/object/cuda_develop.html

[15] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal Decoding of
Linear Codes for Minimizing Symbol Error Rate,”
IEEE Transac-
tions on Information Theory
, vol. IT-20, pp. 284-287, Mar. 1974.

[16] F. Naessens, B. Bougard, S. Bressinck, L. Hollevoet, P. Raghavan,
L. V. der Perre, and F. Catthoor, “A unified instruction set
programmable architecture for multi-standard advanced forward
error correction,” in
IEEE Workshop on Signal Processing Sys-
tems(SIPS)
, October 2008.

[17] J. Sun and O. Takeshita, “Interleavers for turbo codes using
permutation polynomials over integer rings,”
IEEE Trans. Inform.
Theory
, vol. vol.51, pp. 101-119, Jan. 2005.

[18] P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of
optimal and sub-optimal MAP decoding algorithm operating in the
log domain,” in
IEEE Int. Conf. Commun., 1995, pp. 1009-1013.

[19] M. Valenti and J. Sun, “The UMTS Turbo Code and a Efficient
Decoder Implementation Suitable for Software-Defined Radios,”
International Journal of Wireless Information Networks, vol. 8,
no. 4, pp. 203-215, Oct. 2001.

[20] H. Michel, A. Worm, M. Munch, and N. Wehn, “Hardware soft-
ware trade-offs for advanced 3G channel coding,” in
Proceedings
of Design, Automation and Test in Europe
, 2002.

[21] K. Loo, T. Alukaidey, and S. Jimaa, “High performance paral-
lelised 3GPP turbo decoder,” in
IEEE Personal Mobile Communi-
cations Conference
, April 2003, pp. 337-342.

[22] Y. Song, G. Liu, and Huiyang, “The implementation of turbo
decoder on DSP in W-CDMA system,” in
International Conference
on Wireless Communications, Networking and Mobile Computing
,
Dec. 2005, pp. 1281-1283.

197




More intriguing information

1. Three Strikes and You.re Out: Reply to Cooper and Willis
2. Testing Gribat´s Law Across Regions. Evidence from Spain.
3. Gianluigi Zenti, President, Academia Barilla SpA - The Changing Consumer: Demanding but Predictable
4. Correlates of Alcoholic Blackout Experience
5. The name is absent
6. Wirkung einer Feiertagsbereinigung des Länderfinanzausgleichs: eine empirische Analyse des deutschen Finanzausgleichs
7. GENE EXPRESSION AND ITS DISCONTENTS Developmental disorders as dysfunctions of epigenetic cognition
8. Public-private sector pay differentials in a devolved Scotland
9. Artificial neural networks as models of stimulus control*
10. The name is absent
11. AGRICULTURAL TRADE LIBERALIZATION UNDER NAFTA: REPORTING ON THE REPORT CARD
12. The name is absent
13. ANTI-COMPETITIVE FINANCIAL CONTRACTING: THE DESIGN OF FINANCIAL CLAIMS.
14. The name is absent
15. Mean Variance Optimization of Non-Linear Systems and Worst-case Analysis
16. A Regional Core, Adjacent, Periphery Model for National Economic Geography Analysis
17. Deletion of a mycobacterial gene encoding a reductase leads to an altered cell wall containing β-oxo-mycolic acid analogues, and the accumulation of long-chain ketones related to mycolic acids
18. NVESTIGATING LEXICAL ACQUISITION PATTERNS: CONTEXT AND COGNITION
19. Does Competition Increase Economic Efficiency in Swedish County Councils?
20. Indirect Effects of Pesticide Regulation and the Food Quality Protection Act