TABLE III: Throughput vs W
Max-Iog-MAP throughput/ Full-Iog-MAP throughput (Mbps) | ||||
# Iter |
W=32 |
W=64 |
W=96 |
W=128 |
1 |
49.02/36.59 |
34.75/23.87 |
26.32/19.50 |
17.95/12.19 |
2 |
24.14/18.09 |
17.09/12.72 |
12.98/9.62 |
8.82/5.59 |
3 |
16.01/12.00 |
11.34/8.45 |
8.57/6.39 |
5.85/3.97 |
4 |
11.98/9.01 |
8.48/6.51 |
6.41/4.78 |
4.37/2.97 |
5 |
9.57/7.19 |
6.77/5.2 |
5.12/3.82 |
3.49/2.37 |
6 |
7.97/5.99 ~ |
5.64/4.33 ~ |
4.26/3.18 ~ |
2.91/1.97 |
improves the BER performance of the decoder significantly.
Therefore, full-log-MAP is a better choice for this design.
C. Architecture Comparison
Table IV compares our decoder with other programmable turbo
decoders. Our decoder with W = 64 compares favorably in
terms of throughput and BER performance. We can support both
the full-log-MAP (FLM) algorithm and the simplified max-log-
MAP (MLM) algorithm while most other solutions only support
the sub-optimal max-log-MAP algorithm.
TABLE IV: Our decoder vs other programmable turbo decoders
Work |
Architecture |
MAP Algorithm |
Throughput |
Iter. |
[19] |
Intel Pentium 3 |
MLM / FLM |
366 Kbps/51Kbps |
1 |
[20] |
Motorola 56603 |
MLM |
48.6 Kbps |
5 |
[20] |
STM VLIW DSP |
FLM |
200 Kbps |
5 |
[21] |
TigerSHARC DSP |
MLM |
2.399 Mbps |
4 |
[22] |
TMS320C6201 DSP |
MLm |
500 Kbps |
4 |
[5] |
32-wide SIMD |
MLm |
2.08 Mbps |
-5 |
ours |
Nvidia C1060 |
MLM / FLM |
6.77 / 5.2Mbps |
5 |
VI. Conclusion
In this paper, we presented a 3GPP LTE compliant turbo
decoder implemented on GPU. We portion the workload across
cores on the GPU by dividing the codeword into many sub-
blocks to be decoded in parallel. Furthermore, all computation
is completely parallel for each sub-block. To reduce the memory
bandwidth needed to keep the cores fed, we prefetch data into
shared memory and keep immediate data in shared memory.
As different sub-block sizes can lead to BER performance
degradation, we presented how both BER performance and
throughput is affected by sub-block size. We show that our
decoder provides faster throughput even though the full-log-
MAP algorithm is used. As the decoder is done in software, we
can easily change the QPP interleaver and trellis structure states
to support other codes. Future work includes other partitioning
and memory strategies to improve throughput of the decoder.
Furthermore, we will implement a completely iterative MIMO
receiver by combining this decoder with a MIMO detector on
the GPU.
References
[1] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon
limit error-correcting coding and decoding: Turbo-Codes,” in IEEE
International Conference on Communication, May 1993.
[2] D. Garrett, B. Xu, and C. Nicol, “Energy efficient turbo decod-
ing for 3G mobile,” in International symposium on Low power
electronics and design. ACM, 2001, pp. 328-333.
[3] M. Bickerstaff, L. Davis, C. Thomas, D. Garrett, and C. Nicol, “A
24Mb/s radix-4 logMAP turbo decoder for 3GPP-HSDPA mobile
wireless,” in IEEE Int. Solid-State Circuit Conf. (ISSCC), Feb.
2003.
[4] M. Shin and I. Park, “SIMD processor-based turbo decoder sup-
porting multiple third-generation wireless standards,” IEEE Trans.
on VLSI, vol. vol.15, pp. pp.801-810, Jun. 2007.
[5] Y. Lin, S. Mahlke, T. Mudge, C. Chakrabarti, A. Reid, and
K. Flautner, “Design and implementation of turbo decoders for
software defined radio,” in IEEE Workshop on Signal Processing
Design and Implementation (SIPS), Oct. 2006.
[6] Y. Sun, Y. Zhu, M. Goel, and J. R. Cavallaro, “Configurable and
Scalable High Throughput Turbo Decoder Architecture for Mul-
tiple 4G Wireless Standards,” in IEEE International Conference
on Application-Specific Systems, Architectures and Processors
(ASAP), July 2008, pp. 209-214.
[7] P. Salmela, H. Sorokin, and J. Takala, “A Programmable Max-Log-
MAP Turbo Decoder Implementation,” Hindawi VLSI Design, vol.
vol.2008, pp. pp. 636-640, 2008.
[8] C.-C. Wong, Y.-Y. Lee, and H.-C. Chang, “A 188-size 2.1mm2
reconfigurable turbo decoder chip with parallel architecture for
3GPP LTE system,” in 2009 Symposium on VLSI Circuits, June
2009, pp. 288-289.
[9] G. Falcao, V. Silva, and L. Sousa, “How GPUs Can Outperform
ASICs for Fast LDPC Decoding,” in ICS ’09: Proceedings of the
23rd International Conference on Supercomputing, pp. 390-399.
[10] M. Wu, S. Gupta, Y. Sun, and J. R. Cavallaro, “A GPU Imple-
mentation of A Real-Time MIMO Detector,” in IEEE Workshop
on Signal Processing Systems (SiPS’09), Oct. 2009.
[11] M. Wu, Y. Sun, and J. R. Cavallaro, “Reconfigurable Real-time
MIMO Detector on GPU,” in IEEE 43rd Asilomar Conference on
Signals, Systems and Computers (ASILOMAR’09), Nov. 2009.
[12] Xilinx Corporation, 3GPP LTE Turbo Decoder v2.0, 2008.
[Online]. Available: http://www.xilinx.com/products/ipcenter/DO-
DI-TCCDEC-LTE.htm
[13] K. Amiri, Y. Sun, P. Murphy, C. Hunter, J. R. Cavallaro, and
A. Sabharwal, “Warp, a unified wireless network testbed for edu-
cation and research,” in MSE ’07: Proceedings of the 2007 IEEE
International Conference on Microelectronic Systems Education,
June 2007.
[14] NVIDIA Corporation, CUDA Compute Unified Device
Architecture Programming Guide, 2008. [Online]. Available:
http://www.nvidia.com/object/cuda_develop.html
[15] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal Decoding of
Linear Codes for Minimizing Symbol Error Rate,” IEEE Transac-
tions on Information Theory, vol. IT-20, pp. 284-287, Mar. 1974.
[16] F. Naessens, B. Bougard, S. Bressinck, L. Hollevoet, P. Raghavan,
L. V. der Perre, and F. Catthoor, “A unified instruction set
programmable architecture for multi-standard advanced forward
error correction,” in IEEE Workshop on Signal Processing Sys-
tems(SIPS), October 2008.
[17] J. Sun and O. Takeshita, “Interleavers for turbo codes using
permutation polynomials over integer rings,” IEEE Trans. Inform.
Theory, vol. vol.51, pp. 101-119, Jan. 2005.
[18] P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of
optimal and sub-optimal MAP decoding algorithm operating in the
log domain,” in IEEE Int. Conf. Commun., 1995, pp. 1009-1013.
[19] M. Valenti and J. Sun, “The UMTS Turbo Code and a Efficient
Decoder Implementation Suitable for Software-Defined Radios,”
International Journal of Wireless Information Networks, vol. 8,
no. 4, pp. 203-215, Oct. 2001.
[20] H. Michel, A. Worm, M. Munch, and N. Wehn, “Hardware soft-
ware trade-offs for advanced 3G channel coding,” in Proceedings
of Design, Automation and Test in Europe, 2002.
[21] K. Loo, T. Alukaidey, and S. Jimaa, “High performance paral-
lelised 3GPP turbo decoder,” in IEEE Personal Mobile Communi-
cations Conference, April 2003, pp. 337-342.
[22] Y. Song, G. Liu, and Huiyang, “The implementation of turbo
decoder on DSP in W-CDMA system,” in International Conference
on Wireless Communications, Networking and Mobile Computing,
Dec. 2005, pp. 1281-1283.
197