3367
KETONEN et al. : PERFORMANCE-COMPLEXITY COMPARISON OF RECEIVERS FOR A LTE MIMO-OFDM SYSTEM
Fig. 12. Parallel PED calculation and sorting.
TABLE IV
The 2 × 2 K -Best LSD Receiver Complexity
Block |
Slices |
DSPs |
GE |
mW |
QRD |
1346 ~ |
25 |
67 к |
35.1 |
X-best LSD___________________________________________ | ||||
16-best, 4-QAM |
913 |
28 |
54 к |
79 |
8-best, 16-QAM |
4172 |
15 |
ПО к |
120 |
8-best, 64-QAM |
5712 |
22 |
124 к |
261 |
16-best, 64-QAM |
8818________ |
16 |
166 k |
246 |
LLR calculation, non-iterative∕iterative | ||||
4-QAM |
1664/3485 |
5/5 |
27/34 к |
32/51.8 |
16-QAM |
746/1865 |
2/2 |
18/59 к |
20/95.7 |
64-QAM, X=8 |
798/1666 |
2/2 |
25/51 к |
29/66.7 |
64-QAM, X= 16 |
1280/2241 |
2/2 |
36/46 к |
38/68.7 |
Total, non-iterative∕iterative ∣ | ||||
4-QAM |
3926/5747 |
58/58 |
148/155 к |
146/166 |
16-QAM |
6279/7398 |
42/42 |
197/236 к |
175/251 |
64-QAM, X=8 |
7969/8837 |
49/49 |
219/241 к |
326/364 |
64-QAM, X=16 |
11557/12518 |
43/43 |
267/277 к |
319/350 |
and the final K symbol vectors are demapped to bit vectors and
their Euclidean distance is used in the LLR calculation.
The modified ∕√-hcst LSD tree search was used in the imple-
mentation in the 64-QAM case. The architecture of the second
stage parallel Euclidean distance calculation and insertion
sorting is illustrated in Fig. 12. Two PEDs are calculated in
parallel and the smallest one is added to the list.
The 2 x 2 ∕√-hest LSD receiver complexity is shown in
Table IV and the 4×4 receiver complexity in Table V. Seven
BRAMs are needed to store the results of the QRD in a 5-MHz
bandwidth. In the iterative 2× 2 64-QAM 16-best LSD, addi-
tional 9 BRAMs are needed to store the list and LLRs from
the previous iteration. The clock frequency in the ASIC imple-
mentation is 140 MHz and in the FPGA implementation from
100 MHz to 94 MHz depending on the modulation.
The word lengths for the λ-best LSD and LLR calculation
are mainly 16 bits and computer simulations have been per-
formed to confirm that there is no performance degradation [24].
The sorters are insertion sorters. The list size values of 16 and
8 are used in the implementation. The sorters have 16 or 8 reg-
isters in which the smallest Euclidean distances are kept during
the sorting depending on the list size. A full list is used in the
QPSK case and no sorting is required. This decreases the com-
plexity of the detector.
TABLE V
The 4 × 4 X-Best LSD Receiver Complexity | ||||
Block |
Slices |
DSPs |
GE |
mW |
QRD |
39647 ~ |
76 |
352 к |
255 |
X-best LSD | ||||
8-best, 4-QAM |
9761 |
35 |
98 к |
158 |
8-best, 16-QAKΓ~ |
14107 |
21 |
209 к |
290 |
16-best, 16-QAM |
27291 |
20 |
331 к |
443 |
16-best, 64-QAM |
16946 |
19 |
453 к |
564 |
LLR calculation, non-iterative∕iterative
4-QAM |
844/2958 |
2/2 |
18/57 к |
20/96 |
16-QAM, X=8 |
776/3432 |
2/1 |
21/73 к |
24/156 |
16-QAM, X= 16 |
1326/6760 |
2/1 |
28/54 к |
30/80 |
64-QAM_______ |
1299/6370 |
1/1 |
34/57 к |
40/82 |
Total, non-iterative∕iterative | ||||
4-QAM |
50252/52366 |
113/113 |
468/507 к |
432/508 |
16-QAM, X=8 |
54530/57186 |
99/98 |
582/634 к |
568/700 |
16-QAM, X=16 |
68264/73698 |
98/97 |
716/736 к |
729/779 |
64-QAM_______ |
57892/62963 |
96/96 |
841/861 к |
859/901 |
The LLR calculation block was designed both for the iterative
and noniterative receiver. The iterative LLR block was designed
to have a low latency in order to perform additional global it-
erations quickly. The complexity of the block is therefore high.
Using the decoder soft outputs in calculating the LLRs also adds
to the complexity. If extra iterations are not needed, the LLR
block can be scheduled to have the same latency as the ∕∖-best
LSD and have a lower complexity.
B. Soft Interference Cancellation
The SIC receiver consists of a LMMSE detector, a LLR cal-
culation block, a symbol expectation calculation block and an
interference cancellation block as presented in Fig. 4. The top
level architecture of the LMMSE detector for a 2 × 2 antenna
system is presented in Fig. 13. The channel matrix H is first mul-
tiplied by its complex conjugate transpose and the noise vari-
ance σ2 is added to the diagonal elements. The resulting 2 × 2
matrix G is positive definite and symmetric. This simplifies the
matrix inversion, which is performed by dividing the elements
with the determinant, switching the diagonals and negating the
off-diagonal elements. The determinant is real valued and the
off-diagonal elements are complex conjugates. Therefore, less
operations are needed.
The architecture for the real part of the symbol expectation
calculation in the 16-QAM case is presented in Fig. 14. The
imaginary part is calculated in parallel in the same manner from