KETONEN et al.: PERFORMANCE-COMPLEXITY COMPARISON OF RECEIVERS FOR A LTE MIMO-OFDM SYSTEM
Fig.7. Data transmission throughput versus SNR ina2 × 2 V-BLAST system
and moderately correlated channel.
The simplification has again only a small impact on the per-
formance. Comparison of implementations done with the Cata-
pult tool showed that the FPGA complexity is reduced approx-
imately four times with the simplified LLR calculation. Also
the SIC receiver is simplified from using soft demodulation for
LLR calculation to using the approximate log-likelihood crite-
rion. The symbol expectations are also calculated with (14).
IV. Implementation Comparisons
The theoretical complexity of the receivers with M = N is
presented in Table III, where the number of multiplication, ad-
dition, and comparison operations are specified. The memory is
the number of bits used in each block. In the squared Givens ro-
tations (SGR) and the Gram-Schmidt based QRD, the division
and square root were approximated with additions and shifts
[28]. The M3 term in the SGR comes from matrix multiplica-
tions and in the QRD from multiple iterations of vector mul-
tiplications. The number of operations in the A?-best algorithm
depends on the list size K. If K is larger than √zΩ^ ŋ, it is used
as K. In the first level I, the number of multiplications is 2vzΩ
and additions √zΩ. The LLR calculation operations for the linear
receivers depend on the modulation. The constant multiplica-
tions with a power of 2 were calculated as shifts. The number
of multiplications is lower in the K -best LSD than in the SIC
with 4×4 QPSK. The number of multiplications in the A'-best
algorithm increases with the list size and modulation. The mod-
ulation order does not have a major impact on the complexity
with the SIC receiver. The impact of this can be seen later in the
implementation results. The SIC receiver includes more multi-
plications than the K-best algorithm with 4 × 4 QPSK and the
implementation gate count is also higher. With 64-QAM, the
number of multiplications and gates is higher with the AAbest.
Catapult C Synthesis tool [29] was used in the implementa-
tion of the receivers. It synthesizes algorithms written in ANSI
C++ into high-performance, concurrent hardware. This single
source methodology allows designers to pick the best archi-
tecture for a given performance/area/power specification while
minimizing design errors and reducing the overall verification
3365
8-best, 2 iter.
LMMSE
8-best, 2 iter.
Fig. 8. Data transmission throughput versus SNR in a 4 × 4 system in (a) a
highly correlated, (b) moderately correlated, and (c) uncorrelated channel.
4x4 antenna system, moderately correlated TU channel
8-best
8-best, 2 iter.
burden. While the results may not always be as optimal as with
hand-coded HDL, the tool allows experimenting with different
architectures in a short amount of time and the comparison
of different algorithms can be made, provided they are imple-
mented with the same tool. The complexity results seem to be