An Efficient Circulant MIMO Equalizer for CDMA Downlink: Algorithm and VLSI Architecture



Yuanbin Guo et al.

saves roughly 50% of the real multiplications because the FFT
length within 64 points will su
ffice for most realistic equalizer
applications.

4.3. Hermitian matrix inverse architectures

In this section, we utilize the Hermitian feature and focus on
the optimization of the submatrix inverse and multiplication
module following the element-wise FFT modules in the tap
solver. Although the FFT-based tap solver avoids the direct
matrix inverse of the original covariance matrix with the di-
mension of (
NF × NF), the inverse of the diagonal matrix
F is inevitable. For a MIMO receiver with high receive di-
mension, the matrix inverse and multiplication in
F- 1hm is
not trivial. Because
F is a block-diagonal matrix, its inverse
can be decomposed to the inverse of
LF submatrices of size
(
N × N)asin

F -1 = diag (f[0] -1, F[1] -1,..., F[ Lf - 1]- 1).      (15)

A traditional (N×N) matrix inverse using Gaussian elim-
ination has the complexity at
O(N3) complex operations.
Cholesky decomposition can be applied to facilitate the in-
verse of these matrices. However, this method requires arith-
metic square root operation, which is expensive for hardware
implementation. Considering the fact that it is unlikely to
have more than four Rx antennas in a mobile terminal, we
consider the two special cases individually, that is, 2 and 4 Rx
antennas. We propose complexity-reduction schemes and ef-
ficient architectures suitable for VLSI implementation based
on the exploration of block partitioning. The commonality
of the partitioned block matrix inverse is extracted to design
generic RTL modules for reusable modularity. We then build
the (4
× 4) receiver by reusing the (2 × 2) block partitioning.

4.3.1. Dual-antenna MIMO receiver

From (11), a straightforward partitioning is at the matrix
inversion of
F and then the matrix multiplication of the
dimension-wise FFT of the channel coe
fficients as F- 1(D ®
I)hm. In this partitioning, we would first compute the inverse
of the entire subblock matrix in
F and then carry out a ma-
trix multiplication. However, this partitioning involves two
separate loop structures. In the VLSI circuit design, this will
introduce some overhead for memory access and finite-state
machine logic. Since the two steps have the same loop struc-
ture, it is more desirable to merge the two steps and reduce
the overhead shown as follows. The inverse of a (2
× 2) sub-
matrix is given by

F[k]-1 =   fOO(k) fO1(k)    1

f1O(k) f11(k)

=_______________1_______________( fii( k )   - f)1( k ))

foo(k) fn(k) - f)1(k) flo(k) ∖-flo(k) foo(k) ) .

(16)

Let Γ = (D ® I)h m = [Γ[0], Γ[1],..., Γ[ Lf - 1]], where
Γ[k] = [e1 (k) e2(k)] is the combination of the kth elements
of the dimension-wise FFT coe
fficients, then a merged com-
putation of the matrix inverse and multiplication is given by

W = F-1 (D ® I)hm

= diag (F[0]-1, F[1]-1,..., F[ Lf - 1]- 1)Γ

= [f[0] - 1Γ[0] T, F[1]- 1Γ[1] T,..., F[ Lf -1]- 1Γ[ Lf -1] T ].

(17)

W(k) =


Thus, we can use a single merged loop to compute the
final result of
W instead of using separate loops. Moreover,
with the
Hermitian features of F00 and F11, we can reduce the
number of real operations in the matrix inverse and multi-
plication module. This leads to a simplified equation for the
kth element of the matrix W as
f)0(k) fL1(k) - f)1(k) I

( fL1(k) O e 1(k) - f)1(k) * e2(k) ∖
■ ^-flo(k) о e2(k) - f)i(k) * eι(k)J,

(18)


where “a b,” “a O b,” and “a * b” indicate a “real × real,”
“real
× complex” and “complex × complex” multiplication,
respectively. The complex division is replaced by a real divi-
sion. From this, we derived the simplified data path with the
Hermitian optimization as in
Figure 4. In this figure, f00(k)
and
f11(k) are real numbers. The single multiplier means a
real multiplication. The multiplier with a circle means the
“real
× complex” multiplication and the multiplier with a
rectangle is a “complex
× complex” multiplication. The sim-
plified data path facilitates the scaling, and thus increases the
stability in the fixed-point implementation.

4.3.2. Receiver with 4 Rx antennas

The principle operation of interest is the inverse of the (4 × 4)
submatrices. To achieve a scalable design, we first partition
the (4
× 4) submatrices in F[i] into four (2 × 2) block sub
matrices as

f11(i)

f12(i)

f13(i)

f14(i)

F( i )4 × 4 =

f21(i)

f22 (i)

f23 (i)

f24(i)

f31(i)

f32 (i)

f33(i)

f34(i)

(19)

f41 (i)

f42(i)

f43(i)

f44(i)

=

B11(i)

B21(i)

B12(i)

B22(i)

).

The inverse of the (4 × 4) matrix can be carried out by a se-
quential inverse of four (2
× 2) submatrices. We also partition
the inverse of the (4
× 4) element matrix as

F(i)-1 = C11(i) C12(i)

F(i)    =   C21(i) C22(i) .

(20)




More intriguing information

1. A Dynamic Model of Conflict and Cooperation
2. The name is absent
3. The name is absent
4. The name is absent
5. Unemployment in an Interdependent World
6. Portuguese Women in Science and Technology (S&T): Some Gender Features Behind MSc. and PhD. Achievement
7. Stillbirth in a Tertiary Care Referral Hospital in North Bengal - A Review of Causes, Risk Factors and Prevention Strategies
8. POWER LAW SIGNATURE IN INDONESIAN LEGISLATIVE ELECTION 1999-2004
9. The name is absent
10. The name is absent
11. The name is absent
12. The name is absent
13. The name is absent
14. The value-added of primary schools: what is it really measuring?
15. Long-Term Capital Movements
16. Evidence of coevolution in multi-objective evolutionary algorithms
17. NATURAL RESOURCE SUPPLY CONSTRAINTS AND REGIONAL ECONOMIC ANALYSIS: A COMPUTABLE GENERAL EQUILIBRIUM APPROACH
18. Why unwinding preferences is not the same as liberalisation: the case of sugar
19. Learning-by-Exporting? Firm-Level Evidence for UK Manufacturing and Services Sectors
20. Feature type effects in semantic memory: An event related potentials study