EURASIP Journal on Applied Signal Processing
The direct inverse of the matrix is very expensive for hard-
ware implementation.
To reduce the computation complexity, an FFT-based fast
algorithm is presented in this section. It is known that a cir-
culant matrix S can be diagonalized by the FFT operation
as S = DHΛD, where D is the FFT phase coefficient ma-
trix and Λ is a diagonal matrix whose diagonal elements are
the FFT result of the first column of the circulant matrix S.
This known lemma is applied to simplify the MIMO equal-
izer computation dramatically. It is shown that the covari-
ance matrix Rrr can be approximated by a block-circulant
matrix after we add two corner matrices as
0
'rr R rr + .
CL
0CLH
.
.. 0
00
(7)
where
/Eh [ L ] 0 0 ∖
CL = . ... 0
∖Eh [1] ■■■ Eh [ L ])
(8)
Using the extension of the diagonalization lemma and the
features of Kronecker product, the block-circulant matrix can
be decomposed as
Crr = (Dh 0 I)( ɪ Wi 0 E[iɔ (D ® I),
(9)
where W = diag(1, W-1 ■ ■ ■ W- LF 1)) and WLF = e(2n/LF)
is the phase factor coefficient for the DFT computation.
denotes the Kronecker product. By denoting
F=
( ɪ Wi 0 E[if),
(10)
it can be shown that the final MIMO equalizer taps are com-
puted as the following equation:
wopt ≈ (Dh 0 I) ■ F-1 ■ (D 0 I)hm. (11)
F = diag(F[0], F[1], ..., F[LF]) is a block-diagonal matrix
with elements taken from the element-wise FFT of the first
column of the block-circular matrix Crr. For an (M × N)
MIMO system, this reduces the inverse of an (NLF × NLF)
matrix to the inverse of subblock matrices with size (N × N).
3.3. System-on-chip (SoC) architecture partitioning
To achieve the real-time implementation, either DSP pro-
cessors or VLSI architectures could be applied. For exam-
ple, a multiple-processor architecture using TI’s DSP proces-
sors has been reported in [21] for the 3G base station im-
plementation. However, the requirement for low power con-
sumption and compact size makes it difficult to use multiple
DSPs in a mobile handset to achieve the real-time processing
power for the chip-level physical layer design, especially for
the MIMO systems. SoC architecture is a major revolution
for integrated circuits due to the unprecedented levels of
integration and many advantages on the power consump-
tion and compact size. However, the straightforward imple-
mentation of the proposed equalizer has many redundancies
in computation. Many optimizations are needed to make it
more suitable for real-time implementation. We emphasize
the interaction between architecture, system partitioning,
and pipelining in this section with these objectives: (1) pro-
pose VLSI-oriented optimizations to further reduce the com-
putation complexity; (2) implement the equalizer with the
minimum hardware resource to meet the real-time require-
ment; (3) obtain an efficient architecture with optimal par-
allelism and pipelining for the critical computation parts. To
explore the efficient architectures, we elaborate the tasks as
the following procedure.
(1) Compute the independent correlation elements [E[0],
... , E[L]], and form the first block column of circu-
lant Cr(1r) by adding the corner elements as Cr(1r) =
[E[0],...,E[L],0,...,0,EH [L],...,EH [1]]T. Each ele-
ment is an (N × N)subblockmatrix.
(2) Take the element-wise FFT of C(r1r), where the element
vectors Fn1,n2 = FFT{E(nc1),n2} and E(nc1),n2 (i) = Cr(1r)[(n1 -
i - 1) * N + n2 - 1] for i ∈ [0,LF], n 1,n2 ∈ [1,N].
(3) For m ∈ [1, M ]andn ∈ [1, N ], compute the
dimension-wise FFT of the channel estimation as
Φm = (D 0 I)hm = FFT([0, ... ,0, hm,n(L),... , hm,n(0),
0,...,0]).
(4) Compute the inverse of the block-diagonal matrix F,
where F-1 = diag(F[0]-1, ..., F[LF - 1]-1) and F[i] is
an (N × N) submatrix formed from the ith subcarrier
of Fn1,n2.
(5) Compute the matrix multiplication of the submatrices
inverse with the FFT output of channel estimation co-
efficients Ψm = F-1 Φm .
(6) Compute the dimension-wise IFFT of the multiplica-
tion results wopt ≈ (Dh 0 I)Ψm.
With a timing- and data-dependency analysis, the top-
level block diagram for the MIMO equalizer is shown in
Figure 2. The system-level pipeline is designed for bet-
ter modularity. In the front end, a correlation estimation
block takes the multiple-input samples for each chip to
compute the covariance coefficients of the first column of
Rrr. It is made circulant by adding corner to form the
matrix [E[0], ... , E[L], 0, ... , 0, E[L]H , ... , E[1]H ]. The com-
plete coefficients are written to DPRAMs and the (N × N)
element-wise FFT module computes [F[0], ..., F[LF ]] =
FFT{E[0],...,E[L],0,...,0,E[L]H ,...,E[1]H }.
Another parallel data path is for the channel estimation
and the (M × N) dimension-wise FFTs on the channel coef-
ficient vectors as in (D 0 I)hm. A submatrix inverse and mul-
tiplication block takes the FFT coefficients of both the chan-
nel estimation and correlation estimation coefficients from
DPRAMs and carries out the computation as in F-1Φm. Fi-
nally, an (M × N) dimension-wise IFFT module generates
the results for the equalizer taps wopt and sends them to the
More intriguing information
1. APPLYING BIOSOLIDS: ISSUES FOR VIRGINIA AGRICULTURE2. The name is absent
3. The name is absent
4. Bridging Micro- and Macro-Analyses of the EU Sugar Program: Methods and Insights
5. The name is absent
6. Palkkaneuvottelut ja työmarkkinat Pohjoismaissa ja Euroopassa
7. AJAE Appendix: Willingness to Pay Versus Expected Consumption Value in Vickrey Auctions for New Experience Goods
8. The ultimate determinants of central bank independence
9. AN IMPROVED 2D OPTICAL FLOW SENSOR FOR MOTION SEGMENTATION
10. AN ECONOMIC EVALUATION OF THE COLORADO RIVER BASIN SALINITY CONTROL PROGRAM