An Efficient Circulant MIMO Equalizer for CDMA Downlink: Algorithm and VLSI Architecture



EURASIP Journal on Applied Signal Processing

In this paper, we first present an FFT-based fast algorithm
for the tap solving by approximating the block Toeplitz struc-
ture of the covariance matrix with a block-circulant matrix to
avoid the direct matrix inverse. The inverse of the large co-
variance matrix is reduced to some parallel FFT/IFFT opera-
tions and the inverse of some much smaller submatrices. This
algorithm reduces the complexity order to
O(NFlog2(F)),
which makes the real-time implementation much easier. An
algorithmic-level comparative study for di
fferent equaliz-
ers demonstrates their promising performance/complexity
tradeo
ff.

As real-time implementation is concerned, system-on-
chip (SoC) architecture o
ffers more parallelism, more com-
pact size, and lower power consumption than general pur-
pose DSP processors. However, the research for the SoC
architectures of MIMO-HSDPA mobile receiver remains
a relatively new and hot topic. Recently, Nokia success-
fully demonstrated a single-antenna HSDPA real-time sys-
tem in the CTIA’03 wireless trade show [
13, 14]. Although
MIMO-VLSI implementations have been reported for Lu-
cent’s BLAST ASIC chip [
15] and some MIMO detection
algorithms [
16], the VLSI architecture design of MIMO-
CDMA equalizers remains a new research topic. To support
the MIMO-CDMA downlink in a multipath fading channel,
it is necessary to explore the e
fficient VLSI design architec-
ture [
17] for the complex equalizer.

In the second part, we focus on the VLSI-oriented op-
timizations of the architecture complexity. Hermitian opti-
mization is proposed by utilizing the structures of the cor-
relation coe
fficients and the FFT algorithm. A reduced-state
FFT module is proposed to avoid redundant computation
of the symmetric coe
fficients and the zero coefficients. This
reduces both the number and complexity of the conven-
tional FFT module. On the other hand, the matrix inverse
of some smaller submatrices of size (
N × N) is inevitable
for the MIMO receiver although the (
NF × NF) inverse
is avoided. For a high-order MIMO receiver, the complex-
ity still increases dramatically with the number of antennas.
Therefore, the Hermitian feature is applied to reduce the sub-
matrix inverse complexity. Of particular interest is the non-
trivial (4
× 4) MIMO configuration. We apply a divide-and-
conquer method to partition the (4
×4) submatrices into four
(2
× 2) submatrices. The (4 × 4) matrix inverse is then dra-
matically simplified by exploring the commonality in a parti-
tioned matrix inverse lemma. Generic VLSI architectures are
derived from the special design blocks to eliminate the re-
dundancies in the complex operations. The regulated model
facilitates the design of e
fficient parallel VLSI modules such
as “
complex-Hermitian-multiplication,” “Hermitian inverse
and “
diagonal transform.” This leads to efficient architectures
with 3
× further complexity reduction and more parallel and
pipelined schematic.

In addition to minimizing the circuit area used, the de-
sign needs to work within a time budget. There are many
area/time tradeo
ffs in the VLSI architectures. Extensive ar-
chitecture tradeo
ff study provides critical insights into im-
plementation issues that may arise during the product de-
velopment process. However, this type of SoC design space
exploration is extremely time consuming because the stan-
dard trial-and-optimize approaches today are usually tied to
hand-coded VHDL/Verilog-based methodology [
18, 19]. In
this paper, we present a Catapult C-based [
13] high-level-
synthesis (HLS) methodology which integrates several key
technologies to explore the VLSI architecture tradeo
ffs ex-
tensively. Extensive design space exploration is enabled by al-
locating di
fferent architecture/resource constraints in a Cat-
apult C architecture scheduler [
13]. Synthesizable register-
transfer-level (RTL) design is generated from an algorithmic
C/C++ fixed-point design, integrated in other downstream
flows and validated in a Xilinx FPGA prototyping platform.

The rest of the paper is organized as follows. Section 2
gives the MIMO-CDMA downlink system model. The FFT-
based circulant chip equalizer is presented in
Section 3.
Section 4 presents the system-level partitioning and the
VLSI-level complexity optimization. The comparative per-
formance and complexity analysis are presented in
Section 5.
Finally,
Section 6 presents the HLS-based design space explo-
ration and an experimental implementation on FPGA.

2. SYSTEM MODEL FOR MIMO-CDMA DOWNLINK

The system model of the MIMO multicode CDMA down-
link with
M Tx antennas and N Rx antennas is described in
Figure 1. In a multicode CDMA downlink, multiple spread-
ing codes are assigned to a single user to achieve high data
rate. By using spatial multiplexing, the high data rate symbols
are demultiplexed into
KM lower-rate substreams, where K
is the number of spreading codes for data transmission. The
substreams are divided into
M groups, where each substream
in the group is spreaded with a spreading code of spreading
gain
G. Each group is then combined and scrambled with
long scrambling codes and transmitted through the
mth Tx
antenna. The chip-level signal at the
mth transmit antenna is
given by
dm ( i + j * G ) = K= ɪ sm ( j ) ckm ( i )+ sm ( j ) cm ( i ),where
j
is the symbol index, i is the chip index, and k is the index of
the composite spreading code.
skm (j) is the jth symbol of the
kth code at the mth substream. In the following, we focus on
the
jth symbol and omit the symbol index for notation sim-
plicity.
cmk (i) = ck (i)cm(s)(i) is the composite spreading code
sequence for the
kth code at the mth substream, where ck (i)
is the user-specific Hadamard code and
cm(s)(i) is the antenna-
specific scrambling long code.
sPm (j) denotes the pilot sym-
bols at the
mth antenna. cmP (i) = cP(i)cm(s)(i) is the composite
spreading code for pilot symbols at the
mth antenna. The re-
ceived chip-level signal at the
nth Rx antenna is given by

M Lm,n

rn ( i ) = ∑ hmh, n ( l ) dm ( i τι ) + Zn ( i ),        (1)

m=1 l=0

where hm,n(l) and Lm,n are the lth path channel coefficient
and the delay spread between the
mth Tx antenna and the
nth Rx antenna, respectively. zn(i) is the additive Gaussian
noise at the
nth receive antenna.

By packing the received chips from all the receive anten-
nas in a vector
r(i) = [r1(i), ..., rn(i), ..., rN (i)]T and collect-
ing the
LF = 2F + 1 consecutive chips with center at the ith



More intriguing information

1. Commitment devices, opportunity windows, and institution building in Central Asia
2. The name is absent
3. Sustainability of economic development and governance patterns in water management - an overview on the reorganisation of public utilities in Campania, Italy, under EU Framework Directive in the field of water policy (2000/60/CE)
4. REVITALIZING FAMILY FARM AGRICULTURE
5. Life is an Adventure! An agent-based reconciliation of narrative and scientific worldviews
6. Delayed Manifestation of T ransurethral Syndrome as a Complication of T ransurethral Prostatic Resection
7. Conflict and Uncertainty: A Dynamic Approach
8. The name is absent
9. FOREIGN AGRICULTURAL SERVICE PROGRAMS AND FOREIGN RELATIONS
10. Olive Tree Farming in Jaen: Situation With the New Cap and Comparison With the Province Income Per Capita.
11. The name is absent
12. Backpropagation Artificial Neural Network To Detect Hyperthermic Seizures In Rats
13. Public-private sector pay differentials in a devolved Scotland
14. Feature type effects in semantic memory: An event related potentials study
15. The name is absent
16. FDI Implications of Recent European Court of Justice Decision on Corporation Tax Matters
17. TWENTY-FIVE YEARS OF RESEARCH ON WOMEN FARMERS IN AFRICA: LESSONS AND IMPLICATIONS FOR AGRICULTURAL RESEARCH INSTITUTIONS; WITH AN ANNOTATED BIBLIOGRAPHY
18. Manufacturing Earnings and Cycles: New Evidence
19. CHANGING PRICES, CHANGING CIGARETTE CONSUMPTION
20. IMPACTS OF EPA DAIRY WASTE REGULATIONS ON FARM PROFITABILITY