Yuanbin Guo et al.
defined as
--------1-------2 b22 b21 , (24)
b 11 b22 - |b21 I \-b21 b 11/
where there are only real multiplications and divisions.
Definition 4 (diagonal transform). Given the (4 × 4) Her-
mitian matrix A which is partitioned into four subblocks as
A = (A“ A21) = AH, the DT (diagonal transform) of A is de-
fined as
T (A) = T (An, A21, A22)
= A22 - A21A11A21H (25)
= A22 - M (A21, A11 ) AH1.
With these definitions, we regulate the inverse of the
(4 × 4) Hermitian matrix F = FH into simplified operations
on (2 × 2) matrices. After some manipulation, the partitioned
subblock computation equations can be mapped to the fol-
lowing procedure using the defined operators:
Binv = HInv (B11) = BHv;
D = M (B21, Binv);
C22 = HInv ( T (Binv, B21, B22)); (26)
C12 = -M (D H, C22);
C11 = Binv + DHC22D = T ( - C22, DH, Binv) ■
The overall computation complexity is reduced to 2 HInv
operations, 2 DTs, 1 extra CHM block. Because the sign in-
verter and the Hermitian formatter [ ■ ] H have no hardware
resource at all, the computation complexity is determined by
these three generic blocks. The data path of the computation
shows the timing relationship between different design mod-
ules. This regulated procedure facilitates the design of effi-
cient parallel VLSI modules, whose details are given in the
following.
4.3.3. Parallel architecture modules
Now we derive the efficient VLSI modules for the generic M
and T operations. Because the operation M is also embedded
in the T transform, we need to design the interface so that the
computing architecture is reused efficiently. The grouping of
computations and the smart usage of interim registers will
eliminate the redundancy and give simple and generic inter-
face to the design modules. For a single M(A, B) module, we
define
(d 11 d 12∖
11 12 = M(A, B)
d21 d22
a11
a21
0 b 11 + a 12 * b21 a 11 * b2'1 + a 12 o b22
o b 11 + a22 * b21 a21 * b*1 + a22 0 b22
To extract the commonality in the M and T operations, we
have the following lemma for Hermitian matrix.
(27)
Lemma 3 (inverse 4 × 4). IfB = BH is a (2 × 2) Hermitian
matrix, then ABAH is also a Hermitian matrix, where A in this
lemma is a general (2 × 2) matrix. The associated computation
is given by 6 complex multiplications(CM)s, 4 complex-real
multiplication (CRM)s, 4pPow(a, b), and 2%(a, b).
Proof. We extend the computation ofG = ABAH as
G = g11 g12 = ABAH = M(A, B)AH
g21 g22
/ a 11o b 11+ a 12 * b 21
a21o b11+a22* b21
a11 * b2*1 + a12 o b22
a21 * b2*1 + a22 o b22
a1*1
■ a1*2
a2*1
a2*2 .
(28)
We then group the operations for each element as
g11 = (a 11 0 bii) * a 1*1 + [a 12 * b21 * a*1 + a 11 * b*1 * a*2]
+ (a 12 o b22) * a*2,
g21 = d21 * a1*1 + d22 * a1*2,
g12 = d21 * a1*1 + d22 * a1*2,
g22 = (a21 0 b 11) * a*1 + [a22 * b21 * a*1 + a21 * b*1 * a*2]
+ (a22 o b22) * a22.
(29)
We define the interim registers tmp1 = (a11 ob11), tmp2 =
(a12 * b21), tmp3 = (a12 o b22), tmp5 = (a21 o b11), tmp6 =
(a22 * b11), tmp7 = (a22 o b22). These interim values are added
to generate d11, d12, d21, d22. Moreover, instead of having a
general complex multiplication, we can employ the special
functional components. For example, it is easy to verify that
(a11 o b11 ) * a1*1 = pPow(tmp1, a11). By changing the compu-
tation order and combining common computations, we can
finally show that G is a Hermitian matrix with the elements
given by
g 11 = pPow ( tmp1, a1*1) + 2%( tmp2, a 1*1) +pPow ( tmp3, a1*2),
g21 = d21 * a1*1 + d22 * a12*,
g12 = g2*1,
g22 = pPow ( tmp5, a2*1) + 2 %( tmp6, a*1) +pPow ( tmp7, a*2).
(30)
□
Thus the simplified M(A, B) RTL module can be de-
signed as in Figure 6, with the input of both real and
imaginary parts of A as {a11(r/i), a12(r/i), a21 (r/i), a22(r/i)}
and only the necessary elements of the Hermitian ma-
trix B as in {b11 (r), b21(r/i), b22 (r)}. The output ports in-
clude {tmp1, tmp2, tmp3, tmp5, tmp6, tmp7}. We only need
to compute d21 and d22 to get the G elements. Built from the
simplified M(A, B) module, the data path RTL module of the