Fig. 1. a) Original data set , b) after sphering and c) after denoising. After these steps,
the projection yielding the best signal-to-noise ratio, denoted by arrow, can be obtained
by simple correlation-based learning.
where the N × T matrix S are the sources, the M × T matrix X are the observa-
tions and there is noise ν. If the sources are assumed Gaussian, this is a general,
linear factor analysis model with rotational invariance.
DSS, as many other computationally efficient ICA algorithms, resorts to
sphering. In the case of DSS, the main reason is that after sphering, denois-
ing combined with simple correlation based estimation akin to Hebbian learning
(on-line) or power method (batch) is able to retrieve the signal with the highest
SNR. Here SNR is implicitly defined by the denoising. The effect of sphering
and subsequent denoising is depicted in Fig. 1.
Assuming that X is already sphered and f(s) is the denoising procedure, a
simple DSS algorithm can be written as follows:
s=wTX (2)
s+ = f(s) (3)
w+ = Xs+T (4)
wnew = orth(w+) , (5)
where s is the source estimate (a row vector), s+ is the denoised source esti-
mate, w is the previous weight vector (a column vector), w+ is the new weight
vector before and wnew after orthonormalisation (e.g., deflatory or symmetric
orthogonalisation as in FastICA [2]).
Note that if X were not sphered and no denoising were applied, i.e., f (s) = s,
the above equations would describe the power method for computing the princi-
pal eigenvector. When X is sphered, all eigenvalues are equal to one and without
denoising the solution is degenerate, i.e., any unit vector w is a fixed point of
the iterations. This shows that for sphered X, even the slightest denoising f (s)
can determine the convergence point.
If, for instance, f(s) is chosen to be low-pass filtering, implicitly signals are
assumed to have relatively more low frequencies than noise and the above itera-
tion converges to the signal which has the most low-frequency components. On
the other hand, if f (s) is a shrinkage function, suppressing small components
of s while leaving large components relatively untouched, signals are implicitly
assumed to have heavy tails and thus super-Gaussian distributions.