It is possible to begin with an objective function g(s) in which case the de-
noising can be chosen1 to be the gradient: f(s) = Vg(s). In practice, denoising
functions can easily be designed without explicitly starting from ob jective func-
tions. They often work exceedingly well and good denoisings result in fast and
accurate algorithms.
3 Accelerating and stabilising convergence by spectral
shift and adaptation of learning rate
If the denoising function is not able to reduce noise significantly more than signal,
the basic DSS iterations (2)-(5) may converge slowly. This is closely related to
the fact that power method converges slowly if the largest eigenvalue is only
slightly larger than the next largest. Consequently, convergence in DSS can be
accelerated in a very similar manner as in power method.
A well-known speedup for power method is spectral shift. It is based on
modifying an iteration of the form w+ = Aw into w+ = Aw + βw. In the
original iteration, it holds w+ = λw at the fixed points and consequently w+ =
(λ + β)w after the modification. The fixed points remain the same but the
eigenvalues λ are shifted by β , hence the name spectral shift.
If all eigenvalues are large and their differences are small, convergence can be
greatly accelerated by using β which is negative and whose absolute value is close
to the second largest eigenvalue. On the other hand, power method converges to
the eigenvector that corresponds to the eigenvalue having the largest absolute
value. This means that instead of finding the principal component, the minor
component is obtained with negative enough β .
In DSS, (3) can be modified into
s+ = α(s)f (s) + β(s)s (6)
without changing the fixed points as long as α(s) and β(s) are scalar functions.
Since α(s) only scales the source estimate, from now on we assume α(s) = 1.
In DSS, s+sT /T plays the role of the eigenvalue [1]. Since Gaussian signals
are the least desirable ones in source separation, a reasonable choice for β is the
one that shifts the eigenvalue of Gaussian signals to zero:
β = E{f(ν)νT/T} , (7)
where ν is a normally distributed signal.
It is interesting to note that the fixed-point equation of FastICA [2] can
be interpreted within this framework although normally the speedup used in
FastICA is justified as an approximation to Netwon’s method. In [1], it was
shown that if β (s) is based on a linearisation of f (s) around the current source
estimate s, the spectral shift (7) will be
β(s) = -trJ(s)/T, (8)
There is some freedom in this choice because there are several denoising functions
which have the same convergence points. They are given in (6).