Submitted toICA’2003
FASTER TRAINING IN NONLINEAR ICA USING MISEP
Luls B. Almeida
INESC-ID, R. Alves Redol, 9, 1000-029 Lisboa, Portugal
ABSTRACT
MISEP has been proposed as a generalization of the INFO-
MAX method in two directions: (1) handling of nonlinear
mixtures, and (2) learning the nonlinearities to be used at
the outputs, making the method suitable to the separation
of components with a wide range of statistical distributions.
In all implementations up to now, MISEP had used multi-
layer perceptrons (MLPs) to perform the nonlinear ICA op-
eration. Use of MLPs sometimes leads to a relatively slow
training. This has been attributed, at least in part, to the
non-local character of the MLP’s units. This paper investi-
gates the possibility of using a network of radial basis func-
tion (RBF) units for performing the nonlinear ICA opera-
tion. It shows that the local character of the RBF network’s
units allows a significant speedup in the training of the sys-
tem. The paper gives a brief introduction to the basics of the
MISEP method, and presents experimental results showing
the speed advantage of using an RBF-based network to per-
form the ICA operation.
1. INTRODUCTION
Linear independent components analysis (ICA) is becoming
a well researched area. Its nonlinear counterpart (nonlinear
ICA) is much less researched, but interest in this area has
been increasing, e.g. [1, 2, 3, 4, 5, 6]. In this paper we deal
with a method for performing nonlinear ICA which is an
extension of INFOMAX, called MISEP [7, 6, 8].
MISEP extends the well known INFOMAX method in
two ways: (1) it is able to perform nonlinear ICA, and (2)
it uses adaptive nonlinearities at the outputs. These nonlin-
earities are intimately related to the statistical distributions
of the components, and the adaptivity allows the method to
deal with components with a wide range of distributions.
As originally proposed, MISEP could use any parame-
terized, linear or nonlinear network to perform the ICA op-
eration. However, all previous implementations have used
multilayer perceptrons (MLPs) to perform that operation.
This has sometimes resulted in a relatively slow learning.
This work was partially supported by Praxis project P/EEI/14091/1998
and by the European IST project BLISS.
In this paper, after a brief introduction to MISEP, we dis-
cuss the possible causes of this slowness, conjecturing that
it is due, at least in part, to the nonlocal character of the
MLP’s units. We test this conjecture by comparing systems
based on MLPs with systems based on radial basis function
(RBF) units, which have a local character. The experimen-
tal results confirm the validity of this conjecture. They also
show that, while the MLP-based systems could usually per-
form a good separation without the use of any explicit form
of regularization, the RBF-based ones do need an explicit
regularization.
The paper is organized as follows. Section 2 gives a
brief introduction to the MISEP method. Section 3 dis-
cusses the causes of the slow learning that is sometimes
observed. Section 4 describes the alternate implementation
based on RBF units and presents experimental results, and
Section 5 concludes.
2. THE MISEP METHOD
In this section we briefly summarize the MISEP method
for linear and nonlinear ICA. Given observation vectors o,
drawn from an unknown distribution, MISEP tries to find
a transformation y = F(o) (where o and y have the same
dimension n), such that the components of y are as inde-
pendent as possible, according to a mutual information cri-
terion. The mutual information of the components of y is
defined as
I(y) =XH(yi)-H(y), (1)
i
H(y)=-
p (y)log p (y) dy,
(2)
where H denotes Shannon’s entropy, for discrete variables,
or Shannon’s differential entropy,
for continuous variables, p(.) denoting the probability den-
sity of the random variable y. The mutual information I(y)
is non-negative, and is zero only if the components of y are
mutually statistically independent. It is known to be a good
independence criterion for ICA.