are. However ultrametrics do give an indication of closeness and this can be
compared: firstly to the closeness indicated by features, secondly to the idea
that if elements of a sentence are not sufficiently close then there is a barrier
Chomsky (1986b) [7] to movement, roughly speaking barriers impede the move-
ment of phrases to different places in a sentence. Only the closeness as indicates
by features is looked at here. In traditional syntax phrases can be iteratively
embedded to give sentences of unbounded length and complexity. A degree of
sentence complexity perhaps corresponds to the height of the tree representing
the sentence. As people can only process a finite amount of information this
height must be finite. In the traditional theoretical framework there is no fi-
nite bound on sentence length. An upper bound could perhaps be found by
experiment. Inspection of phrase trees suggests a first guess of h = 12.
The third is that it means that syntax is described in the same formalism as
that used in a lot of other sciences, for example those topics described in the
first paragraph §1.1, so that there is the possibility of techniques being used in
one area being deployed in another.
The fourth is that an ultrametric formulation might allow a generalization
so that ideas in syntax can be applied to other cognitive processes.
The fifth, see the next section 1.4, and perhaps the most important, is that it
might be possible to use some sort of minimum distance principle in syntax: it
could be this minimum description which would have application in other cog-
nitive processes. In other words that ultrametric trees should be simple rather
than complicated and that the sort of mechanism use to encode simple tress
might be used elsewhere.
1.4 Ockam’s Razor
Minimum description in science goes back several hundred years to “Ockam’s
razor” or perhaps further, see for example Sorton (1947) [37] page 552. The
principle of least action (see for example Bjorken and Drell (1965) [4] §11.2) in
physics is that minimal variation of a given action gives field equations which
describe the dynamics of a system. For example, Maxwell’s equations can be
derived from a simple action by varying it. In the present context one would
hope that syntax allows for a minimum encoding of semantic information, the
minimum encoding being given by some ultrametric measure. A different ap-
proach along these lines is that of Rissanen (1982) [31] and Zadrozny (2000)
[41]. Briefly they assign a length of 1 to each symbol in a sentence, then the
minimum description length states that the best theory to explain a set
of data is the one which minimizes both the sum of: i) the length, in bits, of
the description of the theory, and ii) the length, in bits, of data when encoded
with the help of the theory. Christiansen (2001) [8] discusses how constraint
handling rules (CHR) can be applied to grammars. This can be thought of as
a minimizing procedure.