ROBUST CLASSIFICATION WITH CONTEXT-SENSITIVE FEATURES



MACHINE LEARNING

272


cantly better than all of the alternatives that were
examined [7].

SPEECH RECOGNITION

This section examines strategies 1, 2, and 5: contex-
tual normalization, contextual expansion, and contextual
weighting. The problem is to recognize a vowel spoken by
an arbitrary speaker. There are ten continuous primary
features (derived from spectral data) and two discrete con-
textual features (the speaker’s identity and sex). The
observations fall in eleven classes (eleven different
vowels) [8].

For speech recognition, spectral data is a primary
feature for recognizing a vowel. The sex of the speaker is a
contextual feature, since we can achieve better recognition
by exploiting the fact that a man’s voice tends to sound
different from a woman’s voice. Sex is not a primary
feature, since knowing the speaker’s sex,
by itself, does
not help us to recognize a vowel. The experimental design
ensures this, since all speakers spoke the same set of
vowels. This background knowledge lets us distinguish
primary and contextual features, without having to
determine the probability distribution.

The data were divided into a training set and a testing
set. Each of the eleven vowels was spoken six times by
each speaker. The training set is from four male and four
female speakers (11
× 6 × 8 = 528 observations). The
testing set is from four new male and three new female
speakers (11
× 6 × 7 = 462 observations). Using a wide
variety of neural network algorithms, Robinson [9]
achieved accuracies ranging from 33% to 56% correct on
the testing set. The mean score was 49%, with a standard
deviation of 6%. Table 3 summarizes Robinson’s results.

Three of the five strategies discussed above were
applied to the data:

Contextual normalization: Each feature was normalized
by equation (11), where the context vector
c was simply
the speaker’s identity. The values of
μ i(c) and σ i(c) were
estimated simply by taking the average and standard
deviation of
xi for the speaker c. In a practical applica-
tion, this will require storing speech samples from a new
speaker in a buffer, until enough data are collected to
calculate the average and standard deviation.

Contextual expansion: The sex of the speaker was
treated as another feature. This strategy is not applicable to
the speaker’s identity, since the speakers in the testing set
are distinct from the speakers in the training set.

Contextual weighting: Let x be a vector of primary
features and let
c be a vector of contextual features. As
with contextual normalization, the context vector
c is

Table 3: Robinson’s (1989) results with the vowel data.

classifier

no. of
hidden
units

no.
correct
(of 462)

percent
correct

Single-layer perceptron

-

154

33

Multi layer perceptron

88

234

51

Multi-layer perceptron

22

206

45

Multi-layer perceptron

11

203

44

Modified Kanerva Model

528

231

50

Modified Kanerva Model

88

197

43

Radial Basis Function

528

247

53

Radial Basis Function

88

220

48

Gaussian node network

528

252

55

Gaussian node network

88

247

53

Gaussian node network

22

250

54

Gaussian node network

11

211

47

Square node network

88

253

55

Square node network

22

236

51

Square node network

11

217

50

Nearest neighbor

-

260

56

simply the speaker’s identity. The features were multiplied
by weights, where the weight
wi for a feature xi was the
ratio of inter-class deviation
σinter to intra-class deviation
i

intra

σ :

i

winter

σ

i

wi = -T^
i intra

σ i

(12)


The inter-class deviation of a feature indicates the
variation in a feature’s value, across class boundaries. It is
the average, for all speakers
c in the training set, of the
standard deviation of the feature, across all classes (all
vowels), for a given speaker. Let
σ1, ...,σm be the
standard deviations of
xi for each of the m speakers in the
training set. The inter-class deviation of
xi is:

winter
σ

i


1 m

1 σ

m *-t j

J = 1

(13)


The intra-class deviation of a feature indicates the
variation in a feature’s value, within a class boundary. It is
the average, for all speakers in the training set and all
classes, of the standard deviation of the feature, for a given
speaker and a given class. Let
,} , where 1 J m
j , k

and 1 k n, be the standard deviations of xi for each of



More intriguing information

1. Studying How E-Markets Evaluation Can Enhance Trust in Virtual Business Communities
2. The name is absent
3. Migrant Business Networks and FDI
4. The name is absent
5. Weather Forecasting for Weather Derivatives
6. Washington Irving and the Knickerbocker Group
7. Financial Markets and International Risk Sharing
8. A dynamic approach to the tendency of industries to cluster
9. The Provisions on Geographical Indications in the TRIPS Agreement
10. The name is absent
11. ENERGY-RELATED INPUT DEMAND BY CROP PRODUCERS
12. The name is absent
13. The name is absent
14. Dynamic Explanations of Industry Structure and Performance
15. TOWARDS THE ZERO ACCIDENT GOAL: ASSISTING THE FIRST OFFICER MONITOR AND CHALLENGE CAPTAIN ERRORS
16. Sectoral Energy- and Labour-Productivity Convergence
17. Eigentumsrechtliche Dezentralisierung und institutioneller Wettbewerb
18. Nach der Einführung von Arbeitslosengeld II: deutlich mehr Verlierer als Gewinner unter den Hilfeempfängern
19. The Nobel Memorial Prize for Robert F. Engle
20. The resources and strategies that 10-11 year old boys use to construct masculinities in the school setting