ROBUST CLASSIFICATION WITH CONTEXT-SENSITIVE FEATURES

MACHINE LEARNING

272

cantly better than all of the alternatives that were
examined [7].

SPEECH RECOGNITION

This section examines strategies 1, 2, and 5: contex-
tual normalization, contextual expansion, and contextual
weighting. The problem is to recognize a vowel spoken by
an arbitrary speaker. There are ten continuous primary
features (derived from spectral data) and two discrete con-
textual features (the speaker’s identity and sex). The
observations fall in eleven classes (eleven different
vowels) [8].

For speech recognition, spectral data is a primary
feature for recognizing a vowel. The sex of the speaker is a
contextual feature, since we can achieve better recognition
by exploiting the fact that a man’s voice tends to sound
different from a woman’s voice. Sex is not a primary
feature, since knowing the speaker’s sex, by itself, does
not help us to recognize a vowel. The experimental design
ensures this, since all speakers spoke the same set of
vowels. This background knowledge lets us distinguish
primary and contextual features, without having to
determine the probability distribution.

The data were divided into a training set and a testing
set. Each of the eleven vowels was spoken six times by
each speaker. The training set is from four male and four
female speakers (11 × 6 × 8 = 528 observations). The
testing set is from four new male and three new female
speakers (11 × 6 × 7 = 462 observations). Using a wide
variety of neural network algorithms, Robinson [9]
achieved accuracies ranging from 33% to 56% correct on
the testing set. The mean score was 49%, with a standard
deviation of 6%. Table 3 summarizes Robinson’s results.

Three of the five strategies discussed above were
applied to the data:

Contextual normalization: Each feature was normalized
by equation (11), where the context vector c was simply
the speaker’s identity. The values of μ i(c) and σ i(c) were
estimated simply by taking the average and standard
deviation of xi for the speaker c. In a practical applica-
tion, this will require storing speech samples from a new
speaker in a buffer, until enough data are collected to
calculate the average and standard deviation.

Contextual expansion: The sex of the speaker was
treated as another feature. This strategy is not applicable to
the speaker’s identity, since the speakers in the testing set
are distinct from the speakers in the training set.

Contextual weighting: Let x be a vector of primary
features and let c be a vector of contextual features. As
with contextual normalization, the context vector c is

Table 3: Robinson’s (1989) results with the vowel data.

classifier	no. of hidden units	no. correct (of 462)	percent correct
Single-layer perceptron	-	154	33
Multi layer perceptron	88	234	51
Multi-layer perceptron	22	206	45
Multi-layer perceptron	11	203	44
Modified Kanerva Model	528	231	50
Modified Kanerva Model	88	197	43
Radial Basis Function	528	247	53
Radial Basis Function	88	220	48
Gaussian node network	528	252	55
Gaussian node network	88	247	53
Gaussian node network	22	250	54
Gaussian node network	11	211	47
Square node network	88	253	55
Square node network	22	236	51
Square node network	11	217	50
Nearest neighbor	-	260	56

simply the speaker’s identity. The features were multiplied
by weights, where the weight w_i for a feature xi was the
ratio of inter-class deviation σ^inter to intra-class deviation
i

intra

σ :

winter

w_i = ^-T^
ⁱ intra

^σ i

(12)

The inter-class deviation of a feature indicates the
variation in a feature’s value, across class boundaries. It is
the average, for all speakers c in the training set, of the
standard deviation of the feature, across all classes (all
vowels), for a given speaker. Let σ₁, ...,σm be the
standard deviations of xi for each of the m speakers in the
training set. The inter-class deviation of xi is:

winter
σ

₁ m

¹ ∑σ

m *-^{t j}

J = 1

(13)

The intra-class deviation of a feature indicates the
variation in a feature’s value, within a class boundary. It is
the average, for all speakers in the training set and all
classes, of the standard deviation of the feature, for a given
speaker and a given class. Let {σ ,} , where 1 ≤ J ≤ m
^j , ^k

and 1 ≤ k ≤ n, be the standard deviations of xi for each of

More intriguing information

1. The name is absent
2. Internationalization of Universities as Internationalization of Bildung
3. The Dynamic Cost of the Draft
4. Placentophagia in Nonpregnant Nulliparous Mice: A Genetic Investigation1
5. Keynesian Dynamics and the Wage-Price Spiral:Estimating a Baseline Disequilibrium Approach
6. REVITALIZING FAMILY FARM AGRICULTURE
7. Developing vocational practice in the jewelry sector through the incubation of a new ‘project-object’
8. Influence of Mucilage Viscosity On The Globule Structure And Stability Of Certain Starch Emulsions
9. Graphical Data Representation in Bankruptcy Analysis
10. The Shepherd Sinfonia