ROBUST CLASSIFICATION WITH CONTEXT-SENSITIVE FEATURES



MACHINE LEARNING

273


the m speakers and n classes in the training set. The intra-
class deviation of
xi is:

m n

σintra = -11 ∑∑σ.,           (14)

i m n      j~k j,k

J = 1 k = 1

The ratio of inter-class deviation to intra-class deviation is
high when a feature varies greatly across class boundaries,
but varies little within a class. A high weight (a high ratio)
suggests that the feature will be useful for classification.
This is a form of contextual weighting, because the weight
is calculated on the basis of the speaker’s identity, which is
a contextual feature.

Table 4 shows the results of using different combina-
tions of these three strategies with IBL. These results show
that there is a form of synergy here, since the sum of the
improvements of each strategy used separately is less than
the improvement of the three strategies used together
(
(58 - 56) + (55 - 56) + (58 - 56) = 3% for the sum
of the three strategies used separately versus
66
- 56 = 10% for the three strategies used together).

The three strategies were also tested with cascade-
correlation [5]. Because of the time required for training
CC, results were gathered for only two cases: With no pre-
processing, cascade-correlation correctly classified 216
observations (47%). With preprocessing by all three strate-
gies, cascade-correlation correctly classified 236 observa-
tions (51%). This shows that contextual information can
be of benefit for both neural networks and nearest
neighbor pattern recognition.

HEPATITIS PROGNOSIS

Similar to the previous section, this section examines
strategies 1, 2, and 5: contextual normalization, contextual
expansion, and contextual weighting. The problem is to
determine whether hepatitis patients will live or die from
their disease. There are seventeen primary features, of
which twelve are discrete (such as “patient is taking
steroids”, “patient reports fatigue”) and five are continu-
ous (such as “patient’s bilirubin level”). There are two
contextual features, of which one is discrete (patient’s sex)
and one is continuous (patient’s age). The patient’s sex
was not used in the following experiments, since 90% of
the patients were male. The observations fall in two
classes (live or die) [10]. There are many missing values in
the hepatitis data. These were filled in by using the single-
nearest neighbor algorithm with the training data.

For hepatitis prognosis, bilirubin level is a primary
feature for determining whether the patient will die from
the disease. The age of the patient is a contextual feature,
since we can achieve more accurate prognoses by using
the patient’s age. Age is not a primary feature, since
knowing the patient’s age,
by itself, does not help us to
make a prognosis. In support of this claim, compare rows
one and three in Table 5. Adding age as a feature actually
reduces accuracy. Background knowledge does not help us
to determine whether age is primary or contextual, since it
is plausible that the patient’s age could be a primary factor
in hepatitis prognosis. In this case, we must use the data to
estimate the probability distribution. The data suggest that
age is a contextual feature.

The data were divided into a training set and a testing
set. Unlike the previous two experiments, there was no
systematic distinction between the training and testing
sets. The data consist of 155 observations, which were
randomly split to make 10 pairs of training and testing
sets. In each pair, there were 100 training observations and
55 testing observations. Thus the total number of observa-
tions for testing purposes was 550.

Three of the five strategies discussed above were
applied to the data:

Contextual normalization: Each feature was normalized
by equation (11), where the context vector
c is simply the
patient’s age. Age was converted into a discrete feature by
dividing age into five intervals, with an equal number of

Table 4: The three strategies applied to the vowel data. Table 5: The three strategies applied to the hepatitis data.

strategy 1:
contextual
normaliza-
tion

strategy 2:
contextual
expansion

strategy 5:
contextual
weighting

no. correct
(of 462)

percent
correct

strategy 1:
contextual
normaliza-
tion

strategy 2:
contextual
expansion

strategy 5:
contextual
weighting

no. correct
(of 550)

percent
correct

No

No

No

258

56

No

No

No

393

71

No

No

Yes

269

58

No

No

Yes

393

71

No

Yes

No

253

55

No

Yes

No

390

71

No

Yes

Yes

272

59

No

Yes

Yes

391

71

Yes

No

No

267

58

Yes

No

No

454

83

Yes

No

Yes

295

64

Yes

No

Yes

460

84

Yes

Yes

No

273

59

Yes

Yes

No

457

83

Yes

Yes

Yes

305

66

Yes

Yes

Yes

464

84



More intriguing information

1. The name is absent
2. Food Prices and Overweight Patterns in Italy
3. The name is absent
4. Centre for Longitudinal Studies
5. The name is absent
6. Tariff Escalation and Invasive Species Risk
7. A Consistent Nonparametric Test for Causality in Quantile
8. The ultimate determinants of central bank independence
9. Momentum in Australian Stock Returns: An Update
10. Midwest prospects and the new economy
11. The name is absent
12. Insurance within the firm
13. Does Competition Increase Economic Efficiency in Swedish County Councils?
14. The name is absent
15. The name is absent
16. An Economic Analysis of Fresh Fruit and Vegetable Consumption: Implications for Overweight and Obesity among Higher- and Lower-Income Consumers
17. Inflation Targeting and Nonlinear Policy Rules: The Case of Asymmetric Preferences (new title: The Fed's monetary policy rule and U.S. inflation: The case of asymmetric preferences)
18. Evolving robust and specialized car racing skills
19. CONSIDERATIONS CONCERNING THE ROLE OF ACCOUNTING AS INFORMATIONAL SYSTEM AND ASSISTANCE OF DECISION
20. Spatial Aggregation and Weather Risk Management