ROBUST CLASSIFICATION WITH CONTEXT-SENSITIVE FEATURES

MACHINE LEARNING

269

DEFINITION OF CONTEXT

This section presents a precise definition of context.
Let C be a finite set of classes. Let F be an n -dimensional
feature space. Let X = (x0, x 1, ..., xn) be a member of
C × F; that is, (x 1, ..., xn) ∈ F and x0 ∈ C. We will use
x to represent a variable and a = (a0, a 1, ..., an) to
represent a constant in C × F. Let p be a probability dis-
tribution defined on C × F. In the definitions that follow,
we will assume that p is a discrete distribution. It is easy
to extend these definitions for the continuous case.

Primary Feature: Feature xi (where 1 ≤ i ≤ n) is a
primary feature for predicting the class x 0 when there is a
value ai of xi and there is a value a0 of x0 such that:

P(x0 = a0∣ x_i = a_i) ≠P(x0 = a0) (1)

In other words, the probability that x0 = a0, given
xi = ai, is different from the probability that x0 = a0.

Contextual Feature: Feature xi (where 1 ≤ i ≤ n) is a
contextual feature for predicting the class x0 when xi is
not a primary feature for predicting the class x0 and there
is a value a of x such that:

p(x0 = a0∣ x 1 = a 1, ., xn = an)≠

P(^x0 = ^a0∣ ^x ₁ = ^a ₁^, ∙∙∙^,^x_i_-₁ = ^a_i_-₁^, (2)

^xi + 1 = ^ai + 1,..∙, ^xn = ^an)

In other words, if xi is a contextual feature, then we can
make a better prediction when we know the value ai of xi
than we can make when the value is unknown, assuming
that we know the values of the other features,
^x 1,; ^x, - 1, ^x, + 1, ∙, xn .

The definitions above refer to the class x0. In the
following, we will assume that the class is fixed, so that
we do not need to explicitly mention the class.

Irrelevant Feature: Feature xi (where 1 ≤ i ≤ n) is an
irrelevant feature when xi is neither a primary feature nor
a contextual feature.

Context-Sensitive Feature: A primary feature xi is
context-sensitive to a contextual feature Xj when there are
values a0, ai, and θj, such that:

p(x0 = a0∣ xi = ai, x_j = a_j) ≠p(x0 = a0∣ xi = ai) (3)

The primary concern here is strategies for handling
context-sensitive features.

Table 1 illustrates the above definitions. Since
p (x 0 = 1) = 0.5 and p (x 0 = 1∣ x 1 = 1) = 0.44, it follows
that x 1 is a primary feature:

p(^x0 = 1) ≠ p(x0 = 1∣ x ₁ = 1) (4)

Since p(x0 = a0∣ x2 = a2) equals p(x0 = a0) for all values
a0 and a₂, it follows that x₂ is not a primary feature.
However, x₂ is not an irrelevant feature, since:

p (x 0 = 1∣ x 1 = 1, x 2 = 1, x 3 = 1)
≠ p (x 0 = 1∣ x 1 = 1, x 3 = 1)

(5)

Therefore x₂ is a contextual feature. Furthermore, primary
feature x 1 is context-sensitive to the contextual feature
x ₂, since:

p (x 0 = 1∣ x ₁ = 1, x ₂ = 1) = 0.53 (6)

and

p (x 0 = 1∣ x 1 = 1) = 0.44 (7)

Finally, x₃ is an irrelevant feature, since, for all values a0,
a 1, a2, and a3:

p⁽^x0 = ^a0∣ ^x 1

= a 1, x 2

= a2, x3 = a3)

= p (x0 = a 01 x 1 = a 1, x2 = a 2)

(8)

When p is unknown, it is often possible to use back-
ground knowledge to distinguish primary, contextual, and

Table 1: Examples of the different types of features.

class ^x 0	primary ^x 1	contextual ^x 2	irrelevant ^x 3	probability p
0	0	0	0	003
0	0	0	1	0.03
0	0	1	0	0.08
0	0	1	1	0.08
0	1	0	0	0.07
0	1	0	1	0.07
0	1	1	0	0.07
0	1	1	1	0.07
1	0	0	0	0.07
1	0	0	1	0.07
1	0	1	0	0.07
1	0	1	1	0.07
1	1	0	0	0.03
1	1	0	1	0.03
1	1	1	0	0.08
1	1	1	1	0.08

More intriguing information

1. Putting Globalization and Concentration in the Agri-food Sector into Context
2. The name is absent
3. fMRI Investigation of Cortical and Subcortical Networks in the Learning of Abstract and Effector-Specific Representations of Motor Sequences
4. BARRIERS TO EFFICIENCY AND THE PRIVATIZATION OF TOWNSHIP-VILLAGE ENTERPRISES
5. The name is absent
6. Distortions in a multi-level co-financing system: the case of the agri-environmental programme of Saxony-Anhalt
7. A Consistent Nonparametric Test for Causality in Quantile
8. Licensing Schemes in Endogenous Entry
9. Evolutionary Clustering in Indonesian Ethnic Textile Motifs
10. Does Competition Increase Economic Efficiency in Swedish County Councils?