ROBUST CLASSIFICATION WITH CONTEXT-SENSITIVE FEATURES



MACHINE LEARNING

269


DEFINITION OF CONTEXT

This section presents a precise definition of context.
Let
C be a finite set of classes. Let F be an n -dimensional
feature space. Let
X = (x0, x 1, ..., xn) be a member of
C × F; that is, (x 1, ..., xn) F and x0 C. We will use
x to represent a variable and a = (a0, a 1, ..., an) to
represent a constant in
C × F. Let p be a probability dis-
tribution defined on
C × F. In the definitions that follow,
we will assume that
p is a discrete distribution. It is easy
to extend these definitions for the continuous case.

Primary Feature: Feature xi (where 1 i n) is a
primary feature for predicting the class x 0 when there is a
value
ai of xi and there is a value a0 of x0 such that:

P(x0 = a0 xi = ai) ≠P(x0 = a0)            (1)

In other words, the probability that x0 = a0, given
xi = ai, is different from the probability that x0 = a0.

Contextual Feature: Feature xi (where 1 in) is a
contextual feature for predicting the class x0 when xi is
not a primary feature for predicting the class x0 and there
is a value
a of x such that:

p(x0 = a0 x 1 = a 1, ., xn = an)≠

P(x0 = a0 x 1 = a 1, ∙∙∙, xi - 1 = ai - 1,           (2)

xi + 1 = ai + 1,..∙, xn = an)

In other words, if xi is a contextual feature, then we can
make a better prediction when we know the value
ai of xi
than we can make when the value is unknown, assuming
that we know the values of the other features,
x 1,; x, - 1, x, + 1, ∙, xn .

The definitions above refer to the class x0. In the
following, we will assume that the class is fixed, so that
we do not need to explicitly mention the class.

Irrelevant Feature: Feature xi (where 1 in) is an
irrelevant feature when xi is neither a primary feature nor
a contextual feature.

Context-Sensitive Feature: A primary feature xi is
context-sensitive to a contextual feature Xj when there are
values
a0, ai, and θj, such that:

p(x0 = a0 xi = ai, xj = aj) ≠p(x0 = a0 xi = ai)     (3)

The primary concern here is strategies for handling
context-sensitive features.

Table 1 illustrates the above definitions. Since
p (x 0 = 1) = 0.5 and p (x 0 = 1 x 1 = 1) = 0.44, it follows
that
x 1 is a primary feature:

p(x0 = 1) ≠ p(x0 = 1 x 1 = 1)              (4)

Since p(x0 = a0 x2 = a2) equals p(x0 = a0) for all values
a0 and a2, it follows that x2 is not a primary feature.
However,
x2 is not an irrelevant feature, since:

p (x 0 = 1 x 1 = 1, x 2 = 1, x 3 = 1)
p (x 0 = 1 x 1 = 1, x 3 = 1)

(5)


Therefore x2 is a contextual feature. Furthermore, primary
feature
x 1 is context-sensitive to the contextual feature
x 2, since:

p (x 0 = 1 x 1 = 1, x 2 = 1) = 0.53           (6)

and

p (x 0 = 1 x 1 = 1) = 0.44               (7)

Finally, x3 is an irrelevant feature, since, for all values a0,
a 1, a2, and a3:

p(x0 = a0 x 1


= a 1, x 2


= a2, x3 = a3)


= p (x0 = a 01 x 1 = a 1, x2 = a 2)

(8)


When p is unknown, it is often possible to use back-
ground knowledge to distinguish primary, contextual, and

Table 1: Examples of the different types of features.

class

x 0

primary

x 1

contextual

x 2

irrelevant

x 3

probability
p

0

0

0

0

003

0

0

0

1

0.03

0

0

1

0

0.08

0

0

1

1

0.08

0

1

0

0

0.07

0

1

0

1

0.07

0

1

1

0

0.07

0

1

1

1

0.07

1

0

0

0

0.07

1

0

0

1

0.07

1

0

1

0

0.07

1

0

1

1

0.07

1

1

0

0

0.03

1

1

0

1

0.03

1

1

1

0

0.08

1

1

1

1

0.08



More intriguing information

1. The name is absent
2. The name is absent
3. Workforce or Workfare?
4. ADJUSTMENT TO GLOBALISATION: A STUDY OF THE FOOTWEAR INDUSTRY IN EUROPE
5. Happiness in Eastern Europe
6. Death as a Fateful Moment? The Reflexive Individual and Scottish Funeral Practices
7. Inflation Targeting and Nonlinear Policy Rules: The Case of Asymmetric Preferences (new title: The Fed's monetary policy rule and U.S. inflation: The case of asymmetric preferences)
8. LOCAL CONTROL AND IMPROVEMENT OF COMMUNITY SERVICE
9. BILL 187 - THE AGRICULTURAL EMPLOYEES PROTECTION ACT: A SPECIAL REPORT
10. Towards Teaching a Robot to Count Objects
11. Do the Largest Firms Grow the Fastest? The Case of U.S. Dairies
12. Dual Inflation Under the Currency Board: The Challenges of Bulgarian EU Accession
13. The name is absent
14. The name is absent
15. The name is absent
16. Structural Influences on Participation Rates: A Canada-U.S. Comparison
17. Estimating the Technology of Cognitive and Noncognitive Skill Formation
18. The name is absent
19. A THEORETICAL FRAMEWORK FOR EVALUATING SOCIAL WELFARE EFFECTS OF NEW AGRICULTURAL TECHNOLOGY
20. Impact of Ethanol Production on U.S. and Regional Gasoline Prices and On the Profitability of U.S. Oil Refinery Industry