Implementation of Rule Based Algorithm for Sandhi-Vicheda Of Compound Hindi Words



IJCSI International Journal of Computer Science Issues, Vol. 3, 2009

46


ISSN (Online): 1694-0784

ISSN (Printed): 1694-0814

"an" (sandhi is, however, reflected in the writing
system of Sanskrit and Hindi). External sandhi effects
can sometimes become morphologized. Most tonal
languages have
Tone sandhi, in which the tones of
words alter according to pre-determined rules. For
example: Mandarin has four tones: a high monotone, a
rising tone, a falling-rising tone, and a falling tone. In
the common greeting
nï hao, both words in isolation
would normally have the falling-rising tone. However,
this is difficult to say, so the tone on
is pronounced
as
m (but still written nι in Hanyu Pinyin).

The Sanskrit Sandhi engine software is not currently
available as a standalone application, since its local use
demands the installation of an HTTP server on the
user's host.

The Sandhi module[1] developed by RCILTS-
Sanskrit, Japanese, Chinese at Jawaharlal Nehru
University, New Delhi. RCILTS, JNU is a resource
center for Sanskrit language of DIT, Government of
India. At JNU work started in three languages viz.,
Sanskrit, Japanese, and Chinese. Using this module the
user can get the information about
Sandhi rules and
processes.
Sutra number in Astyadhayi and its
description is displayed. User can learn three types of
Svara Sandhi, Vyanjan Sandhi, Hal Sandhi through this
Sandhi module Data is in Unicode. Sandhi exceptions
and options are also incorporated. This module takes
two words as input. First word cannot be null but
second word can be. A user can input the two words
and submit the form to get the result of the given input.

Chinese Tone Sandhi,[2] Cheng and Chin-Chuan
from California University, Berkeley, Phonology
Laboratory faced the problem that English stresses are
interpreted by Chinese speakers when they speak
Chinese with Engish words inserted. Chinese speakers
in the United States usually speak Chinese with Engish
words inserted. In Mandarin Chinese, a tone-sandhi
rule changes a third tone preceding another third tone
to a second tone. Using the tone-sandhi rule, they
designed the experiment to find out hoe English
stresses are interpreted in Chinese sentences. Stress
does not exist in the underlying representations of
English phonology. But in studying bilingual
phenomena, the phonetic level is also important. Fry
(1995) found that when a vowel was long and of high
intensity, listeners agreed that the vowel was strongly
stressed. The results of his experiments indicate that
the duration ratio has a stronger influence on
judgements of stress than has the intensity ratio.
Lehiste and Peterson (1959) also reported experiments
on stress.

English l-sandhi [3] involves an allophonic alternation
in alveolar contact for word-final /l/ in connected
speech [4]. EPG data for five Scottish Standard English
and five Southern Standard British English speakers
shows that there is individual and dialectal variation in
contact patterns.

III PROBLEM DEFINITION

Developing programs that understand a natural
language is a difficult task. Natural languages are large.
They contain an infinity of different sentences. No
matter how many sentences a person has heard or seen,
new ones can always be produced. Also, there is much
ambiguity in a natural language. Many words have
several meanings and sentences can have different
meanings in different contexts. Compound words are
created by joining an arbitrary number of existing
words together, and this can lead to a large increase of
the vocabulary size, and thus also to sparse data
problems. Therefore the problem of compound words
poses challenges for many NLP applications
. The
problem domain, to which this paper is concerned, is
breaking up of Hindi compound words into constituent
words. In Hindi, words are a sequence of characters.
These words are combined with ‘swar’, ‘vyanjan’, and
matra’s. Hindi has its own rules of Sandhi-vicheda.
They are, however, not so well-defined as, and much
fewer in number than, the Sanskrit rules. So my
problem is to break the compound word into
constituent words with the help of rules of ‘Sandhi-
vicheda’ in Hindi grammar. My problem is to design a
Graphical User Interface, which accepts input as a
Hindi language word (source text) from the keyboard
or mouse and break it into constituent words (target
text). The source text is converted into target text in
Unicode Format.

Compound Word

Sandhi-Vicheda

ijk/fhu

ij $ v/fhu

HfkOkFf

HfkO $ VFf

Rioiy;

f’ko $ v∣y;

dO∣un

dfO $ bn

xΦτ

ХЧ $ ⅛k

ije?oj

ije $ b?oj

,dd

,d $ ,d

;Fd

;Fkk $ ,d

ijkidfj

ij $ midj

ιfU∕∣pNn

lfU∕k $ Nn

fOpNn

fO $ Nn

IJCSI



More intriguing information

1. The name is absent
2. Biologically inspired distributed machine cognition: a new formal approach to hyperparallel computation
3. ADJUSTMENT TO GLOBALISATION: A STUDY OF THE FOOTWEAR INDUSTRY IN EUROPE
4. Business Networks and Performance: A Spatial Approach
5. The name is absent
6. The name is absent
7. The Formation of Wenzhou Footwear Clusters: How Were the Entry Barriers Overcome?
8. Place of Work and Place of Residence: Informal Hiring Networks and Labor Market Outcomes
9. The name is absent
10. The economic value of food labels: A lab experiment on safer infant milk formula
11. Income Mobility of Owners of Small Businesses when Boundaries between Occupations are Vague
12. The name is absent
13. The Tangible Contribution of R&D Spending Foreign-Owned Plants to a Host Region: a Plant Level Study of the Irish Manufacturing Sector (1980-1996)
14. Deprivation Analysis in Declining Inner City Residential Areas: A Case Study From Izmir, Turkey.
15. The name is absent
16. The Role of Immigration in Sustaining the Social Security System: A Political Economy Approach
17. Towards Learning Affective Body Gesture
18. A Study of Adult 'Non-Singers' In Newfoundland
19. Markets for Influence
20. What Lessons for Economic Development Can We Draw from the Champagne Fairs?