Implementation of Rule Based Algorithm for Sandhi-Vicheda Of Compound Hindi Words



Provided by Cognitive Sciences ePrint Archive

IJCSI International Journal of Computer Science Issues, Vol. 3, 2009                                 45

ISSN (Online): 1694-0784

ISSN (Print): 1694-0814

Implementation of Rule Based Algorithm for Sandhi-Vicheda Of
Compound Hindi Words

Priyanka Gupta1 ,Vishal Goyal 2
1
M.Tech. (ICT) Student, 2Lecturer
Department of Computer Science
Punjabi University Patiala

Abstract

Sandhi means to join two or more words to coin new
word. Sandhi literally means `putting together' or
combining (of sounds), It denotes all combinatory
sound-changes effected (spontaneously) for ease of
pronunciation. Sandhi-vicheda describes [5] the process
by which one letter (whether single or cojoined) is
broken to form two words. Part of the broken letter
remains as the last letter of the first word and part of the
letter forms the first letter of the next letter. Sandhi-
Vicheda is an easy and interesting way that can give
entirely new dimension that add new way to traditional
approach to Hindi Teaching. In this paper using the
Rule based algorithm we have reported an accuracy of
60-80% depending upon the number of rules to be
implemented.

Keywords: Rule Based Algorithm, Sandhi-Vicheda,
Compound Hindi Words

I INTRODUCTION

Natural Language Processing (NLP) refers to
descriptions that attempt to make the computers
analyze, understand and generate natural languages,
enabling one to address a computer in a manner as one
is addressing a human being. Natural Language
Processing is both a modern computational technology
and a method of investigating and evaluating claims
about human language itself. It is a subfield of artificial
intelligence and computational linguistics. It studies the
problems of automated generation and understanding
of natural human languages.

A word can be defined as a sequence of
characters delimited by spaces, punctuation marks, etc.
in case of written text. A compound word (also known
as co-joined word) can be broken up into two or more
independent words. A Sandhi-Vicheda module breaks
the compound word in a sentence into constituent
words. Sandhis take place whenever there is a presence
of a swara i.e.a vowel; the presence of a consonant
with a halanta; the presence of a visarga. Sanskrit has a
well defined set of rules for Sandhi-vicheda. But Hindi
has its own rules of Sandhi-vicheda. They are,
however, not so well-defined as, and much fewer in
number than, the Sanskrit rules.

1.1 The Hindi Language

Hindi is spoken in northern and central India. Linguists
think of Hindi and Urdu as the same language, the
difference being that Hindi [5] is written in the
Devanagari script and draws much of its vocabulary
from Sanskrit, while Urdu is written in the Persian
script and draws a great deal of its vocabulary from
Persian and Arabic. More than 180 million people in
India regard Hindi as their mother tongue. Another 300
million use it as second language. Hindi is the national
language of India and is spoken by almost half a billion
people in India and throughout the world and is the
world's second most spoken language. It allows you to
communicate with a far wider variety of people in
India than English which is only spoken by around five
percent of the population. It is written in an easy to
learn phonetic script called “
Devanagari” which is also
used to write Sanskrit, Marathi and Nepali. Hindi is
normally spoken using a combination of 52 sounds, ten
vowels, 40 consonants, nasalisation and a kind of
aspiration. These sounds are represented in the
Devanagari script by 52 symbols: for ten vowels, two
modifiers and 40 consonants.

II RELATED WORK

Sandhi (in linguistics) [1] is a cover term for a wide
variety of phonological processes that occur at
morpheme or word boundaries, such as the fusion of
sounds across word boundaries and the alteration of
sounds due to neighboring sounds or due to the
grammatical function of adjacent words.
Internal
sandhi
features the alteration of sounds within words
at morpheme boundaries, as in
sympathy (syn- +
pathy
). External sandhi refers to changes found at
word boundaries, such as in the pronunciation [t
εm
b
ʊks] for ten books. This is not true of all dialects of
English. The
Linking R of some dialects of English is a
kind of external sandhi, as is the process called
liaison
in the French language. While it may be extremely
common in speech, sandhi (especially external) is
typically ignored in spelling, as is the case in English,
with the exception of the distinction between "a" and

IJCSI



More intriguing information

1. The name is absent
2. The name is absent
3. Cryothermal Energy Ablation Of Cardiac Arrhythmias 2005: State Of The Art
4. BODY LANGUAGE IS OF PARTICULAR IMPORTANCE IN LARGE GROUPS
5. The name is absent
6. The name is absent
7. Models of Cognition: Neurological possibility does not indicate neurological plausibility.
8. Economies of Size for Conventional Tillage and No-till Wheat Production
9. The name is absent
10. 03-01 "Read My Lips: More New Tax Cuts - The Distributional Impacts of Repealing Dividend Taxation"
11. The name is absent
12. Natural Resources: Curse or Blessing?
13. DETERMINANTS OF FOOD AWAY FROM HOME AMONG AFRICAN-AMERICANS
14. The resources and strategies that 10-11 year old boys use to construct masculinities in the school setting
15. Conflict and Uncertainty: A Dynamic Approach
16. The name is absent
17. Synthesis and biological activity of α-galactosyl ceramide KRN7000 and galactosyl (α1→2) galactosyl ceramide
18. Valuing Farm Financial Information
19. The changing face of Chicago: demographic trends in the 1990s
20. Barriers and Limitations in the Development of Industrial Innovation in the Region