Implementation of Rule Based Algorithm for Sandhi-Vicheda Of Compound Hindi Words



Provided by Cognitive Sciences ePrint Archive

IJCSI International Journal of Computer Science Issues, Vol. 3, 2009                                 45

ISSN (Online): 1694-0784

ISSN (Print): 1694-0814

Implementation of Rule Based Algorithm for Sandhi-Vicheda Of
Compound Hindi Words

Priyanka Gupta1 ,Vishal Goyal 2
1
M.Tech. (ICT) Student, 2Lecturer
Department of Computer Science
Punjabi University Patiala

Abstract

Sandhi means to join two or more words to coin new
word. Sandhi literally means `putting together' or
combining (of sounds), It denotes all combinatory
sound-changes effected (spontaneously) for ease of
pronunciation. Sandhi-vicheda describes [5] the process
by which one letter (whether single or cojoined) is
broken to form two words. Part of the broken letter
remains as the last letter of the first word and part of the
letter forms the first letter of the next letter. Sandhi-
Vicheda is an easy and interesting way that can give
entirely new dimension that add new way to traditional
approach to Hindi Teaching. In this paper using the
Rule based algorithm we have reported an accuracy of
60-80% depending upon the number of rules to be
implemented.

Keywords: Rule Based Algorithm, Sandhi-Vicheda,
Compound Hindi Words

I INTRODUCTION

Natural Language Processing (NLP) refers to
descriptions that attempt to make the computers
analyze, understand and generate natural languages,
enabling one to address a computer in a manner as one
is addressing a human being. Natural Language
Processing is both a modern computational technology
and a method of investigating and evaluating claims
about human language itself. It is a subfield of artificial
intelligence and computational linguistics. It studies the
problems of automated generation and understanding
of natural human languages.

A word can be defined as a sequence of
characters delimited by spaces, punctuation marks, etc.
in case of written text. A compound word (also known
as co-joined word) can be broken up into two or more
independent words. A Sandhi-Vicheda module breaks
the compound word in a sentence into constituent
words. Sandhis take place whenever there is a presence
of a swara i.e.a vowel; the presence of a consonant
with a halanta; the presence of a visarga. Sanskrit has a
well defined set of rules for Sandhi-vicheda. But Hindi
has its own rules of Sandhi-vicheda. They are,
however, not so well-defined as, and much fewer in
number than, the Sanskrit rules.

1.1 The Hindi Language

Hindi is spoken in northern and central India. Linguists
think of Hindi and Urdu as the same language, the
difference being that Hindi [5] is written in the
Devanagari script and draws much of its vocabulary
from Sanskrit, while Urdu is written in the Persian
script and draws a great deal of its vocabulary from
Persian and Arabic. More than 180 million people in
India regard Hindi as their mother tongue. Another 300
million use it as second language. Hindi is the national
language of India and is spoken by almost half a billion
people in India and throughout the world and is the
world's second most spoken language. It allows you to
communicate with a far wider variety of people in
India than English which is only spoken by around five
percent of the population. It is written in an easy to
learn phonetic script called “
Devanagari” which is also
used to write Sanskrit, Marathi and Nepali. Hindi is
normally spoken using a combination of 52 sounds, ten
vowels, 40 consonants, nasalisation and a kind of
aspiration. These sounds are represented in the
Devanagari script by 52 symbols: for ten vowels, two
modifiers and 40 consonants.

II RELATED WORK

Sandhi (in linguistics) [1] is a cover term for a wide
variety of phonological processes that occur at
morpheme or word boundaries, such as the fusion of
sounds across word boundaries and the alteration of
sounds due to neighboring sounds or due to the
grammatical function of adjacent words.
Internal
sandhi
features the alteration of sounds within words
at morpheme boundaries, as in
sympathy (syn- +
pathy
). External sandhi refers to changes found at
word boundaries, such as in the pronunciation [t
εm
b
ʊks] for ten books. This is not true of all dialects of
English. The
Linking R of some dialects of English is a
kind of external sandhi, as is the process called
liaison
in the French language. While it may be extremely
common in speech, sandhi (especially external) is
typically ignored in spelling, as is the case in English,
with the exception of the distinction between "a" and

IJCSI



More intriguing information

1. Dynamiques des Entreprises Agroalimentaires (EAA) du Languedoc-Roussillon : évolutions 1998-2003. Programme de recherche PSDR 2001-2006 financé par l'Inra et la Région Languedoc-Roussillon
2. What should educational research do, and how should it do it? A response to “Will a clinical approach make educational research more relevant to practice” by Jacquelien Bulterman-Bos
3. Empirically Analyzing the Impacts of U.S. Export Credit Programs on U.S. Agricultural Export Competitiveness
4. Nach der Einführung von Arbeitslosengeld II: deutlich mehr Verlierer als Gewinner unter den Hilfeempfängern
5. The name is absent
6. ADJUSTMENT TO GLOBALISATION: A STUDY OF THE FOOTWEAR INDUSTRY IN EUROPE
7. The Making of Cultural Policy: A European Perspective
8. The name is absent
9. Regional science policy and the growth of knowledge megacentres in bioscience clusters
10. A Bayesian approach to analyze regional elasticities
11. Mergers and the changing landscape of commercial banking (Part II)
12. Tourism in Rural Areas and Regional Development Planning
13. The name is absent
14. Foreign Direct Investment and the Single Market
15. The name is absent
16. The name is absent
17. FOREIGN AGRICULTURAL SERVICE PROGRAMS AND FOREIGN RELATIONS
18. The name is absent
19. Evolutionary Clustering in Indonesian Ethnic Textile Motifs
20. The name is absent