Record Details

Synset Based Multilingual Dictionary: Insights, Applications and Challenges

DSpace at IIT Bombay

View Archive Info
 
 
Field Value
 
Title Synset Based Multilingual Dictionary: Insights, Applications and Challenges
 
Creator MOHANTY, RK
BHATTACHARYYA, P
KALELE, S
PANDEY, P
SHARMA, A
KOPRA, M
 
Subject multilingual dictionary
dictionary standardization
concept based dictionary
light weight wsd and lexical choice
multilingual dictionary database
 
Description In this paper, we report our effort at the standardization, design and partial implementation of a multilingual dictionary in the context of three large scale projects, viz., (i) Cross Lingual Information Retrieval, (ii) English to Indian Language Machine Translation, and (iii) Indian Language to Indian Language Machine Translation. These projects are large scale, because each project involves 8-10 partners spread across the length and breadth of India with great amount of language diversity. The dictionary is based not on words but on WordNet SYNSETS, i. e., concepts. Identical dictionary architecture is used for all the three projects, where source to target language transfer is initiated by concept to concept mapping. The whole dictionary can be looked upon as an M X N matrix where M is the number of synsets (rows) and N is the number of languages (columns). This architecture maps the lexeme(s) of one language-standing for a concept-with the lexeme(s) of other languages standing for the same concept. In actual usage, a preliminary WSD identifies the correct row for a word and then a lexical choice procedure identifies the correct target word from the corresponding synset. Currently the multilingual dictionary is being developed for 11 languages: English, Hindi, Bengali, Marathi, Punjabi, Urdu, Tamil, Kannada, Telugu, Malayalam and Oriya. Our work with this framework makes us aware of many benefits of this multilingual concept based scheme over language pair-wise dictionaries. The pivot synsets, with which all other languages link, come from Hindi. Interesting insights emerge and challenges are faced in dealing with linguistic and cultural diversities. Economy of representation is achieved on many fronts and at many levels. We have been eminently assisted by our long standing experience in building the WordNets of two major languages of India, viz., Hindi and Marathi which rank 5th (similar to 500 million) and 14th (similar to 70 million) respectively in the world in terms of the number of people speaking these languages.
 
Publisher UNIV SZEGED, DEPT INFORMATICS
 
Date 2011-09-01T08:17:40Z
2011-12-26T12:59:33Z
2011-12-27T05:51:54Z
2011-09-01T08:17:40Z
2011-12-26T12:59:33Z
2011-12-27T05:51:54Z
2007
 
Type Article
 
Identifier GWC 2008: FOURTH GLOBAL WORDNET CONFERENCE, PROCEEDINGS, (), 321-333
http://dspace.library.iitb.ac.in/xmlui/handle/10054/12718
http://hdl.handle.net/10054/12718
 
Language en