Record Details

Synset Based Multilingual Dictionary: Insights, Applications and Challenges

DSpace at IIT Bombay


Field	Value

Title	Synset Based Multilingual Dictionary: Insights, Applications and Challenges

Creator	MOHANTY, RK BHATTACHARYYA, P KALELE, S PANDEY, P SHARMA, A KOPRA, M

Subject	multilingual dictionary dictionary standardization concept based dictionary light weight wsd and lexical choice multilingual dictionary database

Description	In this paper, we report our effort at the standardization, design and partial implementation of a multilingual dictionary in the context of three large scale projects, viz., (i) Cross Lingual Information Retrieval, (ii) English to Indian Language Machine Translation, and (iii) Indian Language to Indian Language Machine Translation. These projects are large scale, because each project involves 8-10 partners spread across the length and breadth of India with great amount of language diversity. The dictionary is based not on words but on WordNet SYNSETS, i. e., concepts. Identical dictionary architecture is used for all the three projects, where source to target language transfer is initiated by concept to concept mapping. The whole dictionary can be looked upon as an M X N matrix where M is the number of synsets (rows) and N is the number of languages (columns). This architecture maps the lexeme(s) of one language-standing for a concept-with the lexeme(s) of other languages standing for the same concept. In actual usage, a preliminary WSD identifies the correct row for a word and then a lexical choice procedure identifies the correct target word from the corresponding synset. Currently the multilingual dictionary is being developed for 11 languages: English, Hindi, Bengali, Marathi, Punjabi, Urdu, Tamil, Kannada, Telugu, Malayalam and Oriya. Our work with this framework makes us aware of many benefits of this multilingual concept based scheme over language pair-wise dictionaries. The pivot synsets, with which all other languages link, come from Hindi. Interesting insights emerge and challenges are faced in dealing with linguistic and cultural diversities. Economy of representation is achieved on many fronts and at many levels. We have been eminently assisted by our long standing experience in building the WordNets of two major languages of India, viz., Hindi and Marathi which rank 5th (similar to 500 million) and 14th (similar to 70 million) respectively in the world in terms of the number of people speaking these languages.

Publisher	UNIV SZEGED, DEPT INFORMATICS

Date	2011-09-01T08:17:40Z 2011-12-26T12:59:33Z 2011-12-27T05:51:54Z 2011-09-01T08:17:40Z 2011-12-26T12:59:33Z 2011-12-27T05:51:54Z 2007

Type	Article

Identifier	GWC 2008: FOURTH GLOBAL WORDNET CONFERENCE, PROCEEDINGS, (), 321-333 http://dspace.library.iitb.ac.in/xmlui/handle/10054/12718 http://hdl.handle.net/10054/12718

Language	en

ICAR Research Data Repository for Knowledge Management

Record Details

Synset Based Multilingual Dictionary: Insights, Applications and Challenges

DSpace at IIT Bombay