In proceedings details

Identifying Bilingual Segments for Translation Generation

Oct 2014

We present an approach that uses known translation forms in a validated bilingual lexicon and identifies bilingual stem and suffix segments. By applying the longest sequence common to pair of orthographically similar translations we initially induce the bilingual suffix transformations (replacement rules). Redundant analyses are discarded by examining the distribution of stem pairs and associated transformations. Set of bilingual suffixes conflating various translation forms are grouped. Stem pairs sharing similar transformations are subsequently clustered which serves as a basis for the generative approach. The pri- mary motivation behind this work is to eventually improve the lexicon coverage by utilising the correct bilingual entries in suggesting translations for OOV words. In the preliminary results, we report generation results, wherein, 90% of the generated translations are correct. This was achieved when both the bilingual segments (bilingual stem and bilingual suffix) in the bilingual pair being analysed are known to have occurred in the training data set.

Organization:

Publisher: Springer Berlin Heidelberg

Authors: Kavitha Mahesh, Luís Gomes, Gabriel Pereira Lopes

Editors:

Series: Lecture Notes in Computer Science

Volume: 8819

ISSN:

ISBN:

Url: http://http://www.springer.com/computer/database+management+%26+information+retrieval/book/978-3-319

Notes:

Bibtex Key:

DOI:

Pages: 191 to 212

Publication Date: 1 Oct 2014

Publication File: