Identifying Bilingual Segments for Translation Generation
Oct 2014
We present an approach that uses known translation forms
in a validated bilingual lexicon and identifies bilingual stem and suffix segments. By applying the longest sequence common to pair of orthographically similar translations we initially induce the bilingual suffix transformations (replacement rules). Redundant analyses are discarded
by examining the distribution of stem pairs and associated transformations. Set of bilingual suffixes conflating various translation forms are grouped. Stem pairs sharing similar transformations are subsequently clustered which serves as a basis for the generative approach. The pri-
mary motivation behind this work is to eventually improve the lexicon coverage by utilising the correct bilingual entries in suggesting translations for OOV words. In the preliminary results, we report generation results, wherein, 90% of the generated translations are correct. This was achieved when both the bilingual segments (bilingual stem and bilingual suffix) in the bilingual pair being analysed are known to have occurred in the training data set.