Detail

Publication date: 1 de June, 2021

Compressed Bilingual Text Framework

This prototype manages collections of parallel texts, in two different languages, using compressed data structures. We say that two texts are parallel when one is a translation of the other and vice-versa.

With such framework, it is possible to index in main memory huge text collections, while it supports linear time queries operations.

The framework also represents the parallel text alignment, namely the segments_A of text_en is translated by segment_B of the a text_pt, with text_en and text_pt being parallel texts.

Such framework to be used in several Machine Translation tasks such as: concordancer, extraction of translation candidates, parallel text alignment, etc.

Authors

Gabriel Pereira Lopes, Luís Gomes, Jorge Costa, Ankica Barisic,

Date 01/01/2013