
Publication date: 1 de June, 2021


LID is a language identifier for classifying a web page according to the language in which it is written [Silva et al, 2006 and 2007]. It identifies the languages for which it was trained and also identifies if a language us inknkown to it. It works with n-grams of characters that are selected according to their language discriminant value. It is rather robust as it can be trained for very difficult tasks related to language identification, namely for discriminating variants of the same language like Brazilian and European Portuguese.


Joaquim Ferreira da Silva, Gabriel Pereira Lopes,

Date 01/03/2007