Natural Language Multilingual Corpus Parallel Corpus Aligned Corpus Comparable Corpus Annotation
Issue Date:
2011
Publisher:
Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Citation:
Digital Presentation and Preservation of Cultural and Scientific Heritage, Vol. 1, No 1, (2011), 151p-160p
Abstract:
This article briefly reviews multilingual language resources for
Bulgarian, developed in the frame of some international projects: the first-ever
annotated Bulgarian MTE digital lexical resources, Bulgarian-Polish corpus,
Bulgarian-Slovak parallel and aligned corpus, and Bulgarian-Polish-Lithuanian
corpus. These resources are valuable multilingual dataset for language
engineering research and development for Bulgarian language. The multilingual
corpora are large repositories of language data with an important role in
preserving and supporting the world's cultural heritage, because the natural
language is an outstanding part of the human cultural values and collective
memory, and a bridge between cultures.