Annotation Old-Church Slavonic Lexical Processing Cultural Heritage
Issue Date:
2004
Publisher:
Institute of Information Theories and Applications FOI ITHEA
Abstract:
This work presents a software package ACT (Annotated Corpora of Text) for lexical and corpus
processing of European written cultural sources (currently used for processing of mediaeval Slavonic
manuscripts). I use ACT as a contribution towards a contextual and intelligent heritage Information Technology
framework. The software is suitable for capturing characteristics of old written sources including rich language
variability on word and sentential level. It is not the word-form, but its understandings/interpretations that become
central processing units, which can be assigned morphology distinctions, head-words (including recensional),
translation equivalents; these interpretations can be joined in multi-word units or assigned correlation to other
sources. The whole annotation process is automated and individual sorting orders and morphology tags
structures can easily be defined. ACT incorporates modules for: complex searches on one or more sources,
creation of various ready-to-use documents, web text and image access, incorporation of lexical card-files into a
corpus, and text-from-card-files reconstruction.
Description:
* The following text has been originally published in the Proceedings of the Language Recourses and Evaluation
Conference held in Lisbon, Portugal, 2004, under the title of "Towards Intelligent Written Cultural Heritage Processing -
Lexical processing". I present here a revised contribution of the aforementioned paper and I add here the latest efforts done
in the Center for Computational Linguistic in Prague in the field under discussion.