Globalisation is challenging the full functionality of Slovene language, more specifically terminology. However, a new research model using an exemplary discipline may hold the key to standardising terminology in future fields - the TERMIS project.
F.18 Transfer of new know-how to direct users (seminars, fora, conferences)
COBISS.SI-ID: 31431261The paper describes Termania, a free dictionary portal with a search engine and an online dictionary editor. Termania is designed to become a central exchange place for terminological and other lexicographic data for Slovene and subsequently for other languages. The portal is aimed at general web users and thus the primary goal of its design is user-friendliness which on the other hand does not prevent the implementation of more advanced features such as editorial management system, the use of language technologies for language data extraction from text corpora and similar.
F.07 Improvements to an existing product
COBISS.SI-ID: 26419239The paper describes Obeliks, a new statistical tagger for Slovene developed within the "Communication in Slovene" project. The new tool consists of three modules: a rule-based sentence splitter and tokenizer, a morphosyntactic tagger, and a version of the LemmaGen lemmatizer which works in combination with the tagger. Obeliks is trained on the ssj500k corpus tagged according to the JOS tagset. In the JOS system which includes 1,903 possible tags, the tagger achieved 91.34% accuracy for all tags and 98.30% for POS only. Lemmatization accuracy is 97.88% with capitalization included and 98.55% for all-lowercase letters. The paper presents the design of the tagger and the analysis of the tagging accuracy. Obeliks is freely available for download on the Web.
F.06 Development of a new product
COBISS.SI-ID: 26418983