This article presents a lexical analysis of collocations from the Janes and Kres corpora of Slovene. The results of the analysis are shown to be of interest in the monitoring of lexical innovations in Slovene vocabulary. The extracted data were addressed in terms of novelty, typical collocations and set phrases, as well as semantic shifts. The linguistic analysis of the extracted collocations shows, that a contrastive comparison can be used to identify the main characteristics and trends regarding lexical innovations, as well as to highlight their problematic aspects–e.g., when lexical innovations–particularly when under the influence of foreign language elements–also introduce changes in spelling and syntactic features.
COBISS.SI-ID: 1538097859
This paper presents the categorization of verbal multi-word expressions (VMWEs) according to the PARSEME COST Action Shared Task 1.1 Guidelines. The categorization is universal but takes into account the characteristics of the individual included languages. It was used to annotate 13,511 sentences of the Slovene ssj500k 2.0 training corpus, which resulted in 3,364 identified VMWEs categorized as inherently reflexive verbs, light verb constructions, inherently adpositional verbs, and verbal idioms. The paper presents both the quantitative and qualitative results of the analysis and compares the suggested categorization system to existing work on VMWEs in Slovene linguistics.
COBISS.SI-ID: 1538298563
This study identifies, analyses and compares dictionary-relevant formulaic sequences in reference corpora of written and spoken Slovenian. The sequences were identified using a semi-automatic approach, whereby the most frequently recurring word combinations in each corpus were ranked according to their statistical salience and manually inspected for formulaic expressions with lexicographic relevance. Despite its semantic heterogeneity, the resulting list illustrates the distinct characteristics of formulaic multi-word expressions, such as high frequency of usage, prevalent inclusion of grammatical words and common non-propositional meaning, especially in speech, where research revealed numerous understudied formulaic expressions.
COBISS.SI-ID: 24446723
Using the example of nouns, the paper presents the methodology for the expansion of the Sloleks lexicon with morphological patterns. First, the patterns were extracted from the lexicon using morphosyntactic tags and mutable word parts. We then manually separated patterns that are systemic and based on actual language use from examples extracted because of noise attributable to either the extraction method or inconsistencies in Sloleks; arranged patterns into groups based on their relatedness; and defined variability at the level of word forms. We show that the chosen empirically based approach could improve accuracy, clarity and comprehensibility of the current grammatical description of modern Slovene.
COBISS.SI-ID: 69559906
The paper presents the results of a manual annotation of a Slovene training corpus with multi-word units (MWUs) relevant for inclusion in a lexicon of Slovene MWUs. The findings will help revise the criteria for the identification and categorization of dictionary-relevant MWUs in relation to free phrases, as well as more clearly define the distinction between the lexicalized elements of MWUs and the more or less stable elements of their textual environment, which will be useful when determining the canonical forms of MWUs in the lexicon on one hand and their relation to their variable elements and syntactic conversions on the other.
COBISS.SI-ID: 1538554563