The digital age brings dramatic changes to language and communication; its effects can be seen in the ways we use language, the channels we use to communicate and the manners in which ideas are spread. From the other end of the spectrum, our linguistic behaviour, communications and knowledge are transformed into data which can be used or bought to feed intelligent technologies. The article presents a bird's eye view of this dynamics of change, first by focusing on the impact of digitisation on language itself, further by analysing current trends in the language industry where traditional services are being replaced by technology- and data-driven solutions, and finally by exploring the impact of these technologies on man and society at large. We make a case for digital linguistics as an interdisciplinary field of study which adopts a human-centred approach to the sociolinguistic, technological, economic, infrastructural and ethical issues emerging with regard to language in the digital age.
B.04 Guest lecture
COBISS.SI-ID: 40839683The results of a manual annotation of a Slovene training corpus with multi-word units (MWUs) relevant for inclusion in a lexicon of Slovene MWUs are presented. We analyze the annotations in terms of (a) the frequency with which a string has been identified as a MWU, (b) the degree to which the annotators agree on the category of the identified MWU, and (c) the degree to which the annotators agree on the range of the MWU in terms of its lexicalized elements. The results of the analysis will be useful in different stages of the compilation of a Slovene MWU lexicon. The list of dictionary-relevant MWUs obtained in the annotation task will be used to enrich the lexicon and to train models for the automatic identification of MWUs in running text. The findings will also help revise the criteria for the identification and categorization of dictionary-relevant MWUs in relation to free phrases, as well as more clearly define the distinction between the lexicalized elements of MWUs and the more or less stable elements of their textual environment, which will be useful when determining the canonical forms of MWUs in the lexicon on one hand and their relation to their variable elements and syntactic conversions on the other.
B.03 Paper at an international scientific conference
COBISS.SI-ID: 1538554563This paper presents the Slovene Training Corpus ssj500k 2.2, which has been annotated on the levels of tokenization, sentence segmentation, part-of-speech tagging, lemmatization, syntactic dependencies, named entities, verbal multi-word expressions, and semantic role labeling. It describes the individual layers of annotation and shows the scope of using the training corpus in the production of various lexicons, such as the lexicon of multi-word units and the valency lexicon of modern Slovene. It concludes by presenting our future work, i.e. the annotation of multi-word expressions based on the Slovene Lexical Database.
B.03 Paper at an international scientific conference
COBISS.SI-ID: 70824034Visiting professor with a series of invited lectures at the International Summer School of Beijing Foreign Studies University, Beijing, 16-26-. jul. 2019.
B.05 Guest lecturer at an institute/university
COBISS.SI-ID: 70999906The monograph is the result of the mutual respect and fruitful cooperation of several researchers. As interest in academic discourse has been growing in Slovenia, as well as in the wider region, and shared research paradigms that take into consideration cross-cultural encounters in academic contexts are emerging, it seems important to create opportunities for interaction among scholars juxtaposing different lingua-cultures. With this edited volume, we wished to provide such an opportunity by bringing together researchers examining different language combinations, including those contrasting English as an academic lingua franca and L1 discourse, as well as experts investigating other languages and cultures. A central and recurring theme of the volume is the focus on the dynamic evolution of academic discourse conventions through language contact predominantly in Slovene, but also, in the context of the region, in Croatian and Serbian.
C.02 Editorial board of a national monograph
COBISS.SI-ID: 304629760