1.

CLASSLA: the CLARIN Knowledge Centre for South Slavic languages

The CLARIN Knowledge Centre for South Slavic languages (CLASSLA) was established in 2019. The centre offers expertise on language resources and technologies for South Slavic languages. Its basic activities are giving researchers, students, citizen scientists and other interested parties information on the available resources and technologies via its documentation, supporting them in producing, modifying or publishing resources and technologies via its helpdesk and organizing training activities. CLASSLA is operated by CLARIN.SI and CLADA-BG. Webpage: https://www.clarin.si/info/k-centre

D.02 Establishment of a research centre, laboratory, study course, association

2.

Centre for language resources and technologies

The Centre for Language Resources and Technologies at the University of Ljubljana (CJVT UL) is a research unit focusing on scientific research as well as the development and maintenance of key digital language resources and language technology applications for contemporary Slovene. The developed resources and applications have practical value and are accessible to all the Slovene language users around the world. CJVT has been founded to ensure a systematic long-term development of technologies, resources and tools for Slovene, enabling it to keep up with other languages in the digital world. The University of Ljubljana, where many study programmes and research programmes are taking place and where many researchers are based, provides an interdisciplinary institutional framework, where our main vision focuses on a well-developed language infrastructure for Slovene. We list one of the dictionaries created and maintained by CJVT UL.

D.07 Presiding over a centre/laboratory

COBISS.SI-ID: 294177280

3.

Gigafida 2.0 - corpus of standard written Slovene

Gigafida, currently available in version 2.0, is a reference corpus of written Slovene. It comprises texts that have been selected and automatically processed with the aim of creating a corpus that represents a sample of modern standard Slovene and can be used for research in linguistics and other branches of the humanities, for compiling modern dictionaries, grammars, and learning materials, as well as for developing language technologies for Slovene. At the 36th Slovenian Book Fair (2020), the Gigafida corpus received a special award in the field of e-publishing which is given - as part of the Book of the Year award - for a project with the most imaginative, fresh and specific solutions within digital platforms related to books: https: //www.knjiznisejem.si/index.php/sl/nagrade.

E.01 National awards

COBISS.SI-ID: 18023939

4.

Collocations dictionary of modern Slovene KSSS 1.0

The database of the Collocations Dictionary of Modern Slovene 1.0 contains entries for 35,862 headwords and 7,310,983 collocations. The collocations were automatically extracted from the Gigafida reference corpus with the use of predefined linguistic parameters. The database is an example of language data for modern Slovene compiled with advanced interdisciplinary procedures and published under an open licence at the CLARIN.SI repository. In the last two years, we have published 19 databases and training corpora, including the Morphological Lexicon of Slovene with machine-assigned accents, Multiword Expressions lexicon, Valency lexicon, Reference List of Slovene Frequent Common Words, and other resources.

F.15 Development of a new information system/databases

COBISS.SI-ID: 20172291

P6-0411 — Interim report

1.

CLASSLA: the CLARIN Knowledge Centre for South Slavic languages

2.

Centre for language resources and technologies

3.

Gigafida 2.0 - corpus of standard written Slovene

4.

Collocations dictionary of modern Slovene KSSS 1.0