This paper describes the process of the creation of a reference speech corpus and its distribution to potential users, as it was done in the case of the Slovene corpus GOS. The corpus structure and experiences with recording, labelling system, and two levels of transcription (pronunciationbased and standardized) are described, as well as the main characteristics of the corpus interface (web concordancer) and the availability of the original corpus files.
COBISS.SI-ID: 16771606
In recent years, authorship attribution has become a hot topic of interest because of its usefulness in the fields of law (plagiarism), criminology (threat letters), literary history (pseudonyms), and commercial research (client profiling). In the present paper, we use the method of support vector machine (SVM) in order to compare an anonymous text called "voters in sportsuits" to 75 texts written by 21 known athors. The results show that the features of one of the authors are similar to the anonymous text according to vocabulary variability, Brunet formula, and frequency of hapyx.
COBISS.SI-ID: 51943522
The main goal of the project activity “Pedagogical grammar portal” is to establish which are the real problems that pupils and students encouter when writing in standard Slovene, and to offer them explanations and solutions to these problems in an interesting and easy-to-understand form.
COBISS.SI-ID: 35714349