This paper describes design and compilation of a reference speech corpus and its distribution to potential users, as it was done in the case of the Slovene corpus GOS. The corpus structure and experiences with recording, labelling system, and two levels of transcription (pronunciationbased and standardized) are described, as well as the main characteristics of the corpus interface (web concordancer) and the availability of the original corpus files.
COBISS.SI-ID: 16771606
In this paper we compare an anonymous text to 75 texts written by 21 known authors. The analysis is based on support vector machines (SVM), which allows to detect differences and similarities between the compared texts on the basis of lexical and readability features. The results show that one of the authors properties resemble significantly to the anonymou text, especially according to lexical diversity, Brunet formula and hapax relative frequency.
COBISS.SI-ID: 51943522
The main challenge of the paper was to identify the most effective strategic structures when two speakers want to speak at the same time in French and in Slovene. The study is based on three spontaneous conversations which were analyzed on prosodic, morphosyntactic and discourse level. The results show that speakers who were efficient in taking their turn produced many pauses, repetitions and auto-corrections as well as longer discourse preambles than speakers who were less successful in taking (or keeping) their turns.
COBISS.SI-ID: 56746594
In this paper we examine an authentic anonymous text which provoked intense reactions in Slovenian media in 2011. Within this authorship attribution task, a corpus of 75 texts written by 21 potential authors was analysed with a predefined set of lexical and readability features. The results show that one of the candidate authors resembles the anonymous text by most of the features although it is not possible to verify whether the actual author was included into the analysis or not.
COBISS.SI-ID: 55987554
The aim of the paper is to search for common guidelines for the future development of speech databases in order to make them the most useful for both main fields of their use, linguistic research and speech technologies. We compare the Slovene speech database for automatic speech recognition – BNSI Broadcast News, and the Slovene reference speech corpus GOS, and outline possible common guidelines for future work.
COBISS.SI-ID: 17960982