1.

The SI TEDx-UM speech database

A new Slovenian spoken language resource was built from TEDx Talks. The speech database contains 242 talks in total duration of 54 hours. The annotation and transcription of acquired spoken material was generated automatically, applying acoustic segmentation and automatic speech recognition. The development and evaluation subset was also manually transcribed using the guidelines specified for the Slovenian GOS corpus. The manual transcriptions were used to evaluate the quality of unsupervised transcriptions. The average word error rate for the SI TEDx-UM evaluation subset was 50.7%. The SI TEDx-UM is a valuable new spoken language resource for Slovenian language, which belongs to the group of under-resourced languages. The SI TEDx-UM is freely available.

B.03 Paper at an international scientific conference

COBISS.SI-ID: 19822102

2.

Spoken corpus Gos VideoLectures 1.0 (transcription)

Gos Videolectures is an add-on to the Gos reference speech corpus of Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos Videolectures recordings are a selection of public lectures available through web-portal Videolectures.net provided by the Jožef Stefan Institute, and covers in its first release 4.5 hours of speech.

B.03 Paper at an international scientific conference

COBISS.SI-ID: 29725735

3.

Morphosyntactic tags in statistical machine translation of highly inflectional language

We investigate the usefulness of morphosyntactic information in statistical machine translation between English and Slovenian, which is a highly inflectional language. Translation in both directions is explored, while translation to inflectional language is a more challenging task. Morphosyntactic tags are attached to words by two different taggers (TreeTagger and Obelix) and utilized during translation in three different ways: for N-best list re-scoring, factored translation and OSM modeling. We investigate the usefulness of a complete set of morphosyntactic tags and a reduced set, containing only the most relevant morphosyntactic information. The results show that morphosyntactic information is an important factor in translation. When used in factored translation with OSM models it improves the BLEU score by almost 10% relative if translating from English to Slovenian, and by 2% if translating from Slovenian to English.

B.03 Paper at an international scientific conference

COBISS.SI-ID: 20055318

4.

Quick and efficient definition of hangbefore and hangover criteria for voice activity detection

Hangbefore and hangover criteria are integrated into VAD algorithm after basic VAD decision step. Using the hangbefore criterion, the problem of incorrect detection of unvoiced speech that occurs at the beginning and in the middle of the speech segment, can be solved. Using the hangover criterion, the problem of incorrect detection of unvoiced speech that occurs at the end and in the middle of the speech segment, can be improved. There is always a dilemma of how many frames should be taken for hangover and hangbefore criteria. The main purpose of this study was to set up the procedure, which would quickly and reliably define frames for hangbefore and hangover criteria. Our test results showed that the new quick procedure has led us to very similar final results as obtained with the previously known procedures.

B.03 Paper at an international scientific conference

COBISS.SI-ID: 19584790

5.

Member of the editorial board of the International Journal of Speech Technology

Zdravko Kačič - member o the editorial board (2003 - ) of International Journal of Speech Communication; publisher: Springer-Verlag GmbH; ISSN:1381-2416; http://www.springer.com/engineering/signals/journal/10772/PS2?detailsPage=editorialBoard

C.06 Editorial board membership

COBISS.SI-ID: 16846341

P2-0069 — Annual report 2016

1.

The SI TEDx-UM speech database

2.

Spoken corpus Gos VideoLectures 1.0 (transcription)

3.

Morphosyntactic tags in statistical machine translation of highly inflectional language

4.

Quick and efficient definition of hangbefore and hangover criteria for voice activity detection

5.

Member of the editorial board of the International Journal of Speech Technology