1.

The TTS-driven affective embodied conversational agent EVA, based on a novel conversational-behavior generation algorithm

As a result of the convergence of different services delivered over the internet protocol, internet protocol television (IPTV) may be regarded as the one of the most widespread user interfaces accepted by a highly diverse user domain. Every generation, from children to the elderly, can use IPTV for recreation, as well as for gaining social contact and stimulating the mind. However, technological advances in digital platforms go hand in hand with the complexity of their user interfaces, and thus induce technological disinterest and technological exclusion. Therefore, interactivity and affective content presentations are, from the perspective of advanced user interfaces, two key factors in any application incorporating human-computer interaction (HCI). Furthermore, the perception and understanding of the information (meaning) conveyed is closely interlinked with visual cues and non-verbal elements that speakers generate throughout human-human dialogues. In this regard, co-verbal behavior provides information to the communicative act. It supports the speaker's communicative goal and allows for a variety of other information to be added to his/her messages, including (but not limited to) psychological states, attitudes, and personality. In the present paper, we address complexity and technological disinterest through the integration of natural, human-like multimodal output that incorporates a novel combined data- and rule-driven co-verbal behavior generator that is able to extract features from unannotated, general text. The core of the paper discusses the processes that model and synchronize non-verbal features with verbal features even when dealing with unknown context and/or limited contextual information. In addition, the proposed algorithm incorporates data-driven (speech prosody, repository of motor skills) and rule-based concepts (grammar, gesticon). The algorithm firstly classifies the communicative intent, then plans the co-verbal cues and their form within the gesture unit, generates temporally synchronized co-verbal cues, and finally realizes them in the form of human-like co-verbal movements. In this way, the information can be represented in the form of both meaningfully and temporally synchronized co-verbal cues with accompanying synthesized speech, using communication channels to which people are most accustomed.

COBISS.SI-ID: 19965206

2.

Large vocabulary continuous speech recognition of an inflected language using stems and endings

In this article, we focus on creating a large vocabulary speech recognition system for the Slovenian language. Currently, state-of-the-art recognition systems are able to use vocabularies with sizes of 20,000 to 100,000 words. These systems have mostly been developed for English, which belongs to a group of uninflectional languages. Slovenian, as a Slavic language, belongs to a group of inflectional languages. Its rich morphology presents a major problem in large vocabulary speech recognition. Compared to English, the Slovenian language requires a vocabulary approximately 10 times greater for the same degree of text coverage. Consequently, the difference in vocabulary size causes a high degree of OOV (out-of-vocabulary words). Therefore OOV words have a direct impact on recognizer efficiency. The characteristics of inflectional languages have been considered when developing a new search algorithm with a method for restricting the correct order of sub-word units, and to use separate language models based on sub-words. This search algorithm combines the properties of sub-word-based models (reduced OOV) and word-based models (the length of context). The algorithm also enables better search-space limitation for sub-word models. Using sub-word models, we increase recognizer accuracy and achieve a comparable search space to that of a standard word-based recognizer. Our methods were evaluated in experiments on a SNABI speech database.

COBISS.SI-ID: 11385366

3.

The impact of context on discourse marker use in two conversational genres

The relationships between text or talk and the context are among the basic fields of pragmatic research and an insight into their nature may contribute to a better understanding of language use. In this article, we use the results of an analysis of discourse marker use in two different conversational genres (telephone conversation and television interviews) in an attempt to examine the impact of context on the use of discourse markers, generalized for each analysed genre. In the first stage of the analysis, we observe important differences between the two genres: discourse markers are far more frequently used in telephone conversations than in television interviews. In the second stage of the analysis, we identify several contextual factors whichcontribute to the differences in the use of discourse markers. In this way, we obtain insight into this particular aspect of genre context-talk relationships, and identify some of the characteristics of the genres in question.

COBISS.SI-ID: 12612886

4.

Time and space-efficient architecture for a corpus-based text-to-speech synthesis system

This paper proposes a time and space-efficient architecture for a text-to-speech synthesis system (TTS). The proposed architecture can be efficiently used in those applications with unlimited domain, requiring multilingual or polyglot functionality. The integration of a queuing mechanism, heterogeneous graphs and finite-state machines gives a powerful, reliable and easily maintainable architecture for the TTS system. Flexible and language-independent framework efficiently integrates all those algorithms used within the scope of the TTS system. Heterogeneous relation graphs are used for linguistic information representation and feature construction. Finite-state machines are used for time and space-efficient representation of language resources, for time and space-efficient lookup processes, and the separation of language-dependent resources from a language-independent TTS engine. Its queuing mechanism consists of several dequeue data structures and is responsible for the activation of all those TTS engine modules having to process the input text. In the proposed architecture, all modules use the same data structure for gathering linguistic information about input text. All input and output formats are compatible, the structure is modular and interchangeable, it is easily maintainable and object oriented. The proposed archi-tecture was successfully used when implementing the Slovenian PLA TTOS corpus-based TTS system, as pre sen ted in this paper.

COBISS.SI-ID: 11323158

5.

Adjustment method for embedded metrology engine in an EM773 series microcontroller

This paper presents the problems of implementation and adjustment (calibration) of a metrology engine embedded in NXP%s EM773 series microcontroller. The metrology engine is used in a smart metering application to collect data about energy utilization and is controlled with the use of metrology engine adjustment (calibration) parameters. The aim of this research is to develop a method which would enable the operators to find and verify the optimum parameters which would ensure the best possible accuracy. Properly adjusted (calibrated) metrology engines can then be used as a base for variety of products used in smart and intelligent environments. This paper focuses on the problems encountered in the development, partial automatisation, implementation and verification of this method.

COBISS.SI-ID: 18658838

6.

TTS-driven synthetic behavior generation model for embodied conversational agents

This paper presents a novel TTSdriven nonverbal behaviour system for coverbal gesture synthesis. The system’s architecture and grammar, used to synchronize the nonverbal expressions with verbal information in symbolical and temporal domain, are presented in detail. The way how a visual representation of meaning can be selected, how the structure of its propagation can be generated as sequence movementphases (based on lexical affiliation and semiotic rules), and how movementphases and durations of movements can be aligned with the verbal content is also discussed. Finally, we explain how a procedural script is formed that drives the synchronized verbal and coverbal behaviour. The generated synthetic behaviour already reflects a very highdegree of lipsync and iconic, symbolic, and indexical expressions, as well as adaptors. As proven by the evaluation, most of the generated behaviour appears 'natural', and may adequately represent the verbal content.

COBISS.SI-ID: 17284886

7.

ACTLW - an action-based computation tree logic with unless operator

Model checkers for systems represented by labelled transition systems are not as extensively used as those for systems represented by Kripke structures. This is partially due to the lack of an elegant formal language for property specification which would not be as raw as, for example, HML yet also not as complex as, for example, -calculus. This paper proposes a new action-based propositional branching-time temporal logic ACTLW, which enhances popular computation tree logic (CTL) with the notion of actions in a similar but more comprehensive way than action-based CTL introduced by De Nicola and Vaandrager [R. De Nicola, F.W. Vaandrager, Action versus logics for transition systems, in: Semantics of Systems of Concurrent Processes, Proceedings LITP Spring School on Theoretical Computer Science, LNCS 469, 1990, pp. 407-419]. ACTLW is defined by using temporal operators until and unless only, whereas all other temporal operators are derived from them. Fixed-point characterisation of the operators together with symbolic algorithms for globalmodel checking are shown. Usage of this new logic is illustrated by an example of verification of mutual-exclusion algorithms.

COBISS.SI-ID: 12047638

8.

An expressive conversational-behavior generation model for advanced interaction within multimodal user interfaces

The authors introduce a flexible and efficient algorithm and a novel system used for the planning, generation, and realization of conversational behavior (co-verbal behavior). Such behavior is best described as a set of moving body parts, which are meaningful. In terms of prosody, it is synchronized with the accompanying speech. The movement and shapes generated as a co-verbal behavior represent a contextual link between a repertoire of independent motor skills (shapes, movements, and poses that conversational agent can reproduce and execute), and the intent/meaning of spoken sequences (context). The actual intent/meaning of spoken content is identified through language-dependent linguistic markers and prosody. The knowledge databases used to determine the intent/meaning of text are based on the linguistic analysis and classification of the text into semiotic classes and subclasses achieved through annotation of multimodal corpora based on the proposed EVA annotation scheme. The scheme allows for capturing features at a functional (context- dependent), as well as at a descriptive (contextindependent) level. The functional level captures high-level features that describe the correlation between speech and co-verbal behavior, whereas the descriptive level allows us to capture and define body-poses and shapes independently of verbal content and in high-resolution. The annotation scheme, therefore, not only interlinks speech and gesture at a semiotic level, but also serves as a basis for the creation of a context independent repertoire of movement and shapes.

COBISS.SI-ID: 19378454

9.

Modelling medium access control in IEEE 802.15.4 nonbeacon-enabled networks with probabilistic timed automata

This paper concerns the formal modelling of medium access control in nonbeacon-enabled IEEE 802.15.4 wireless personal area networks with probabilistic timed automata supported by the PRISM probabilistic model checker. In these networks, the devices contend for the medium by executing anunslotted carrier sense multiple access with collision avoidance algorithm. In the literature, a model of a network which consists of two stations sendingdata to two different destination stations is introduced. We have improved this model and, based on it, we propose two ways of modelling a network with an arbitrary number of sending stations, each having its own destination. We show that the same models are valid representations of a starshaped network with an arbitrary number of stations which send data to thesame destination station. We also propose how to model such a network if some of the sending stations are not within radio range of the others, i.e. ifthey are hidden. We present some results obtained for these models by probabilistic model checking using PRISM.

COBISS.SI-ID: 16807958

10.

Self-adaptive differential evolution algorithm using population size reduction and three strategies

Many real-world optimization problems are largescale in nature. In order to solve these problems, an optimization algorithm is required that is able to apply a global search regardless of the problemsć particularities. This paper proposes a self-adaptive differential evolution algorithm, called jDElscop, for solving large-scale optimization problems with continuous variables. The proposed algorithm employs three strategies and a population size reduction mechanism. The performance of the jDElscop algorithm is evaluated on a set of benchmark problems provided for the Special Issue on the Scalability of Evolutionary Algorithms and other Metaheuristics for Large Scale Continuous Optimization Problems. Nonparametric statistical procedures were performed formultiple comparisons between the proposed algorithm and three wellknown algorithms from literature. The results show that the jDElscop algorithm can deal with large-scale continuous optimization effectively. It also behaves significantly better than other three algorithms used in the comparison, in most cases.

COBISS.SI-ID: 14398230

P2-0069 — Final report

1.