1.

Learning part-based spatial models for laser-vision-based room categorization

One of the major goals of the project was to develop a robust incremental method for learning object representations. In this paper we applied the developed method to the problem of room categorisation. We presented a new approach for room categorization that is based on 2D laser range data. The method is based on a novel spatial model consisting of mid-level parts that are built on top of a low-level part-based representation. The approach is then fused with a vision-based method for room categorization, which is also based on a spatial model consisting of mid-level visual-parts. The approach is based on a new discriminative dictionary learning technique that is applied for part-dictionary selection in both laser-based and vision-based modalities. Finally, we presented a comparative analysis between laser-based, vision-based, and laser-vision-fusion-based approaches in a uniform part-based framework that was evaluated on a large dataset with several categories of rooms from the domestic environments. The results showed that the proposed algorithm for learning representations produces very good results.

COBISS.SI-ID: 1537424323

2.

An integrated system for interactive continuous learning of categorical knowledge

In this paper, we summarise the work that we had initiated before this project started, and which we finished in the framework of this project. The goal of the research was development of an intelligent robot capable of learning representations of objects and their properties in a natural dialogue with a human teacher. We present representations and mechanisms that facilitate continuous learning of visual concepts in dialogue with a tutor and show as well as evaluate the implemented robot system. We present how beliefs about the world are created by processing visual and linguistic information and show how they are used for planning system behaviour with the aim of satisfying its internal drive to extend its knowledge. The system facilitates different kinds of learning initiated by the human tutor or by the system itself. We demonstrate and experimentally evaluate these principles in the case of a robot capable of learning about object visual properties.

COBISS.SI-ID: 1536908227

3.

Adding discriminative power to a generative hierarchical compositional model using histograms of compositions

In this paper we identified problems with excessive feature sharing and the lack of discriminative learning in hierarchical compositional models. We proposed to overcome those issues by fully utilizing discriminative features already present in the generative models of hierarchical compositions. We formed a discriminative descriptor from existing library of parts and added it at the top of the compositional model for discriminative verification of generatively-generated compositional proposals of object regions. Our approach was evaluated on several visual object classification and detection tasks and resulted in improved performance compared to classic hierarchical compositions. We also demonstrated that our approach resulted in the state-of-the-art performance in partial occlusions compared to the convolutional neural networks. Results of this research have been demonstrated mostly on general tasks of visual object recognition and detection, and have also formed a basis for all our further research in the topic of hierarchical compositions and deep networks.

COBISS.SI-ID: 1536363971

4.

Deformable parts correlation filters for robust visual tracking

We present a new formulation of the constellation model with correlation filters that treats the geometric and visual constraints within a single convex cost function and derive a highly efficient optimization for maximum a posteriori inference of a fully connected constellation. We propose a tracker that models the object at two levels of detail. The coarse level approximately localizes the object, while the mid-level representation carries out fine localization. The model is capable of adapting to the target aspect change and partial occlusion. The resulting tracker is rigorously analyzed on a highly challenging OTB, VOT2014, and VOT2015 benchmarks, exhibits a state-of-the-art performance and runs in real-time. The tracker was used to provide a temporal context to improve the detection of objects, particularly traffic signs. We have developed a three-stage method that detects traffic signs in the first stage using the Faster R-CNN method, then, in the second stage, it tracks the detections with the developed tracker, and connects the individual detections to each other and finally verifies them in the third step and merges multiple detection hypotheses into the final detection result. Using this approach we were able to improve the detection performance. The method and the results were published in article [12]. The first author of the article joined the project as a student and has developed into an extremely promising researcher; already in the second year of the doctoral study, he has published two publications in journals with impact factors greater than 7, as well as other high quality publications.

COBISS.SI-ID: 1537625283

5.

Towards deep compositional networks

In this paper we identified the lack of explicit structure as an important drawback in deep convolutional networks and the lack of a well-defined discriminative cost function as important drawback in hierarchical compositions. We proposed to address both issues with a novel analytic model of a basic unit in a layered hierarchical model with both explicit compositional structure and a well-defined discriminative cost function. Proposed model was applied to object classification tasks where we showed improved inference time. Furthermore, proposed model allowed us to perform novel visualization of the structure of deep networks features which resulted in new understanding of features learned by deep networks. This work formed the basis for our further research in combining properties of deep networks and compositional hierarchies which resulted in a novel deep network architectures with spatially-adaptive receptive fields. This model provided significant improvements in deep convolutional networks for semantic segmentation both in terms of classification accuracy and in terms of faster inference. Resulting work [16] was accepted for publication at CVPR 2018, the most prestigious conference in the field of computer vision research.

COBISS.SI-ID: 1537308611

L2-6765 — Final report

1.

Learning part-based spatial models for laser-vision-based room categorization

2.

An integrated system for interactive continuous learning of categorical knowledge

3.

Adding discriminative power to a generative hierarchical compositional model using histograms of compositions

4.

Deformable parts correlation filters for robust visual tracking

5.

Towards deep compositional networks