1.

Learning to predict superquadric parameters from depth images with explicit and implicit supervision

Reconstruction of 3D space from visual data has always been a significant challenge in the field of computer vision. A popular approach to address this problem can be found in the form of bottom-up reconstruction techniques which try to model complex 3D scenes through a constellation of volumetric primitives. Such techniques are inspired by the current understanding of the human visual system and are, therefore, strongly related to the way humans process visual information, as suggested by recent visual neuroscience literature. While advances have been made in recent years in the area of 3D reconstruction, the problem remains challenging due to the many possible ways of representing 3D data, the ambiguity of determining the shape and general position in 3D space and the difficulty to train efficient models for the prediction of volumetric primitives. In this article, we address these challenges and present a novel solution for recovering volumetric primitives from depth images. Specifically, we focus on the recovery of superquadrics, a special type of parametric models able to describe a wide array of 3D shapes using only a few parameters. We present a new learning objective that relies on the superquadric (inside-outside) function and develop two learning strategies for training convolutional neural networks (CNN) capable of predicting superquadric parameters. The first uses explicit supervision and penalizes the difference between the predicted and reference superquadric parameters. The second strategy uses implicit supervision and penalizes differences between the input depth images and depth images rendered from the predicted parameters. CNN predictors for superquadric parameters are trained with both strategies and evaluated on a large dataset of synthetic and real-world depth images. Experimental results show that both strategies compare favourably to the existing state-of-the-art and result in high quality 3D reconstructions of the modelled scenes at a much shorter processing time.

COBISS.SI-ID: 45630467

2.

Evaluation and analysis of ear recognition models : performance, complexity and resource requirements.

The article describes detailed analysis of the state-of-the-art approaches to ear recognition, which additionally confirms the importance of the authors within the field. This is proved also by the competition paper about the same topic from 2019 [COBISS.SI-ID 1538531011], which we organised at a prestige biometrics conference.

COBISS.SI-ID: 1537788099

3.

Performance evaluation methodology for long-term single-object tracking

A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Our new performance measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize the short-term performance measures, thus linking the two tracking problems. Furthermore, the new measures are highly robust to temporal annotation sparsity and allow annotation of sequences hundreds of times longer than in the current datasets without increasing manual annotation labor. A new challenging dataset of carefully selected sequences with many target disappearances is proposed as well. The new methodology and the dataset have become part of the largest visual object tracking challenge in computer vision, VOT. This work was part of the contribution awarded by the Slovenian research agency ARRS as an excellent scientific achievement in 2020. In addition, ARRS has awarded our work on discriminative correlation filters for tracking as an excellent scientific achievement in 2019.

COBISS.SI-ID: 1538564803

4.

Reconstruction by inpainting for visual anomaly detection

We proposed a novel reconstruction-based method for unsupervised visual anomaly detection. We addressed the issue of generalisation, common in other reconstruction-based anomaly detection methods, by forming the reconstruction as an iterative inpainting process. As the reconstruction of an anomalous region is not conditioned on the corresponding input anomalous pixels, the likelihood of accurate reconstruction of the anomaly is low. The anomalies can be detected by comparing the input image and its reconstruction with an image similarity method. The approach uses multi-scale image gradient similarity as an image similarity method and training loss that is more robust to random pattern regions than distance measures that were previously used in anomaly detection. This reduces the amount of false-positive detection on random-pattern regions. The proposed approach improved the state-of-the-art results on the widely used MVTec anomaly-detection dataset and also achieves excellent results on commonly used datasets for video anomaly detection. The resulting paper was published in Pattern Recognition (IF: 7.196). As the method does not require annotated anomaly samples, it is especially useful in cases where the acquisition of anomalous samples is unfeasible.

COBISS.SI-ID: 49664003

5.

Spatially-adaptive filter units for compact and efficient deep neural networks

This paper proposed a novel displaced aggregation unit (DAU) for deep convolutional networks, which introduces novel compositional properties into the deep models. In contrast to classical filters with units (pixels) placed on a fixed regular grid, the displacement of the DAUs are learned, which resulted in deep networks with novel properties, such as decoupling of the parameters from the receptive field, learning of the receptive field sizes and automatic adjustment of the spatial focus of features. Those properties resulted in more efficient deep networks with a fewer number of operations and parameters, and also enabled novel analysis of the parameters and the spatial coverage of features. The strength of DAUs was extensively demonstrated on classification, semantic segmentation and blind image deblurring tasks. Results showed that DAUs efficiently allocate parameters resulting in up to 4-times more compact networks in terms of the number of parameters at a similar or better performance. The proposed method is, therefore, suitable for modelling visual information in images in fully-convolutional models for segmentation as well as for generative models, since it can adapt the size of receptive fields to the content of the images, i.e. to the consistency of the content of the image set, much easier than the standard convolution.

COBISS.SI-ID: 1538492611

P2-0214 — Interim report

1.

Learning to predict superquadric parameters from depth images with explicit and implicit supervision

2.

Evaluation and analysis of ear recognition models : performance, complexity and resource requirements.

3.

Performance evaluation methodology for long-term single-object tracking

4.

Reconstruction by inpainting for visual anomaly detection

5.

Spatially-adaptive filter units for compact and efficient deep neural networks