1.

Methods for finding and explaining groups of genes with similar expression timecourse profiles

We have developed a method for predicting time series (values of a continuous variable), based on predictive clustering trees. The method can be used to identify groups of examples with similar temporal profiles and at the same time provides a description for each of the groups. We have used the method to identify groups of yeast genes that respond similarly to various kinds of environmental stress, and to explain the groups in terms of gene annotations with terms from the Gene Ontology.

COBISS.SI-ID: 23488807

2.

A data mining ontology

We have developed the OntoDM ontology of data mining. It represents entities such as data, data mining tasks and algorithms, and generalizations (resulting from the latter). OntoDM covers much of the diversity in data mining research, including recently developed approaches to mining structured data and constraint-based data mining. In contrast to other ontologies of data mining, OntoDM is a deep ontology and is compliant to best practices in ontology engineering.

COBISS.SI-ID: 24216359

3.

Learning multi-target trees from massive or streaming data

We have developed methods for learning trees for single and multi-target regression from massive or streaming data. To our knowledge, no other methods for structured prediction on streaming (or massive) data have been proposed so far. The methods can be used for analyzing very large datasets, such as those generated by highthroughput omics techniques in the area of systems biology.

COBISS.SI-ID: 24647719

4.

Autocorrelation in predictive clustering

We developed a method that explicitly takes into account spatial and network autocorrelation in data that are not independently and identically distributed (i.i.d.) and provides a multilevel insight into the autocorrelation phenomenon. The method is based on the concept of predictive clustering trees (PCTs) and works for different predictive modeling tasks, including classification and regression, as well as some clustering tasks. We applied this method to several real world problems of spatial regression and classification, as well as problems of network regression coming from the areas of social and spatial networks.

COBISS.SI-ID: 26073895

5.

Hierarchical annotation of medical images

We propose the use of random forests and bagging of predictive clustering trees in the domain of medical image annotation with labels organized into a hierarchy. The experiments show that ensembles of predictive clustering trees perform consistently better than SVMs. Second, SIFT descriptors are the most discriminative. Next, combinations of several descriptors improve the predictive performance of the classifiers. Finally, the results of the annotation of the considered image database are the best results reported so far both in the literature and at image annotation competitions.

COBISS.SI-ID: 24848423

J2-2285 — Final report

1.

Methods for finding and explaining groups of genes with similar expression time­course profiles

2.

A data mining ontology

3.

Learning multi-­target trees from massive or streaming data

4.

Autocorrelation in predictive clustering

5.

Hierarchical annotation of medical images

Methods for finding and explaining groups of genes with similar expression timecourse profiles

Learning multi-target trees from massive or streaming data