1.

Using ensembles of trees for hierarchical multilabel classification for predicting gene function

We have developed a method for learning tree ensembles for hierarchical multi-label classification. We have used it for gene function prediction in three different organisms: S. cerevisiae, A. thaliana and M. musculus. The results show that our method is as accurate as state-of-the-art methods for automatic gene function prediction, but has a much lower time complexity.

F.02 Acquisition of new scientific knowledge

COBISS.SI-ID: 23480359

2.

Inductive databases and constraint-based data mining

We have edited a book on inductive databases and constraint-based data mining, which introduces this research area and gives an overview of recent research. Inductive databases are of great importance for the integrative analysis of data in general and for systems biology in particular. Besides data, inductive databases contain patterns, which are generated through inductive queries. The book contains several chapters authored by the team of this project, covering topics such as ontologies for data mining, constrained clustering, predicting gene function and analyzing micro-array data.

C.01 Editorial board of a foreign/international collection of papers/book

COBISS.SI-ID: 24215079

3.

Feature ranking for biomarker discovery

We have developed several methods for feature ranking and applied them to practical problems of biomarker discovery. These include a method for ranking in the context of predicting structured outputs, such as multiple targets, methods for evaluating rankings, and methods for aggregating rankings. We have applied these methods to discover biomarkers for neuroblastoma, a type of embrional tumors, and the neurodegenerative Huntington's disease.

F.02 Acquisition of new scientific knowledge

COBISS.SI-ID: 24222247

4.

Chairing the program committees and organization of the Third and Fourth International Workshops on Machine Learning in Systems Biology (MLSB-09, -10)

We organized the Third and Fourth Workshop on Machine Learning in Systems Biology (MLSB-09, MLSB-10) in Ljubljana and Edinburgh. We also co-chaired the program committees for both events. The workshop is a highly reputed event with high quality invited speakers and reviewed contributions that attracts more than 60 participants. Papers presented at the workshop were published in proceedings. At the workshops, we presented the results of our work in this project and the EU projects EETP and PHAGOSYS.

B.02 Presiding over the programming board of a conference

COBISS.SI-ID: 24513831

5.

Analysis of time series data on agroecosystem vegetation

Using the k-medoids clustering algorithm and the dynamic time warping distance between time series, we clustered the time course profiles of oilseed rape cover crop. The clustering revealed five typical clusters of crop cover profiles that differed in terms of rate of increase, lag phase and maximum value, but were largely independent of the type of crop (winter/spring oil seed rape) and the weed management regime. We then constructed predictive clustering trees (a generalized form of decision trees) that predict the weed cover profile (time series) from independent (input) variables that include the crop cover cluster, other crops descriptors and environmental variables. The approach was successful in identifying the inter-dependencies between the weed and crop type of vegetation.

F.02 Acquisition of new scientific knowledge

COBISS.SI-ID: 24218407

J2-2285 — Final report

1.

Using ensembles of trees for hierarchical multi­label classification for predicting gene function

2.

Inductive databases and constraint-based data mining

3.

Feature ranking for biomarker discovery

4.

Chairing the program committees and organization of the Third and Fourth International Workshops on Machine Learning in Systems Biology (MLSB-09, -10)

5.

Analysis of time series data on agroecosystem vegetation

Using ensembles of trees for hierarchical multilabel classification for predicting gene function