Analysis of heterogeneous information networks for knowledge discovery in life-sciences

Code

J7-7303 (C) - included in ARIS records

Head

PhD Nada Lavrač

Period

1/1/2016 - 12/31/2018

Range in 2018

1.63 FTE

Science

Natural sciences and mathematics (2)
Engineering sciences and technologies (8)
Biotechnical sciences (3)
Humanities (3)

Reseacher status

Researcher (16)
Junior expert or technical associate (0)

Education

Doctoral degree (15)
Other (1)

Sex

Woman (10)
Man (6)

Status

Employed at RO and RRD (10)
No data on employment in RO (5)
Retired (1)

No. of publications

10–99 (6)
100–999 (9)
1,000–9,999 (1)

Projects / Programmes source: ARIS

source: ARIS

Analysis of heterogeneous information networks for knowledge discovery in life-sciences

Research activity

Code	Science	Field	Subfield
7.00.00	Interdisciplinary research

Code	Science	Field
P176	Natural sciences and mathematics	Artificial intelligence

Code	Science	Field
1.02	Natural Sciences	Computer and information sciences

Keywords

Data mining, knowledge discovery, semantic data mining, workflows, heteregenous networks, plant immune signalling

Evaluation (rules)

source: COBISS

Evaluation of bibliographic research performance indicators according to ARIS methodology

Citations Citations for bibliographic records in COBIB.SI that are linked to records in citation databases

source: WoS

source: Scopus

source: COBISS

Researchers (16)

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	19116	PhD Špela Baebler	Biotechnology	Researcher	2016 - 2018	320
2.	34130	PhD Anna Coll Rius	Biochemistry and molecular biology	Researcher	2016 - 2018	169
3.	12688	PhD Kristina Gruden	Biotechnology	Researcher	2016 - 2018	1,001
4.	36355	PhD Jan Kralj	Communications technology	Researcher	2017 - 2018	39
5.	08949	PhD Nada Lavrač	Computer science and informatics	Head	2016 - 2018	873
6.	50070	PhD Matej Martinc	Linguistics	Researcher	2017 - 2018	87
7.	36836	PhD Biljana Mileva Boshkoska	Computer science and informatics	Researcher	2017 - 2018	162
8.	36912	PhD Dragana Miljković	Computer science and informatics	Researcher	2016 - 2018	71
9.	35475	PhD Matic Perovšek	Computer science and informatics	Researcher	2016	15
10.	29539	PhD Vid Podpečan	Computer science and informatics	Researcher	2016 - 2018	106
11.	31844	PhD Senja Pollak	Linguistics	Researcher	2016 - 2018	302
12.	18467	PhD Maruša Pompe Novak	Biotechnology	Researcher	2016 - 2018	298
13.	34502	PhD Živa Ramšak	Biology	Researcher	2016 - 2018	123
14.	37679	Andraž Repar	Linguistics	Researcher	2018	34
15.	04586	PhD Tanja Urbančič	Computer science and informatics	Researcher	2018	290
16.	34262	PhD Anže Vavpetič	Computer science and informatics	Researcher	2016 - 2017	30

Organisations (2)

no.	Code	Research organisation	City	Registration number	No. of publicationsNo. of publications
1.	0105	National Institute of Biology	Ljubljana	5055784	13,510
2.	0106	Jožef Stefan Institute	Ljubljana	5051606000	91,734

Abstract

The proposal addresses knowledge discovery in complex data mining scenarios in life-sciences. With the development of high-throughput molecular biology techniques the data generated are getting into the range of so-called Big Data. Information relevant to a certain biological question is scattered in different public resources in heterogeneous formats and in the form inaccessible to typical biologists. To circumvent this situation, we need to fuse this information into a unique data source to be mined. The aim of the proposed project is to develop, implement, evaluate and apply a new methodology for analyzing large heterogeneous data in the area of life-sciences. The development of the proposed methodology is motivated by a tremendous increase in data generation within life-sciences research, while the means for explanatory knowledge discovery from these large heterogeneous data sources is still lagging behind. We aim to improve the existing data analysis approaches by extending and combining text mining, relational data mining and information fusion methods. In order to evaluate the proposed methodology we will use several benchmark and real-world problems in the area of life-sciences, aiming to advance translational research in agriculture by extracting novel knowledge on plant immune signaling. The project has the following objectives: 1. Development of a new methodology, which will enable fusing texts and complex relational background knowledge into the form of a large heterogeneous information network. This will be achieved by extending our own methodology for mining heterogeneous information networks through contextualizing the information on data instances in terms of available semantic background knowledge (domain taxonomies and ontologies), and by adapting the methodology to big data and complex life-science scenarios. 2. Implementation of the methodology in the ClowdFlows or TextFlows and experimental evaluation of the proposed methodology on publicly available benchmark data sets, including selected medical problems for which large public heterogeneous data sets exist. 3. Application of the methodology to three life-science application scenarios: (i) cross-domain knowledge discovery from documents from two unrelated life-science problems, aiming to uncover yet unknown relations between "redox status" and "plant immune signaling", (ii) mining a time stamped stream of heterogeneous experimental data in the domain of plant immune signaling, and (iii) identification of key components in plant immune signaling determining the outcome of a disease. The project will contribute to the development of new algorithms for mining large heterogeneous data. Accessibility of the developed methodology will be ensured by implementing the methodology in one of our web data mining platforms ClowdFlows or TextFlows, which will enable the use of the developed technology to the broader research audience and increase its relevance also for life science experts. The research will be performed in close collaboration of data mining experts from JSI with domain experts from NIB.

Significance for science

This project addresses the open problem of assisting scientists with the increasingly daunting task of heterogeneous and distributed information fusion and knowledge discovery. Solving this problem requires the development of a new computational paradigm that integrates ideas from different supporting domains. An adequate solution to this problem will result in new technologies that are relevant to a range of applications, some of which are also mentioned in the EU FP7 ICT work programme, such as Challenge 4 on Content and Challenge 5 on Healthcare. It covers issues such as knowledge management and creation, but goes beyond them in assisting users (particularly scientists) in knowledge discovery across distributed information repositories.

The project will advance the state-of-the-art by developing a framework for mining heterogeneous information networks, new data mining algorithms and a new approach to interactively formulate and refine powerful knowledge discovery workflows. Evidently, the proposed project solves an open problem and it is clearly pursing a long term objective with a high technological potential.

Successful results of the MinHIN project can contribute to Europe’s knowledge industry enabling it to become more effective, efficient and competitive. The challenges addressed by the MinHIN project cannot be adequately addressed with existing ICT methodologies or their incremental improvements since the methods developed within MinHIN will be substantially different from existing information fusion and knowledge discovery technologies and will require the collaboration of scientists with diverse backgrounds to tackle challenges in innovative information fusion, data mining, distributed information retrieval, and sophisticated user interfaces. A successful outcome of the project may have, firstly, a significant impact on the data mining technology and on science, and in a longer term, when adapted to knowledge discovery, also a considerable impact on the ability of Europe’s private and public sector in public data analysis.

The proposed MinHIN project has the potential to implement and demonstrate a paradigm shift in information and knowledge management, discovery, fusion and understanding. The MinHIN prototype will establish a strong scientific and technological basis for a broader, interdisciplinary research community as well as help cultivating the underlying methodologies to a level at which it can attract investment from industry, especially in the pharmaceutical and biotechnology sector.

Significance for the country

Since the project aims at analysis of heterogeneous information networks of potato the project results will directly influnce food industry. Potato is currently the third most important food crop world-wide. It produces high amounts of non-allergic vegetable proteins per hectare and contains many vitamins and health promoting compounds and has thus an increasing significance in the developing world as food crop. Yet its production is currently not optimal due to the high input costs during cultivation needed to achieve appropriate yield and susceptibility to biotic and abiotic factors. EU potato industry is very competitive and is continuously gaining shares worldwide. Hundreds of cultivars are used, many with close cultural and regional ties. In Slovenia, in the 80s, the PVY epidemic completely eliminated sensitive, but at that time leading, potato cultivars which virtually terminated Slovenian seed potato production. Currently there are only a few completely resistant cultivars, but their growing is problematic due to specific Slovenian climate as well as from the perspective of genetic diversity. The research findings of the proposed project will be a basis for precision breeding of environment resilient cultivars.

Most important scientific results

Interim report, final report

Most important socioeconomically and culturally relevant results

Interim report, final report