The paper presents a methodology for analyzing time series of gene expression data collected from the leaves of potato virus Y (PVY) infected and non-infected potato plants. We aim at identifying differentially-expressed genes whose expression values are statistically significantly different in the set of PVY infected potato plants compared to non-infected plants, and which demonstrate also statistically significant changes of expression values of genes of PVY infected potato plants in time. The novelty of the approach includes stratified data randomization used in estimating the statistical properties of gene expression of the samples in the control set of non-infected potato plants. A novel estimate that computes the relative minimal distance between the samples has been defined that enables reliable identification of the differences between the target and control datasets when these sets are small. The relevance of the outcomes is demonstrated by visualizing the relative minimal distance of gene expression changes in time for three different types of potato leaves for the genes that have been identified as relevant by the proposed methodology.
COBISS.SI-ID: 32031015
This paper presents a new approach to discovering dependencies between different biological domains based on copula analysis of literature mining results. More specifically, we have explored dependencies between literature from the domains of plant defence response and redox potential. Copula analysis of triplets, which are extracted by Bio3graph tool, shows that dependencies exist between these two domains indicating a potential for cross-domain literature exploration. Bio3graph is a rule-based natural language processing tool which extracts relations in the form (subject, predicate, object) triplets. It is publicly available at http://ropot.ijs.si/bio3graph/software/. Copula analysis was performed by using Clayton and Frank fully nested copulas and the software is publicly available at: http://source.ijs.si/bmileva/copulasfordexapps.git.
COBISS.SI-ID: 2048463379
This paper presents Community-Based Semantic Subgroup Discovery (CBSSD), a novel approach that advances ontology-based subgroup identification by exploiting the structural properties of induced complex networks related to the studied phenomenon. Following the idea of multi-view learning, using different sources of information to obtain better models, the CBSSD approach can leverage different types of nodes of the induced complex network, simultaneously using information from multiple levels of a biological system. The approach was tested on ten data sets consisting of genes related to complex diseases, as well as core metabolic processes. The experimental results demonstrate that the CBSSD approach is scalable, applicable to large complex networks, and that it can be used to identify significant combinations of terms, which can not be uncovered by contemporary term enrichment analysis approaches.
COBISS.SI-ID: 32057639
Literature-based discovery tools have been often used to overcome the problem of fragmentation of science and to assist researchers in their process of cross-domain knowledge discovery. In this paper we propose a methodology for cross-domain literature-based discovery that focuses on outlier documents to reduce the search space of potential cross-domain links and to improve search efficiency. In a previous study, literature mining tools OntoGen for document clustering and CrossBee for cross-domain bridging term exploration were combined to search for hidden relations in scientific papers from two different domains of interest, where the utility of the approach was demonstrated in a study involving PubMed papers about Alzheimer’s disease and gut microbiome. This paper extends the approach by proposing a methodology, implemented as a repeatable workflow in a web-based text mining platform TextFlows, which enables easy access and execution of the methodology for the interested researcher.
COBISS.SI-ID: 30497575
The paper presents an approach to mining heterogeneous information networks by decomposing them into homogeneous networks. The proposed HINMINE methodology is based on previous work that classifies nodes in a heterogeneous network in two steps. In the first step the heterogeneous network is decomposed into one or more homogeneous networks using different connecting nodes. We improve this step by using new methods inspired by weighting of bag-of-words vectors mostly used in information retrieval. In the second step, the resulting homogeneous networks are used to classify data either by network propositionalization or label propagation. We propose an adaptation of the label propagation algorithm to handle imbalanced data and test several classification algorithms in propositionalization. The new methodology is tested on three data sets with different properties. Our results show that HINMINE, using different network decomposition methods, can significantly improve the performance of the resulting classifiers, and also that using a modified label propagation algorithm is beneficial when the data set is imbalanced.
COBISS.SI-ID: 30214439