Abstract. The main goal of reporting in the financial system is to ensure high quality and useful information about the financial position of firms, and to make it available to a wide range of users, including existing and potential investors, financial institutions, employees, the government, etc. Formal reports contain both strictly regulated, financial sections, and unregulated, narrative parts. Our research starts from the hypothesis that there is a relation between business performance and not only content, but also the linguistic properties of unregulated parts of annual reports. In the paper we first present our dataset of financial reports and the techniques we used to extract the unregulated textual parts. Next, we introduce our approaches of differential content analysis and analysis of correlation with financial aspects. The differential content analysis is based on TF-IDF weighting and is aimed at finding the characteristic terms for each year (i.e. the terms which were not prevailing in the previous reports by the same firm). For correlation of linguistic characteristics of reports with financial aspects, an array of linguistic features was considered and selected financial indicators were used. Linguistic features range from measurements, such as personal/impersonal pronouns ratio, to assessments of characteristics like financial sentiment, trust, doubt, and discursive features expressing certainty, modality, etc. While some features show strong correlation with industry (e.g., shorter and more personal reports by IT industry compared to automotive industry), doubt, communication – as well as necessity and cognition words to some extent – are positively correlated with failure.
COBISS.SI-ID: 24345318
The aim of this work is to reproduce the approach to detecting semantic orientations in economic texts that was presented in the paper Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts by Malo et al. The approach employs the Linearized Phrase Structure model for sentence level classification of short economic texts into a positive, negative or neutral category from investor’s perspective and yields state-of-the-art results. The proposed method employs both rule based linguistic models and machine learning. Where possible we follow the same approach as described in the original paper, with some documented modifications. Our solution is simplified in at least two aspects, but its performance is comparable to the original and overall remains better than the reported results of other benchmark algorithms mentioned in the original paper. The differences between the two models and results are described in detail and lead to conclusion that the original approach is to a large extent repeatable and that our simplified version does not overly sacrifice performance for generalizability.
COBISS.SI-ID: 31308327
We present initial investigations for a diachronic study of lexical changes in financial reporting, looking at methods suitable for analysing semantic associations between financial terms and how these change across time. Our corpus consists of US 10-K annual reports of 30 companies included in the Dow Jones Industrial Average stock index over the years 1996-2015. We grouped the reports by the reported fiscal year and derived word embedding models for each year using both GloVe and a count-based PPMI method; these vectors were then used to calculate cosine similarity between pairs of words. We expect the resulting diachronic patterns of lexical contexts of financial terms to vary with the economic cycle; here we select pairs of terms with strong increasing association over time (e.g. dividend and shareholder) or strong decreasing association over time (e.g. dividend and gain), and suggest some qualitative explanations for these changes due to the economic crisis.
COBISS.SI-ID: 31366439
In this paper we present experimental assessment of a dynamic adaptation of an approach for sentiment classification of tweets. Specifically, this approach enables a dynamic adaptation of the parameters used for three-class classification with a binary SVM classifier. The approach is suited for incremental active learning scenarios in domains with frequent concept alterations and changes. Our target application is in domain of finance and the assessment is partially domain-specific, but the approach itself is not limited to a particular domain.
COBISS.SI-ID: 29892903
In this paper we present a study on expressions of trust and doubt in financial tweets and official periodic reports of companies. We use the trust and doubt wordlists that we created and analyze the presence of trust and doubt terms in both textual collections after some domain-specific text processing. In tweets, we have found that doubt is more frequently expressed than trust and forms higher peaks. Next, we have analyzed the relation between the filing dates of reports and the peaks in financial tweets with regard to their overall volume, trust tweets volume and doubt tweets volume. The analysis indicates that the Twitter community reacts more often to the quarterly than yearly reports and that the peaks are usually at the day of report, not before or after. As a result of corresponding analysis of textual content in annual reports, we present the frequencies of different trust/doubt terms in these reports and indicate some notable differences among their use by different companies
COBISS.SI-ID: 31366695