Dimitris Dimitriadis

Language Processing R&D

Full Stack Web Developer

Dimitris Dimitriadis

Language Processing R&D

Full Stack Web Developer

Academic Articles
Journals
Enhancing yes/no question answering with weak supervision via extractive question answering
2023
Enhancing yes/no question answering with weak supervision via extractive question answering
D. Dimitriadis, G. Tsoumakas

The effectiveness of natural language processing models relies on various factors, including the architecture, number of parameters, data used during training, and the tasks they were trained on. Recent studies indicate that models pre-trained on large corpora and fine-tuned on task-specific datasets, covering multiple tasks, can generate remarkable results across various benchmarks. We propose a new approach based on a straightforward hypothesis: improving model performance on a target task by considering other artificial tasks defined on the same training dataset. By doing so, the model can gain further insights into the training dataset and attain a greater understanding, improving efficiency on the target task. This approach differs from others that consider multiple pre-existing tasks on different datasets. We validate this hypothesis by focusing on the problem of answering yes/no questions and introducing a multi-task model that outputs a span of the reference text, serving as evidence for answering the question. The task of span extraction is an artificial one, designed to benefit the performance of the model answering yes/no questions. We acquire weak supervision for these spans, by using a pre-trained extractive question answering model, dispensing the need for costly human annotation. Our experiments, using modern transformer-based language models, demonstrate that this method outperforms the standard approach of training models to answer yes/no questions. Although the primary objective was to enhance the performance of the model in answering yes/no questions, it was discovered that span texts are a significant source of information. These spans, derived from the question reference texts, provided valuable insights for the users to better comprehend the answers to the questions. The model’s improved accuracy in answering yes/no questions, coupled with the supplementary information provided by the span texts, led to a more comprehensive and informative user experience.

Link

Artificial fine-tuning tasks for yes/no question answering
2022
Artificial fine-tuning tasks for yes/no question answering
D. Dimitriadis, G. Tsoumakas

Current research in yes/no question answering (QA) focuses on transfer learning techniques and transformer-based models. Models trained on large corpora are fine-tuned on tasks similar to yes/no QA, and then the captured knowledge is transferred for solving the yes/no QA task. Most previous studies use existing similar tasks, such as natural language inference or extractive QA, for the fine-tuning step. This paper follows a different perspective, hypothesizing that an artificial yes/no task can transfer useful knowledge for improving the performance of yes/no QA. We introduce three such tasks for this purpose, by adapting three corresponding existing tasks: candidate answer validation, sentiment classification, and lexical simplification. Furthermore, we experimented with three different variations of the BERT model (BERT base, RoBERTa, and ALBERT). The results show that our hypothesis holds true for all artificial tasks, despite the small size of the corresponding datasets that are used for the fine-tuning process, the differences between these tasks, the decisions that we made to adapt the original ones, and the tasks’ simplicity. This gives an alternative perspective on how to deal with the yes/no QA problem, that is more creative, and at the same time more flexible, as it can exploit multiple other existing tasks and corresponding datasets to improve yes/no QA models.
Link

Semantic Indexing of 19th-Century Greek Literature Using 21st-Century Linguistic Resources
2021
Semantic Indexing of 19th-Century Greek Literature Using 21st-Century Linguistic Resources
D. Dimitriadis, S. Zapounidou, G. Tsoumakas

Manual classification of works of literature with genre/form concepts is a time-consuming
task requiring domain expertise. Building automated systems based on language understanding
can help humans to achieve this work faster and more consistently. Towards this direction, we
present a case study on automatic classification of Greek literature books of the 19th century. The
main challenges in this problem are the limited number of literature books and resources of that
age and the quality of the source text. We propose an automated classification system based on the
Bidirectional Encoder Representations from Transformers (BERT) model trained on books from the
20th and 21st century. We also dealt with BERT’s constraint on the maximum sequence length of
the input, leveraging the TextRank algorithm to construct representative sentences or phrases from
each book. The results show that BERT trained on recent literature books correctly classifies most of
the books of the 19th century despite the disparity between the two collections. Additionally, the
TextRank algorithm improves the performance of BERT.

Word Embeddings and External Resources for Answer Processing in Biomedical Factoid Question Answering
2019
Word Embeddings and External Resources for Answer Processing in Biomedical Factoid Question Answering
D. Dimitriadis, G. Tsoumakas

Biomedical question answering (QA) is a challenging task that has not been yet successfully solved, according to results on international benchmarks, such as BioASQ. Recent progress on deep neural networks has led to promising results in domain independent QA, but the lack of large datasets with biomedical question-answer pairs hinders their successful application to the domain of biomedicine.

We propose a novel machine-learning based answer processing approach that exploits neural networks in an unsupervised way through word embeddings. Our approach first combines biomedical and general purpose tools to identify the candidate answers from a set of passages. Candidates are then represented using a combination of features based on both biomedical external resources and input textual sources, including features based on word embeddings. Candidates are then ranked based on the score given at the output of a binary classification model, trained from candidates extracted from a small number of questions, related passages and correct answer triplets from the BioASQ challenge.

Our experimental results show that the use of word embeddings, combined with other features, improves the performance of answer processing in biomedical question answering. In addition, our results show that the use of several annotators improves the identification of answers in passages. Finally, our approach has participated in the last two versions (2017, 2018) of the BioASQ challenge achieving competitive results.

Workshops
2020
Yes/No Question Answering in BIoASQ 2019
Dimitris Dimitriadis, Grigorios Tsoumakas

The field of question answering has gained greater attention with the rise of deep neural networks. More and more approaches adopt paradigms which are based primarily on the powerful language representations models and transfer learning techniques to build efficient learning models which are able to outperform current state of the art systems. Endorsing this current trend, in this paper, we strive to take a step towards the goal of answering yes/no questions in the field of biomedicine. Specifically, the task is to give a short answer (yes or no) for a question written in natural language, finding clues including in a set of snippets that are related with this question. We propose three different deep neural network models, which are free of assumptions about predefined specific feature functions, while the key elements of these are the ELMo embeddings, the similarity matrices and/or sentiment information. The results have shown that incorporating the sentiment, we can improve the performance of a yes/no question answering system while the proposed learning models significantly outperform the BioASQ baseline.

2016
Large-scale semantic indexing and question answering in biomedicine
Eirini Papagiannopoulou, Yiannis Papanikolaou, Dimitris Dimitriadis, Sakis Lagopoulos, Grigorios Tsoumakas, Manos Laliotis, Nikos Markantonatos, Ioannis Vlahavas

In this paper we present the methods and approaches employed in terms of our participation in the 2016 version of the BioASQ challenge. For the semantic indexing task, we extended our successful ensemble approach of last year with additional models. The official results obtained so-far demonstrate a continuing consistent advantage of our approaches against the National Library of Medicine (NLM) baselines. For the question answering task, we extended our approach on factoid questions, while we also developed approaches for the document, concept and snippet retrieval sub-tasks

2014
Ensemble Approaches for Large-Scale Multi-Label Classification and Question Answering in Biomedicine.
Yannis Papanikolaou, Dimitrios Dimitriadis, Grigorios Tsoumakas, Manos Laliotis, Nikos Markantonatos, Ioannis P Vlahavas

This paper documents the systems that we developed for our participation in the BioASQ 2014 large-scale bio-medical semantic indexing and question answering challenge. For the large-scale semantic indexing task, we employed a novel multi-label ensemble method consisting of support vector machines, labeled Latent Dirichlet Allocation models and meta-models predicting the number of relevant labels. This method proved successful in our experiments as well as during the competition. For the question answering task we combined different techniques for scoring of candidate answers based on recent literature.

Conferences
2022
Development of a Mobile Application to Calculate the River Flow with the Mid-section Method (in greek)
Dimitrios Pantelakis, Dimitris Dimitriadis, Konstantinos Dimitriadis, Xaralampos Doulgeris, Euaggelos Hatzigiannakis

The design and development of mobile applications in the environmental sciences is expanding. This work presents the design and implementation of a mobile application in Android platform that aims to calculate the river flow (on time) with the Mid-Section method and the storage of measurement data and results in databases. The Kotlin and Python programming languages have been used for application development. This application is innovative in the field of hydrometers and its development aims to directly calculate the flow of a river.