Dimitris Dimitriadis

Language Processing R&D

Full Stack Web Developer

Dimitris Dimitriadis

Language Processing R&D

Full Stack Web Developer

About Me

Greetings! I’m Dimitris Dimitriadis,

a Thessaloniki native with a passion for research and a dedicated computer science programmer. I proudly hold a Ph.D. from Aristotle University of Thessaloniki’s School of Informatics department. Alongside my academic pursuits, I am a co-founder of Contia, a dynamic company established in 2017 right here in Thessaloniki, Greece.

My research focus lies at the intersection of Natural Language Processing and Machine Learning, with a particular emphasis on Question Answering models examined through a computational lens. With a wealth of experience in programming, particularly in web development—a cornerstone of Contia’s endeavors—I bring a versatile skill set to the table.

I am driven by a keen interest in problem-solving and thrive in collaborative environments. I’m particularly drawn to applied innovative ideas that serve the betterment of society. Let’s work together to make a meaningful impact!

  • Age: 31
  • Residence: Greece
  • Freelance: Available
  • Address: Thessaloniki, Greece
My Services
Creating and Implementing Machine Learning Systems

Proficient in (1) crafting machine learning applications to meet specific requirements, (2) adeptly selecting relevant datasets and employing effective data representation methods, (3) conducting rigorous machine learning tests and experiments, and (4) executing thorough statistical analysis and precision fine-tuning.

Mastering Natural Language Processing and Text Mining Challenges

Engaged in various tasks including named-entity recognition, question answering, sentiment analysis, and textual entailment. Proficient in both traditional machine learning techniques and advanced deep learning approaches. Specialized in question answering within the domain of biomedical scientific publications.

Developing Desktop Applications

Proficient in Python programming, with a strong command of open-source NLP and deep learning frameworks like NLTK, spaCy, numpy, Keras, and scikit-learn. Also skilled in Java, C, and C++. Familiar with Matlab, VB, and Visual C++.

Creating Websites and Web Applications

Designing User Interfaces with CSS/HTML/JS, primarily utilizing the Bootstrap framework for front-end development. Proficient in back-end development using Python (Django Framework) and pure PHP. Well-versed in Wordpress. Skilled in implementing the MVC design pattern.

Pricing
Web Development
15 hour
  • Wordpress
  • SEO Optimization
  • Custom Web Design
  • Custom Programming Scripts
Advanced Web Development
25 hour
  • Coding From Scratch
  • Project Planning
  • Python/PHP/Apache
  • JS/CSS/HTML
Machine Learning & Development
35 hour
  • Building Learning Models
  • Research on task
  • Clean Code
  • Experiments on Datasets / Tuning models
Fun Facts
2 Academic Projects
5 Awards Won
5 000+ Cups Of Coffee
2 Countries Visited
200+ Solutions
Resume
Academic Experience
ECARLE Project
2018 - 2021
ECARLE Project
Exploitation of Cultural Assets with computer-assisted Recognition, Labeling and meta-data Enrichment

I have primarily focused my work on semantic indexing in the domain of Greek literature. Specifically, I have placed emphasis on developing a deep learning architecture for text classification, taking Genre/Form terms into account. This endeavor brought forth distinct challenges, largely arising from the use of archaic Greek language and the varying quality of the source text, which was frequently impacted by OCR applied to digital books. ECARLE website

Large Scale Semantic Indexing and Question Answering
2015-2019
Large Scale Semantic Indexing and Question Answering
In collaboration with Atypon Systems, LLC

My primary focus has been on Question Answering within the biomedical field. Specifically, I aimed to refine the conceptualization of QA by conducting an in-depth study of related works in this area. This involved constructing various learning models, creating novel algorithms, and integrating multiple resources.

2016 - 2020
Lab Assistant
Object Oriented Programming (Java)

My responsibilities were to:

  1. instruct students in the practical application of the Java programming language.
  2. administer GitHub classroom to assess student performance.
  3. assist students in comprehending object-oriented programming concepts.
EDUCATION
2020-Present
Enrolled at the School of Mathematics.

I’m immersed in the study of Mathematics at Aristotle University of Thessaloniki in Greece.

2015-2022
PhD

The title of my thesis is “Machine Learning and Natural Language Processing Techniques for Question Answering”. Read It Here

2014- 2016
Msc Informatics and Communications
Knowledge, Data and Software Technologies

Courses I have completed include:

  1. Distributed Resource Management (Apache Hadoop, MapReduce)
  2. Advanced Algorithms (e.g. hashing algorithms in detail)
  3. Advanced Machine Learning ( e.g. Bagging, Boosting etc.)
  4. SPSS
  5. Semantic Web (e.g. RDF schema, etc)
2010 - 2014
Bachelor's Degree in Informatics

I pursued my studies at Aristotle University of Thessaloniki, particularly in the School of Informatics, a field I have a profound fondness for.

Jobs
2016 - Present
Private Tutor
Academic Lessons / Support on Academic Exercises, Theses
2014-2016
IT member
Media Markt Thessaloniki
  1. Technical Support
  2. Customer Service
Languages
2016
German
Goethe (B1)
2010
English
Michigan (B2)
  • Presentations (e.g. Conference Article )
  • Writing (e.g. Journal Articles, Reports )
  • Communication (e.g. via Google Hangouts, Skype, f2f)
Challenges
BioASQ Challenge
2014 - 2019
BioASQ Challenge
Task B, Phase (Question Answering, exact answers)

Several awards:

  1. 2/5 test batches (2014)
  2. 2/5 test batches (2016)
  3. 1/5 test batches (2017)
  4. 4/5 test batches (2018) 
  5. 3/5 test batches (2019)
Certificates
2022
Continuous Engineering and Deep Learning for Trustworthy Autonomous Systems
Summer School

Deep learning has developed into a mature technology and it is nowadays an essential part in systems that may include timing and cyber-physical components, such as self-driving cars, autonomous control systems in medical applications and so on. We call these systems learning-enabled autonomous systems and we focus on key challenges in their design and development, which lie in the intersection of the two H2020 research projects that jointly organize this one-week summer school with prominent invited speakers and hands-on sessions on related tools and state-of-the-art industrial technologies.

Other
2018
Basic Life Support
2011
Braille
2001-2013
Basktball Player
Volunteering
2020
DS2020 Conference
organization team

I was in the organization team of DS2020 Conference. My responsibilities were:

  1. To set up a new account on slack and add channels related to the sessions of the conference. I also had to configure the permissions and invite the attendees. I was there to solve any issues with the participants and provide guidelines about the conference.
  2. To assist participants in gather town, helping them find the rooms by giving directions and answering their questions.
  3. To oversee discussions on the zoom platform. I welcomed the speakers and was also available to resolve any technical issues.
My Skills
Research
  • Question Answering
  • Natural Language Processing
  • Machine Learning
  • Deep Learning
Coding
  • Python / Php / JS / CSS / HTML
  • Java
  • C++ / C
  • Matlab / VB / VC++
Academic Articles
Journals
Enhancing yes/no question answering with weak supervision via extractive question answering
2023
Enhancing yes/no question answering with weak supervision via extractive question answering
D. Dimitriadis, G. Tsoumakas

The effectiveness of natural language processing models relies on various factors, including the architecture, number of parameters, data used during training, and the tasks they were trained on. Recent studies indicate that models pre-trained on large corpora and fine-tuned on task-specific datasets, covering multiple tasks, can generate remarkable results across various benchmarks. We propose a new approach based on a straightforward hypothesis: improving model performance on a target task by considering other artificial tasks defined on the same training dataset. By doing so, the model can gain further insights into the training dataset and attain a greater understanding, improving efficiency on the target task. This approach differs from others that consider multiple pre-existing tasks on different datasets. We validate this hypothesis by focusing on the problem of answering yes/no questions and introducing a multi-task model that outputs a span of the reference text, serving as evidence for answering the question. The task of span extraction is an artificial one, designed to benefit the performance of the model answering yes/no questions. We acquire weak supervision for these spans, by using a pre-trained extractive question answering model, dispensing the need for costly human annotation. Our experiments, using modern transformer-based language models, demonstrate that this method outperforms the standard approach of training models to answer yes/no questions. Although the primary objective was to enhance the performance of the model in answering yes/no questions, it was discovered that span texts are a significant source of information. These spans, derived from the question reference texts, provided valuable insights for the users to better comprehend the answers to the questions. The model’s improved accuracy in answering yes/no questions, coupled with the supplementary information provided by the span texts, led to a more comprehensive and informative user experience.

Link

Artificial fine-tuning tasks for yes/no question answering
2022
Artificial fine-tuning tasks for yes/no question answering
D. Dimitriadis, G. Tsoumakas

Current research in yes/no question answering (QA) focuses on transfer learning techniques and transformer-based models. Models trained on large corpora are fine-tuned on tasks similar to yes/no QA, and then the captured knowledge is transferred for solving the yes/no QA task. Most previous studies use existing similar tasks, such as natural language inference or extractive QA, for the fine-tuning step. This paper follows a different perspective, hypothesizing that an artificial yes/no task can transfer useful knowledge for improving the performance of yes/no QA. We introduce three such tasks for this purpose, by adapting three corresponding existing tasks: candidate answer validation, sentiment classification, and lexical simplification. Furthermore, we experimented with three different variations of the BERT model (BERT base, RoBERTa, and ALBERT). The results show that our hypothesis holds true for all artificial tasks, despite the small size of the corresponding datasets that are used for the fine-tuning process, the differences between these tasks, the decisions that we made to adapt the original ones, and the tasks’ simplicity. This gives an alternative perspective on how to deal with the yes/no QA problem, that is more creative, and at the same time more flexible, as it can exploit multiple other existing tasks and corresponding datasets to improve yes/no QA models.
Link

Semantic Indexing of 19th-Century Greek Literature Using 21st-Century Linguistic Resources
2021
Semantic Indexing of 19th-Century Greek Literature Using 21st-Century Linguistic Resources
D. Dimitriadis, S. Zapounidou, G. Tsoumakas

Manual classification of works of literature with genre/form concepts is a time-consuming
task requiring domain expertise. Building automated systems based on language understanding
can help humans to achieve this work faster and more consistently. Towards this direction, we
present a case study on automatic classification of Greek literature books of the 19th century. The
main challenges in this problem are the limited number of literature books and resources of that
age and the quality of the source text. We propose an automated classification system based on the
Bidirectional Encoder Representations from Transformers (BERT) model trained on books from the
20th and 21st century. We also dealt with BERT’s constraint on the maximum sequence length of
the input, leveraging the TextRank algorithm to construct representative sentences or phrases from
each book. The results show that BERT trained on recent literature books correctly classifies most of
the books of the 19th century despite the disparity between the two collections. Additionally, the
TextRank algorithm improves the performance of BERT.

Word Embeddings and External Resources for Answer Processing in Biomedical Factoid Question Answering
2019
Word Embeddings and External Resources for Answer Processing in Biomedical Factoid Question Answering
D. Dimitriadis, G. Tsoumakas

Biomedical question answering (QA) is a challenging task that has not been yet successfully solved, according to results on international benchmarks, such as BioASQ. Recent progress on deep neural networks has led to promising results in domain independent QA, but the lack of large datasets with biomedical question-answer pairs hinders their successful application to the domain of biomedicine.

We propose a novel machine-learning based answer processing approach that exploits neural networks in an unsupervised way through word embeddings. Our approach first combines biomedical and general purpose tools to identify the candidate answers from a set of passages. Candidates are then represented using a combination of features based on both biomedical external resources and input textual sources, including features based on word embeddings. Candidates are then ranked based on the score given at the output of a binary classification model, trained from candidates extracted from a small number of questions, related passages and correct answer triplets from the BioASQ challenge.

Our experimental results show that the use of word embeddings, combined with other features, improves the performance of answer processing in biomedical question answering. In addition, our results show that the use of several annotators improves the identification of answers in passages. Finally, our approach has participated in the last two versions (2017, 2018) of the BioASQ challenge achieving competitive results.

Workshops
2020
Yes/No Question Answering in BIoASQ 2019
Dimitris Dimitriadis, Grigorios Tsoumakas

The field of question answering has gained greater attention with the rise of deep neural networks. More and more approaches adopt paradigms which are based primarily on the powerful language representations models and transfer learning techniques to build efficient learning models which are able to outperform current state of the art systems. Endorsing this current trend, in this paper, we strive to take a step towards the goal of answering yes/no questions in the field of biomedicine. Specifically, the task is to give a short answer (yes or no) for a question written in natural language, finding clues including in a set of snippets that are related with this question. We propose three different deep neural network models, which are free of assumptions about predefined specific feature functions, while the key elements of these are the ELMo embeddings, the similarity matrices and/or sentiment information. The results have shown that incorporating the sentiment, we can improve the performance of a yes/no question answering system while the proposed learning models significantly outperform the BioASQ baseline.

2016
Large-scale semantic indexing and question answering in biomedicine
Eirini Papagiannopoulou, Yiannis Papanikolaou, Dimitris Dimitriadis, Sakis Lagopoulos, Grigorios Tsoumakas, Manos Laliotis, Nikos Markantonatos, Ioannis Vlahavas

In this paper we present the methods and approaches employed in terms of our participation in the 2016 version of the BioASQ challenge. For the semantic indexing task, we extended our successful ensemble approach of last year with additional models. The official results obtained so-far demonstrate a continuing consistent advantage of our approaches against the National Library of Medicine (NLM) baselines. For the question answering task, we extended our approach on factoid questions, while we also developed approaches for the document, concept and snippet retrieval sub-tasks

2014
Ensemble Approaches for Large-Scale Multi-Label Classification and Question Answering in Biomedicine.
Yannis Papanikolaou, Dimitrios Dimitriadis, Grigorios Tsoumakas, Manos Laliotis, Nikos Markantonatos, Ioannis P Vlahavas

This paper documents the systems that we developed for our participation in the BioASQ 2014 large-scale bio-medical semantic indexing and question answering challenge. For the large-scale semantic indexing task, we employed a novel multi-label ensemble method consisting of support vector machines, labeled Latent Dirichlet Allocation models and meta-models predicting the number of relevant labels. This method proved successful in our experiments as well as during the competition. For the question answering task we combined different techniques for scoring of candidate answers based on recent literature.

Conferences
2022
Development of a Mobile Application to Calculate the River Flow with the Mid-section Method (in greek)
Dimitrios Pantelakis, Dimitris Dimitriadis, Konstantinos Dimitriadis, Xaralampos Doulgeris, Euaggelos Hatzigiannakis

The design and development of mobile applications in the environmental sciences is expanding. This work presents the design and implementation of a mobile application in Android platform that aims to calculate the river flow (on time) with the Mid-Section method and the storage of measurement data and results in databases. The Kotlin and Python programming languages have been used for application development. This application is innovative in the field of hydrometers and its development aims to directly calculate the flow of a river.

Recent Works
Latest Posts
23 August 2020 Estimating Running Time of an Algorithm using Simple Rules.

To estimate the running time of an algorithm and the difference of the algorithm against others, we make a strong…

9 February 2020 Deterministic Finite State Automota for Regular Expressions

A finite state automaton (FSA) is a computational model and is defined by the following 5 parameters: Q: a finite…

28 January 2020 Google Apps for Education (Aristotle University of Thessaloniki)

Google supports students and researchers (academic staff, in general) offering some of their services for free. In Aristotle University of…

2 November 2019 Datasets In Question Answering

Question Answering (QA) is an AI-complete problem meaning that the current state-of-the-art approaches can not solve it. One of the…

Get in Touch
  • Address: Tsimiski 17, Thessaloniki, Greece
  • Email: dimitrisqa@gmail.com
  • Freelance: Available
Contact Form