NLP & Machine Learning Freelancer - Dimitris Dimitriadis

Dimitris Dimitriadis

Language Processing R&D

Full Stack Web Developer

About Me

Greetings! I’m Dimitris Dimitriadis,

a Thessaloniki native with a passion for research and a dedicated computer science programmer. I proudly hold a Ph.D. from Aristotle University of Thessaloniki’s School of Informatics department. Alongside my academic pursuits, I am a co-founder of Contia, a dynamic company established in 2017 right here in Thessaloniki, Greece.

My research focus lies at the intersection of Natural Language Processing and Machine Learning, with a particular emphasis on Question Answering models examined through a computational lens. With a wealth of experience in programming, particularly in web development—a cornerstone of Contia’s endeavors—I bring a versatile skill set to the table.

I am driven by a keen interest in problem-solving and thrive in collaborative environments. I’m particularly drawn to applied innovative ideas that serve the betterment of society. Let’s work together to make a meaningful impact!

Age: 31
Residence: Greece
Freelance: Available
Address: Thessaloniki, Greece

My Services

Creating and Implementing Machine Learning Systems

Proficient in (1) crafting machine learning applications to meet specific requirements, (2) adeptly selecting relevant datasets and employing effective data representation methods, (3) conducting rigorous machine learning tests and experiments, and (4) executing thorough statistical analysis and precision fine-tuning.

Mastering Natural Language Processing and Text Mining Challenges

Engaged in various tasks including named-entity recognition, question answering, sentiment analysis, and textual entailment. Proficient in both traditional machine learning techniques and advanced deep learning approaches. Specialized in question answering within the domain of biomedical scientific publications.

Developing Desktop Applications

Proficient in Python programming, with a strong command of open-source NLP and deep learning frameworks like NLTK, spaCy, numpy, Keras, and scikit-learn. Also skilled in Java, C, and C++. Familiar with Matlab, VB, and Visual C++.

Creating Websites and Web Applications

Designing User Interfaces with CSS/HTML/JS, primarily utilizing the Bootstrap framework for front-end development. Proficient in back-end development using Python (Django Framework) and pure PHP. Well-versed in Wordpress. Skilled in implementing the MVC design pattern.

Pricing

Web Development

€ 15 hour

Wordpress
SEO Optimization
Custom Web Design
Custom Programming Scripts

WD Projects ( Coming Soon)

Advanced Web Development

€ 25 hour

Coding From Scratch
Project Planning
Python/PHP/Apache
JS/CSS/HTML

AWD Projects ( Coming Soon)

Machine Learning & Development

€ 35 hour

Building Learning Models
Research on task
Clean Code
Experiments on Datasets / Tuning models

ML Projects ( Coming Soon)

Fun Facts

2 Academic Projects

5 Awards Won

5 000+ Cups Of Coffee

2 Countries Visited

200+ Solutions

Resume

Academic Experience

2018 - 2021

ECARLE Project

Exploitation of Cultural Assets with computer-assisted Recognition, Labeling and meta-data Enrichment

I have primarily focused my work on semantic indexing in the domain of Greek literature. Specifically, I have placed emphasis on developing a deep learning architecture for text classification, taking Genre/Form terms into account. This endeavor brought forth distinct challenges, largely arising from the use of archaic Greek language and the varying quality of the source text, which was frequently impacted by OCR applied to digital books. ECARLE website

2015-2019

Large Scale Semantic Indexing and Question Answering

In collaboration with Atypon Systems, LLC

My primary focus has been on Question Answering within the biomedical field. Specifically, I aimed to refine the conceptualization of QA by conducting an in-depth study of related works in this area. This involved constructing various learning models, creating novel algorithms, and integrating multiple resources.

2016 - 2020

Lab Assistant

Object Oriented Programming (Java)

My responsibilities were to:

instruct students in the practical application of the Java programming language.
administer GitHub classroom to assess student performance.
assist students in comprehending object-oriented programming concepts.

EDUCATION

2020-Present

Enrolled at the School of Mathematics.

I’m immersed in the study of Mathematics at Aristotle University of Thessaloniki in Greece.

2015-2022

PhD

The title of my thesis is “Machine Learning and Natural Language Processing Techniques for Question Answering”. Read It Here

2014- 2016

Msc Informatics and Communications

Knowledge, Data and Software Technologies

Courses I have completed include:

Distributed Resource Management (Apache Hadoop, MapReduce)
Advanced Algorithms (e.g. hashing algorithms in detail)
Advanced Machine Learning ( e.g. Bagging, Boosting etc.)
SPSS
Semantic Web (e.g. RDF schema, etc)

2010 - 2014

Bachelor's Degree in Informatics

I pursued my studies at Aristotle University of Thessaloniki, particularly in the School of Informatics, a field I have a profound fondness for.

Jobs

2016 - Present

Private Tutor

Academic Lessons / Support on Academic Exercises, Theses

2014-2016

IT member

Media Markt Thessaloniki

Technical Support
Customer Service

Languages

2016

German

Goethe (B1)

2010

English

Michigan (B2)

Presentations (e.g. Conference Article )
Writing (e.g. Journal Articles, Reports )
Communication (e.g. via Google Hangouts, Skype, f2f)

Challenges

2014 - 2019

BioASQ Challenge

Task B, Phase (Question Answering, exact answers)

Several awards:

Certificates

2022

Continuous Engineering and Deep Learning for Trustworthy Autonomous Systems

Summer School

Deep learning has developed into a mature technology and it is nowadays an essential part in systems that may include timing and cyber-physical components, such as self-driving cars, autonomous control systems in medical applications and so on. We call these systems learning-enabled autonomous systems and we focus on key challenges in their design and development, which lie in the intersection of the two H2020 research projects that jointly organize this one-week summer school with prominent invited speakers and hands-on sessions on related tools and state-of-the-art industrial technologies.

Other

2018

Basic Life Support

2011

Braille

2001-2013

Basktball Player

Volunteering

2020

DS2020 Conference

organization team

I was in the organization team of DS2020 Conference. My responsibilities were:

To set up a new account on slack and add channels related to the sessions of the conference. I also had to configure the permissions and invite the attendees. I was there to solve any issues with the participants and provide guidelines about the conference.
To assist participants in gather town, helping them find the rooms by giving directions and answering their questions.
To oversee discussions on the zoom platform. I welcomed the speakers and was also available to resolve any technical issues.

My Skills

Research

Question Answering
Natural Language Processing
Machine Learning
Deep Learning

Coding

Python / Php / JS / CSS / HTML
Java
C++ / C
Matlab / VB / VC++

Academic Articles

Journals

2023

Enhancing yes/no question answering with weak supervision via extractive question answering

D. Dimitriadis, G. Tsoumakas

The effectiveness of natural language processing models relies on various factors, including the architecture, number of parameters, data used during training, and the tasks they were trained on. Recent studies indicate that models pre-trained on large corpora and fine-tuned on task-specific datasets, covering multiple tasks, can generate remarkable results across various benchmarks. We propose a new approach based on a straightforward hypothesis: improving model performance on a target task by considering other artificial tasks defined on the same training dataset. By doing so, the model can gain further insights into the training dataset and attain a greater understanding, improving efficiency on the target task. This approach differs from others that consider multiple pre-existing tasks on different datasets. We validate this hypothesis by focusing on the problem of answering yes/no questions and introducing a multi-task model that outputs a span of the reference text, serving as evidence for answering the question. The task of span extraction is an artificial one, designed to benefit the performance of the model answering yes/no questions. We acquire weak supervision for these spans, by using a pre-trained extractive question answering model, dispensing the need for costly human annotation. Our experiments, using modern transformer-based language models, demonstrate that this method outperforms the standard approach of training models to answer yes/no questions. Although the primary objective was to enhance the performance of the model in answering yes/no questions, it was discovered that span texts are a significant source of information. These spans, derived from the question reference texts, provided valuable insights for the users to better comprehend the answers to the questions. The model’s improved accuracy in answering yes/no questions, coupled with the supplementary information provided by the span texts, led to a more comprehensive and informative user experience.

Link

2022

Artificial fine-tuning tasks for yes/no question answering

D. Dimitriadis, G. Tsoumakas

Current research in yes/no question answering (QA) focuses on transfer learning techniques and transformer-based models. Models trained on large corpora are fine-tuned on tasks similar to yes/no QA, and then the captured knowledge is transferred for solving the yes/no QA task. Most previous studies use existing similar tasks, such as natural language inference or extractive QA, for the fine-tuning step. This paper follows a different perspective, hypothesizing that an artificial yes/no task can transfer useful knowledge for improving the performance of yes/no QA. We introduce three such tasks for this purpose, by adapting three corresponding existing tasks: candidate answer validation, sentiment classification, and lexical simplification. Furthermore, we experimented with three different variations of the BERT model (BERT base, RoBERTa, and ALBERT). The results show that our hypothesis holds true for all artificial tasks, despite the small size of the corresponding datasets that are used for the fine-tuning process, the differences between these tasks, the decisions that we made to adapt the original ones, and the tasks’ simplicity. This gives an alternative perspective on how to deal with the yes/no QA problem, that is more creative, and at the same time more flexible, as it can exploit multiple other existing tasks and corresponding datasets to improve yes/no QA models.
Link

2021

Semantic Indexing of 19th-Century Greek Literature Using 21st-Century Linguistic Resources

D. Dimitriadis, S. Zapounidou, G. Tsoumakas

Manual classification of works of literature with genre/form concepts is a time-consuming
task requiring domain expertise. Building automated systems based on language understanding
can help humans to achieve this work faster and more consistently. Towards this direction, we
present a case study on automatic classification of Greek literature books of the 19th century. The
main challenges in this problem are the limited number of literature books and resources of that
age and the quality of the source text. We propose an automated classification system based on the
Bidirectional Encoder Representations from Transformers (BERT) model trained on books from the
20th and 21st century. We also dealt with BERT’s constraint on the maximum sequence length of
the input, leveraging the TextRank algorithm to construct representative sentences or phrases from
each book. The results show that BERT trained on recent literature books correctly classifies most of
the books of the 19th century despite the disparity between the two collections. Additionally, the
TextRank algorithm improves the performance of BERT.

2019

Word Embeddings and External Resources for Answer Processing in Biomedical Factoid Question Answering

D. Dimitriadis, G. Tsoumakas

Biomedical question answering (QA) is a challenging task that has not been yet successfully solved, according to results on international benchmarks, such as BioASQ. Recent progress on deep neural networks has led to promising results in domain independent QA, but the lack of large datasets with biomedical question-answer pairs hinders their successful application to the domain of biomedicine.

We propose a novel machine-learning based answer processing approach that exploits neural networks in an unsupervised way through word embeddings. Our approach first combines biomedical and general purpose tools to identify the candidate answers from a set of passages. Candidates are then represented using a combination of features based on both biomedical external resources and input textual sources, including features based on word embeddings. Candidates are then ranked based on the score given at the output of a binary classification model, trained from candidates extracted from a small number of questions, related passages and correct answer triplets from the BioASQ challenge.

Our experimental results show that the use of word embeddings, combined with other features, improves the performance of answer processing in biomedical question answering. In addition, our results show that the use of several annotators improves the identification of answers in passages. Finally, our approach has participated in the last two versions (2017, 2018) of the BioASQ challenge achieving competitive results.

Workshops

2020

Yes/No Question Answering in BIoASQ 2019

Dimitris Dimitriadis, Grigorios Tsoumakas

The field of question answering has gained greater attention with the rise of deep neural networks. More and more approaches adopt paradigms which are based primarily on the powerful language representations models and transfer learning techniques to build efficient learning models which are able to outperform current state of the art systems. Endorsing this current trend, in this paper, we strive to take a step towards the goal of answering yes/no questions in the field of biomedicine. Specifically, the task is to give a short answer (yes or no) for a question written in natural language, finding clues including in a set of snippets that are related with this question. We propose three different deep neural network models, which are free of assumptions about predefined specific feature functions, while the key elements of these are the ELMo embeddings, the similarity matrices and/or sentiment information. The results have shown that incorporating the sentiment, we can improve the performance of a yes/no question answering system while the proposed learning models significantly outperform the BioASQ baseline.

2016

Large-scale semantic indexing and question answering in biomedicine

Eirini Papagiannopoulou, Yiannis Papanikolaou, Dimitris Dimitriadis, Sakis Lagopoulos, Grigorios Tsoumakas, Manos Laliotis, Nikos Markantonatos, Ioannis Vlahavas

In this paper we present the methods and approaches employed in terms of our participation in the 2016 version of the BioASQ challenge. For the semantic indexing task, we extended our successful ensemble approach of last year with additional models. The official results obtained so-far demonstrate a continuing consistent advantage of our approaches against the National Library of Medicine (NLM) baselines. For the question answering task, we extended our approach on factoid questions, while we also developed approaches for the document, concept and snippet retrieval sub-tasks

2014

Ensemble Approaches for Large-Scale Multi-Label Classification and Question Answering in Biomedicine.

Yannis Papanikolaou, Dimitrios Dimitriadis, Grigorios Tsoumakas, Manos Laliotis, Nikos Markantonatos, Ioannis P Vlahavas

This paper documents the systems that we developed for our participation in the BioASQ 2014 large-scale bio-medical semantic indexing and question answering challenge. For the large-scale semantic indexing task, we employed a novel multi-label ensemble method consisting of support vector machines, labeled Latent Dirichlet Allocation models and meta-models predicting the number of relevant labels. This method proved successful in our experiments as well as during the competition. For the question answering task we combined different techniques for scoring of candidate answers based on recent literature.

Conferences

2022

Development of a Mobile Application to Calculate the River Flow with the Mid-section Method (in greek)

Dimitrios Pantelakis, Dimitris Dimitriadis, Konstantinos Dimitriadis, Xaralampos Doulgeris, Euaggelos Hatzigiannakis

The design and development of mobile applications in the environmental sciences is expanding. This work presents the design and implementation of a mobile application in Android platform that aims to calculate the river flow (on time) with the Mid-Section method and the storage of measurement data and results in databases. The Kotlin and Python programming languages have been used for application development. This application is innovative in the field of hydrometers and its development aims to directly calculate the flow of a river.

Recent Works

Question Answering in Biomedical Domain

Big Projects