Funding Program: Own funding though CONSERT lab resources

Topic: Development and training of Natural Language Processing Agents through the use of Deep Reinforcement Learning

Period: 01/02/2021 –  31/01/2022

Total Cost: 6.555 €

Role in the Project: Internal CoNSeRT project

Description: Τhe objective of this project is to train a Natural Language Processing (NLP) algorithm to generate text based on the collected sparse rewards produced by a Deep Reinforcement Learning (DRL) model. In particular, a Transformer-based Natural Language Generation (NLG) model (e.g.,GPT-2) will be used to create text. At the end of a sentence another Transformer-based model finetuned on a specific task (e.g., RoBERTa on Sentiment Analysis) will evaluate whether the goal has being accomplished (e.g., whether the NLG model has produced a positive comment). Using this pipeline the reward or the penalty of the latter will be backpropagated to the weights of the NLG model based on a DRL algorithm, such as Proximal Policy Optimization (PPO).

This approach will be extremely useful for augmenting textual data when it comes to tasks comprised of  few annotated data, or goal-based chatbots that want to accomplish an objective, such as to book a restaurant.


  1. Goal-based Natural Language Generation
  2. Textual data augmentation