Defusing toxic online debates with AI

L3S Best Publication of the Quarter (Q3/2024)
Category: Natural Language Processing, Large Language Models

LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback 

Authors: Timon Ziegenbein, Gabriella Skitalinskaya, Alireza Bayat Makou, Henning Wachsmuth 

Published in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (A* conference) 

The paper in a nutshell:  

Ensuring that online discussions are civil and productive is a major challenge for social media platforms. Such platforms usually rely on both users and automated detection tools to flag inappropriate arguments of other users, which moderators then review. However, this kind of post-hoc moderation is expensive and time-consuming, and moderators are often overwhelmed by the amount and severity of flagged content. Instead, a promising alternative is to prevent negative behavior during content creation. This paper studies how inappropriate language in arguments can be computationally mitigated. We propose a computational approach that balances content preservation and appropriateness based on a large language model (LLM). We evaluate different degrees of content preservation and appropriateness in human assessment studies. Systematic experiments provide evidence that our approach can mitigate the inappropriateness of arguments while largely preserving their content. It significantly outperforms competitive computational baselines and humans. 

What is the potential impact of your findings?  

The findings from this research can be used to develop a computational tool that can help users to write more appropriate arguments in online discussions. This tool can be integrated into social media platforms to prevent negative behavior during content creation. This can help to improve the quality of online discussions and make social media platforms more civil and productive. 

What is new about the research?  

This research is the first to propose a computational approach that balances content preservation and appropriateness in rewriting inappropriate arguments. The approach is based on a large language model (LLM) and uses reinforcement learning to learn how to rewrite arguments  without the need for parallel data. 

Paper link:https://aclanthology.org/2024.acl-long.244.pdf