The COVID-19 Humor Project

TL;DR

My first research publication! Project concerned with COVID-19 pandemic humor dataset creation as well as Humor detection and Maladaptive Humor Detection tasks evaluated on state-of-the-art LLM models to assess the subtle and dark nature of pandemic related humor that may serve toward social and mental analysis of a population living through such an era. Here’s a high-level pipeline of the project and its steps, check out the Resources for more!

pipeline

Motivation

It all started as a part of my third year undergrad project, we were trying to analyze the interesting discourse online about the COVID-19 pandemic of 2020. Since it was the early days of this torturous pandemic, I was quite fascinated as to how people from various sections of the society coped with the day-to-day stress COVID-19 imposed over us in the form of humorous text. From this stemmed the topic of topical humor as a thesis project. Along with my classmate, Neha and my Professor of Natural Language Processing at the time Dr.Sunny Rai, we began defining motivations and asked ourselves :

Is there a topical humor dataset with labels that could help us understand behavior of people in a pandemic situation, how are people getting affected with this kind of humor? Are jokes made during a crisis insensitive to certain people and if so how?

Since there weren’t many relevant research papers found on this specific domain of COVID-19 Humor, we decided to focus on different aspects of the project. This was a long-term project spanning around 3 years, so needless to say, we had had our share of diversions and interesting conclusion that propelled the work in different directions.

The Project

Having the motivation in mind, we began extracting COVID-19 Humor data from various social media sites, namely Reddit, Twitter, Onion headlines and real news headlines. We also extracted memes in the form of images for future work on COVID-19 Humor in the form of memes.

Pre-processing, Cleaning and Normalizing the data culminated into one dataset containing about 2510 hand-annotated samples with labels such as humor style, type, theme, target and stereotypes formed or exploited while creating the humor in addition to 909 memes.

Having a well-annotated dataset meant we could employ machine learning models for the tasks we were most curious about. We evaluated the tasks of humor detection and maladaptive humor detection on the state of the art models of GPT-3, RoBERTa and fine-tuned RoBERTa.

We find that these SOTA models are yet to be perfected in the above mentioned tasks and often fail to detect humor or misidentify the maladaptive intent of the offensive joke. This leads to a potentially unexplored domain of maladaptive humor detection through zero-shot learning.

ACM WebConf Presentation

As a result of the work we put into this project, we were rewarded with being accepted and published into the ACM Web Conference 2023 (WWW’23 Companion - SocialNLP Workshop), April 30 – May 04, 2023 in Austin, TX, USA. The SocialNLP workshop was The 11th International Workshop on Natural Language Processing for Social Media. The main objectives of the conference were :

Addressing issues in social computing using NLP techniques
Solving NLP problems using information from social networks or social media
Handling new problems related to both social computing and natural language processing

Our paper was selected under the theme of Disaster Management Using Social Media, specifically COVID-19 on Social Media. We were selected for presentation at the conference which we gave virtually and I am pleased to say that our work was met with much acceptance and encouragement from the panel members and the scientific community. Here is a picture of us presenting live! ACM-WebConf

The Impact

We believe that the dataset we produced could have a significant positive impact in the research of the following topics:

Microaggression Detection
Creating a safe and inclusive digital atmosphere.
Resource to understand non-traditional, topical humor
Stereotype/Norms detection and identifying unfair/harmful stereotypes.
Information sharing for public welfare.

Resources

Here are some useful links if you are interested in checking out this project!