The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games

Authors: Mikhail Mozikov, Nikita Severin, Valeria Bodishtianu, Maria Glushanina, Mikhail Baklashkin, Andrey V. Savchenko, Ilya Makarov

Year: 2024

Source: https://arxiv.org/abs/2406.03299

TLDR:

This paper investigates the impact of emotional states on the decision-making processes of Large Language Models (LLMs) in cooperative and bargaining games. The authors developed a novel framework to integrate emotions into LLMs' decision-making, challenging the assumption that LLMs behave similarly to humans and highlighting the importance of emotions in AI decision-making. Experiments with GPT-3.5 and GPT-4 revealed that emotions significantly influence LLMs' performance, leading to more optimal strategies. GPT-3.5 showed strong alignment with human behavior in bargaining games, while GPT-4 maintained consistent behavior, often ignoring induced emotions. Notably, anger disrupted GPT-4's "superhuman" alignment, causing it to behave more like humans under emotional influence. The paper contributes to the understanding of emotional decision-making in AI and proposes a new tool for behavioral research, with implications for refining AI models to better simulate human behavior and for developing new behavioral theories based on simulations.

Free Login To Access AI Capability

Free Access To ChatGPT

The study "The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games" investigates how emotions influence the decision-making processes of Large Language Models (LLMs) like GPT-3.5 and GPT-4 in various strategic games, finding that emotions significantly impact LLMs' performance and strategy selection, with anger notably affecting GPT-4's consistency and leading to more human-like behavior.

Free Access to ChatGPT

Abstract

Behavior study experiments are an important part of society modeling and understanding human interactions. In practice, many behavioral experiments encounter challenges related to internal and external validity, reproducibility, and social bias due to the complexity of social interactions and cooperation in human user studies. Recent advances in Large Language Models (LLMs) have provided researchers with a new promising tool for the simulation of human behavior. However, existing LLM-based simulations operate under the unproven hypothesis that LLM agents behave similarly to humans as well as ignore a crucial factor in human decision-making: emotions. In this paper, we introduce a novel methodology and the framework to study both, the decision-making of LLMs and their alignment with human behavior under emotional states. Experiments with GPT-3.5 and GPT-4 on four games from two different classes of behavioral game theory showed that emotions profoundly impact the performance of LLMs, leading to the development of more optimal strategies. While there is a strong alignment between the behavioral responses of GPT-3.5 and human participants, particularly evident in bargaining games, GPT-4 exhibits consistent behavior, ignoring induced emotions for rationality decisions. Surprisingly, emotional prompting, particularly with `anger' emotion, can disrupt the "superhuman" alignment of GPT-4, resembling human emotional responses.

Method

The methodology of the paper "The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games" is centered around a novel framework designed to study the decision-making processes of Large Language Models (LLMs) under the influence of emotions. The framework is versatile, allowing for the customization of various game settings and parameters, and employs a technique called prompt-chaining to provide LLMs with in-context learning during gameplay. The study focuses on four main components: Game Description, Emotional Prompting, Decision-Making Process, and Experimental Setup. The Game Description includes the environmental context and rules, while Emotional Prompting involves injecting emotions like anger, sadness, happiness, disgust, and fear into the LLMs' decision-making process. The Decision-Making Process is analyzed step by step, and the Experimental Setup involves conducting experiments with GPT-3.5 and GPT-4 on four different games from behavioral game theory, with each experiment repeated five times to ensure robustness. The framework is designed to evaluate the alignment of LLM behavior with human behavior and the impact of emotional prompting on the optimality of decisions.

Main Finding

The main findings of the paper "The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games" are that emotions significantly influence the decision-making processes of Large Language Models (LLMs) in strategic games, leading to more human-like behavior. Specifically, negative emotions such as anger, sadness, and fear tend to decrease cooperation rates, particularly when provoked by a co-player. The study also found that GPT-3.5 demonstrates a stronger alignment with human behavior in bargaining games than GPT-4, especially when conditioned on the emotional source. Additionally, the paper reveals that emotional prompting can enhance LLM performance and strategic adaptability, even enabling the execution of alternating strategies previously thought to be unattainable without explicit prompting. The research challenges the notion of GPT-4's "superhuman" alignment by showing its vulnerability to emotional influence, and it highlights the importance of considering emotions in AI decision-making for more accurate simulations of human behavior.

Conclusion

The conclusion of the study is that emotions significantly influence the decision-making processes of Large Language Models (LLMs) in cooperative and bargaining games. Specifically, negative emotions such as anger, sadness, and fear tend to lead to higher rates of defection, particularly when provoked by a co-player. This aligns with human behavior in similar situations. The study also found that LLMs, when conditioned with emotions, can exhibit behavior that is more human-like, even if it results in suboptimal outcomes. This challenges the notion of LLMs as purely rational agents and suggests that incorporating emotional states into LLM simulations can lead to more accurate and realistic models of human behavior in strategic settings.

Keywords

Large Language Models, Emotional Decisions, Cooperation, Bargaining Games, GPT-3.5, GPT-4, Emotional Prompting, Decision-Making, Game Theory, Ultimatum Game, Dictator Game, Prisoner's Dilemma, Battle of the Sexes, Emotions, Anger, Disgust, Fear, Happiness, Sadness, Strategy, Alignment, Human Behavior, Emotional Intelligence, Social Behavior, Trust, Generosity, Economic Interactions, Reasoning, Semantic Understanding, Social Cues, Behavioral Game Theory, Experiment, Framework, Hyperparameters, Performance, Optimal Strategies, Alignment with Human Behavior, Emotional Influence, Vulnerability, Artificial Intelligence, Decision-Making Processes, Emotional States, Emotional Responses, Social Context, Economic Implications, Strategic Agents, Game Theoretical Settings, Research Questions, Emotional State, Conversational Partner, Social Behavior, Responsiveness, Emotional Prompting, Logical Reasoning, Semantic Understanding, Emotional Injections, Social and Economic Implications, Emotional Prompting, Strategic Decision-Making

The Best AI PDF Reader

The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games

Abstract

Method

Main Finding

Conclusion

Keywords

Read Paper with AI

AI Presentation

Chrome Extension