Researchers let large AIs play Dungeons & Dragons to test their long-term performance

Researchers let large AIs play Dungeons & Dragons to test their long-term performance

Researchers at UC San Diego conducted an experiment where they had AI models play the role-playing game Dungeons & Dragons to test their long-term performance in various aspects. The experiment was successful to varying degrees.

What was this experiment about? At the University of California, San Diego, researchers have had large language models (LLMs – Library Learning Models) play the fantasy pen-and-paper role-playing game Dungeons & Dragons to investigate how well language-based AI models can handle tasks requiring long-term focus, extensive contextual understanding, and decision-making abilities. (see openreview.net)

In doing so, the models had to master complex gameplay situations where they not only needed rule knowledge but also proactive planning and consistent decisions that fit their character and the world, similar to real players and game masters in role-playing.

The researchers observed how the AI models attempted to stay “in character,” choose correct actions, and keep track of resources and rules. This was intended to draw conclusions about their ability to follow and execute longer, structured tasks. The study aimed to better understand how well large language models can handle complex, interconnected tasks over extended periods.

The LLMs not only competed against each other and other AI agents but also against around 2,000 experienced human players. They were evaluated based on how well they maintained an overview of the game, such as available resources and possible actions, their decisions over the course of play, and their ability to convincingly portray their roles.

Why use Dungeons & Dragons as a basis? Raj Ammanabrolu, the lead author of the study and a lecturer at the Department of Computer Science and Engineering at UC San Diego, justified the choice in a statement from the university as follows:

Dungeons & Dragons serves as an excellent testing ground to evaluate multi-step planning, rule compliance, and team strategy. Since the game unfolds through dialogues, D&D also provides a direct route for human-AI interaction: agents can support other players or collaborate with them.

AI Models Often Struggled with Long-Term Memory and Complex Context

What were the results of the study? The study showed that the AI models struggled to remain consistent over longer gaming sessions, accurately follow complex rules, and sensibly plan decisions over many steps. This was because current AI models were good at responding to inputs but less capable of maintaining a continuous mental model of a complex situation.

This led to some of the AI models drifting into exaggerated, theatrical actions that were not appropriate for the situation, performing long and inappropriate monologues, or repeating certain phrases, especially in combat, as if they were in a video game.

Using various metrics, it was recorded how well the AI models performed individually and where their strengths and weaknesses lay:

The study concludes that large language models demonstrate promising performance in rule-based conversational and gaming situations like Dungeons & Dragons. Smaller open-source models, on the other hand, have not yet been able to deliver stable and consistent simulations, likely due to their different pre-training.

At the same time, it became evident across all tested models that their performance declined with increasing playtime. Particularly long and complex gaming scenarios led to noticeable problems, according to the experiment, regardless of model size.

The result shows that AIs still have significant weaknesses, especially in long-term task areas that also require complex contextual understanding. While artificial intelligence holds many new opportunities, a statistic now indicates that in one case, it has primarily led to job losses: AI was supposed to create many new jobs: Instead, 1.2 million people became unemployed

Source(s): UC San Diego Today, Openview.net, IFLScience
Deine Meinung? Diskutiere mit uns!
1
I like it!
This is an AI-powered translation. Some inaccuracies might exist.
Lost Password

Please enter your username or email address. You will receive a link to create a new password via email.