We are building an agent to play the grandfather of JRPGs, Dragon Quest. In this article, we will highlight the overall goals, methodologies & current progress being made on the agent development.

Goals for the overall project

The goal of the project is to evaluate the capabilities of open-source and closed-source models in playing RPGs. Been exposed to the progress made by open-source models in multi-modal understanding we naturally gravitated on only picking them first, eventually comparing them to closed source counterparts.

Why RPGs?

Before we get into RPGs, let’s start with the primary question, Why games?

Firstly games serve as rich testbeds for AI: they mimic real-world complexity with dynamic rules, clear objectives, and countless pathways to success. We also draw inspiration from prior breakthroughs: AlphaZero and AlphaGo, IBM’s advances and OpenAI’s systems for Atari, Dota 2 & in recent times challenges built to beat Pokémon & Minecraft. These efforts motivated us to tackle games with higher narrative and decision-making complexity.

Secondly, games in our opinion, are the best medium of experience, learning & storytelling. Between us, we have played nearly 800 games in our lifetime (i.e mostly sairaj) across several device & software generations and we love nerding out on game design. We wanted to understand game design & game reward mechanics in a much deeper way & we wanted to learn building RL environments. Thus this experiment was born.

Special thing about RPGs, it’s a combination of fantasy, grind, progression & status games very similar to what we experience in the modern world. We have played through almost all of them. The more we play RPGs, the more we realise that even though there’s well-defined ways of playing them, due to the vast agency available to the player, we find different ways of doing the same thing as we keep exploring the game. This helps people come up with different metas and strategies for progression & resource accumulation which introduces additional fun factor into the game. Our question here, could LLMs also figure out ways to beat the game and expand on existing strategies in the rulebook by simply exploring the game environment like humans?

There’s turn-based RPGs (DnD based like Baldur’s Gate or jRPGs like Dragon Quest, Chrono Trigger, Final Fantasy etc.) & Action RPGs (Morrowind, Witcher 3, Dragon Age etc.). For this experiment we are going to choose turn-based RPGs for its relative simplicity.

   Battling a character in the game(human)

Battling a character in the game(human)

Earlier game-playing AIs were largely trained with Deep Reinforcement Learning (DRL) which are systems that ‘learn by doing’ through hundreds of thousands of iterations. We learn all of our skills in a similar way, although our increased intelligence & proprioception allows us to reason better with the skills & carry learnings across context. When we play games, we put ourselves in in a much more structured environmental ruleset and figure out how to navigate and progress through such environments in promise of a reward.

Recent advances in research have allowed for multi-modal support for llms, for example via vision-language models. This equips agents with the essential inputs needed for RPGs: handling complex stories (projected via continous images), navigating (through actions), managing combat (through text-based game state), and interacting with NPCs (via screenplay).

We’ll cover how humans play games, optimum prompting for balancing agency & handholding plus how we wanna assess the benchmarks from a game design perspective in a later article.

Discovering the right game to pick ahead

Choosing the right game was a challenging decision, as it had to be relatively less complex for it to be a fair starting point. There had to be elements involving decision making like resource management, map-exploration, mini-interaction, skill management & combat mechanics for it to be a good test-bed for evaluation for a game-playing agent. Since an Turn-based RPG was the chosen genre, we decided to go with Dragon Quest. This was the simplest to work with because it was the oldest and the emulation was also fairly simple (NES / SNES). Plus we really loved the name Dragon Warrior which was the name for it’s US release but also going well with a certain Panda character

For our initial setup we decided to go with Retroarch on Steam because we were trying to play the SNES version of Dragon Quest, primarily as it was the version we already had available. But we needed to allow the agent to play through the emulator. We needed the memory map and the controls, and Retroarch would not allow us to do that because Lua scripting is not allowed in Retroarch. So, that's why we chose the Mesen emulator. Mesen emulator allows both NES and SNES, but for the Dragon Quest SNES version, we couldn't find the exact memory map that would be crucial later to read the underlying game-state information. So we stuck to the NES version for simplicity.

Setting up DragonQuest on Mesen

We downloaded the DragonQuest ROM and loaded it up via Mesen. Mesen supports custom ISOs, and as long as one has a custom ISO you’re good. Once the game is loaded, the first thing we noticed was a fresh new soundtrack and a more jaggedy game-map. Although the SNES version felt more modern, the essence of the game was still intact with NES.

                    NES version

                NES version

                  SNES version

              SNES version