How Will AI Help With Game Testing?

It’s odd to think that your favorite big-budget video games, which can take years and upwards of $50 million to develop, often emerge from several months of back-to-back hours spent destroying them by dedicated testers.

No gamer wants to see their character stuck floating mid-air or their warriors glitching mid-fight. This would break the illusion of what might have been a very realistic game, pulling the player out of the experience. These disconnections leave players feeling frustrated and disenchanted, unable to control their avatars effectively.

That’s why game developers aim to “break” their products during testing to identify and remedy potential issues before they reach the public. By finding and fixing bugs early on, games feel more realistic and enjoyable to play.

The ongoing argument of how best to go about testing — the automation versus manual debate — continues to challenge game publishers, but the continuing layoffs that started in 2023 shows us the industry is in for some much-needed reform.

Testing video games is no walk in the park; the seemingly infinite variables make experienced game testers some of the most elite to date. But these elite stars have to undertake some incredibly tedious and repetitive tasks, leaving them often overworked and underpaid, so something has to give to take the pressure off.

AI will serve as a safety valve for the industry’s testers. It will amplify test case creation tenfold, and risk analysis will become more intuitive thanks to machine learning.

3 Types of Game Testing AI Can Conduct

Positive and negative testing.
Boundary analysis.
Stress and load testing.

More on AIWhat Is Artificial Intelligence (AI)?

Challenges for Game Testers

Although 2023 was one of the best years for gamers, with top releases including The Legend of Zelda: Tears of the Kingdom, Baldur’s Gate and Alan Wake 2, the industry’s testers have had it tough.

During the pandemic’s gaming boom, venture funding surged. Increased profits attracted non-gaming investors, leading to bloated valuations and startup funding rounds — while diluting the sector’s talent. The pool of skilled game testers became insufficient to meet the rising needs of the industry.

Not to mention, inflation remains well above market rates, and game publishers continue to exploit their employees’ passion, underpaying them for what the industry refers to as “crunch.” Federal and state overtime laws exempt high-earning computer professionals in the gaming industry. This encourages studios to rely on overtime work to address development issues. In other words, up to an additional 20 hours in their standard 40-hour workweek, with just 8 percent saying they received compensation.

This leaves game testers fully occupied repeating tasks persistently, potentially more than 100 times, all to validate small elements in the game.

Overcoming the Infinite With AI

Automating test case generation with AI will significantly reduce a game’s time to market while also eliminating the most monotonous part of game production. For instance, developers using GitHub Copilot, an LLM-based code generator, reported a 55 percent productivity increase.

Fixing issues is a grueling task, often leading to new bugs appearing in a different context or under different circumstances. This dynamic can make progress feel slow, as if the testing process is never-ending. Rather than testing and encountering the same bugs over and over again, which can be disheartening and frustrating, AI can help create test scenarios to review said bugs on a loop throughout different aspects of the game.

Let’s take a look at some key testing types, including positive and negative testing, boundary analysis, and stress and load testing, where AI can generate bespoke test cases and save game testers hours of time.

Positive and Negative Testing

Testers should always implement at least two tests per requirement, one positive and one negative. This principle is true whether testing functional parts of the game, like the controls, features and logic, or nonfunctional areas, such as its responsiveness and usability.

Positive testing ensures that software features work as intended when given valid inputs. For instance, player name fields typically require alpha characters. So, the positive test would verify functionality by inputting valid names into the user interface. It checks whether the software behaves correctly under normal circumstances.

In contrast, game-makers must also look for ways to exploit game mechanics or glitches to uncover vulnerabilities. Negative testing is important here. It checks how games handle unexpected or invalid inputs, like apostrophes in names or leaving fields blank. The aim is to validate that the software only performs the actions it should and prevents those it shouldn’t, such as allowing unauthorized access or crashing.

With AI, game testers can use the following prompt:

“Test if you can log in with this credential list, each one 10 times:

... // credential list here”

By including positive and negative credentials in the list, game testers can analyze hundreds of credential inputs, thoroughly, in seconds. The more credentials you provide the tool, and the more times you ask the tool to review them, the less risk of gamers finding a fault. You can even ask AI to keep generating and testing elements, like credentials, on an infinite loop until you stop it.

That’s what makes it so valuable: Developers input the context and the AI runs the tests autonomously while developers can take a well-deserved coffee break.

Boundary Analysis

Developers assume that most errors and defects tend to occur at the edges of normal or expected values. Therefore, game makers need to understand how their games function at the boundaries by testing the system inputs and outputs at the extreme values or limits of its range.

To identify input and output boundaries, developers must:

Define the input or output domain: Consider user expectations, developer design, specifications and environmental factors.
Identify input or output boundaries: Consider data type, range, structure, and dependencies.
Test input or output boundaries: Use boundary value analysis, equivalence partitioning, and error guessing.

Even testing one element, such as a character’s life percentage, has seemingly endless possibilities.

Imagine a scenario with the following life boundary values: Minimum: Zero (character is dead), maximum: 100 (character is at full health), near minimum: one, two or three (near death), and near maximum: 97, 98, 99 (nearly full health).

Developers will look to heal characters at their boundaries and near boundaries, such as healing them from 97 health to 100 health, with different elements in the game, where health should remain at 100.

But what happens at every possible boundary scenario between different life statuses and healing packages? Does the health stay at 100, or does it go over the life allowance based on the amount in the life support? What happens over time after the character has lost its life repeatedly? Are the results the same if equal impact is applied in a shorter window?

AI could significantly enhance boundary testing by autonomously generating numerous test cases with varying health values, damage amounts and healing factors, ensuring comprehensive coverage. It can simulate real-world gameplay conditions and measure variables such as damage over time. Developers can also prompt the AI to keep playing the game differently until a certain outcome is reached. The practical part here is that the tester doesn’t have to write out hundreds of test scenarios and feed them to the AI. The AI can come up with the scenarios itself.

Stress and Load Testing

Similar to how developers must test system behavior at extreme edges of values, game makers must also assess game performance under harsh conditions. This includes heavy loads, numerous users and resource constraints, such as limited CPU or memory. They will look at the response time, throughput and resource usage in each scenario to identify system stability.

To push games to their limits, developers will create scenarios with many objects, characters or players to test the game’s performance under heavy loads. They will also perform repetitive actions or hold buttons down for extended periods to identify issues related to sustained activity.

How many elements of the game have the potential to provide high stress. For example, if your game can accept multiple players at once, that impacts the number of tests you run. Ensuring that you test every scenario with AI will help guarantee game functionality before the game reaches the user and without developer burnout.

Moreover, autonomous game-testing AI analyzes what is on the screen and decides where to click at which moment and which keyboard keys to press. Then it reasons about whether the actions taken were executed correctly and keeps everything in a journal to track its action history. This allows the framework to provide a built-in memory, which is a network of vector databases, so it can always generate new random scenarios that are different than before, without human instruction. This frees developers time to focus on more intricate, higher-order tasks.

More on GamingWhy Software Developers Should Be Playing More Video Games

Test Better With AI

Today’s gamer expectations are exceedingly high, leaving developers with an incredibly fierce testing environment. By automating tasks, generating diverse test cases, simulating multiple scenarios, and providing valuable insights, AI can help game developers ensure that their creations are robust and reliable, even under extreme conditions.