Artificial intelligence should solve the problems of humanity – Fails at Pokémon Red, cannot find the arena after 50,000 hours

Artificial intelligence should solve the problems of humanity – Fails at Pokémon Red, cannot find the arena after 50,000 hours

The YouTuber Peter Whidden taught an AI to play Pokémon Red using an emulator. He encourages it to perform the right actions through reward points. However, problems keep occurring, such as a “fear” of Pokémon Centers and issues with navigation.

How does this work exactly? The artificial intelligence is supposed to play the game as much like a human as possible. The YouTuber explains that the AI is capable of using the controls independently. After each action, it looks at the screen and considers what to do next, just like a user in front of the device.

In doing so, he ran 40 test sessions in parallel to increase the learning speed.

However, since the algorithm does not aim to win the game on its own, Whidden set specific rewards. To encourage exploration, the AI received a reward point whenever it saw something new, measured by the number of different pixels on the screen. However, this resulted in the AI becoming fascinated by the animation of water and standing still nearby instead of continuing to the next city.

Additional reward points were introduced, such as for catching Pokémon, the overall level of the team, winning a trainer battle, or victory in an arena.

Yet, even after that, problems kept arising.

Anyone who gets nostalgic feelings from this text and the accompanying video should check out the trailer for the new live-action series:

The fear of Nurse Joy, the search for the second arena, and 10,000 Magikarp

What hurdles were there? During the first visit to the Pokémon Center, the AI interacted with the computer and stored some Pokémon. This lowered the overall level of the team, leading to a sort of trauma – even though the algorithm obviously has no feelings. From then on, attempts were made to actively avoid the building.

This led to the team not being healed anymore. Whidden had to tinker with the system and introduce a new reward level.

Also curious were the adjustments during battles. The AI ran into every battle, whether it could win or not. So the YouTuber introduced a penalty for lost battles. However, immediately after the first defeat, the AI refused to press the A-button after its last Pokémon had fainted. It simply wanted to stay forever on the battle screen to avoid losing points.

The battle against Brock in the first arena also did not proceed logically at all. The AI took a long time to realize that water attacks are the weakness of rock-type Pokémon. Only when Squirtle could use no other abilities than Bubble was this attack used for the first time. A casual victory followed – after about 7,000 hours of gameplay.

But even afterwards, things didn’t improve. The AI managed to enter Mt. Moon, but didn’t feel comfortable there and simply left the place again. Even after 50,000 hours, the AI has not found Cerulean City and thus the second arena.

However, the AI has grown very fond of the Pokémon Magikarp. At the shady vendor selling the Pokémon Magikarp for 500 Pokédollars, the AI obtained over 10,000 versions of it. Since the AI was programmed to catch new Pokémon, this was probably the most lucrative way.

When reading this, the AI doesn’t seem to make a good impression so far. But there were also a few positive elements.

AI learns glitches that take others decades

Was everything bad in this experiment? No, because the algorithm repeatedly used a specific path at the same location that made no sense at first glance. Later, however, the YouTuber found out that a glitch was being utilized that guarantees that the first Pokémon encountered can be caught immediately with a throw.

You can watch the entire video here:

Recommended editorial content

At this point you will find external content from YouTube that complements the article.

I consent to external content being displayed to me. Personal data can be transmitted to third party platforms. Read more about our privacy policy.
Link to the YouTube content

What was used at all? At the end of the video, the YouTuber explains many technical details that are particularly interesting if you want to conduct such experiments yourself. For the learning algorithm, he used Proximal Policy Optimization, which he says is the standard and was also used for ChatGPT.

The most difficult part of the process is explaining to the machine what to do without telling it each step individually, since the AI is supposed to learn independently. Larger datasets help with this, which were not available in this case with Pokémon Red, unlike with text or speech AIs.

What do you think of this experiment? Did you find the information and the video entertaining?

How good the language AIs have become is shown by this example: Company lays off an entire team, now lets AI do the work – An employee complains: ‘An AI took my job.’

Source(s): Gamesradar
Deine Meinung? Diskutiere mit uns!
139
I like it!
This is an AI-powered translation. Some inaccuracies might exist.
Lost Password

Please enter your username or email address. You will receive a link to create a new password via email.