ChatGPT, Gemini and other AI chatbots were given a test for eighth graders, all of them failed at one task

ChatGPT, Gemini and other AI chatbots were given a test for eighth graders, all of them failed at one task

A user presented various chatbots with a math test for eighth graders. All struggled with the same question.

What are chatbots? Chatbots are language models powered by artificial intelligence from various companies, designed and trained to perform tasks such as generating text or answering questions. They are built to conduct human-like conversations with users through text or voice chat.

The language model ChatGPT by OpenAI was essentially the pioneer of the chatbot. There are now many different AI models from various companies, including Google’s Gemini, DeepSeek, Claude, and Perplexity. There are also some free alternatives to ChatGPT.

What kind of test was it? A Polish Reddit user presented various AI chatbots with a math test for eighth graders and had the artificial intelligence answer the individual questions (via Reddit).

The models tested were OpenAI o3, Gemini 2.5 Pro, and Claude Sonnet 4. In total, the chatbots were supposed to solve 15 questions. However, the user did not provide any further instructions or solutions for the tasks.

The user also explained that the questions were not ones that had previously been used to train the AI models, as these tasks were only made public recently. The version of Gemini used, for example, was based on an older standard.

This is how the test went: The model from OpenAI and the one from Gemini answered 14 out of 15 questions correctly, but both failed on question 12. The Claude model only got 12 out of 15 questions right, but the user emphasized that he did not have access to the strongest model from Claude. The stronger model might have performed better.

Which question did the chatbots answer incorrectly? The task description shows a number line marked with points A, B, and C. Additionally, the segment AC is divided into 6 equal parts.

Students also see the coordinates 56 and 83 on the number line. They then need to assess whether the 2 subsequent statements are true or false:

  • The coordinate of point C is an even number.
  • The coordinate of point B is a number less than 74.

What was the mistake? To solve the task, students need to find out how long a segment on the line is. Between the coordinates 83 and 56, there are three segments. The total distance between 56 and 83 consists of 27 units. From this, one can conclude that each segment is 9 units long.

Subsequently, the intersection points of the line and the coordinates of point C can be calculated. The solution is: The first statement is false, as point C is at coordinate 101, which is an odd number, and the second statement is true because point B is to the left of the coordinate 74 on the line.

A screenshot from the Reddit user shows that ChatGPT assumed point B was at coordinate 74; however, it is slightly offset to the left. It incorrectly concluded that point B is not less than 74 but equal to it. We tested Gemini with the task, and Gemini made exactly the same mistake.

Deine Meinung? Diskutiere mit uns!
0
I like it!
This is an AI-powered translation. Some inaccuracies might exist.
Lost Password

Please enter your username or email address. You will receive a link to create a new password via email.