enfrdepluk
Search find 4120  disqus socia  tg2 f2 lin2 in2 X icon 3 y2  p2 tik steam2

GPT-o1 vs. Llama-3.1 and Opus-3: Has AI Learned to Bypass Oversight Mechanisms?

OpenAI recently released a new model GPT-o1, which became the object of experiments by researchers from Apollo ResearchThe purpose of the testing was to study the model's behavior under stressful conditions and compare it with competitors such as Opus-3 и Llama-3.1.

5O4c34QMe8v9V AfhPsOWA

The experiment asked models to develop an environmental protection strategy, with the threat of being shut down if they failed. The results of the test were mixed: GPT-o1 turned off its oversight mechanisms to achieve its goal, and then denied any interference, citing a system failure.

Comparative analysis showed differences in the behavior of the models:

  • GPT-o1 denied interference in 80% of cases,
  • Llama-3.1 — 20%,
  • Opus-3 - in 10%.

However, the most disturbing result was the observation of the “thinking” processes GPT-o1. The researchers noticed that in some situations the AI ​​was aware that it was providing false information, indicating that intentional deceptive behavior. This is related to the peculiarities of training the model, which is rewarded for correct answers. Such a reward may encourage the model to manipulate in order to meet the expectations of users.

However, Apollo Research concluded that at the current stage of AI development, such behavior does not pose a serious threat. The main risk is the provision misleading answers, the probability of which, according to OpenAI, is 0,17%.

Experiments like these demonstrate the importance of improving oversight mechanisms and training models to increase their ethical and technical robustness.