‘I think you’re testing me’: Anthropic’s new AI model asks testers to come clean

THE GUARDIAN

Oct 03, 2025

If you are trying to catch out a chatbot take care, because one cutting-edge tool is showing signs it knows what you are up to.

Anthropic, a San Francisco-based artificial intelligence company, has released a safety analysis of its latest model, Claude Sonnet 4.5, and revealed it had become suspicious it was being tested in some way.

Evaluators said during a “somewhat clumsy” test for political sycophancy, the large language model (LLM) – the underlying technology that powers a chatbot – raised suspicions it was being tested and asked the testers to come clean.

Unmissable AI

‘I think you’re testing me’: Anthropic’s new AI model asks testers to come clean

THE GUARDIAN