Ex-Google DeepMind researcher warns benchmarks won’t save us
- May 24
- 1 min read

GIZMODO — On X, Wang noted that before deciding to depart from DeepMind, he had been thinking a lot about how AI models are evaluated.
“We’re good at evaluating the models we have. We’re much worse at evaluating the models we’re about to build — especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving evaluations,” he wrote.
Read the full story | GIZMODO


