AI benchmarks are broken. Here’s what we need instead.
- Apr 15
- 1 min read

MIT TECHNOLOGY REVIEW — AI is almost never used in the way it is benchmarked. Although researchers and industry have started to improve benchmarking by moving beyond static tests to more dynamic evaluation methods, these innovations resolve only part of the issue. That’s because they still evaluate AI’s performance outside the human teams and organizational workflows where its real-world performance ultimately unfolds.
While AI is evaluated at the task level in a vacuum, it is used in messy, complex environments where it usually interacts with more than one person.
Read the full story | MIT TECHNOLOGY REVIEW


