Beating benchmarks can't be the base for how you judge the real-world usefulness of an AI.
And Claude just blew me away with its new hybrid reasoning model