Claude 3.7 Sonnet Can Play Pokémon?! AI Benchmarks Are Getting Weird

Apart from excelling at reasoning & coding, Claude 3.7 Sonnet apparently outperformed all models in Pokémon gameplay tests.

First, it was Chess & Go as AI benchmarks. Then, we got Dota 2 & Starcraft II. Now… Pokémon??

What’s next—Claude vs AlphaGo in competitive Uno? 😂