Why Weibo’s tiny VibeThinker-3B has the AI world arguing over benchmarks again

https://images.ctfassets.net/jdtwqhzvc2n1/3nN597gjeIB7cuVDi13fi4/27ec6c6a739b9a205e7f3a8ecf4d9ddb/Nuneybits_Vector_art_of_a_laptop-sized_AI_casting_a_giant_geome_e2031be7-1e11-4da1-b7bc-67...

On Sunday, a team of nine researchers at Sina Weibo — the Chinese social media giant better known for its microblogging platform than for cutting-edge artificial intelligence — quietly posted a 14-page technical report to arXiv that sent shockwaves through the AI research community. Their claim: a language model with just 3 billion parameters can match or exceed the reasoning performance of flagship systems from Google DeepMind, OpenAI, Anthropic, and DeepSeek that are hundreds of times larger.

The model, called VibeThinker-3B, scored 94.3 on AIME 2026 — the American Invitational Mathematics Examination, one of the most demanding standardized math competitions in the world. That figure places it alongside DeepSeek V3.2, a model with 671 billion parameters, and ahead of Gemini 3 Pro, Google's high-performance flagship reasoning system, which scored 91.7. With a test-time scaling technique the team calls Claim-Level Reliability Assessment, the score climbs...

Copyright of this story solely belongs to venturebeat.com. To see the full text click HERE