TECH NEWS

DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole

For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI's GPT-5 family, Anthropic's Claude Opus, and Google's Gemini Pro have clustered within a narrow band on Scale AI's SWE-Bench Pro leaderboard, making it nearly impossible for engineering leaders to determine which agent will actually perform best inside their codebases.

On Monday, a startup called Datacurve released a benchmark it says shatters that illusion. DeepSWE, a 113-task evaluation spanning 91 open-source repositories and five programming languages, produces a dramatically wider spread among the same frontier models — and crowns OpenAI's GPT-5.5 as the clear leader at 70%, sixteen points ahead of its nearest competitor.

"On public leaderboards, top models often look relatively close in capability," wrote Datacurve co-author Serena Ge on X. "DeepSWE shows where they actually diverge, reflecting the realistic experience of...

Copyright of this story solely belongs to venturebeat.com. To see the full text click HERE

https://www.techtarget.com/rms/onlineimages/keys_a138169498.jpg

As Q-Day looms, 90% of systems are unprepared for PQC | TechTarget

Cybersecurity executives have a long way to go before they are ready for a quantum computing world, researchers warn -- and they're likely running out of time. A new report from Forescout Research Vedere Labs found that 90% of systems remain unprepared for Q-Day -- when a quantum

https://www.zdnet.com/a/img/resize/226a311ec98b181bd95b77bf779610fff6ac5739/2025/06/25/1e482c17-2e2d-4187-8e0b-17f2aacad51c/slplus-2.jpg?auto=webp&fit=crop&height=675&width=1200

Bluetooth speakers aren't created equal - here are the ones I'd buy for Prime Day

Follow ZDNET: Add us as a preferred source on Google. Cooler weather is behind us, signaling that Bluetooth speaker season is in full swing. If your summer days consist of cookouts, patio hangouts, days at the beach, or afternoons in the park, you need a trusty Bluetooth speaker. During this

https://platform.theverge.com/wp-content/uploads/sites/2/2025/01/STK481_STK432_CONGRESS_GOVERNMENT_CIVRGINIA_C.jpg?quality=90&strip=all&crop=0%2C10.732984293194%2C100%2C78.534031413613&w=1200

Congresswoman denies staff used AI to write defense funding amendment

Emma Roth is a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO. Rep. Anna Paulina Luna (R-FL) says her staff used AI for “spellcheck” in an amendment summary for a major defense bill, but

https://cdn.mos.cms.futurecdn.net/FwWN5EuahphSrnsnTiSfjn-1920-80.jpg

‘Who cares about GTA 6?’ I’m not bitter about the lack of a PC version, and neither will you be with…

Amazon Prime Day is running throughout this week, and I've been checking Amazon, as well as its rivals, which are also running sales, for the best gaming laptop deals. As Managing Editor of Core Tech for TechRadar, I've spent nearly two decades writing about laptops and

Read more

As Q-Day looms, 90% of systems are unprepared for PQC | TechTarget

Bluetooth speakers aren't created equal - here are the ones I'd buy for Prime Day

Congresswoman denies staff used AI to write defense funding amendment

&lsquo;Who cares about GTA 6?&rsquo; I&rsquo;m not bitter about the lack of a PC version, and neither will you be with…

‘Who cares about GTA 6?’ I’m not bitter about the lack of a PC version, and neither will you be with…