Video Coding Benchmarks

Self-invoking code benchmarks help you decide which LLMs to use for your programming tasks

As large language models (LLMs) continue to improve at coding, the benchmarks used to evaluate their performance are steadily becoming less useful. That's because though many LLMs have similar high ...

Computer Weekly

Secure coding benchmark to increase standards among developers

Developer security advocate Secure Code Warrior (SCW) has launched what it claims is the industry’s first benchmark designed to quantify the security competence of its customers’ software developer ...

Hosted on MSN

AI tools expand from coding benchmarks to classroom transparency

On April 27, multiple AI developments showcased how the technology is advancing in both professional and educational contexts. Open benchmarks revealed ChatGPT 5.5’s strengths in short, well-defined ...

Yahoo Finance

Endor Labs Launches Agentic Code Security Benchmark, Finds Top-Performing AI Coding Agents Pass Tests But Still Fail Security

The benchmark extends the Carnegie Mellon SusVibes framework to continuously evaluate leading AI coding agents, updates as new agents and models are released PALO ALTO, Calif., April 15, 2026 ...

InfoWorld

Why benchmarks are key to AI progress

Researchers are racing to develop more challenging, interpretable, and fair assessments of AI models that reflect real-world use cases. The stakes are high. Benchmarks are often reduced to leaderboard ...

Hosted on MSN

The Winners (and Losers) of This New Vibe-Coding Benchmark Will Surprise You

The race for best vibe-coding AI model is neck and neck, according to Vals AI. OpenAI is the new king of vibe coding, according to a newly-released benchmark from AI evaluation startup Vals AI. In a ...

VentureBeat

Microsoft’s GRIN-MoE AI model takes on coding and math, beating competitors in key benchmarks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has unveiled a groundbreaking artificial intelligence model, ...

SiliconANGLE

Study finds newer LLMs introduce more severe coding bugs despite higher benchmark scores

A new report today from code quality testing startup SonarSource SA is warning that while the latest large language models may be getting better at passing coding benchmarks, at the same time they are ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results