• The Recap AI
  • Posts
  • Leaked benchmarks reveal Grok 4's huge leap

Leaked benchmarks reveal Grok 4's huge leap

PLUS: The chip war's new front, Mozilla's browser-based AI agents, and Tesla's Robotaxi goes viral

Good morning, AI enthusiast.

Unconfirmed benchmarks for xAI's upcoming Grok 4 model have surfaced online, suggesting a major leap in performance that could rival top models from OpenAI and Google.

The model reportedly nearly doubles the performance of current systems on difficult tests. If these leaks prove accurate, is the AI race about to be seriously disrupted by a new top contender?

In today’s AI recap:

  • Grok 4’s leaked benchmark scores

  • The chip war’s new front

  • Mozilla’s browser-based AI agents

  • Tesla’s Robotaxi goes viral

Grok 4's Bombshell Benchmarks

The Recap: Unconfirmed benchmarks for xAI's upcoming Grok 4 model have circulated online, suggesting a major leap in performance that could rival top models from OpenAI and Google.

Unpacked:

  • Grok 4 reportedly scored a stunning 45% on the notoriously difficult Humanity's Last Exam (HLE) benchmark, nearly doubling the performance of current models.

  • Beyond HLE, the model also shows top-tier results on other key tests, scoring 87-88% on GPQA and an impressive 72-75% on the SWE coding benchmark.

  • The leak follows an announcement from Elon Musk about a "significant" improvement to Grok, fueling speculation about an imminent official release.

Bottom line: If these benchmarks hold up, Grok 4 represents a serious challenge to the dominance of OpenAI and Google in the AI race. This leap in capability suggests the pace of AI progress is not slowing down, pushing the entire industry toward even more powerful models.

The Chip War's New Front

The Recap: The Trump administration is preparing to restrict exports of advanced AI chips to Malaysia and Thailand. This move aims to prevent semiconductors from being smuggled into China, further escalating the global tech rivalry.

Unpacked:

  • The new controls are designed to close a suspected loophole, as the U.S. has already banned direct sales of its most powerful AI chips to China over national security concerns.

  • Nvidia CEO Jensen Huang pushed back against the plan, stating there is no evidence of chip diversion and noting the massive size of AI systems makes them difficult to smuggle.

  • This development puts major tech companies on alert, as firms like Oracle, Microsoft, and Google have been pouring billions into building new data centers across the region.

Bottom line: The potential restrictions mark a new phase in the U.S. strategy to contain China's technological rise, shifting focus to critical supply chain hubs. For companies operating in Southeast Asia, navigating geopolitical tensions just became a much higher priority.

AI Training

The Recap: In this video, we walk through how to use n8n, firecrawl, and rss.app to scrape virtually any piece of web content and transform it into LLM-ready output.

P.S We also launched a free community for AI Builders looking to master the art and science of building AI Automations — Come join us!

AI Agents Invade The Browser

The Recap: Mozilla.ai has released the Wasm Agents blueprint, an open-source project that allows AI agents to run directly inside a web browser. This approach simplifies development by packaging agents into single HTML files, eliminating complex setup processes.

Unpacked:

  • The framework uses WebAssembly to package Python-based agents into self-contained HTML files that execute entirely within a sandboxed browser tab.

  • This makes creating and sharing agents much easier, as developers can bypass local dependency installations and run code directly on their devices using tools like Pyodide.

  • It supports both major cloud models and local LLMs through compatible APIs, allowing you to run agents offline with services like Ollama for enhanced privacy.

Bottom line: This marks a significant move toward making AI agents more portable and accessible by shifting development from the backend into the browser. It empowers developers to quickly prototype and share agentic workflows, which could accelerate innovation across the open-source community.

Where AI Experts Share Their Best Work

Join our Free AI Automation Community

Join our FREE community AI Automation Mastery — where entrepreneurs, AI builders, and AI agency owners share templates, solve problems together, and learn from each other's wins (and mistakes).

What makes our community different:

  • Real peer support from people building actual AI businesses

  • Complete access to download our automation library of battle-tested n8n templates

  • Collaborate and problem-solve with AI experts when you get stuck

Dive into our course materials, collaborate with experienced builders, and turn automation challenges into shared wins. Join here (completely free).

Robotaxi's Mainstream Moment

The Recap: Tesla’s Robotaxi service is capturing mainstream attention through viral social media videos showing the cars performing complex, everyday tasks with impressive ease.

Unpacked:

  • The clips showcase relatable missions, including a late-night food run where a Robotaxi seamlessly transported a user to and from a fast-food restaurant.

  • Beyond just driving, the service demonstrates advanced maneuvers like autonomously parking in a busy retail lot, a common pain point for human drivers.

  • Users are highlighting the “seamless” experience, signaling that the technology is becoming practical for real-world, daily use.

Bottom line: These tangible demonstrations are quickly moving autonomous driving from a futuristic concept to a present-day reality in the public eye. Successfully navigating everyday suburban life is becoming the new benchmark for leadership in the autonomous vehicle space.

The Shortlist

Publishers filed an EU antitrust complaint against Google, alleging its AI Overviews feature harms their businesses by reducing traffic and misusing their content for summaries.

The European Commission confirmed it will stick to the original implementation timeline for its landmark AI Act, dismissing calls from major tech companies for a pause.

Netcraft found that major LLMs recommend the wrong URL for large brands 34% of the time, creating a significant new attack vector for phishers to register the incorrect domains.

Baidu launched MuseSteamer, an image-to-video model capable of generating 10-second, 1080p cinematic clips with synchronized dialogue and sound effects from a single image.

What did you think of today's email?

Before you go we’d love to know what you thought of today's newsletter. We read every single message to help improve The Recap experience.

Login or Subscribe to participate in polls.

Signing off,

David, Lucas, Mitchell — The Recap editorial team