Technology

Meta's Muse Spark Closes the Door on Open Source

Meta's new AI model, built by Alexandr Wang's Superintelligence Labs, ships closed-source. The $14.3 billion bet just changed Meta's identity.

By Shaw Beckett·6 min read
Meta AI logo glowing on a dark corporate wall with closed padlock imagery

For more than six years, Meta's pitch to the AI research world was simple: we give away the weights. Llama models seeded startup labs, academic benchmarks, and half the open-source chatbots on Hugging Face. That era ended quietly on April 8, when Meta Superintelligence Labs published the Muse Spark model card without a download link. The company that bet hardest on open weights just shipped its most important model as a closed box.

The timing is not subtle. Ten months after Mark Zuckerberg paid $14.3 billion for a 49% stake in Scale AI and installed its cofounder, Alexandr Wang, as Meta's first-ever chief AI officer, the Superintelligence Labs division Wang runs has produced a flagship model that explicitly walks away from Meta's defining strategic posture. Muse Spark is multimodal, reasoning-capable, and competitive on several leaderboards. It is also, for now, proprietary.

For Meta, a company that has spent years arguing open models were the only way to keep AI accountable, that shift is not a technical footnote. It is an identity change.

What Muse Spark Actually Is

Muse Spark is the first model from the Muse family developed inside Meta Superintelligence Labs, the research unit Zuckerberg built around Wang's hire last summer. According to Meta's model announcement at ai.meta.com/blog/introducing-muse-spark-msl, it is natively multimodal, with support for tool use, visual chain of thought, and a new "Contemplating mode" that orchestrates multiple reasoning agents in parallel.

The capability list reads like a catch-up sheet against OpenAI's GPT-5 and Google's Gemini 3.1 Pro. Visual STEM problems. Entity recognition. Interactive displays for health information. Troubleshooting home appliances from a photo. The model can sketch explanations, annotate diagrams, and generate minigames inside a single conversation. Meta says the underlying pretraining stack is an order of magnitude more compute-efficient than Llama 4 Maverick, its previous flagship.

On paper, that efficiency is the story underneath the story. Llama 4 reportedly burned through unprecedented amounts of GPU time in 2025 without decisively pulling ahead of rivals. Muse Spark is Meta's attempt to rebuild the training pipeline from the substrate up rather than scale a losing architecture further.

The Benchmark Reality Check

Meta's benchmark tables tell a carefully chosen story. Muse Spark's Contemplating mode scores 58% on Humanity's Last Exam, the notoriously difficult frontier benchmark, and 38% on FrontierScience Research, a reasoning test built by physicists and mathematicians. Both are respectable numbers for a debut model from a team that was only assembled last year.

The comparison points are less flattering. On PhD-level reasoning tests, Muse Spark trails Google's Gemini 3.1 Pro by roughly five percentage points (89.5% to 94.3%), according to analysts cited by Fortune. On general reasoning, it sits in a tight pack with Anthropic's Claude and OpenAI's most recent release rather than breaking out in front.

Where Muse Spark clearly pulls ahead is health. Meta says it worked with more than 1,000 physicians to curate training data on clinical topics, and the resulting benchmark scores on health reasoning exceed rival models by a meaningful margin. According to Bloomberg's coverage of the launch, analysts who reviewed the benchmarks said the health-specific scores were the most notable difference from competing general-purpose models, and the area where Meta's physician-curated dataset most clearly paid off.

The health angle also hints at where Meta wants the model to earn its keep, in Messenger conversations, Instagram DMs, and the Ray-Ban Meta glasses that watch what you eat.

Why the Closed-Source Pivot Matters

For six years, Meta's AI team has defended open weights as a moral and strategic good. Yann LeCun, Meta's chief AI scientist, has argued openly that closed-source frontier models concentrate too much power in a handful of companies. When Meta released Llama 2 with a license allowing most commercial use, then Llama 3 and Llama 4 on even more permissive terms, the company positioned itself as the antidote to OpenAI's secrecy.

Muse Spark reverses that posture. No weights. No architectural details. A closed API preview for select partners. Meta's announcement blog includes a carefully worded line that the company "hopes to open-source future versions," but the Muse Spark weights themselves are not coming out.

That is a defining change, and it reflects the internal argument Wang appears to have won. As Axios reported, the closed-model decision was closely identified with Wang himself, who was a public skeptic of unrestricted open-weight releases during his Scale AI tenure, citing misuse risk and competitive leakage. The $14.3 billion price tag Meta paid for him made the shift politically inevitable inside the company, according to reporting from Fortune.

There is also a split inside Meta about how the pivot is being managed. In March 2026, the company created a parallel applied AI engineering unit under Maher Saba, separate from Wang's research lab. Multiple outlets have read that move as a hedge: Wang gets a long runway to build frontier models, while Saba's team is tasked with shipping the Llama-lineage products Meta still sells to developers.

Two researchers comparing a multimodal AI benchmark chart on a large display
Muse Spark's benchmark numbers are strong on health and visual tasks, mixed on pure reasoning.

The $14.3 Billion Hire That Changed Everything

To understand how Meta arrived here, you have to rewind to June 2025. Llama 4 had landed with mixed reviews. Zuckerberg, by multiple accounts, was frustrated that Meta's AI team had fallen behind OpenAI and Google despite spending freely on compute. His response was the most expensive acqui-hire in Silicon Valley history: $14.3 billion for a 49% non-voting stake in Scale AI, plus a package that brought Wang, then 28, to Menlo Park as chief AI officer and head of a new Superintelligence Labs division.

The deal was structured to buy three things at once: Wang, Scale AI's data-labeling pipeline, and the clear implication that Meta would recruit aggressively from Scale's research bench. Within weeks, dozens of senior researchers from OpenAI, Google DeepMind, and Anthropic reportedly accepted packages to join the new unit, many of them at compensation levels that raised eyebrows even in a market already used to seven-figure offers.

What that hiring wave bought is Muse Spark. Ten months from the first Wang-led project meeting to a shipping flagship model is fast by any standard, and the speed is partly why the benchmark numbers do not yet dominate. TechCrunch's Amanda Silberling characterized the launch as "a ground-up overhaul" of Meta's AI stack rather than an incremental upgrade, which is the framing analysts keep returning to. The question is not whether Muse Spark beats Gemini 3.1 Pro today. It is whether the pipeline that produced it in ten months can produce a v2 that does, and whether Meta will have the patience to wait for that.

The $14.3 billion price tag means Meta cannot wait forever. The company's AI capital spending ran above $65 billion in 2025, according to filings, and is projected higher this year. Muse Spark descendants need to drive real ad revenue and subscription conversion inside Instagram, WhatsApp, and the Meta AI app on a meaningful timeline for that spend to look rational. The pressure is visible across the industry: Oracle's 30,000-person layoff wave in March was framed in earnings calls as a direct consequence of reallocating payroll toward AI data center buildouts, and first-quarter tech job losses hit 71,000, with cloud providers pointing to the same capex-versus-headcount tradeoff.

Who Loses If Meta Closes Up

The group with the most to lose from the Muse Spark pivot is not OpenAI or Google. It is the open-source AI community that treated Llama as infrastructure. Hugging Face hosts tens of thousands of Llama derivative models, many of them fine-tuned for specialized tasks at a fraction of proprietary API access costs. Startups like Together AI and Perplexity built substantial businesses on top of Llama weights. Academic labs relied on Llama for reproducible research.

The alternatives, Alibaba's Qwen series and France's Mistral, are credible but do not yet match the compute scale Meta was willing to burn on open releases. If Meta's open-weight output slows or stops, the gravitational center of open-source AI shifts to Shanghai and Paris, which is not a neutral geopolitical fact for a technology Washington has spent three years trying to keep onshore. The Sanders-AOC AI data center moratorium bill and California Governor Newsom's AI executive order have both been justified, in part, as attempts to keep US AI development accountable, and both lose leverage if the most-downloaded open model in the world is suddenly a Chinese product.

A developer looking at an open-source AI model community dashboard with concerned expression
The Llama ecosystem supported tens of thousands of derivative models. Muse Spark's closed release changes that calculus.

Meta has not said what happens to future Llama releases, and the silence is loud. The Llama 5 model that multiple reports suggested was in training as of late 2025 has not been mentioned in any Wang-era communications. Researchers inside Meta's applied AI unit, when pressed by reporters, have stopped giving timelines.

That uncertainty is the real story underneath the Muse Spark benchmarks. For six years, Meta's open-weight strategy was a gravity well that pulled the entire open-source AI ecosystem toward it. Remove that gravity and the field reshapes: Alibaba's Qwen becomes the default open model, Mistral picks up European mindshare, and a meaningful slice of derivative-model builders move to paid APIs because they have no comparable alternative.

The Health Play Is a Tell

Tucked inside the Muse Spark launch is a signal about where Meta thinks AI actually makes money. The 1,000-physician collaboration on training data is not cheap, and it is not glamorous. It suggests the company is aiming squarely at the conversational-health use case: the user who asks an AI about their prescription, their recovery plan, or their diet, and needs answers that will not get Meta sued.

The early benchmark lead on health questions appears genuine. On a battery of clinical reasoning tests, Muse Spark scored roughly 42.8% against rivals clustered in the mid-30s, according to figures published by Fortune. That is not doctor-level performance, but it is the first general-purpose model where health-specific training data appears to have paid off in measurable ways.

It also explains the device strategy. Ray-Ban Meta glasses can see what a user is eating and offer nutritional context in real time. Instagram's shopping flow can surface supplement information without linking out. WhatsApp, already the primary messaging app for billions of health conversations with family, could ship a medical-literate assistant. None of those features are worth building unless the underlying model can be trusted not to hallucinate drug interactions.

Ray-Ban Meta glasses on a kitchen counter next to a plate of food with nutrition overlay concept
Meta's health-focused training data points toward Ray-Ban glasses and WhatsApp as the primary deployment surface.

Whether that bet pays off depends on whether consumers trust any AI to do that job. The Meta brand does not obviously help. Americans have told Pew Research repeatedly that they rank social media companies among the least-trusted stewards of personal data, a reputation that follows the Muse Spark product into every Instagram health conversation and every glasses-based meal lookup. Muse Spark's health benchmarks may be real. The distribution channel still has to earn its reception.

The Real Test

The honest read of Muse Spark is that it is a competent first draft from a team that did not exist a year ago, wrapped in a strategic decision that will define Meta's next decade. The model itself does not displace OpenAI or Google. What it does is mark the end of Meta's posture as the open-source anchor of frontier AI, and the beginning of a straight-line race to ship useful features inside products the company already controls.

The coming months will test whether that pivot was worth breaking Meta's identity for. Wang has a runway, but not an unlimited one. The Llama community is watching for a signal about whether future open releases continue. Advertisers are waiting to see if AI-powered features translate into engagement numbers. And physicians, teachers, and consumers who interact with Muse Spark through a Meta app will decide, in the aggregate, whether a closed model wearing Meta's logo earns the trust the old open-weight strategy tried to buy with transparency.

The $14.3 billion bill is already paid. Muse Spark is the first thing Meta has to show for it. Whether it is the last thing depends entirely on what the Superintelligence Labs pipeline produces next, and how quickly.

Sources

Written by

Shaw Beckett