Elon Musk, Diffusion Models, and the Rise of Mercury

  1. A New Paradigm May Be Forming
  2. Meet Inception Labs and Mercury
  3. How Mercury Works
  4. Inside the Diffusion Revolution
  5. Training and Scale
  6. Performance: 10× Faster, Same Quality
  7. A Historical Echo
  8. What Comes Next
  9. Further Reading

A New Paradigm May Be Forming

In a recent exchange on X, Elon Musk echoed a striking prediction: diffusion models — the same architecture that powers image generators like Stable Diffusion — could soon dominate most AI workloads. Musk cited Stanford professor Stefano Ermon, whose research argues that diffusion models’ inherent parallelism gives them a decisive advantage over the sequential, autoregressive transformers that currently power GPT-4, Claude, and Gemini.

While transformers have defined the past five years of AI, Musk’s comment hints at an impending architectural shift — one reminiscent of the deep learning revolutions that came before it.


Meet Inception Labs and Mercury

That shift is being engineered by Inception Labs, a startup founded by Stanford professors including Ermon himself. Their flagship system, Mercury, is the world’s first diffusion-based large language model (dLLM) designed for commercial-scale text generation.

The company recently raised $50 million to scale this approach, claiming Mercury achieves up to 10× faster inference than comparable transformer models by eliminating sequential bottlenecks. The vision: make diffusion not just for pixels, but for language, video, and world modeling.


How Mercury Works

Traditional LLMs — whether GPT-4 or Claude — predict the next token one at a time, in sequence. Mercury instead starts with noise and refines it toward coherent text in parallel, using a denoising process adapted from image diffusion.

This process unfolds in two stages:

  1. Forward Process: Mercury gradually corrupts real text into noise across multiple steps, learning the statistical structure of language.
  2. Reverse Process: During inference, it starts from noise and iteratively denoises, producing complete sequences — multiple tokens at once.

By replacing next-token prediction with a diffusion denoising objective, Mercury gains parallelism, error correction, and remarkable speed. Despite this radical shift, it retains transformer backbones for compatibility with existing training and inference pipelines (SFT, RLHF, DPO, etc.).


Inside the Diffusion Revolution

Mercury’s text diffusion process operates on discrete token sequences x \in X. Each diffusion step samples and refines latent variables z_t that move from pure noise toward meaningful text representations. The training objective minimizes a weighted denoising loss:

L(x) = -\mathbb{E}t [\gamma(t) \cdot \mathbb{E}{z_t \sim q} \log p_\theta(x | z_t)]

In practice, this means Mercury can correct itself mid-generation — something autoregressive transformers fundamentally struggle with. The result is a coarse-to-fine decoding loop that predicts multiple tokens simultaneously, improving both efficiency and coherence.


Training and Scale

Mercury is trained on trillions of tokens spanning web, code, and curated synthetic data. The models range from compact “Mini” and “Small” versions up to large generalist systems with context windows up to 128K tokens. Inference typically completes in 10–50 denoising steps — orders of magnitude faster than sequential generation.

Training runs on NVIDIA H100 clusters using standard LLM toolchains, with alignment handled via instruction tuning and preference optimization.


Performance: 10× Faster, Same Quality

On paper, Mercury’s numbers are eye-catching:

BenchmarkMercury Coder MiniMercury Coder SmallGPT-4o MiniClaude 3.5 Haiku
HumanEval (%)88.090.0~8590+
MBPP (%)76.677.1~75~78
Tokens/sec (H100)110973759~100
Latency (ms, Copilot Arena)25N/A~100~50

Mercury rivals or surpasses transformer baselines on code and reasoning tasks, while generating 5–20× faster on equivalent hardware. Its performance on Fill-in-the-Middle (FIM) benchmarks also suggests diffusion’s potential for robust, parallel context editing — a key advantage for agents, copilots, and IDE integrations.


A Historical Echo

Machine learning has cycled through dominant architectures roughly every decade:

  • 2000s: Convolutional Neural Networks (CNNs)
  • 2010s: Recurrent Neural Networks (RNNs)
  • 2020s: Transformers

Each leap offered not just better accuracy, but better compute scaling. Diffusion may be the next inflection point — especially as GPUs, TPUs, and NPUs evolve for parallel workloads.

Skeptics, however, note that language generation’s discrete structure may resist full diffusion dominance. Transformers enjoy massive tooling, dataset, and framework support. Replacing them wholesale won’t happen overnight. But if diffusion proves cheaper, faster, and scalable, its trajectory may mirror the very transformers it now challenges.


What Comes Next

Inception Labs has begun opening Mercury APIs at platform.inceptionlabs.ai, pricing at $0.25 per million input tokens and $1.00 per million output tokens — a clear signal they’re aiming at OpenAI-level production workloads. The Mercury Coder Playground is live for testing, and a generalist chat model is now in closed beta.

If Musk and Ermon are right, diffusion could define the next chapter of AI — one where text, video, and world models share the same generative backbone. And if Mercury’s numbers hold, that chapter may arrive sooner than anyone expects.


Further Reading

  • Stefano Ermon et al., Diffusion Language Models Are Parallel Transformers (Stanford AI Lab)
  • Elon Musk on X, Diffusion Will Likely Dominate Future AI Workloads
  • Inception Labs, Mercury Technical Overview (2025)

How Direct File is Quietly Redefining Government Software

In a rare but powerful move, the U.S. government has open-sourced one of its most impactful digital public services: Direct File, a platform that allows taxpayers to file their federal returns electronically—completely free of charge and without third-party intermediaries.

At a glance, Direct File might seem like just another government web form. But beneath the surface lies a thoughtfully engineered system that’s not just about taxes—it’s a case study in modern government software, scalable infrastructure, and user-first design.

Let’s break it down.

🧾 What is Direct File?

Direct File is a web-based, interview-style application that guides users through the federal tax filing process. It works seamlessly across devices—mobile, desktop, tablet—and is available in both English and Spanish.

Built to accommodate taxpayers with a wide range of needs, it translates the complexity of IRS tax code into plain-language questions. On the backend, it connects with the IRS’s Modernized e-File (MeF) system via API to handle real-time tax return submissions.

🧠 The Tech Stack: Government Goes Modern

The project reflects a significant leap forward in how federal systems are built and deployed.

  • Fact Graph: At the heart of Direct File is a “Fact Graph”—an XML-based knowledge graph that smartly handles incomplete or evolving user information during the filing process.
  • Programming Stack:
    • Scala for the logic and backend (running on the JVM)
    • Transpiled to JavaScript for client-side execution
    • React frontend in the df-client directory
    • Containerized for Speed: Docker is used for seamless local deployment.
      • This spins up the backend (port 8080) and Postgres DB (port 5432).
  • Modular Architecture:
    • fact-graph-scala: Core tax logic
    • js-factgraph-scala: Frontend port
    • backend: Auth, session management
    • submit: MeF submission engine
    • status: Monitors submission acknowledgments
    • state-api: Bridges federal and state systems
    • email-service: Handles user notifications

🤝 Built by Public Servants, Not Contractors Alone

Unlike many large-scale federal tech initiatives, Direct File was created in-house at the IRS, in partnership with:

  • U.S. Digital Service (USDS)
  • General Services Administration (GSA)
  • Contractors like TrussWorks, Coforma, and ATI

This hybrid structure ensured agile execution while maintaining strong public stewardship.

🔒 Security Without Obscurity

Despite being open source, Direct File excludes any code that touches:

  • Personally Identifiable Information (PII)
  • Federal Tax Information (FTI)
  • Sensitive But Unclassified (SBU) data
  • National Security Systems (NSS) code

This reflects a disciplined balance between transparency and trust—one that more government software projects should emulate.

📜 Legal Framework

Direct File is anchored in a suite of progressive digital policies:

  • Source Code Harmonization And Reuse in IT Act of 2024
  • Federal Source Code Policy
  • Digital Government Strategy
  • E-Government Act of 2002
  • Clinger-Cohen Act of 1996

Together, these policies mandate that custom-developed government software should be shared and reused, not siloed.

💡 Why This Matters

Direct File represents a milestone for civic tech, open government, and digital service delivery:

✅ 1. Open Source, Real Impact
It’s not often we see real, working government platforms open to inspection and reuse. This invites contributions from civic technologists and helps other governments learn from U.S. innovation.

🧩 2. Designing for Complexity
Converting complex tax logic into user-friendly language—using a structured knowledge graph—is a pattern applicable well beyond taxes (think healthcare, benefits, or housing).

🛠️ 3. Engineering Innovation
The Fact Graph and modular backend architecture reflect best practices in modern backend design—resilient, flexible, and portable.

🔐 4. Trust and Privacy by Design
The selective code release shows how governments can be open while still securing sensitive systems.

🌐 5. Interoperability with State Systems
The state-api integration is especially forward-thinking. It could pave the way for smoother federal–state collaboration in everything from benefits to compliance.

🚀 Getting Started with the Code

Want to explore it locally? You’ll need:

  • Java
  • Scala
  • Maven
  • SBT
  • Coursier
  • Docker

Then: 1. Clone the repo

2. Run:

docker compose up -d --build

3. Navigate to /direct-file/df-client

4. Run:

npm run start

Frontend runs at http://localhost:3000

Final Thoughts

Direct File shows that government software doesn’t have to be clunky, slow, or hidden. With the right talent and commitment, it can be modern, secure, and open.

This project is not just about taxes—it’s about showing the public sector what’s possible when we build with purpose and publish with pride.

📎 Explore the repo (once public): github.com/irs-directfile

📩 Want to build something similar? Contact me

Reblog: Can muscle fibers grow without being activated?

Chris Beardsley·Follow5 min read·2 hours ago–ListenShareAfter publishing my last short article, I received some feedback from friends and colleagues notifying me that one of the reasons that many people are struggling with the use of the principle of neuromechanical matching for governing exercis

from Pocket
via Did you enjoy this article? Then read the full version from the author’s website.

Reblog: What Startups Are Really Like

Paul Graham’s essay on what startups are really like:

I wasn’t sure what to talk about at Startup School, so I decided to ask the founders of the startups we’d funded. What hadn’t I written about yet? I’m in the unusual position of being able to test the essays I write about startups.

from Pocket
via Did you enjoy this article? Then read the full version from the author’s website.

Reblog: Entropy: Why Life Always Seems to Get More Complicated

This pithy statement references the annoying tendency of life to cause trouble and make things difficult. Problems seem to arise naturally on their own, while solutions always require our attention, energy, and effort. Life never seems to just work itself out for us. If anything, our lives become more complicated and gradually decline into disorder rather than remaining simple and structured.

Why is that?

Murphy’s Law is just a common adage that people toss around in conversation, but it is related to one of the great forces of our universe. This force is so fundamental to the way our world works that it permeates nearly every endeavor we pursue. It drives many of the problems we face and leads to disarray. It is the one force that governs everybody’s life: Entropy.
from Pocket

via Did you enjoy this article? Then read the full version from the author’s website.

Unpackaging: 360˚ Video and Real-time CG Elements Compositing in Unity

At Unity Vision Summit a couple days ago, Unity announced that 360˚ video compositing will be available in “Unity2017”. Unity2017 is the next stable release of their engine, said to focus on artists and designers.

With this new feature in Unity engine, anyone can add graphics effects such as lens flares, digital animations, and interactivity in real-time to a video. The presenter Natalie Grant, a Senior Product Marketing Manager @Unity of VR/AR/Film, said one of the most important aspects about VR is that it achieves the “feeling like you are actually there.” She continued, “These are a few small ways to make a regular 360˚ video interactive and immersive”.

The purpose of this post is to explain that I posit that consumers and creators alike will learn more about computer graphics topics like (General Purpose GPU usage) and virtualization of the real-world. WebVR and 360˚ content on laptops and phones will “bring people up the immersion curve” as Mike Schroepfer says. This approach where content is composed of 360˚ video and real-time 3D model content will contribute to that.

 

What is compositing?

Compositing is combining of visual elements from separate sources into single images.

How it’s achieved in Unity with 360˚ videos

As described in the talk*, 360˚ composited with real-time CG elements is essentially two spheres in a scene and 360˚ videos on the interior of each sphere with a camera at the shared center point. To explain, imagine the layers of the Earth as an analogy.

earth_analog

The inner core is essentially the user’s head or main camera looking around the environment. The outer core is the first 360˚ video player with a shader** applied to it masking some of the video but not all of it. Skipping the lower mantle temporarily, the upper mantle is where the second 360˚ video player is. This upper mantle is showing the same 360˚ video as the inner player but normally, without a shader. In between, the lower mantle is where users can now place digital animations, 3D objects, and UI elements that are interactable. This is where all the magic happens–specifically because the space between the two concentric 360˚ video player spheres allows for CG content to really seem in the scene. Both copies of the composited 360˚ video are exactly in alignment–meaning that as long as the user’s view position is confined to 3DoF, the user can’t tell there are two copies of this video file playing when viewing this.

Finally, for more immersion, the Crust and the rest of Space, is also a layer where any Unity object can be positioned.

Natalie shows this is important in the use case of matching a Unity directional light source to the position, direction, and intensity of the sun as captured in a 360˚ video. This means that because of Unity’s physics based rendering, the CG elements (in the lower mantle or outer crust with standard shaders) should have shadows, color, reflections and more produced in a realistic way (because they are affected by the light source). This increases the effectiveness of the illusion that footage and real-time elements are composited.

Another way to think about this approach is as the Russian nesting dolls of spheres (credit: Ann Greenberg). In this comparison, each doll corresponds to a 360˚ video sphere, and just like the dolls, the spheres are concentrically nested and aligned with the same rotation.

russiandolls

As demonstrated by Natalie on stage, when done deliberately enough the 3D content will actually look like it’s in the camera-captured shot. Creating the illusion that 3D objects are occluded or hidden behind the inner sphere playing video (see below a 3D dinosaur moving behind the first 360 video).

ujyxCC.gif

In the short-term, I think this will help people engage with more 360˚ video content and potentially excite people about mixing camera captures and virtual content.

 

When demonstrating “locomotion in 360˚ video”, Natalie Grant of Unity showed that one can click to move to another 360˚ video. For starters, this isn’t exactly like movement with teleportation in a completely digital environment, where one can teleport anywhere a pointer collides with a plane. Remember that the creative behind the project must capture each 360˚ video using a camera and tripod, and that’s still a constraint on the freedom of choice for location. However, with potentially a lot less work, the creative can begin making compelling 360˚ video experiences with an interactive component (i.e. switching the 360˚ video) and layers of spatially accurate CG objects.

Also at this year’s F8 developer conference Facebook announced a new camera, the Surround 360 video camera, that will let users move around inside live-action scenes. The product can infer 3D point and vector data of its surroundings using overlapping video image data from adjacent cameras. So a reasonable implication is that we may even have 6 DoF live action scenes eventually with CG elements composited***.

However, I’d imagine that blind spots would exist once a user has moved significantly from the original center of the two spheres, and that will also impact the integrity of the illusion that both CG and video are composited.

 

I look forward to seeing some creative applications of this method.

*found at the 35-minute mark https://www.youtube.com/watch?v=ODXMhaNIF5E

** A Unity Standard Shader attempts to light objects in a “physically-accurate” way. This technique is called Physically-Based Rendering or PBR for short. Instead of defining how an object looks in one lighting environment, one needs only to specify the properties of the object (e.g. how metal or plastic it is).

Then, the shader computes its shading based on those factors.

***The original Surround 360 was estimated to cost about $30,000 at using the company’s exact schematics.

Three Trending VR Topics from GDC and Unity Updates

Three Trending VR Topics from GDC

The following is a synthesis of a talk given by Greenlight Insight’s Alexis Macklin and Unity’s Tony Parisi along with my own experiences at Oculus Developer Day 2017. The notes on VR experiences are important as the industry grows because we can reflect on what strategies are yielding better experiences for our users.

1) Complex Story-Telling

Dear Angelica and Why it Was Ground-Breaking

  • Creators at Oculus Story Studio used a combination of Houdini (cinema), UE4 (games), and Quill (art)
  • Use of Houdini in the flow of VR development is also illustrated really well by Mike Murdoc, Creative Director at Trihelix VR
  • Mike uses Houdini to permute and design VR interfaces check out what I mean.

2) Locomotion

  • Design Standards Don’t Exist Yet The style of teleportation or varieties of movement that are considered correct are still just beginning to be explored and developed––don’t expect to see a consensus on a design strategy
  • An indicator of Motion Sickness Many VR developers identified however that movement in the user’s peripheral vision is a good indicator of motion sickness. So if the user was teleporting from point A to point B and that user saw something moving in the corner of his or her eye that is a scenario you can expect will yield some sickness. This is controllable, outside of individual tolerances
  • Accessibility How to bring VR to those that can’t use VR right now, was a big theme. Different control schemes are to be developed throughout the year

3) Social VR Experiences

  • Sony VR bringing games and multiplayer activities to users. Eye and mouth movements are said to evoke the most emotional appeal out of the users. Do not approach the uncanny valley––in short, this means to stick to surrealism and cartoony type of avatars––the best way I can explain this is with Bitmoji. If you have created one of them, you probably understand, that they are not created to look exactly like you. They are an abstract that might share a similar skin tone to you or a similar clothing style; and, it’s intentionally just enough to create resemblance and purposefully doesn’t go too close to your real image.
  • Simply put, it can become a little messy as your brain develops a picture around all of the minute details that might be wrong about the representation as opposed to a clearly abstract example of a person

Examples

Robo Recall by Epic Games

  • Subject: Similar to Destiny or Call of Duty––stop bad guys
  • Lacked an overarching plot intentionally
  • Users are forced to explore new interactions and environments
  • Hopes to have longer play times Stevi Rex and Alexis both note that for them it’s the first VR title that left them wanting to play more

Sprint Vector by Survios

  • Subject: Run as fast as you can at top speeds to reach the finish line
  • Previously released Raw Data (made $1 million in a month)
  • Throws locomotion standards out of the window and asks you to swing your arms with tracked controllers as if you were on an elliptical machine
  • A lot of people were expecting motion sickness but so far, there has been positive reviews
  • Many many videos of people racing can be found online

Bebylon by Kite and Lightning

  • Subject: Futuristic baby battles
  • Currently, this is in a closed beta
  • 2 different levels of social interaction one is amongst other players and another amongst spectators––it is cross-platform
    • PC VR
    • Mobile VR
    • Console VR
    • Youtube (includes spectator-mode)
    • Twitch (includes spectator-mode)

From Other Suns by Gunfire Games

  • Subject: A four-player RTS in which you fight and try to save humanity
  • A unique movement mechanic combines stepped teleportation with a switch from first-person–you are your character–to third-person–you are controlling your character–to guide your character to his or her next location
  • They have also employed accessibility well so you can choose to opt out of the aforementioned mode of transportation if you’re more comfortable with VR
  • Finally, comfort turning is displayed well in the app–this is where the player can rotate at no less than a 20-degree angle increment.
  • Oculus is the publisher for this title

Brass Tactics by Hidden Path Entertainment

  • Subject: A real-time strategy game that asks you to engage your enemy and conquer territories
  • Most notable perhaps is the paint-brush-stroke-like style that you are encouraged to corral your troops and send them to other territories with the Oculus Touch controllers
  • This is a social app as well, and you can see your opponent across the expanse of the game board in front of you
  • I felt an awesome thrill and palpable sensation of pressure to guard my already captured territories before the game clock ran out–which is undoubtedly because of the fact that you have another player in view

Unity Updates – featuring Tony Parisi, Global Head of VR/AR @ Unity

At the end of the month Unity 5.6 will be released

  • Physics-based rendering and lighting––yielding much more realism
  • Significant optimization and latency reduction with single-pass rendering for mobile
  • Vulcan Support with Unity 5.6

It’s clear that Unity brings a lot of different toolsets together… namely, the gaming and cinematic arts sectors seem to be bridged by Unity

How can those individuals working on VR––coming from Animator and Photo-capture backgrounds––use Unity to optimize their VR experiences and would you please share a few successful examples?

  • Unity is super excited about the cinematic use of Unity for storytelling, that isn’t about leveling up or gaining points. Rather, you have environments where you can explore and video game technology is used to tell stories.
  • This also ties into Unity’s big focus on enabling designers and artists more in the year 2017–as Unity has been more of a programmer’s tool. Coming to the beta of Unity 2017 (will be in beta later this year), there is a feature called Timeline which is all about a keyframe and timeline based animation system in which you can bring in keyframe 3d graphics, skin characters, audio, and video, and even synchronize them all in a linear timeline.

Now there’s a video player that supports 360-degree playback. Also supported is 4K video playback, ingest a 360-degree video and then augment that video for enhanced viewing.

Non-Gaming Examples #MadeWithUnity

Asteroids by Baobab

  • Asteroids is a follow-up to their popular “Invasion” (you can find Invasion on Daydream and GearVR). It’s Pixar quality, looks like a feature film quality piece. They have some novel locomotion mechanics, and you play a supporting character where you have to participate to move the story along. A great full, end to end, narrative where there’s only one way the story ends.

A Price of Freedom by Construct Studios

  • You play a secret agent, inspired by the MK Ultra experiments that the CIA was doing–to use psychedelics to create operatives.

The Life of Us by Within

  • Chris Milk’s shop down in LA, amazing breakthrough-VR-story-telling
  • They created a social VR experience here where you start out as a single-celled organism and move up to larger organisms (fish, primates, human, and finally a futuristic robot) and you collaborate with one other person
  • Use your Vive controllers to swim and fly
  • Tony says he’s never felt so embodied in a VR experience and attributes it largely to the nature of social interaction within the experience

Zero Days VR by Scatter

  • A Brooklyn-based shop focused on Cinematic VR
  • They combine, video, audio, CG, data-visualization, and a completely linear timeline, all synchronized using the timeline product
  • It takes you through a VR version of a film documentary on the U.S. and Israeli intelligence agencies trying to sabotage an Iranian nuclear facility
  • It integrates interviews and voice over and is super compelling human-interest and high drama where you can move around with the Oculus Rift

 

In Conclusion

Unity Focus Remains on Core Graphics and Physics

  • The upshot is that the core functionality of Unity will continue to serve clients producing all categories of content
  • There is a long, growing list of customers that are using the engine for non-gaming needs and that informs where the product will go in the future
  • PSVR is pushing a million units shipped, and Vive and Rift are in the small hundreds of thousands of units active respectively
  • Mobile – Cardboard is in the tens of million and the more deluxe drop in headsets with Daydream and GearVR are over 5 million units – the development here is catalyzing the growth of the overall VR industry scale
  • However, you cannot replace what the higher-end systems are doing in terms of interaction and room scale
  • The hope is that the two trends–Mobile and High-End–converge
  • 30+ platforms are supported by Unity which is one of the biggest draws as a development platform for Unity

Snapchat’s Latent AR Strategy

I recently shared that Sony will be debuting a 360˚ ad with Snapchat. The following is a research report put together in June by, Matt Terndrup, and covers the potential AR strategy at Snapchat.

 

snap-back-dshah

My Snapcode

 

 

Snapchat’s approach––which is continuously educating the market with novel app structure, and an interface that has few instructions is appealing. This report does not include that Facebook’s Instagram recently included Snapchat’s Story functionality, and I have written up some of my thoughts on how that has changed the way I used Instagram personally [to come separately].  As well as Snapchat’s most recent acquisition of Vurb, whose mission is to create a smarter, more connected mobile world that empowers people to do more of what they want all in one app.
The highlights include:
  • The features of Snapchat that use AR already
  • Ahead of the curve: future success that will be driven by talent from Oculus, Microsoft Hololens, Engineering at Qualcomm on the Vuforia Team, and Emblematic Group
  • How small hints of Snapchat building stylish AR glasses have surfaced
Here is the table of contents, to read more check the Scribd link at the end of the post.
Table of Contents:
  • Userbase
  • Snapchat’s AR History
    • Overlays
    • Lenses
    • 3D Stickers
  • Acquisitions
    • Vergence Labs
    • Looksery
    • Seene
  • Public Spotting
  • Engineering and Research Talent
  • Patent
  • Forecasts
  • Conclusion