It's awkward watching agents use computers
We need a new OS built for agents
I had lunch with my friend Ronny at work a few weeks ago and we were (shocker!) talking about AI. Specifically, we were talking about how AI is absorbing more of the daily computing tasks that used to require a person sitting at a screen. Scheduling, research, drafting, monitoring, summarizing. The list grows every month.
That led us to a question: if AI agents are becoming the primary operators of computing systems, what happens to all the infrastructure we built so that humans could operate them?
Because that’s what operating systems actually are. They’re abstraction layers. Decades of engineering spent translating machine reality into something a human mind can understand. File systems, because humans think in containers and locations. Graphical interfaces, because humans perceive visually. Process scheduling, because humans experience time linearly. Every design decision in Unix, Windows, Android, ChromeOS and MacOS encodes endless assumptions about who’s sitting at the controls.
But what if the operator we’re building for is no longer a human?
15,000 tokens per second
Let’s take a quick aside. There is a company called Taalas who has recently built an app called Chat Jimmy. It’s a demonstration of their main business: building AI models compiled directly into custom silicon. Not software inference running on a general-purpose chip. The model is in the computer.
Chat Jimmy operates consistently above 15,000 tokens per second. For context, human speech runs at roughly 3 tokens per second. At 15,000 tokens per second, an agent can reason through a complex problem, take multiple actions, and produce a result before a human has finished reading the first sentence of the last response. The inference bottleneck for most computing tasks totally disappears.
If we can get hardware of this nature to a place where local models operate at similar speeds, something shifts with on-device models. They become the computer. And we don’t have an OS built for that world yet.
The big inversion
It won’t be trivial to get 15k tokens per second running on a device you keep on your desk, but it will happen. And it’ll signal a shift in how we work. At current inference speeds, agents wait on compute. The OS doesn’t matter much because the model is the slow part. Get the model fast enough and the calculus flips. The agent isn’t waiting on the model. The model is waiting on the human. I already see some of this happening today.
Humans will be the rate-limiting constraint on our systems.
That’s a completely different world to design for. An operating system doesn’t need to be optimized for human-speed interaction anymore. It needs to be optimized for machine-speed agents that are supervised by humans who still move at human speed. Safety, reversibility, and trust enforcement become the central design problem.
I don’t think anyone has asked what an OS designed for this user inversion would look like yet.
A sketch of the stack
I’ve been thinking about this question and sketching out an architecture that could be interesting. By no means is this thoroughly thought through, so think of it like an outline for what might work. If you’ve thought about this as well or have some ideas, reach out — I’d love to chat.
Layer 0: Linux kernel. Start here and keep it. It handles hardware, networking, and process isolation. These are solved problems not worth re-solving. You can add eBPF programs on top to enforce capability rules at the kernel boundary. No kernel fork required.
Layer 1: Hardware model. An on-device LLM minted into silicon that’s always on, always available, near-zero latency. The runtime talks to this chip in natural language. That’s not a poetic choice. If the model is the chip and human language is its instruction set, then system calls are just English. Every operation in the audit log is human-readable by default. LLMs printed into chips will have knowledge gaps, so this layer gets augmented with an online model for some tasks — but we’ll handle that later.
Layer 2: Agent runtime. This is the interesting layer. It replaces all of traditional user space. It’s the part of the stack that exists to mediate between the machine and a human user. Instead of mediating for humans, it manages agents: their context windows, their tool access, their permissions, their lifecycle.
A few key jobs it handles:
Context management. An agent’s working memory is its context window, not just RAM. The OS manages context as a first-class resource — what gets evicted, summarized, or persisted across sessions.
Tools instead of syscalls. Instead of a syscall table, a tool registry. Agents call tools. The registry knows what each tool does, whether it’s reversible, what it costs, and whether human approval is required before execution.
Capabilities instead of permissions. Unix permissions are a file ownership model. That’s not the right primitive for agents. We need something closer to structured, delegatable, auditable capabilities.
Reversibility as infrastructure. Right now, every application that wants undo has to build it from scratch. An agent OS should make reversibility a first-class primitive. A staging area for irreversible actions that waits for explicit human approval before executing. At 15,000 tokens per second, mistakes accumulate fast.
Layers 3 and 4: Specialist models and agents. Specialist models load on demand above the runtime — like kernel modules for specific domains: code, vision, planning. Above that, agents run as goal-oriented processes rather than session-bound apps. This is where we can add online models to overcome the natural knowledge gap of an LLM printed to silicon.
I’ve seen this movie before
One of my first jobs at Google was helping build ChromeOS. I know a little about what it takes to rethink an operating system. ChromeOS asked: what could an OS be if the web browser were the only app? The answer was a much more efficient system. Strip away 40 years of complexity built to run native applications. Boot in 8 seconds. Let the browser do everything else.
Today we’re in a similar spot, asking a similar question: what would an OS need to be if you built the whole thing for AI agents instead of people?
I work next to people building the next generation of operating systems and I haven’t even asked them about this. Partly because what I’m sketching is a thought experiment, and I didn’t want anyone else’s answers to constrain mine. The Android and ChromeOS teams have something in development called Aluminum — I don’t know what’s in it, and this isn’t that.
What I do think is clear: every company building an OS right now is trying to figure out how to add AI into it. But the answer we’re looking for isn’t an existing paradigm with AI bolted on. It isn’t ChromeOS or MacOS or Android with a chat window. It’s something designed from scratch around the assumption that the thing doing the computing is an agent, and the human is there to supervise and interact with it — not to drive all the details of making it work.
That’s a different OS than anything that exists today.
Why does this matter now?
The window is specific to this moment. Every day another AI system announces more capabilities for computer use: schedules, remote access, browser automation, skills, MCP servers. All of these systems designed so agents can pretend to operate like people. We’re forcing computers to pretend to behave like humans in order to do things on computers. It’s a strange layer of friction we keep injecting because we’re using a wrench to hammer a nail.
At the same time, local inference is still too slow to run capable agents on-device for everything a modern OS relies on. Even the newest Qwen models on a tricked-out Mac Studio fall short of what we need for agents to run all of the details of these systems continuously. Until we close that gap, agents live in the cloud, on human-designed infrastructure, using agent harnesses to drive systems built for people.
Before long these constraints will be gone. Chips will hit 15,000 tokens per second and will run efficiently on your laptop. Then on your phone. Then on your headphones. Local capable agents will become practical on every device. And everyone who wants to run those agents needs an OS built specifically to maximize that experience, or we’ll keep absorbing friction translating from computer to human and back to computers.
When that OS exists, it will shape what agents can do and how safe they are. If we keep heading down the road we’re on with defaults of agents running on Linux and Windows — systems with no concept of a tool registry, no capability model for autonomous action, no staging area for irreversible decisions — we’re headed for problems.
The time to design this is before the hardware arrives. After it arrives, you’re retrofitting. Or maybe the agents will just do it themselves.
What’s left to ponder?
Because I’m using this as a way to write-to-think, I’m very much missing a lot of detail. A few gaps I’m genuinely curious about:
The Tensor question. Google’s Tensor chips already do pretty impressive on-device Gemini Nano inference. Is it close enough to what Layer 1 needs, and how well-suited is this approach to being what we want exposed to software? This matters for whether we’re talking about new silicon or new software on existing silicon.
The safety research. The capability and reversibility model I’ve sketched is conceptually interesting, but I don’t know how much formal work exists on verifying capability systems for LLM agents. This is probably a known problem in security research that I haven’t fully worked through yet.
The device form factor. A personal AI server — a small always-on box that runs your agents locally — seems like a natural first hardware expression of this. But I’m not sure if that’s the right starting point, or if this begins as a server product and migrates to the edge over time. What do you think?
I find this compelling enough to think more about, which is why I wrote it down.
If you’re working on any part of this stack — hardware, runtime, capabilities, reversibility — I’d love to know. And if I’ve missed something obvious, tell me. What’s the right device form factor for this? Does it start in your living room, or in a data center?
— T






