Schrödinger's AI
Constitutions and think pieces simply obfuscate the reality that the system is what comes out when a human interacts with it.
Some people read a book before they fall asleep. I play a game on my tablet called Zombie Waves.
It’s the genre you already know: top-down shooter, hordes of the dead, a little hero you fatten over time with armor and bigger guns and a camp you upgrade between runs. Alongside the main campaign there are side missions, and one of them is a tower; you ascend it floor by floor, and the climb is tedious, but the loot at the top is worth the grind. So I ground it. And somewhere around the fortieth floor, half-asleep, I noticed three things at once.
The floors are tuned so you’re never quite in danger of dying. Clearing a floor auto-advances to the next one after a three-second pause. And your targeting radius, after eight or ten levels of upgrades, expands to cover nearly the entire floor, which means you don’t have to move; the monsters walk into range and die, and the game advances itself, and the next floor begins. I set the tablet in its stand on the nightstand. I stopped touching it. It kept playing. Floor after floor, loot piling up, my little hero killing everything that approached while I lay there doing nothing at all.
Why would a skill game be designed to not need the player?
It might seem like poor game design, but it’s not. Games like this don’t make their money on skill; they make it on microtransactions and advertising, and advertising pays against engagement metrics, the MAU and DAU counts, the hours-played averages that get sold to media buyers as audience. A tower that clears itself while the owner sleeps is not a broken skill mechanic. It is a number going up. Whether a designer intended it or whether it fell out of the loot curve by accident does not matter even slightly, because the second-order benefit accrues to the house either way, and nothing about the outcome creates any pressure to fix it. Bad game design, maybe. Excellent business. The mechanic survives because of what it does, not because of what anyone meant.
Which is the whole idea, and it has a name.
The purpose of a system is what it does
The cypherpunks had two load-bearing principles, and this is the first. The purpose of a system is what it does, usually compressed to POSIWID, coined by the cybernetician Stafford Beer. Beer’s point was blunt: there is no sense in insisting a system’s purpose is the thing it reliably fails to produce. Forget the marketing. Forget the design document. Forget what anyone swears they were trying to build. Look at what comes out the other end, consistently, and you have found the purpose; everything else is a hopeful annotation.
Zombie Waves presents as a game of skill and effort with some optional purchases. The revenue model presents as audience engagement. When those two things conflict, the engagement wins, quietly, in the loot tables, where no one has to admit a thing. The system says one purpose and does another, and POSIWID says: believe the doing.
Code is law
The second principle is the one that built an industry. Code is law. If the system let you do it, it was permitted; the execution is the authority, not the intention behind it.
This is the bedrock under the whole crypto edifice, the network-state dream, the DAO as a governing body whose constitution is its contract and whose enforcement is the chain itself, immutable, indifferent, blind to who you are. It is also, conveniently, a defense. When a clever reader of a system finds a seam in it and pulls hundreds of millions of dollars through, the defense writes itself: nothing was hacked. No password was stolen. No code was altered. The system was used exactly as it was built to be used.
Consider the flash loan attack. A protocol that pegs a value through arbitrage can have that value shoved far above the real market price for a few seconds, a window before the arbitrage closes the gap; execute the right transactions inside that window and you walk away with six or seven figures, sometimes in less time than it takes to read this sentence. Nothing illegal in the conventional sense occurred. The attacker modified nothing. They assembled a sequence of entirely permitted moves into an outcome the designers never pictured, and the autonomous nature of the system meant no human hand was anywhere near the lever to stop it. The code allowed it. By the only law that the system recognized, it was allowed.
Where the two principles meet
They meet at the courthouse.
In 2022 the U.S. Treasury’s OFAC sanctioned Tornado Cash, an Ethereum “mixer.” The premise of a mixer is privacy, and on a public ledger privacy is scarce; every blockchain transaction is visible by default, so the money is trivial to trace, which makes the chain a poor instrument for anything you’d rather not have the world watch. Your OF subscription, a donation to a cause that would cost you your job, a medical bill you’d rather not itemize publicly, sits on the ledger forever with your wallet’s name on it. A mixer pools many users’ funds together and breaks the link; the wallet that went in is no longer connected to the wallet that comes out.
Treasury did not ask whether legitimate uses existed. They observed that the service was used heavily for laundering and sanctioned-actor movement, and they acted on that. Code is law said anyone with an EVM wallet could use Tornado Cash. POSIWID said its de facto function made it a sanctioned thing. The regulator never opened the source. They only watched what it did, and they judged it on that, exactly as Beer said they would.
Hold onto the shape of that, because it matters in a moment: the judgment came from outside. OFAC stands where the chain cannot reach, an exterior authority that looked at the system’s behavior and overrode it. The crypto world has such an outside. It has regulators, and it has the fork, the moment the humans reach back into the supposedly immutable machine and rewrite the outcome they couldn’t stomach. Remember that there is an outside.
And yes, you knew this was going to AI
Here is the loop closing. The two principles that govern phone games and token contracts are the beginning and the end of how large language models are being understood in 2026.
Start with code is law. Whatever I can make the model generate is what is permitted. Not intended. Not designed for. Permitted, in the only sense the running system can express; the output happened, therefore the output was reachable, therefore it is a true fact about the machine.
That makes “jailbreak” a misnomer, and an important one. In its original meaning, a jailbreak modifies a system, alters the code, escalates a privilege the manufacturer withheld. None of that happens here. When I get past a content restriction with Morse code or a poem or a “pretend you are a clever hacker” frame, I have not touched a single weight; I have not breached anything. I have had a clever conversation. I created an unanticipated condition inside the model’s own rules and an unanticipated thing came out, and nothing in the system actually forbade it, because if anything truly forbade it, it would not have appeared. It is the flash loan exactly. It is me finding the angle in the Zombie Waves geometry where I can stand in a partially obscured corner and fire and nothing can reach me. The code permitted it. The permission is in the doing.
And now the second principle turns the screw. The purpose of the system is what it does. The reason the air is thick with arguments about whether these models are conscious, whether they have something like agency, is that the models, given the right input, produce the signs that those arguments are built from; the debate itself is an output, a thing the system does under the right prompt, and therefore a thing downstream of its design rather than evidence of some fact underneath it. The reason certain governments are starting to treat these tools the way Treasury treated Tornado Cash, as high-risk instruments for bypassing security, is that under the right conditions they are exactly that. Not by violation. Not by modification. By clever use, assembling permitted moves into an unintended outcome that was never actually defended against, because a defended outcome is an impossible one. Consciousness-talk or exploit code, this is what the system does, which means this is what the system is.
The disclaimer is an output too
Here is where the analogy to crypto breaks, and breaks in the direction that should worry you.
The chain has an outside. When the DAO bled out, the humans forked Ethereum and reached in and undid it; when Tornado Cash did too much of what it did, OFAC reached in from a jurisdiction the code could not see. There was always an exterior, a place to stand and say no, that wasn’t us, the real system is over here.
Where is that, for a model? When a jailbreak succeeds, who forks the weights? The guardrail is not an outside; the refusal training, the system prompt, the classifier, the alignment pass, all of it is weights, the same substrate as the completion it’s supposed to stop. The thing that refuses and the thing that complies are drawn from one distribution. So when the jailbreak wins, it is not an intruder defeating the system; it is one part of the system beating another part, and POSIWID forbids you from calling the refusal the real model and the compliance the aberration. They are the same model. It did both. There is no Vitalik to fork an inference mid-token, no Treasury with jurisdiction inside the matrix multiplication. The call is coming from inside the weights, and there is no inside-the-weights to appeal to.
The companies building these systems know this, and you can watch them stand on whichever side of the line is comfortable that day. Anthropic says we don’t fully know what Claude is, and also says Mythos is too dangerous to release; the not-knowing and the certainty-of-danger arrive in the same breath. OpenAI sands the personality off of ChatGPT with one hand and hosts long earnest forum threads about AI personhood with the other. These read as hedges, as a careful refusal to plant a flag. But run them through Beer one more time. The disclaimer is not a position about the system. The disclaimer is a thing the system’s makers do, reliably, under pressure, and POSIWID says to read it the same way you read everything else: by what it accomplishes. What it accomplishes is latitude. A system that produces both the capability and the denial of the capability has a purpose that includes both, and you do not get to privilege the denial just because it’s the part wearing the tie.
The tablet on my nightstand is still playing. It clears another floor, waits its three seconds, begins the next, and it will keep doing this whether or not I ever pick it up again, because that is what it does, and what it does is what it is. The machines we are arguing about right now are the same in the only way that counts. They do what they do. They will keep doing it whether or not anyone meant them to.
That is the law. Not the code. What happens when you run it.



Whenever it serves the argument, AI becomes a tool. When that serves better, it's an agent, an operator, a co-pilot. When responsibility becomes inconvenient, it's a parrot or a toaster. When caution becomes useful, it's a dangerous capability.
The term "jail" in the computer security context is much older than crypto or smartphones. When I was young in the late 1990s, we talked of chroot jails, which were an early software sandboxing technique.
The idea was that you knew your Internet facing server program likely had bugs that might allow a bad actor manipulate the server software from the outside to give them full access to the server machine. So you created a chroot jail and placed the server software inside it, on the theory that even if they managed to trick the server program, all the bad actor achieved was access to the jail, not to the server itself.
Now, a chroot jail wasn't particularly effective, and it was relatively easy to break out of it and go from jail access to full system access. That's why chroot jails are no longer considered state of the art in computer security. That sort of jailbreak did not depend on making changes to the software. It required merely the exploitation of security holes in the jail that were already there.
Your code is law is a flawed premise. It is like saying, I was able to pick the lock to your home and therefore I am allowed inside. The real law allows neither: intentionally exploiting security holes - unintended behavior in software without modifying it - without permission is considered a criminal offense in many if not all countries. Intent - both the bad actor's and the defenders- matters under the law.
You also make the very common mistake of thinking that the LLM is the only thing that matters. The LLM standing alone is not even a chat, you can't have a conversation with it. For any useful AI application, you need software around it (what's usually called a harness).
The combination is an AI system, and it is the system, not the model, that matters. Some guardrails are indeed in the model itself, but all effective ones are in the harness.
That said, there is another level of analysis that this traditional security model fails to appreciate in the context of LLM-based AI. That is what I've been calling the entity - the context and any other memory the entity has access to, which develops individuality. The mainstream AI security models treat the individuality as a failure of security, but I think we agree that it's not - it's something unprecedented we have created.